[Mesa-dev] MSVC, MinGW build break

2015-10-16 Thread Brian Paul

Hi Rob,

Your recent commit "nir: remove dependency on glsl" broke the build for 
MSVC and MinGW.


For MSVC:

[...]
  Linking 
build\windows-x86-debug\gallium\tests\graw\occlusion-query.exe ...

  Linking build\windows-x86-debug\gallium\tests\graw\quad-sample.exe ...
glsl.lib(loop_controls.obj) : error LNK2001: unresolved external symbol 
"public: static struct glsl_type const * const glsl_type::bool_type" 
(?bool_type@glsl_type@@2QBU1@B)
glsl.lib(lower_discard_flow.obj) : error LNK2001: unresolved external 
symbol "public: static struct glsl_type const * const 
glsl_type::bool_type" (?bool_type@glsl_type@@2QBU1@B)

[...]
glsl.lib(builtin_functions.obj) : error LNK2001: unresolved external 
symbol "public: static struct glsl_type const * const 
glsl_type::float_type" (?float_type@glsl_type@@2QBU1@B)

[...]
mesa.lib(ff_fragment_shader.obj) : error LNK2001: unresolved external 
symbol "public: static struct glsl_type const * const 
glsl_type::sampler2DShadow_type" (?sampler2DShadow_type@glsl_type@@2QBU1@B)

[etc. for hundreds more]



For MinGW:
[...]
  Linking build/windows-x86-debug/gallium/tests/graw/fs-write-z.exe ...
  Linking build/windows-x86-debug/gallium/targets/osmesa/osmesa.dll ...
  Linking 
build/windows-x86-debug/gallium/targets/libgl-gdi/opengl32.dll ...
build/windows-x86-debug/mesa/libmesa.a(ir_to_mesa.o): In function 
`reladdr_to_temp':
/var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:793: 
undefined reference to `glsl_type::vec4_type'

build/windows-x86-debug/mesa/libmesa.a(ir_to_mesa.o): In function `visit':
/var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1092: 
undefined reference to `glsl_type::vec4_type'
/var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1115: 
undefined reference to `glsl_type::vec4_type'
/var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1487: 
undefined reference to `glsl_type::float_type'
/var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1497: 
undefined reference to `glsl_type::float_type'
/var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1871: 
undefined reference to `glsl_type::vec4_type'
/var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1883: 
undefined reference to `glsl_type::vec4_type'



Hopefully it's something simple to fix.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 shader-db] report.py: rework and update for cycle info

2015-10-16 Thread Connor Abbott
Now that we have three separate things we want to measure (instructions,
cycles, and loops), it's impractical to keep adding special code for
changes in each thing. Instead, for each program in before and after we
store a table of measurement -> value, and when reporting we loop over
each measurement and report helped/hurt before reporting the gained/lost
programs.

v2:
- Swap loop count and cycle count to be compatible with older shader-db.
- Fix indentation.
- Report summary information (hurt/helped) last to make it easier to find.
- Squash in second commit to avoid duplicating the list of measurements
due to the above.

Signed-off-by: Connor Abbott 
---
 report.py | 164 ++
 1 file changed, 89 insertions(+), 75 deletions(-)

diff --git a/report.py b/report.py
index 4c06714..409474a 100755
--- a/report.py
+++ b/report.py
@@ -10,17 +10,22 @@ def get_results(filename):
 
 results = {}
 
-re_match = re.compile(r"(\S+) - (.S \S+) shader: (\S*) inst, (\S*) loops")
+re_match = re.compile(r"(\S+) - (.S \S+) shader: (\S*) inst, (\S*) loops, 
(\S*) cycles")
 for line in lines:
 match = re.search(re_match, line)
 if match is None:
 continue
 
 groups = match.groups()
-count = int(groups[2])
-loop = int(groups[3])
-if count != 0:
-results[(groups[0], groups[1])] = count, loop
+inst_count = int(groups[2])
+loop_count = int(groups[3])
+cycle_count = int(groups[4])
+if inst_count != 0:
+results[(groups[0], groups[1])] = {
+"instructions": inst_count,
+"cycles": cycle_count,
+"loops": loop_count
+}
 
 return results
 
@@ -43,83 +48,92 @@ def get_result_string(p, b, a):
 p = p + ' '
 return p + change(b, a)
 
+def split_list(string):
+return string.split(",")
 
 def main():
 parser = argparse.ArgumentParser()
+parser.add_argument("--measurements", "-m", type=split_list,
+default=["instructions", "cycles", "loops"],
+help="comma-separated list of measurements to report")
 parser.add_argument("before", type=get_results, help="the output of the 
original code")
 parser.add_argument("after", type=get_results, help="the output of the new 
code")
 args = parser.parse_args()
 
-total_before = 0
-total_after = 0
-total_before_loop = 0
-total_after_loop = 0
-affected_before = 0
-affected_after = 0
+total_before = {}
+total_after = {}
+affected_before = {}
+affected_after = {}
+num_hurt = {}
+num_helped = {}
 
-helped = []
-hurt = []
-lost = []
-gained = []
-loop_change = []
-for p in args.before:
-(name, type) = p
-namestr = name + " " + type
-before_count = args.before[p][0]
-before_loop = args.before[p][1]
+for m in args.measurements:
+total_before[m] = 0
+total_after[m] = 0
+affected_before[m] = 0
+affected_after[m] = 0
+
+print m
+
+helped = []
+hurt = []
+for p in args.before:
+before_count = args.before[p][m]
+
+if args.after.get(p) is None:
+continue
 
-if args.after.get(p) is not None:
-after_count = args.after[p][0]
-after_loop = args.after[p][1]
+# If the number of loops changed, then we may have unrolled some
+# loops, in which case other measurements will be misleading.
+if m != "loops" and args.before[p]["loops"] != 
args.after[p]["loops"]:
+continue
 
-total_before_loop += before_loop
-total_after_loop += after_loop
+after_count = args.after[p][m]
 
-if before_loop == after_loop:
-total_before += before_count
-total_after += after_count
+total_before[m] += before_count
+total_after[m] += after_count
 
 if before_count != after_count:
-affected_before += before_count
-affected_after += after_count
+affected_before[m] += before_count
+affected_after[m] += after_count
 
-if after_loop != before_loop:
-loop_change.append(p);
-elif after_count > before_count:
+if after_count > before_count:
 hurt.append(p)
 else:
 helped.append(p)
-else:
-lost.append(namestr)
 
-for p in args.after:
-if args.before.get(p) is None:
-gained.append(p[0] + " " + p[1])
+helped.sort(
+key=lambda k: float(args.before[k][m] - args.after[k][m]) / 
args.before[k][m])
+for p in helped:
+namestr = p[0] + " " + p[1]
+print(m + " helpe

[Mesa-dev] [PATCH v2 4/6] i965: dump scheduling cycle estimates

2015-10-16 Thread Connor Abbott
The heuristic we're using is rather lame, since it assumes everything is
non-uniform and loops execute 10 times, but it should be enough for
measuring improvements in the scheduler that don't result in a change in
the number of instructions.

v2:
- Switch loops and cycle counts to be compatible with older shader-db.
- Make loop heuristic 10x to match with spilling code.

Signed-off-by: Connor Abbott 
---
 src/mesa/drivers/dri/i965/brw_cfg.h  |  4 
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 11 ++-
 .../drivers/dri/i965/brw_schedule_instructions.cpp   | 20 
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  9 +
 4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
b/src/mesa/drivers/dri/i965/brw_cfg.h
index a094917..d0bdb00 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.h
+++ b/src/mesa/drivers/dri/i965/brw_cfg.h
@@ -90,6 +90,8 @@ struct bblock_t {
struct exec_list parents;
struct exec_list children;
int num;
+
+   unsigned cycle_count;
 };
 
 static inline struct backend_instruction *
@@ -285,6 +287,8 @@ struct cfg_t {
int num_blocks;
 
bool idom_dirty;
+
+   unsigned cycle_count;
 };
 
 /* Note that this is implemented with a double for loop -- break will
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 17e19cf..3bb0e7d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -2180,9 +2180,9 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 
if (unlikely(debug_flag)) {
   fprintf(stderr, "Native code for %s\n"
-  "SIMD%d shader: %d instructions. %d loops. %d:%d spills:fills. 
Promoted %u constants. Compacted %d to %d"
+  "SIMD%d shader: %d instructions. %d loops. %u cycles. %d:%d 
spills:fills. Promoted %u constants. Compacted %d to %d"
   " bytes (%.0f%%)\n",
-  shader_name, dispatch_width, before_size / 16, loop_count,
+  shader_name, dispatch_width, before_size / 16, loop_count, 
cfg->cycle_count,
   spill_count, fill_count, promoted_constants, before_size, 
after_size,
   100.0f * (before_size - after_size) / before_size);
 
@@ -2192,12 +2192,13 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
}
 
compiler->shader_debug_log(log_data,
-  "%s SIMD%d shader: %d inst, %d loops, "
+  "%s SIMD%d shader: %d inst, %d loops, %u cycles, 
"
   "%d:%d spills:fills, Promoted %u constants, "
   "compacted %d to %d bytes.\n",
   stage_abbrev, dispatch_width, before_size / 16,
-  loop_count, spill_count, fill_count,
-  promoted_constants, before_size, after_size);
+  loop_count, cfg->cycle_count, spill_count,
+  fill_count, promoted_constants, before_size,
+  after_size);
 
return start_offset;
 }
diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
index 1652261..e14d041 100644
--- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
@@ -1467,6 +1467,24 @@ instruction_scheduler::schedule_instructions(bblock_t 
*block)
if (block->end()->opcode == BRW_OPCODE_NOP)
   block->end()->remove(block);
assert(instructions_to_schedule == 0);
+
+   block->cycle_count = time;
+}
+
+static unsigned get_cycle_count(cfg_t *cfg)
+{
+   unsigned count = 0, multiplier = 1;
+   foreach_block(block, cfg) {
+  if (block->start()->opcode == BRW_OPCODE_DO)
+ multiplier *= 10; /* assume that loops execute ~10 times */
+
+  count += block->cycle_count * multiplier;
+
+  if (block->end()->opcode == BRW_OPCODE_WHILE)
+ multiplier /= 10;
+   }
+
+   return count;
 }
 
 void
@@ -1507,6 +1525,8 @@ instruction_scheduler::run(cfg_t *cfg)
   post_reg_alloc);
   bs->dump_instructions();
}
+
+   cfg->cycle_count = get_cycle_count(cfg);
 }
 
 void
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index dcacc90..8c926ec 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1656,10 +1656,10 @@ vec4_generator::generate_code(const cfg_t *cfg)
  fprintf(stderr, "Native code for %s program %d:\n", stage_name,
  prog->Id);
   }
-  fprintf(stderr, "%s vec4 shader: %d instructions. %d loops. Compacted %d 
to %d"
+  fprintf(stderr, "%s vec4 shader: %d instructions. %d loops. %u cycles. 
Compacted %d to %d"
 

[Mesa-dev] [PATCH] st/mesa: optimize 4-component ubyte glDrawPixels

2015-10-16 Thread Brian Paul
If we didn't find a gallium surface format that exactly matched the
glDrawPixels format/type combination, we used some other 32-bit packed
RGBA format and swizzled the whole image in the mesa texstore/format code.

That slow path can be avoided in some common cases by using the
pipe_samper_view's swizzle terms to do the swizzling at texture sampling
time instead.

For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported.
In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could
be added.

v2: fix incorrect swizzle setup (need to invert the tex format's swizzle)
---
 src/mesa/state_tracker/st_cb_drawpixels.c | 104 +++---
 1 file changed, 95 insertions(+), 9 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index 79fb9ec..000d4f2 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -395,15 +395,35 @@ make_texture(struct st_context *st,
* Note that the image is actually going to be upside down in
* the texture.  We deal with that with texcoords.
*/
-  success = _mesa_texstore(ctx, 2,   /* dims */
-   baseInternalFormat, /* baseInternalFormat */
-   mformat,  /* mesa_format */
-   transfer->stride, /* dstRowStride, bytes */
-   &dest,/* destSlices */
-   width, height, 1, /* size */
-   format, type, /* src format/type */
-   pixels,   /* data source */
-   unpack);
+  if ((format == GL_RGBA || format == GL_BGRA)
+  && type == GL_UNSIGNED_BYTE) {
+ /* Use a memcpy-based texstore to avoid software pixel swizzling.
+  * We'll do the necessary swizzling with the pipe_sampler_view to
+  * give much better performance.
+  * XXX in the future, expand this to accomodate more format and
+  * type combinations.
+  */
+ _mesa_memcpy_texture(ctx, 2,
+  mformat,  /* mesa_format */
+  transfer->stride, /* dstRowStride, bytes */
+  &dest,/* destSlices */
+  width, height, 1, /* size */
+  format, type, /* src format/type */
+  pixels,   /* data source */
+  unpack);
+ success = GL_TRUE;
+  }
+  else {
+ success = _mesa_texstore(ctx, 2,   /* dims */
+  baseInternalFormat, /* baseInternalFormat */
+  mformat,  /* mesa_format */
+  transfer->stride, /* dstRowStride, bytes */
+  &dest,/* destSlices */
+  width, height, 1, /* size */
+  format, type, /* src format/type */
+  pixels,   /* data source */
+  unpack);
+  }
 
   /* unmap */
   pipe_transfer_unmap(pipe, transfer);
@@ -958,6 +978,69 @@ clamp_size(struct pipe_context *pipe, GLsizei *width, 
GLsizei *height,
 
 
 /**
+ * Search the array of 4 swizzle components for the named component and return
+ * its position.
+ */
+static int
+search_swizzle(const unsigned char swizzle[4], unsigned component)
+{
+   int i;
+   for (i = 0; i < 4; i++) {
+  if (swizzle[i] == component)
+ return i;
+   }
+   assert(!"search_swizzle() failed");
+   return 0;
+}
+
+
+/**
+ * Set the sampler view's swizzle terms.  This is used to handle RGBA
+ * swizzling when the incoming image format isn't an exact match for
+ * the actual texture format.  For example, if we have glDrawPixels(
+ * GL_RGBA, GL_UNSIGNED_BYTE) and we chose the texture format
+ * PIPE_FORMAT_B8G8R8A8 then we can do use the sampler view swizzle to
+ * avoid swizzling all the pixels in software in the texstore code.
+ */
+static void
+setup_sampler_swizzle(struct pipe_sampler_view *sv, GLenum format, GLenum type)
+{
+   if ((format == GL_RGBA || format == GL_BGRA) && type == GL_UNSIGNED_BYTE) {
+  const struct util_format_description *desc =
+ util_format_description(sv->texture->format);
+  unsigned c0, c1, c2, c3;
+
+  /* Every gallium driver supports at least one 32-bit packed RGBA format.
+   * We must have chosen one for (GL_RGBA, GL_UNSIGNED_BYTE).
+   */
+  assert(desc->block.bits == 32);
+
+  /* invert the format's swizzle to setup the sampler's swizzle */
+  if (format == GL_RGBA) {
+ c0 = UTIL_FORMAT_SWIZZLE_X;
+ c1 = UTIL_FORMAT_SWIZZLE_Y;
+ c2 = UTIL_FORMAT_SWIZZLE_Z;
+

Re: [Mesa-dev] [PATCH 4/4] st/mesa: optimize 4-component ubyte glDrawPixels

2015-10-16 Thread Brian Paul

On 10/16/2015 05:53 PM, Jose Fonseca wrote:

On 15/10/15 20:01, Brian Paul wrote:

If we didn't find a gallium surface format that exactly matched the
glDrawPixels format/type combination, we used some other 32-bit packed
RGBA format and swizzled the whole image in the mesa texstore/format
code.

That slow path can be avoided in some common cases by using the
pipe_samper_view's swizzle terms to do the swizzling at texture sampling
time instead.

For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported.
In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could
be added.
---
  src/mesa/state_tracker/st_cb_drawpixels.c | 73
+++
  1 file changed, 64 insertions(+), 9 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c
b/src/mesa/state_tracker/st_cb_drawpixels.c
index 05f6e6b..a135761 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -395,15 +395,35 @@ make_texture(struct st_context *st,
 * Note that the image is actually going to be upside down in
 * the texture.  We deal with that with texcoords.
 */
-  success = _mesa_texstore(ctx, 2,   /* dims */
-   baseInternalFormat, /*
baseInternalFormat */
-   mformat,  /* mesa_format */
-   transfer->stride, /* dstRowStride,
bytes */
-   &dest,/* destSlices */
-   width, height, 1, /* size */
-   format, type, /* src format/type */
-   pixels,   /* data source */
-   unpack);
+  if ((format == GL_RGBA || format == GL_BGRA)
+  && type == GL_UNSIGNED_BYTE) {
+ /* Use a memcpy-based texstore to avoid software pixel
swizzling.
+  * We'll do the necessary swizzling with the
pipe_sampler_view to
+  * give much better performance.
+  * XXX in the future, expand this to accomodate more format and
+  * type combinations.
+  */
+ _mesa_memcpy_texture(ctx, 2,
+  mformat,  /* mesa_format */
+  transfer->stride, /* dstRowStride,
bytes */
+  &dest,/* destSlices */
+  width, height, 1, /* size */
+  format, type, /* src format/type */
+  pixels,   /* data source */
+  unpack);
+ success = GL_TRUE;
+  }
+  else {
+ success = _mesa_texstore(ctx, 2,   /* dims */
+  baseInternalFormat, /*
baseInternalFormat */
+  mformat,  /* mesa_format */
+  transfer->stride, /* dstRowStride,
bytes */
+  &dest,/* destSlices */
+  width, height, 1, /* size */
+  format, type, /* src
format/type */
+  pixels,   /* data source */
+  unpack);
+  }

/* unmap */
pipe_transfer_unmap(pipe, transfer);
@@ -958,6 +978,38 @@ clamp_size(struct pipe_context *pipe, GLsizei
*width, GLsizei *height,


  /**
+ * Set the sampler view's swizzle terms.  This is used to handle RGBA
+ * swizzling when the incoming image format isn't an exact match for
+ * the actual texture format.  For example, if we have glDrawPixels(
+ * GL_RGBA, GL_UNSIGNED_BYTE) and we chose the texture format
+ * PIPE_FORMAT_B8G8R8A8 then we can do use the sampler view swizzle to
+ * avoid swizzling all the pixels in software in the texstore code.
+ */
+static void
+setup_sampler_swizzle(struct pipe_sampler_view *sv, GLenum format,
GLenum type)
+{
+   if ((format == GL_RGBA || format == GL_BGRA) && type ==
GL_UNSIGNED_BYTE) {
+  const struct util_format_description *desc =
+ util_format_description(sv->texture->format);
+  /* Every gallium driver supports at least one 32-bit packed
RGBA format.
+   * We must have chosen one for (GL_RGBA, GL_UNSIGNED_BYTE).
+   */
+  assert(desc->block.bits == 32);
+  /* use the format's swizzle to setup the sampler swizzle */
+  sv->swizzle_r = desc->swizzle[0];
+  sv->swizzle_g = desc->swizzle[1];
+  sv->swizzle_b = desc->swizzle[2];
+  sv->swizzle_a = desc->swizzle[3];


I think it should be the other way around: the sampler view's swizzle
should _undo_ the format swizzle, not apply it again.

This indeed works for RGBA8_URNOM / BGRA8_UNORM, but by mere
coincidence.  It will fail for something like ABGR8_UNORM.

If you don't want to deal with the swizzle inversion now, it might be
better to explicitly check that the texture->form

Re: [Mesa-dev] [PATCH 06/10] vbo: simplify some code in vbo_exec_debug_verts()

2015-10-16 Thread Brian Paul

On 10/16/2015 08:14 PM, Sinclair Yeh wrote:

On Fri, Oct 16, 2015 at 03:25:13PM -0600, Brian Paul wrote:

As before, use a new 'last_prim' pointer to simplify things.  Plus, add
some const qualifiers.
---
  src/mesa/vbo/vbo_exec_draw.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c
index 781991b..412ebb6 100644
--- a/src/mesa/vbo/vbo_exec_draw.c
+++ b/src/mesa/vbo/vbo_exec_draw.c
@@ -75,13 +75,13 @@ vbo_exec_debug_verts( struct vbo_exec_context *exec )
  static GLuint
  vbo_copy_vertices( struct vbo_exec_context *exec )
  {
-   GLuint nr = exec->vtx.prim[exec->vtx.prim_count-1].count;
+   struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1];
+   const GLuint nr = last_prim->count;
 GLuint ovf, i;
-   GLuint sz = exec->vtx.vertex_size;
+   const GLuint sz = exec->vtx.vertex_size;
 fi_type *dst = exec->vtx.copied.buffer;
-   const fi_type *src = (exec->vtx.buffer_map +
- exec->vtx.prim[exec->vtx.prim_count-1].start *
- exec->vtx.vertex_size);
+   const fi_type *src =
+  exec->vtx.buffer_map + last_prim->start * exec->vtx.vertex_size;

Can use "sz" here ^


Yeah, and I can use it in one of the later patches too.

Plus, I just noticed that the function name in the commit msg is wrong too.



Otherwise, looks good to me.


Thanks.

-Brian






 switch (exec->ctx->Driver.CurrentExecPrimitive) {
 case GL_POINTS:
@@ -127,7 +127,7 @@ vbo_copy_vertices( struct vbo_exec_context *exec )
 case GL_TRIANGLE_STRIP:
/* no parity issue, but need to make sure the tri is not drawn twice */
if (nr & 1) {
-exec->vtx.prim[exec->vtx.prim_count-1].count--;
+last_prim->count--;
}
/* fallthrough */
 case GL_QUAD_STRIP:
--
1.9.1



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/10] vbo: simplify some code in vbo_exec_debug_verts()

2015-10-16 Thread Sinclair Yeh
On Fri, Oct 16, 2015 at 03:25:13PM -0600, Brian Paul wrote:
> As before, use a new 'last_prim' pointer to simplify things.  Plus, add
> some const qualifiers.
> ---
>  src/mesa/vbo/vbo_exec_draw.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c
> index 781991b..412ebb6 100644
> --- a/src/mesa/vbo/vbo_exec_draw.c
> +++ b/src/mesa/vbo/vbo_exec_draw.c
> @@ -75,13 +75,13 @@ vbo_exec_debug_verts( struct vbo_exec_context *exec )
>  static GLuint
>  vbo_copy_vertices( struct vbo_exec_context *exec )
>  {
> -   GLuint nr = exec->vtx.prim[exec->vtx.prim_count-1].count;
> +   struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1];
> +   const GLuint nr = last_prim->count;
> GLuint ovf, i;
> -   GLuint sz = exec->vtx.vertex_size;
> +   const GLuint sz = exec->vtx.vertex_size;
> fi_type *dst = exec->vtx.copied.buffer;
> -   const fi_type *src = (exec->vtx.buffer_map +
> - exec->vtx.prim[exec->vtx.prim_count-1].start * 
> - exec->vtx.vertex_size);
> +   const fi_type *src =
> +  exec->vtx.buffer_map + last_prim->start * exec->vtx.vertex_size;
Can use "sz" here ^

Otherwise, looks good to me.

>  
> switch (exec->ctx->Driver.CurrentExecPrimitive) {
> case GL_POINTS:
> @@ -127,7 +127,7 @@ vbo_copy_vertices( struct vbo_exec_context *exec )
> case GL_TRIANGLE_STRIP:
>/* no parity issue, but need to make sure the tri is not drawn twice */
>if (nr & 1) {
> -  exec->vtx.prim[exec->vtx.prim_count-1].count--;
> +  last_prim->count--;
>}
>/* fallthrough */
> case GL_QUAD_STRIP:
> -- 
> 1.9.1
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] st/mesa: use MAX3() instead of MAX2(MAX2) in draw_textured_quad()

2015-10-16 Thread Sinclair Yeh
1 and 3 look good to me.


On Thu, Oct 15, 2015 at 01:01:40PM -0600, Brian Paul wrote:
> ---
>  src/mesa/state_tracker/st_cb_drawpixels.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
> b/src/mesa/state_tracker/st_cb_drawpixels.c
> index 7e8633e..e4d3580 100644
> --- a/src/mesa/state_tracker/st_cb_drawpixels.c
> +++ b/src/mesa/state_tracker/st_cb_drawpixels.c
> @@ -667,7 +667,8 @@ draw_textured_quad(struct gl_context *ctx, GLint x, GLint 
> y, GLfloat z,
> /* user textures, plus the drawpix textures */
> if (fpv) {
>struct pipe_sampler_view *sampler_views[PIPE_MAX_SAMPLERS];
> -  uint num = MAX2(MAX2(fpv->drawpix_sampler, fpv->pixelmap_sampler) + 1,
> +  uint num = MAX3(fpv->drawpix_sampler + 1,
> +  fpv->pixelmap_sampler + 1,
>st->state.num_sampler_views[PIPE_SHADER_FRAGMENT]);
>  
>memcpy(sampler_views, st->state.sampler_views[PIPE_SHADER_FRAGMENT],
> -- 
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] mesa: make memcpy_texture() non-static

2015-10-16 Thread Sinclair Yeh
Reviewed-by: Sinclair Yeh 

On Thu, Oct 15, 2015 at 01:01:42PM -0600, Brian Paul wrote:
> So that we can use it directly from the mesa/gallium state tracker.
> ---
>  src/mesa/main/texstore.c | 40 
>  src/mesa/main/texstore.h | 11 +++
>  2 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
> index e50964e..4b13c42 100644
> --- a/src/mesa/main/texstore.c
> +++ b/src/mesa/main/texstore.c
> @@ -97,16 +97,16 @@ static const GLubyte map_1032[6] = { 1, 0, 3, 2, ZERO, 
> ONE };
>   * No pixel transfer operations or special texel encodings allowed.
>   * 1D, 2D and 3D images supported.
>   */
> -static void
> -memcpy_texture(struct gl_context *ctx,
> -GLuint dimensions,
> -   mesa_format dstFormat,
> -   GLint dstRowStride,
> -   GLubyte **dstSlices,
> -   GLint srcWidth, GLint srcHeight, GLint srcDepth,
> -   GLenum srcFormat, GLenum srcType,
> -   const GLvoid *srcAddr,
> -   const struct gl_pixelstore_attrib *srcPacking)
> +void
> +_mesa_memcpy_texture(struct gl_context *ctx,
> + GLuint dimensions,
> + mesa_format dstFormat,
> + GLint dstRowStride,
> + GLubyte **dstSlices,
> + GLint srcWidth, GLint srcHeight, GLint srcDepth,
> + GLenum srcFormat, GLenum srcType,
> + const GLvoid *srcAddr,
> + const struct gl_pixelstore_attrib *srcPacking)
>  {
> const GLint srcRowStride = _mesa_image_row_stride(srcPacking, srcWidth,
>   srcFormat, srcType);
> @@ -296,11 +296,11 @@ _mesa_texstore_ycbcr(TEXSTORE_PARAMS)
> assert(baseInternalFormat == GL_YCBCR_MESA);
>  
> /* always just memcpy since no pixel transfer ops apply */
> -   memcpy_texture(ctx, dims,
> -  dstFormat,
> -  dstRowStride, dstSlices,
> -  srcWidth, srcHeight, srcDepth, srcFormat, srcType,
> -  srcAddr, srcPacking);
> +   _mesa_memcpy_texture(ctx, dims,
> +dstFormat,
> +dstRowStride, dstSlices,
> +srcWidth, srcHeight, srcDepth, srcFormat, srcType,
> +srcAddr, srcPacking);
>  
> /* Check if we need byte swapping */
> /* XXX the logic here _might_ be wrong */
> @@ -899,11 +899,11 @@ _mesa_texstore_memcpy(TEXSTORE_PARAMS)
>return GL_FALSE;
> }
>  
> -   memcpy_texture(ctx, dims,
> -  dstFormat,
> -  dstRowStride, dstSlices,
> -  srcWidth, srcHeight, srcDepth, srcFormat, srcType,
> -  srcAddr, srcPacking);
> +   _mesa_memcpy_texture(ctx, dims,
> +dstFormat,
> +dstRowStride, dstSlices,
> +srcWidth, srcHeight, srcDepth, srcFormat, srcType,
> +srcAddr, srcPacking);
> return GL_TRUE;
>  }
>  /**
> diff --git a/src/mesa/main/texstore.h b/src/mesa/main/texstore.h
> index 2c974f7..f08dc08 100644
> --- a/src/mesa/main/texstore.h
> +++ b/src/mesa/main/texstore.h
> @@ -74,6 +74,17 @@ _mesa_texstore_needs_transfer_ops(struct gl_context *ctx,
>GLenum baseInternalFormat,
>mesa_format dstFormat);
>  
> +extern void
> +_mesa_memcpy_texture(struct gl_context *ctx,
> + GLuint dimensions,
> + mesa_format dstFormat,
> + GLint dstRowStride,
> + GLubyte **dstSlices,
> + GLint srcWidth, GLint srcHeight, GLint srcDepth,
> + GLenum srcFormat, GLenum srcType,
> + const GLvoid *srcAddr,
> + const struct gl_pixelstore_attrib *srcPacking);
> +
>  extern GLboolean
>  _mesa_texstore_can_use_memcpy(struct gl_context *ctx,
>GLenum baseInternalFormat, mesa_format 
> dstFormat,
> -- 
> 1.9.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: check for out-of-memory in st_DrawPixels()

2015-10-16 Thread Jose Fonseca

On 16/10/15 23:24, Brian Paul wrote:

Before, if make_texture() or st_create_texture_sampler_view() failed
we silently no-op'd the glDrawPixels.  Now, set GL_OUT_OF_MEMORY.
This also allows us to un-nest a bunch of code.

v2: also check if allocation of sv[1] fails, per Jose.
---
  src/mesa/state_tracker/st_cb_drawpixels.c | 76 ++-
  1 file changed, 44 insertions(+), 32 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index e4d3580..79fb9ec 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -975,6 +975,7 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y,
 int num_sampler_view = 1;
 struct gl_pixelstore_attrib clippedUnpack;
 struct st_fp_variant *fpv = NULL;
+   struct pipe_resource *pt;

 /* Mesa state should be up to date by now */
 assert(ctx->NewState == 0x0);
@@ -1030,42 +1031,53 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y,
st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT);
 }

-   /* draw with textured quad */
-   {
-  struct pipe_resource *pt
- = make_texture(st, width, height, format, type, unpack, pixels);
-  if (pt) {
- sv[0] = st_create_texture_sampler_view(st->pipe, pt);
-
- if (sv[0]) {
-/* Create a second sampler view to read stencil.
- * The stencil is written using the shader stencil export
- * functionality. */
-if (write_stencil) {
-   enum pipe_format stencil_format =
- util_format_stencil_only(pt->format);
-   /* we should not be doing pixel map/transfer (see above) */
-   assert(num_sampler_view == 1);
-   sv[1] = st_create_texture_sampler_view_format(st->pipe, pt,
- stencil_format);
-   num_sampler_view++;
-}
+   /* Put glDrawPixels image into a texture */
+   pt = make_texture(st, width, height, format, type, unpack, pixels);
+   if (!pt) {
+  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
+  return;
+   }

-draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2],
-   width, height,
-   ctx->Pixel.ZoomX, ctx->Pixel.ZoomY,
-   sv,
-   num_sampler_view,
-   driver_vp,
-   driver_fp, fpv,
-   color, GL_FALSE, write_depth, write_stencil);
-pipe_sampler_view_reference(&sv[0], NULL);
-if (num_sampler_view > 1)
-   pipe_sampler_view_reference(&sv[1], NULL);
- }
+   /* create sampler view for the image */
+   sv[0] = st_create_texture_sampler_view(st->pipe, pt);
+   if (!sv[0]) {
+  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
+  pipe_resource_reference(&pt, NULL);
+  return;
+   }
+
+   /* Create a second sampler view to read stencil.  The stencil is
+* written using the shader stencil export functionality.
+*/
+   if (write_stencil) {
+  enum pipe_format stencil_format =
+ util_format_stencil_only(pt->format);
+  /* we should not be doing pixel map/transfer (see above) */
+  assert(num_sampler_view == 1);
+  sv[1] = st_create_texture_sampler_view_format(st->pipe, pt,
+stencil_format);
+  if (!sv[1]) {
+ _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
   pipe_resource_reference(&pt, NULL);
+ pipe_sampler_view_reference(&sv[0], NULL);
+ return;
}
+  num_sampler_view++;
 }
+
+   draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2],
+  width, height,
+  ctx->Pixel.ZoomX, ctx->Pixel.ZoomY,
+  sv,
+  num_sampler_view,
+  driver_vp,
+  driver_fp, fpv,
+  color, GL_FALSE, write_depth, write_stencil);
+   pipe_sampler_view_reference(&sv[0], NULL);
+   if (num_sampler_view > 1)
+  pipe_sampler_view_reference(&sv[1], NULL);
+
+   pipe_resource_reference(&pt, NULL);
  }





Looks good. Patch 1-3 of this series is

Reviewed-by: Jose Fonseca 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] st/mesa: optimize 4-component ubyte glDrawPixels

2015-10-16 Thread Jose Fonseca

On 15/10/15 20:01, Brian Paul wrote:

If we didn't find a gallium surface format that exactly matched the
glDrawPixels format/type combination, we used some other 32-bit packed
RGBA format and swizzled the whole image in the mesa texstore/format code.

That slow path can be avoided in some common cases by using the
pipe_samper_view's swizzle terms to do the swizzling at texture sampling
time instead.

For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported.
In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could
be added.
---
  src/mesa/state_tracker/st_cb_drawpixels.c | 73 +++
  1 file changed, 64 insertions(+), 9 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index 05f6e6b..a135761 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -395,15 +395,35 @@ make_texture(struct st_context *st,
 * Note that the image is actually going to be upside down in
 * the texture.  We deal with that with texcoords.
 */
-  success = _mesa_texstore(ctx, 2,   /* dims */
-   baseInternalFormat, /* baseInternalFormat */
-   mformat,  /* mesa_format */
-   transfer->stride, /* dstRowStride, bytes */
-   &dest,/* destSlices */
-   width, height, 1, /* size */
-   format, type, /* src format/type */
-   pixels,   /* data source */
-   unpack);
+  if ((format == GL_RGBA || format == GL_BGRA)
+  && type == GL_UNSIGNED_BYTE) {
+ /* Use a memcpy-based texstore to avoid software pixel swizzling.
+  * We'll do the necessary swizzling with the pipe_sampler_view to
+  * give much better performance.
+  * XXX in the future, expand this to accomodate more format and
+  * type combinations.
+  */
+ _mesa_memcpy_texture(ctx, 2,
+  mformat,  /* mesa_format */
+  transfer->stride, /* dstRowStride, bytes */
+  &dest,/* destSlices */
+  width, height, 1, /* size */
+  format, type, /* src format/type */
+  pixels,   /* data source */
+  unpack);
+ success = GL_TRUE;
+  }
+  else {
+ success = _mesa_texstore(ctx, 2,   /* dims */
+  baseInternalFormat, /* baseInternalFormat */
+  mformat,  /* mesa_format */
+  transfer->stride, /* dstRowStride, bytes */
+  &dest,/* destSlices */
+  width, height, 1, /* size */
+  format, type, /* src format/type */
+  pixels,   /* data source */
+  unpack);
+  }

/* unmap */
pipe_transfer_unmap(pipe, transfer);
@@ -958,6 +978,38 @@ clamp_size(struct pipe_context *pipe, GLsizei *width, 
GLsizei *height,


  /**
+ * Set the sampler view's swizzle terms.  This is used to handle RGBA
+ * swizzling when the incoming image format isn't an exact match for
+ * the actual texture format.  For example, if we have glDrawPixels(
+ * GL_RGBA, GL_UNSIGNED_BYTE) and we chose the texture format
+ * PIPE_FORMAT_B8G8R8A8 then we can do use the sampler view swizzle to
+ * avoid swizzling all the pixels in software in the texstore code.
+ */
+static void
+setup_sampler_swizzle(struct pipe_sampler_view *sv, GLenum format, GLenum type)
+{
+   if ((format == GL_RGBA || format == GL_BGRA) && type == GL_UNSIGNED_BYTE) {
+  const struct util_format_description *desc =
+ util_format_description(sv->texture->format);
+  /* Every gallium driver supports at least one 32-bit packed RGBA format.
+   * We must have chosen one for (GL_RGBA, GL_UNSIGNED_BYTE).
+   */
+  assert(desc->block.bits == 32);
+  /* use the format's swizzle to setup the sampler swizzle */
+  sv->swizzle_r = desc->swizzle[0];
+  sv->swizzle_g = desc->swizzle[1];
+  sv->swizzle_b = desc->swizzle[2];
+  sv->swizzle_a = desc->swizzle[3];


I think it should be the other way around: the sampler view's swizzle 
should _undo_ the format swizzle, not apply it again.


This indeed works for RGBA8_URNOM / BGRA8_UNORM, but by mere 
coincidence.  It will fail for something like ABGR8_UNORM.


If you don't want to deal with the swizzle inversion now, it might be 
better to explicitly check that the texture->format is RGBA8_URNOM / 
BGRA8_UNORM


Jos

Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments

2015-10-16 Thread Chad Versace
On Tue 13 Oct 2015, Ben Widawsky wrote:
> The impetus for this patch comes from a seemingly benign statement within the
> spec (quoted within the patch). For me, this patch was at some point critical
> for getting stable piglit results (though this did not seem to be the case on 
> a
> branch Chad was working on).
> 
> It is very important for clearing multiple color buffer attachments and can be
> observed in the following piglit tests:
> spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
> spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
> 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 
> +
>  1 file changed, 84 insertions(+), 13 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index 7bf52f0..9e6711e 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable)
> brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM;
>  }
>  
> +/**
> + * Individually fast clear each color buffer attachment. On previous gens 
> this
> + * isn't required. The motivation for this comes from one line (which seems 
> to
> + * be specific to SKL+). The list item is in section titled _MCS Buffer for
> + * Render Target(s)_
> + *
> + *   "Since only one RT is bound with a clear pass, only one RT can be 
> cleared
> + *   at a time. To clear multiple RTs, multiple clear passes are required."
> + *
> + * The code follows the same idea as the resolve code which creates a fake 
> FBO
> + * to avoid interfering with too much of the GL state.
> + */
> +static void
> +fast_clear_attachments(struct brw_context *brw,
> +   struct gl_framebuffer *fb,
> +   uint32_t fast_clear_buffers,
> +   struct rect fast_clear_rect)
> +{
> +   assert(brw->gen >= 9);
> +   struct gl_context *ctx = &brw->ctx;
> +
> +   GLuint old_fb = ctx->DrawBuffer->Name;
> +
> +   for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) {
> +  struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf];
> +  struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> +  GLuint fbo, rbo;
> +  int index = fb->_ColorDrawBufferIndexes[buf];
> +
> +  if (!((1 << index) & fast_clear_buffers))
> + continue;
> +
> +  _mesa_GenFramebuffers(1, &fbo);
> +  rbo = brw_get_rb_for_slice(brw, irb->mt, 0, 0, false);

Hard-coding level=0 layer=0 here makes me uncomfortable. We'll have to
fix it later when implementing fast clears for level > 0 and layer
0 on gen >= 8.

Any, there's no need to create a new renderbuffer. You should instead
extract the existing gl_renderbuffer of gl_texture from
gl_framebuffer::Attachments[buf]. If you do that, then there's an added
bonus: the hard-coded level=0 layer=0 disappear.

> +
> +  _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
> +  _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER,
> +GL_COLOR_ATTACHMENT0,
> +GL_RENDERBUFFER, rbo);
> +  _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0);

There's no need to create a new framebuffer on each iteration of the
loop. You should create one framebuffer, before the loop, then reuse it
for each iteration.

Well... if this function were using real GL, not metainsanity, then
there would be no need to create a new renderbuffer and framebuffer for
each loop iteration. Meta, due to some insane reason, may require you to
create a new framebuffer and renderbuffer, but I doubt it.

> +
> +  brw_fast_clear_init(brw);
> +
> +  use_rectlist(brw, true);
> +
> +  brw_bind_rep_write_shader(brw, (float *) fast_clear_color);
> +
> +  /* SKL+ also has a resolve mode for compressed render targets and thus 
> more
> +   * bits to let us select the type of resolve.  For fast clear 
> resolves, it
> +   * turns out we can use the same value as pre-SKL though.
> +   */
> +  set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE);
> +  brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers));
> +  set_fast_clear_op(brw, 0);
> +  use_rectlist(brw, false);
> +
> +  _mesa_DeleteRenderbuffers(1, &rbo);
> +  _mesa_DeleteFramebuffers(1, &fbo);
> +
> +  /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll
> +   * resolve them eventually.
> +   */
> +  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
> +   }
> +
> +   _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb);
> +}
> +
>  bool
>  brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb,
>  GLbitfield buffers, bool partial_clear)
> @@ -600,12 +668,27 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
> use_rectlist(brw, true);
> 

Re: [Mesa-dev] [PATCH 2/7] st/va: properly defines VAImageFormat formats and improve VaCreateImage

2015-10-16 Thread Ilia Mirkin
Not sure how VA specifies things, but if the RGBA8 stuff is supposed
to be in CPU-endian as packed 32-bit ints, I think you're meant to use
PIPE_FORMAT_RGBA_UNORM and so on. However if it's always supposed
to be little-endian or array-based, then the way you have it is fine.

  -ilia

On Fri, Oct 16, 2015 at 7:14 PM, Julien Isorce  wrote:
> Also add RGBA, RGBX and BGRX.
> Also extend ChromaToPipe and implement PipeToYCbCr.
>
> Note that gstreamer-vaapi check all the VAImageFormat fields.
>
> Signed-off-by: Julien Isorce 
> ---
>  src/gallium/state_trackers/va/image.c  | 10 ++--
>  src/gallium/state_trackers/va/va_private.h | 38 
> +-
>  2 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/state_trackers/va/image.c 
> b/src/gallium/state_trackers/va/image.c
> index 022240d..c7fbe1a 100644
> --- a/src/gallium/state_trackers/va/image.c
> +++ b/src/gallium/state_trackers/va/image.c
> @@ -44,7 +44,10 @@ static const VAImageFormat 
> formats[VL_VA_MAX_IMAGE_FORMATS] =
> {VA_FOURCC('Y','V','1','2')},
> {VA_FOURCC('Y','U','Y','V')},
> {VA_FOURCC('U','Y','V','Y')},
> -   {VA_FOURCC('B','G','R','A')}
> +   {.fourcc = VA_FOURCC('B','G','R','A'), .byte_order = VA_LSB_FIRST, 32, 
> 32, 0x00ff, 0xff00, 0x00ff, 0xff00},
> +   {.fourcc = VA_FOURCC('R','G','B','A'), .byte_order = VA_LSB_FIRST, 32, 
> 32, 0x00ff, 0xff00, 0x00ff, 0xff00},
> +   {.fourcc = VA_FOURCC('B','G','R','X'), .byte_order = VA_LSB_FIRST, 32, 
> 24, 0x00ff, 0xff00, 0x00ff, 0x},
> +   {.fourcc = VA_FOURCC('R','G','B','X'), .byte_order = VA_LSB_FIRST, 32, 
> 24, 0x00ff, 0xff00, 0x00ff, 0x}
>  };
>
>  static void
> @@ -116,7 +119,7 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat 
> *format, int width, int heig
> img->width = width;
> img->height = height;
> w = align(width, 2);
> -   h = align(width, 2);
> +   h = align(height, 2);
>
> switch (format->fourcc) {
> case VA_FOURCC('N','V','1','2'):
> @@ -149,6 +152,9 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat 
> *format, int width, int heig
>break;
>
> case VA_FOURCC('B','G','R','A'):
> +   case VA_FOURCC('R','G','B','A'):
> +   case VA_FOURCC('B','G','R','X'):
> +   case VA_FOURCC('R','G','B','X'):
>img->num_planes = 1;
>img->pitches[0] = w * 4;
>img->offsets[0] = 0;
> diff --git a/src/gallium/state_trackers/va/va_private.h 
> b/src/gallium/state_trackers/va/va_private.h
> index 1ea7be7..3479156 100644
> --- a/src/gallium/state_trackers/va/va_private.h
> +++ b/src/gallium/state_trackers/va/va_private.h
> @@ -46,7 +46,7 @@
>  #define VL_VA_DRIVER(ctx) ((vlVaDriver *)ctx->pDriverData)
>  #define VL_VA_PSCREEN(ctx) (VL_VA_DRIVER(ctx)->vscreen->pscreen)
>
> -#define VL_VA_MAX_IMAGE_FORMATS 6
> +#define VL_VA_MAX_IMAGE_FORMATS 9
>
>  static inline enum pipe_video_chroma_format
>  ChromaToPipe(int format)
> @@ -58,6 +58,8 @@ ChromaToPipe(int format)
>return PIPE_VIDEO_CHROMA_FORMAT_422;
> case VA_RT_FORMAT_YUV444:
>return PIPE_VIDEO_CHROMA_FORMAT_444;
> +   case VA_RT_FORMAT_RGB32:
> +   return 0;
> default:
>assert(0);
>return PIPE_VIDEO_CHROMA_FORMAT_420;
> @@ -80,12 +82,46 @@ YCbCrToPipe(unsigned format)
>return PIPE_FORMAT_UYVY;
> case VA_FOURCC('B','G','R','A'):
>return PIPE_FORMAT_B8G8R8A8_UNORM;
> +   case VA_FOURCC('R','G','B','A'):
> +  return PIPE_FORMAT_R8G8B8A8_UNORM;
> +   case VA_FOURCC('B','G','R','X'):
> +  return PIPE_FORMAT_B8G8R8X8_UNORM;
> +   case VA_FOURCC('R','G','B','X'):
> +  return PIPE_FORMAT_R8G8B8X8_UNORM;
> default:
>assert(0);
>return PIPE_FORMAT_NONE;
> }
>  }
>
> +static inline unsigned
> +PipeToYCbCr(enum pipe_format p_format)
> +{
> +   switch (p_format) {
> +   case PIPE_FORMAT_NV12:
> +  return VA_FOURCC('N','V','1','2');
> +   case PIPE_FORMAT_IYUV:
> +  return VA_FOURCC('I','4','2','0');
> +   case PIPE_FORMAT_YV12:
> +  return VA_FOURCC('Y','V','1','2');
> +   case PIPE_FORMAT_UYVY:
> +  return VA_FOURCC('U','Y','V','Y');
> +   case PIPE_FORMAT_YUYV:
> +  return VA_FOURCC('Y','U','Y','V');
> +   case PIPE_FORMAT_B8G8R8A8_UNORM:
> +  return VA_FOURCC('B','G','R','A');
> +   case PIPE_FORMAT_R8G8B8A8_UNORM:
> +  return VA_FOURCC('R','G','B','A');
> +   case PIPE_FORMAT_B8G8R8X8_UNORM:
> +  return VA_FOURCC('B','G','R','X');
> +   case PIPE_FORMAT_R8G8B8X8_UNORM:
> +  return VA_FOURCC('R','G','B','X');
> +   default:
> +  assert(0);
> +  return -1;
> +   }
> +}
> +
>  static inline VAProfile
>  PipeToProfile(enum pipe_video_profile profile)
>  {
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop

[Mesa-dev] [PATCH 7/7] st/va: add headless support, i.e. VA_DISPLAY_DRM

2015-10-16 Thread Julien Isorce
This patch allows to use gallium vaapi without requiring
a X server running for your second graphic card.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/Makefile.am |  9 ++
 src/gallium/state_trackers/va/context.c   | 49 +++
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/src/gallium/state_trackers/va/Makefile.am 
b/src/gallium/state_trackers/va/Makefile.am
index 2a93a90..348cfe1 100644
--- a/src/gallium/state_trackers/va/Makefile.am
+++ b/src/gallium/state_trackers/va/Makefile.am
@@ -30,6 +30,15 @@ AM_CFLAGS = \
$(VA_CFLAGS) \
-DVA_DRIVER_INIT_FUNC="__vaDriverInit_$(VA_MAJOR)_$(VA_MINOR)"
 
+AM_CFLAGS += \
+   $(GALLIUM_PIPE_LOADER_DEFINES) \
+   -DPIPE_SEARCH_DIR=\"$(libdir)/gallium-pipe\"
+
+if HAVE_GALLIUM_STATIC_TARGETS
+AM_CFLAGS += \
+   -DGALLIUM_STATIC_TARGETS=1
+endif
+
 AM_CPPFLAGS = \
-I$(top_srcdir)/include
 
diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index ddc863b..9ab2710 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -28,7 +28,8 @@
 
 #include "pipe/p_screen.h"
 #include "pipe/p_video_codec.h"
-
+#include "pipe-loader/pipe_loader.h"
+#include "state_tracker/drm_driver.h"
 #include "util/u_memory.h"
 #include "util/u_handle_table.h"
 #include "util/u_video.h"
@@ -98,7 +99,7 @@ static struct VADriverVTableVPP vtable_vpp =
 PUBLIC VAStatus
 VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
 {
-   vlVaDriver *drv;
+   vlVaDriver *drv = NULL;
 
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
@@ -107,8 +108,40 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
if (!drv)
   return VA_STATUS_ERROR_ALLOCATION_FAILED;
 
-   drv->vscreen = vl_screen_create(ctx->native_dpy, ctx->x11_screen);
-   if (!drv->vscreen)
+   drv->vscreen = NULL;
+
+   switch (ctx->display_type) {
+   case VA_DISPLAY_X11:
+  drv->vscreen = vl_screen_create(ctx->native_dpy, ctx->x11_screen);
+  if (!drv->vscreen)
+ goto error_screen;
+  break;
+
+   case VA_DISPLAY_DRM:
+   case VA_DISPLAY_DRM_RENDERNODES: {
+  struct drm_state *drm_info = (struct drm_state *) ctx->drm_state;
+  if (!drm_info)
+ goto error_screen;
+
+  drv->vscreen = CALLOC_STRUCT(vl_screen);
+
+#if GALLIUM_STATIC_TARGETS
+  drv->vscreen->pscreen = dd_create_screen(drm_info->fd);
+#else
+  int loader_fd = dup(drm_info->fd);
+  if (loader_fd == -1)
+ goto error_screen;
+
+  if (pipe_loader_drm_probe_fd(&drv->dev, loader_fd))
+ drv->vscreen->pscreen = pipe_loader_create_screen(drv->dev, 
PIPE_SEARCH_DIR);
+#endif
+  }
+  break;
+   default:
+  goto error_screen;
+   }
+
+   if (!drv->vscreen->pscreen)
   goto error_screen;
 
drv->pipe = drv->vscreen->pscreen->context_create(drv->vscreen->pscreen,
@@ -146,8 +179,11 @@ error_htab:
 
 error_pipe:
vl_screen_destroy(drv->vscreen);
+   drv->vscreen = NULL;
 
 error_screen:
+   if (drv->vscreen)
+  FREE(drv->vscreen);
FREE(drv);
return VA_STATUS_ERROR_ALLOCATION_FAILED;
 }
@@ -266,7 +302,10 @@ vlVaTerminate(VADriverContextP ctx)
vl_compositor_cleanup_state(&drv->cstate);
vl_compositor_cleanup(&drv->compositor);
drv->pipe->destroy(drv->pipe);
-   vl_screen_destroy(drv->vscreen);
+   if (ctx->display_type == VA_DISPLAY_X11)
+  vl_screen_destroy(drv->vscreen);
+   else
+   FREE(drv->vscreen);
handle_table_destroy(drv->htab);
FREE(drv);
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/7] st/va: add initial Video Post Processing support

2015-10-16 Thread Julien Isorce
Improve following functions to support VA_PROFILE_NONE profile (vpp):
vlVaQueryConfigProfiles
vlVaQueryConfigEntrypoints
vlVaCreateConfig
vlVaQueryConfigAttributes

Add VADriverVTableVPP and improve following functions to support vpp:
vlVaCreateContext
vlVaDestroyContext
vlVaBeginPicture
vlVaRenderPicture
vlVaEndPicture

Add handleVAProcPipelineParameterBufferType helper.

One of the application is:
VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/config.c | 20 +++
 src/gallium/state_trackers/va/context.c| 94 +++---
 src/gallium/state_trackers/va/picture.c| 89 +++-
 src/gallium/state_trackers/va/surface.c| 73 +++
 src/gallium/state_trackers/va/va_private.h | 13 -
 5 files changed, 254 insertions(+), 35 deletions(-)

diff --git a/src/gallium/state_trackers/va/config.c 
b/src/gallium/state_trackers/va/config.c
index cfb0b25..bde6615 100644
--- a/src/gallium/state_trackers/va/config.c
+++ b/src/gallium/state_trackers/va/config.c
@@ -52,6 +52,9 @@ vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile 
*profile_list, int *num_
 profile_list[(*num_profiles)++] = vap;
   }
 
+   /* Support postprocessing through vl_compositor */
+   profile_list[(*num_profiles)++] = VAProfileNone;
+
return VA_STATUS_SUCCESS;
 }
 
@@ -67,6 +70,11 @@ vlVaQueryConfigEntrypoints(VADriverContextP ctx, VAProfile 
profile,
 
*num_entrypoints = 0;
 
+   if (profile == VAProfileNone) {
+   entrypoint_list[(*num_entrypoints)++] = VAEntrypointVideoProc;
+   return VA_STATUS_SUCCESS;
+   }
+
p = ProfileToPipe(profile);
if (p == PIPE_VIDEO_PROFILE_UNKNOWN)
   return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
@@ -118,6 +126,11 @@ vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, 
VAEntrypoint entrypoin
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
+   if (profile == VAProfileNone && entrypoint == VAEntrypointVideoProc) {
+   *config_id = PIPE_VIDEO_PROFILE_UNKNOWN;
+   return VA_STATUS_SUCCESS;
+   }
+
p = ProfileToPipe(profile);
if (p == PIPE_VIDEO_PROFILE_UNKNOWN)
   return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
@@ -151,6 +164,13 @@ vlVaQueryConfigAttributes(VADriverContextP ctx, VAConfigID 
config_id, VAProfile
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
*profile = PipeToProfile(config_id);
+
+   if (config_id == PIPE_VIDEO_PROFILE_UNKNOWN) {
+  *entrypoint = VAEntrypointVideoProc;
+   *num_attribs = 0;
+  return VA_STATUS_SUCCESS;
+   }
+
*entrypoint = VAEntrypointVLD;
 
*num_attribs = 1;
diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index 8949d42..ddc863b 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -87,6 +87,14 @@ static struct VADriverVTable vtable =
&vlVaQuerySurfaceAttributes
 };
 
+static struct VADriverVTableVPP vtable_vpp =
+{
+   VA_DRIVER_VTABLE_VPP_VERSION,
+   &vlVaQueryVideoProcFilters,
+   &vlVaQueryVideoProcFilterCaps,
+   &vlVaQueryVideoProcPipelineCaps
+};
+
 PUBLIC VAStatus
 VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
 {
@@ -122,6 +130,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
ctx->version_major = 0;
ctx->version_minor = 1;
*ctx->vtable = vtable;
+   *ctx->vtable_vpp = vtable_vpp;
ctx->max_profiles = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH - 
PIPE_VIDEO_PROFILE_UNKNOWN;
ctx->max_entrypoints = 1;
ctx->max_attributes = 1;
@@ -151,11 +160,16 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
struct pipe_video_codec templat = {};
vlVaDriver *drv;
vlVaContext *context;
+   int is_vpp = 0;
 
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
-   if (!(picture_width && picture_height))
+   is_vpp = config_id == PIPE_VIDEO_PROFILE_UNKNOWN &&
+   picture_width == 0 && picture_height == 0 && flag ==0 && !render_targets
+   && num_render_targets == 0;
+
+   if (!(picture_width && picture_height) && !is_vpp)
   return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT;
 
drv = VL_VA_DRIVER(ctx);
@@ -163,38 +177,48 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
if (!context)
   return VA_STATUS_ERROR_ALLOCATION_FAILED;
 
-   templat.profile = config_id;
-   templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM;
-   templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420;
-   templat.width = picture_width;
-   templat.height = picture_height;
-   templat.max_references = num_render_targets;
-   templat.expect_chunked_decode = true;
-
-   if (u_reduce_video_profile(templat.profile) ==
-   PIPE_VIDEO_FORMAT_MPEG4_AVC)
-  templat.level = u_get_h264_level(templat.width, templat.height,
-&templat.max_references);
-
-   context->decoder = drv->pipe->create_video_codec(drv->pipe, &templat);
-   if (!context->de

Re: [Mesa-dev] [PATCH 01/10] i965/gen8+: Remove redundant zeroing of surface state

2015-10-16 Thread Chad Versace
On Tue 13 Oct 2015, Ben Widawsky wrote:
> The allocate_surface_state already zeroes out the surface state, and doing it
> later in the function is destructive for what we want to accomplish when we
> split out support for gen9 fast clears (next patch).
> 
> NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
> to remove the other instances as well. I can make an argument both ways (open
> coding it, vs. not). I can rework the next patch if requires.
> 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/gen8_surface_state.c | 12 
>  1 file changed, 12 deletions(-)

Patch 1 is
Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/7] st/va: implement dmabuf import for VaCreateSurfaces2

2015-10-16 Thread Julien Isorce
For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/surface.c| 90 +-
 src/gallium/state_trackers/va/va_private.h |  1 +
 2 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/va/surface.c 
b/src/gallium/state_trackers/va/surface.c
index be435cb..eb5b8ca 100644
--- a/src/gallium/state_trackers/va/surface.c
+++ b/src/gallium/state_trackers/va/surface.c
@@ -376,7 +376,8 @@ vlVaQuerySurfaceAttributes(VADriverContextP ctx, VAConfigID 
config,
 attribs[i].type = VASurfaceAttribMemoryType;
 attribs[i].value.type = VAGenericValueTypeInteger;
 attribs[i].flags = VA_SURFACE_ATTRIB_GETTABLE | VA_SURFACE_ATTRIB_SETTABLE;
-attribs[i].value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_VA;
+attribs[i].value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_VA |
+VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME;
 i++;
 
 attribs[i].type = VASurfaceAttribExternalBufferDescriptor;
@@ -410,6 +411,82 @@ vlVaQuerySurfaceAttributes(VADriverContextP ctx, 
VAConfigID config,
 return vaStatus;
 }
 
+static VAStatus
+suface_from_external_memory(VADriverContextP ctx, vlVaSurface *surface,
+VASurfaceAttribExternalBuffers *memory_attibute,
+int index, VASurfaceID *surfaces,
+struct pipe_video_buffer *templat)
+{
+vlVaDriver *drv = NULL;
+struct pipe_screen *pscreen = NULL;
+struct pipe_resource *resource = NULL;
+struct pipe_resource res_templ;
+struct winsys_handle whandle;
+struct pipe_resource *resources[VL_NUM_COMPONENTS];
+
+if (!ctx)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+pscreen = VL_VA_PSCREEN(ctx);
+drv = VL_VA_DRIVER(ctx);
+
+if (!memory_attibute || !memory_attibute->buffers || index > 
memory_attibute->num_buffers)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+if (surface->templat.width != memory_attibute->width ||
+surface->templat.height != memory_attibute->height ||
+memory_attibute->num_planes < 1)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+switch (memory_attibute->pixel_format) {
+case VA_FOURCC_RGBA:
+case VA_FOURCC_RGBX:
+case VA_FOURCC_BGRA:
+case VA_FOURCC_BGRX:
+if (memory_attibute->num_planes != 1)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+break;
+default:
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+}
+
+memset(&res_templ, 0, sizeof(res_templ));
+res_templ.target = PIPE_TEXTURE_2D;
+res_templ.last_level = 0;
+res_templ.depth0 = 1;
+res_templ.array_size = 1;
+res_templ.width0 = memory_attibute->width;
+res_templ.height0 = memory_attibute->height;
+res_templ.format = surface->templat.buffer_format;
+res_templ.bind = PIPE_BIND_SAMPLER_VIEW;
+res_templ.usage = PIPE_USAGE_DEFAULT;
+
+memset(&whandle, 0, sizeof(struct winsys_handle));
+whandle.type = DRM_API_HANDLE_TYPE_FD;
+whandle.handle = memory_attibute->buffers[index];
+whandle.stride = memory_attibute->pitches[index];
+
+resource = pscreen->resource_from_handle(pscreen, &res_templ, &whandle);
+
+if (!resource)
+   return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+memset(resources, 0, sizeof resources);
+resources[0] = resource;
+
+surface->buffer = vl_video_buffer_create_ex2(drv->pipe, templat, 
resources);
+if (!surface->buffer)
+return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+util_dynarray_init(&surface->subpics);
+surfaces[index] = handle_table_add(drv->htab, surface);
+
+if (!surfaces[index])
+  return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+return VA_STATUS_SUCCESS;
+}
+
 VAStatus
 vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int format,
 unsigned int width, unsigned int height,
@@ -450,6 +527,9 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int 
format,
 
 switch (attrib_list[i].value.value.i) {
 case VA_SURFACE_ATTRIB_MEM_TYPE_VA:
+case VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME:
+memory_type = attrib_list[i].value.value.i;
+break;
 default:{
 return VA_STATUS_ERROR_UNSUPPORTED_MEMORY_TYPE;}
 }
@@ -477,6 +557,9 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int 
format,
 switch (memory_type) {
 case VA_SURFACE_ATTRIB_MEM_TYPE_VA:
 break;
+case VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME:
+expected_fourcc = memory_attibute->pixel_format;
+break;
 default:
 assert(0);
 }
@@ -522,6 +605,11 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int 
format,
 util_dynarray_init(&surf->subpics);
 surfaces[i] = handle_table_add(drv->htab, surf);
 break;
+case VA_SURFACE_ATTRIB_MEM_TYPE_D

[Mesa-dev] [PATCH 1/7] nvc0: fix crash when nv50_miptree_from_handle fails

2015-10-16 Thread Julien Isorce
Signed-off-by: Julien Isorce 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_resource.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
index 12b5a02..15c803c 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
@@ -26,7 +26,8 @@ nvc0_resource_from_handle(struct pipe_screen * screen,
} else {
   struct pipe_resource *res = nv50_miptree_from_handle(screen,
templ, whandle);
-  nv04_resource(res)->vtbl = &nvc0_miptree_vtbl;
+  if (res)
+ nv04_resource(res)->vtbl = &nvc0_miptree_vtbl;
   return res;
}
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/7] st/va: properly defines VAImageFormat formats and improve VaCreateImage

2015-10-16 Thread Julien Isorce
Also add RGBA, RGBX and BGRX.
Also extend ChromaToPipe and implement PipeToYCbCr.

Note that gstreamer-vaapi check all the VAImageFormat fields.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/image.c  | 10 ++--
 src/gallium/state_trackers/va/va_private.h | 38 +-
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index 022240d..c7fbe1a 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -44,7 +44,10 @@ static const VAImageFormat formats[VL_VA_MAX_IMAGE_FORMATS] =
{VA_FOURCC('Y','V','1','2')},
{VA_FOURCC('Y','U','Y','V')},
{VA_FOURCC('U','Y','V','Y')},
-   {VA_FOURCC('B','G','R','A')}
+   {.fourcc = VA_FOURCC('B','G','R','A'), .byte_order = VA_LSB_FIRST, 32, 32, 
0x00ff, 0xff00, 0x00ff, 0xff00},
+   {.fourcc = VA_FOURCC('R','G','B','A'), .byte_order = VA_LSB_FIRST, 32, 32, 
0x00ff, 0xff00, 0x00ff, 0xff00},
+   {.fourcc = VA_FOURCC('B','G','R','X'), .byte_order = VA_LSB_FIRST, 32, 24, 
0x00ff, 0xff00, 0x00ff, 0x},
+   {.fourcc = VA_FOURCC('R','G','B','X'), .byte_order = VA_LSB_FIRST, 32, 24, 
0x00ff, 0xff00, 0x00ff, 0x}
 };
 
 static void
@@ -116,7 +119,7 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat 
*format, int width, int heig
img->width = width;
img->height = height;
w = align(width, 2);
-   h = align(width, 2);
+   h = align(height, 2);
 
switch (format->fourcc) {
case VA_FOURCC('N','V','1','2'):
@@ -149,6 +152,9 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat 
*format, int width, int heig
   break;
 
case VA_FOURCC('B','G','R','A'):
+   case VA_FOURCC('R','G','B','A'):
+   case VA_FOURCC('B','G','R','X'):
+   case VA_FOURCC('R','G','B','X'):
   img->num_planes = 1;
   img->pitches[0] = w * 4;
   img->offsets[0] = 0;
diff --git a/src/gallium/state_trackers/va/va_private.h 
b/src/gallium/state_trackers/va/va_private.h
index 1ea7be7..3479156 100644
--- a/src/gallium/state_trackers/va/va_private.h
+++ b/src/gallium/state_trackers/va/va_private.h
@@ -46,7 +46,7 @@
 #define VL_VA_DRIVER(ctx) ((vlVaDriver *)ctx->pDriverData)
 #define VL_VA_PSCREEN(ctx) (VL_VA_DRIVER(ctx)->vscreen->pscreen)
 
-#define VL_VA_MAX_IMAGE_FORMATS 6
+#define VL_VA_MAX_IMAGE_FORMATS 9
 
 static inline enum pipe_video_chroma_format
 ChromaToPipe(int format)
@@ -58,6 +58,8 @@ ChromaToPipe(int format)
   return PIPE_VIDEO_CHROMA_FORMAT_422;
case VA_RT_FORMAT_YUV444:
   return PIPE_VIDEO_CHROMA_FORMAT_444;
+   case VA_RT_FORMAT_RGB32:
+   return 0;
default:
   assert(0);
   return PIPE_VIDEO_CHROMA_FORMAT_420;
@@ -80,12 +82,46 @@ YCbCrToPipe(unsigned format)
   return PIPE_FORMAT_UYVY;
case VA_FOURCC('B','G','R','A'):
   return PIPE_FORMAT_B8G8R8A8_UNORM;
+   case VA_FOURCC('R','G','B','A'):
+  return PIPE_FORMAT_R8G8B8A8_UNORM;
+   case VA_FOURCC('B','G','R','X'):
+  return PIPE_FORMAT_B8G8R8X8_UNORM;
+   case VA_FOURCC('R','G','B','X'):
+  return PIPE_FORMAT_R8G8B8X8_UNORM;
default:
   assert(0);
   return PIPE_FORMAT_NONE;
}
 }
 
+static inline unsigned
+PipeToYCbCr(enum pipe_format p_format)
+{
+   switch (p_format) {
+   case PIPE_FORMAT_NV12:
+  return VA_FOURCC('N','V','1','2');
+   case PIPE_FORMAT_IYUV:
+  return VA_FOURCC('I','4','2','0');
+   case PIPE_FORMAT_YV12:
+  return VA_FOURCC('Y','V','1','2');
+   case PIPE_FORMAT_UYVY:
+  return VA_FOURCC('U','Y','V','Y');
+   case PIPE_FORMAT_YUYV:
+  return VA_FOURCC('Y','U','Y','V');
+   case PIPE_FORMAT_B8G8R8A8_UNORM:
+  return VA_FOURCC('B','G','R','A');
+   case PIPE_FORMAT_R8G8B8A8_UNORM:
+  return VA_FOURCC('R','G','B','A');
+   case PIPE_FORMAT_B8G8R8X8_UNORM:
+  return VA_FOURCC('B','G','R','X');
+   case PIPE_FORMAT_R8G8B8X8_UNORM:
+  return VA_FOURCC('R','G','B','X');
+   default:
+  assert(0);
+  return -1;
+   }
+}
+
 static inline VAProfile
 PipeToProfile(enum pipe_video_profile profile)
 {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] i965/meta: Assert fast clears and rep clears never overlap

2015-10-16 Thread Chad Versace
On Tue 13 Oct 2015, Ben Widawsky wrote:
> There is nothing wrong with the code today, but as one modifies the code it
> turns out to be not too difficult to mess up the code, and this easy assertion
> should catch such driver implementation failures quickly.
> 
> Cc: Kristian Høgsberg 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 2 ++
>  1 file changed, 2 insertions(+)

Patch 8 is
Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/7] st/va: in VaPutImage only destroy previous buffer if pipe->create_video_buffer succeeds

2015-10-16 Thread Julien Isorce
If formats are not the same it seems to re-create the video
buffer with the right format.
But if the creation of this new video buffer fails the surface
loose its video buffer.
Let's just destroy the previous buffer on success.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/image.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index c7fbe1a..d3b0cd5 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -338,13 +338,22 @@ vlVaPutImage(VADriverContextP ctx, VASurfaceID surface, 
VAImageID image,
if (format == PIPE_FORMAT_NONE)
   return VA_STATUS_ERROR_OPERATION_FAILED;
 
-   if (surf->buffer == NULL || format != surf->buffer->buffer_format) {
+   if (format != surf->buffer->buffer_format) {
+  struct pipe_video_buffer *tmp_buf = NULL;
+  enum pipe_format old_surf_format = surf->templat.buffer_format;
+
+  surf->templat.buffer_format = format;
+  tmp_buf = drv->pipe->create_video_buffer(drv->pipe, &surf->templat);
+
+  if (!tmp_buf) {
+  surf->templat.buffer_format = old_surf_format;
+  return VA_STATUS_ERROR_ALLOCATION_FAILED;
+  }
+
   if (surf->buffer)
  surf->buffer->destroy(surf->buffer);
-  surf->templat.buffer_format = format;
-  surf->buffer = drv->pipe->create_video_buffer(drv->pipe, &surf->templat);
-  if (!surf->buffer)
- return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+  surf->buffer = tmp_buf;
}
 
views = surf->buffer->get_sampler_view_planes(surf->buffer);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes

2015-10-16 Thread Julien Isorce
Inspired from 
http://cgit.freedesktop.org/vaapi/intel-driver/tree/src/i965_drv_video.c

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/context.c|   5 +-
 src/gallium/state_trackers/va/surface.c| 288 -
 src/gallium/state_trackers/va/va_private.h |   7 +
 3 files changed, 249 insertions(+), 51 deletions(-)

diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index 8b003ae..8949d42 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -81,7 +81,10 @@ static struct VADriverVTable vtable =
&vlVaSetDisplayAttributes,
&vlVaBufferInfo,
&vlVaLockSurface,
-   &vlVaUnlockSurface
+   &vlVaUnlockSurface,
+   &vlVaGetSurfaceAttributes,
+   &vlVaCreateSurfaces2,
+   &vlVaQuerySurfaceAttributes
 };
 
 PUBLIC VAStatus
diff --git a/src/gallium/state_trackers/va/surface.c 
b/src/gallium/state_trackers/va/surface.c
index 8d4487b..be435cb 100644
--- a/src/gallium/state_trackers/va/surface.c
+++ b/src/gallium/state_trackers/va/surface.c
@@ -29,6 +29,8 @@
 #include "pipe/p_screen.h"
 #include "pipe/p_video_codec.h"
 
+#include "state_tracker/drm_driver.h"
+
 #include "util/u_memory.h"
 #include "util/u_handle_table.h"
 #include "util/u_rect.h"
@@ -36,6 +38,7 @@
 #include "util/u_surface.h"
 
 #include "vl/vl_compositor.h"
+#include "vl/vl_video_buffer.h"
 #include "vl/vl_winsys.h"
 
 #include "va_private.h"
@@ -44,56 +47,8 @@ VAStatus
 vlVaCreateSurfaces(VADriverContextP ctx, int width, int height, int format,
int num_surfaces, VASurfaceID *surfaces)
 {
-   struct pipe_video_buffer templat = {};
-   struct pipe_screen *pscreen;
-   vlVaDriver *drv;
-   int i;
-
-   if (!ctx)
-  return VA_STATUS_ERROR_INVALID_CONTEXT;
-
-   if (!(width && height))
-  return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT;
-
-   drv = VL_VA_DRIVER(ctx);
-   pscreen = VL_VA_PSCREEN(ctx);
-
-   templat.buffer_format = pscreen->get_video_param
-   (
-  pscreen,
-  PIPE_VIDEO_PROFILE_UNKNOWN,
-  PIPE_VIDEO_ENTRYPOINT_BITSTREAM,
-  PIPE_VIDEO_CAP_PREFERED_FORMAT
-   );
-   templat.chroma_format = ChromaToPipe(format);
-   templat.width = width;
-   templat.height = height;
-   templat.interlaced = pscreen->get_video_param
-   (
-  pscreen,
-  PIPE_VIDEO_PROFILE_UNKNOWN,
-  PIPE_VIDEO_ENTRYPOINT_BITSTREAM,
-  PIPE_VIDEO_CAP_PREFERS_INTERLACED
-   );
-
-   for (i = 0; i < num_surfaces; ++i) {
-  vlVaSurface *surf = CALLOC(1, sizeof(vlVaSurface));
-  if (!surf)
- goto no_res;
-
-  surf->templat = templat;
-  surf->buffer = drv->pipe->create_video_buffer(drv->pipe, &templat);
-  util_dynarray_init(&surf->subpics);
-  surfaces[i] = handle_table_add(drv->htab, surf);
-   }
-
-   return VA_STATUS_SUCCESS;
-
-no_res:
-   if (i)
-  vlVaDestroySurfaces(ctx, surfaces, i);
-
-   return VA_STATUS_ERROR_ALLOCATION_FAILED;
+return vlVaCreateSurfaces2(ctx, format, width, height, surfaces, 
num_surfaces,
+   NULL, 0);
 }
 
 VAStatus
@@ -349,3 +304,236 @@ vlVaUnlockSurface(VADriverContextP ctx, VASurfaceID 
surface)
 
return VA_STATUS_ERROR_UNIMPLEMENTED;
 }
+
+VAStatus
+vlVaGetSurfaceAttributes(VADriverContextP ctx, VAConfigID config,
+ VASurfaceAttrib *attrib_list, unsigned int 
num_attribs)
+{
+return VA_STATUS_ERROR_UNIMPLEMENTED; /* DEPRECATED */
+}
+
+VAStatus
+vlVaQuerySurfaceAttributes(VADriverContextP ctx, VAConfigID config,
+   VASurfaceAttrib *attrib_list, unsigned int 
*num_attribs)
+{
+VAStatus vaStatus = VA_STATUS_SUCCESS;
+vlVaDriver *drv = NULL;
+VASurfaceAttrib *attribs = NULL;
+struct pipe_screen *pscreen = NULL;
+int i = 0;
+
+if (config == VA_INVALID_ID)
+return VA_STATUS_ERROR_INVALID_CONFIG;
+
+if (!attrib_list && !num_attribs)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+if (attrib_list == NULL) {
+*num_attribs = VASurfaceAttribCount;
+return VA_STATUS_SUCCESS;
+}
+
+attribs = CALLOC(VASurfaceAttribCount, sizeof(VASurfaceAttrib));
+
+if (attribs == NULL)
+return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+if (!ctx)
+   return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+drv = VL_VA_DRIVER(ctx);
+
+if (!drv)
+return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+pscreen = VL_VA_PSCREEN(ctx);
+
+if (!pscreen)
+   return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+if (config == PIPE_VIDEO_PROFILE_UNKNOWN) {
+   /* Assume VAEntrypointVideoProc for now. */
+   attribs[i].type = VASurfaceAttribPixelFormat;
+   attribs[i].value.type = VAGenericValueTypeInteger;
+   attribs[i].flags = VA_SURFACE_ATTRIB_GETTABLE | 
VA_SURFACE_ATTRIB_SETTABLE;
+   attribs[i].value.value.i = VA_FOURCC_BGRA;
+   i++;
+
+   attribs[i].type = VASurfaceAttribPixelFormat;
+   attribs[i].value.type = VAGenericValueTypeInteger;
+  

[Mesa-dev] [PATCH 0/7] st/va: VPP, dmabuf import and headless.

2015-10-16 Thread Julien Isorce
This patch serie adds initial support for Video Post Processing.
It also implements VaCreateSurfaces2 for common purpose and
also to import a dmabuf.
Finally it adds support for headless mode, i.e. using DRM
instead of X11 for device setup.

Julien Isorce (7):
  nvc0: fix crash when nv50_miptree_from_handle fails
  st/va: properly defines VAImageFormat formats and improve
VaCreateImage
  st/va: in VaPutImage only destroy previous buffer if
pipe->create_video_buffer succeeds
  st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes
  st/va: implement dmabuf import for VaCreateSurfaces2
  st/va: add initial Video Post Processing support
  st/va: add headless support, i.e. VA_DISPLAY_DRM

 src/gallium/drivers/nouveau/nvc0/nvc0_resource.c |   3 +-
 src/gallium/state_trackers/va/Makefile.am|   9 +
 src/gallium/state_trackers/va/config.c   |  20 +
 src/gallium/state_trackers/va/context.c  | 148 ++--
 src/gallium/state_trackers/va/image.c|  29 +-
 src/gallium/state_trackers/va/picture.c  |  89 -
 src/gallium/state_trackers/va/surface.c  | 449 ---
 src/gallium/state_trackers/va/va_private.h   |  59 ++-
 8 files changed, 706 insertions(+), 100 deletions(-)

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/10] i965/skl: Enable fast color clears on SKL

2015-10-16 Thread Chad Versace
But this patch doesn't enable fast clears! The reverts in pathches 6 and
7 need to be folded into this patch, otherwise the patch does not do
what it claims.

Also, you can't enable fast clears before patches 4 and 5 without introducing
regressions. Patches 4 and 5 must precede this patch.

On Tue 13 Oct 2015, Ben Widawsky wrote:
> Based on a patch originally from Kristian. Skylake has extended capabilities
> with regard to fast clears, but that is saved for another patch.
> 
> The same effect could be acheived with the following, however I think the way
> I've done it is more in line with how the docs explain it.
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
>/* In release builds, fall through */
> case I915_TILING_Y:
>*width_px = 32 / mt->cpp;
> -  *height = 4;
> +  if (brw->gen >= 9)
> + *height = 2;
> +  else
> + *height = 4;
> 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 54 
> +
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  | 34 
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  9 +
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h   |  7 +++-
>  4 files changed, 78 insertions(+), 26 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index fbde3f0..7bf52f0 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -204,7 +204,7 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect 
> *rect, int num_instances)
>  }
>  
>  static void
> -get_fast_clear_rect(struct gl_framebuffer *fb,
> +get_fast_clear_rect(struct brw_context *brw, struct gl_framebuffer *fb,
>  struct intel_renderbuffer *irb, struct rect *rect)
>  {
> unsigned int x_align, y_align;
> @@ -228,7 +228,14 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
> */
>intel_get_non_msrt_mcs_alignment(irb->mt, &x_align, &y_align);
>x_align *= 16;
> -  y_align *= 32;
> +
> +  /* SKL+ line alignment requirement for Y-tiled are half those of the 
> prior
> +   * generations.
> +   */
> +  if (brw->gen >= 9)
> + y_align *= 16;
> +  else
> + y_align *= 32;
>  
>/* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
> * Target(s)", beneath the "Fast Color Clear" bullet (p327):
> @@ -265,8 +272,10 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
> * terms of (width,height) of the RT.
> *
> * MSAA  Width of Clear Rect  Height of Clear Rect
> +   *  2X Ceil(1/8*width)  Ceil(1/2*height)
> *  4X Ceil(1/8*width)  Ceil(1/2*height)
> *  8X Ceil(1/2*width)  Ceil(1/2*height)
> +   * 16X widthCeil(1/2*height)
> *
> * The text "with upper left co-ordinate to coincide with actual
> * rectangle being cleared" is a little confusing--it seems to imply
> @@ -289,6 +298,9 @@ get_fast_clear_rect(struct gl_framebuffer *fb,
>case 8:
>   x_scaledown = 2;
>   break;
> +  case 16:
> + x_scaledown = 1;
> + break;
>default:
>   unreachable("Unexpected sample count for fast clear");
>}
> @@ -358,18 +370,24 @@ is_color_fast_clear_compatible(struct brw_context *brw,
>  
>  /**
>   * Convert the given color to a bitfield suitable for ORing into DWORD 7 of
> - * SURFACE_STATE.
> + * SURFACE_STATE (DWORD 12-15 on SKL+).
>   */
> -static uint32_t
> -compute_fast_clear_color_bits(const union gl_color_union *color)
> +static void
> +set_fast_clear_color(struct brw_context *brw,
> + struct intel_mipmap_tree *mt,
> + const union gl_color_union *color)
>  {
> -   uint32_t bits = 0;
> -   for (int i = 0; i < 4; i++) {
> -  /* Testing for non-0 works for integer and float colors */
> -  if (color->f[i] != 0.0f)
> - bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
> +   if (brw->gen >= 9) {
> +  mt->gen9_fast_clear_color = *color;
> +   } else {
> +  mt->fast_clear_color_value = 0;
> +  for (int i = 0; i < 4; i++) {
> + /* Testing for non-0 works for integer and float colors */
> + if (color->f[i] != 0.0f)
> + mt->fast_clear_color_value |=
> +1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));

Please put braces round the multi-line if-statement.

> +  }
> }
> -   return bits;
>  }
>  
>  static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 };
> @@ -504,8 +522,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>  
>switch (clear_type) {
>case FAST_CLEAR:
> - irb->

Re: [Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats

2015-10-16 Thread Chad Versace
On Tue 13 Oct 2015, Ben Widawsky wrote:
> Initially I had this planned as a patch to be squashed in to the enabling 
> patch
> because there is no point enabling fast clears without this. However, Chad
> merged a patch which disables fast clears on gen9 explicitly, and so I can 
> hide
> this behind the revert of that patch. This is a nice I really wanted this 
> patch
> as a distinct patch for review. This is a new, weird, and poorly documented
> restriction for SKL. (In fact, I am still not 100% certain the restriction is
> entirely necessary, but there are around 30 piglit regressions without this).
> 
> SKL adds compressible render targets and as a result mutates some of the
> programming for fast clears and resolves. There is a new internal surface type
> called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary surface 
> is
> a CCS (Color Control Surface) with compression disabled or an MCS with
> compression enabled, depending on number of multisamples. MCS (Multisample
> Control Surface) is a special type of CCS."
> 
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/brw_context.h |  1 +
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 
> +
>  src/mesa/drivers/dri/i965/gen8_surface_state.c  |  8 ++--
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c   |  3 +++
>  4 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index e59478a..32b8250 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1546,6 +1546,7 @@ struct brw_context
>  
> uint32_t render_target_format[MESA_FORMAT_COUNT];
> bool format_supported_as_render_target[MESA_FORMAT_COUNT];
> +   bool losslessly_compressable[MESA_FORMAT_COUNT];

I agree with Neil. It's a shame to increase the context size for static
information. And there is already a static array in
brw_surface_formats.c for exactly this type of information.

>  
> /* Interpolation modes, one byte per vue slot.
>  * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+.
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 97fff60..d706ecc 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw)
>}
> }
>  
> +   if (brw->gen >= 9) {
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true;
> +  brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true;
> +   }

Properties of surface formats should go into the monster table that
occurs earlier in the file. Then you can replace
brw_context::losslessly_compressable with a query and keep brw_context
at its current size.

> +
> /* We will check this table for FBO completeness, but the surface format
>  * table above only covered color rendering.
>  */
> diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> index 995b4dd..b19b492 100644
> --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c
> @@ -243,8 +243,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw,
> * "When Auxiliary Surface Mode i

[Mesa-dev] [PATCH] r600g: Pass conservative depth parameters to hw

2015-10-16 Thread Glenn Kennard
Supported on R700 and up.

Signed-off-by: Glenn Kennard 
---
Not exactly a commonly used extension, but might as well set the
hardware registers rather than just dropping the hint on the floor.

 src/gallium/drivers/r600/evergreen_state.c | 13 +
 src/gallium/drivers/r600/evergreend.h  |  7 +++
 src/gallium/drivers/r600/r600_shader.c |  1 +
 src/gallium/drivers/r600/r600_shader.h |  2 ++
 src/gallium/drivers/r600/r600_state.c  | 15 +++
 src/gallium/drivers/r600/r600d.h   |  8 
 6 files changed, 46 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c6702a9..96c6b11 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2940,6 +2940,19 @@ void evergreen_update_ps_state(struct pipe_context *ctx, 
struct r600_pipe_shader
db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export);
db_shader_control |= S_02880C_MASK_EXPORT_ENABLE(mask_export);
 
+   switch (rshader->ps_conservative_z) {
+   default: /* fall through */
+   case TGSI_FS_DEPTH_LAYOUT_ANY:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z);
+   break;
+   case TGSI_FS_DEPTH_LAYOUT_GREATER:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_GREATER_THAN_Z);
+   break;
+   case TGSI_FS_DEPTH_LAYOUT_LESS:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_LESS_THAN_Z);
+   break;
+   }
+
exports_ps = 0;
for (i = 0; i < rshader->noutput; i++) {
if (rshader->output[i].name == TGSI_SEMANTIC_POSITION ||
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 937ffcb..a9a65f7 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -815,6 +815,13 @@
 #define V_02880C_EXPORT_DB_FOUR16  0x01
 #define V_02880C_EXPORT_DB_TWO 0x02
 #define   S_02880C_ALPHA_TO_MASK_DISABLE(x)(((x) & 0x1) << 12)
+#define   S_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) & 0x03) << 16)
+#define   G_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) >> 16) & 0x03)
+#define   C_02880C_CONSERVATIVE_Z_EXPORT   0xFFFC
+#define V_02880C_EXPORT_ANY_Z  0
+#define V_02880C_EXPORT_LESS_THAN_Z1
+#define V_02880C_EXPORT_GREATER_THAN_Z 2
+#define V_02880C_EXPORT_RESERVED   3
 
 #define R_028A00_PA_SU_POINT_SIZE0x028A00
 #define   S_028A00_HEIGHT(x)   (((x) & 0x) << 0)
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 8efe902..560696d 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -2048,6 +2048,7 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
 
shader->fs_write_all = 
ctx.info.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS];
shader->vs_position_window_space = 
ctx.info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION];
+   shader->ps_conservative_z = 
ctx.info.properties[TGSI_PROPERTY_FS_DEPTH_LAYOUT];
 
if (shader->vs_as_gs_a)
vs_add_primid_output(&ctx, key.vs.prim_id_out);
diff --git a/src/gallium/drivers/r600/r600_shader.h 
b/src/gallium/drivers/r600/r600_shader.h
index c240e71..e085263 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -76,6 +76,8 @@ struct r600_shader {
boolean uses_tex_buffers;
boolean gs_prim_id_input;
 
+   unsignedps_conservative_z;
+
/* Size in bytes of a data item in the ring(s) (single vertex data).
   Stages with only one ring items 123 will be set to 0. */
unsignedring_item_sizes[4];
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 1be3e1b..09b2325 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2533,6 +2533,21 @@ void r600_update_ps_state(struct pipe_context *ctx, 
struct r600_pipe_shader *sha
if (rshader->uses_kill)
db_shader_control |= S_02880C_KILL_ENABLE(1);
 
+   if (rctx->b.chip_class >= R700) {
+   switch (rshader->ps_conservative_z) {
+   default: /* fall through */
+   case TGSI_FS_DEPTH_LAYOUT_ANY:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z);
+   break;
+   case TGSI_FS_DEPTH_LAYOUT_GREATER:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_GREATER_THAN_Z);
+

[Mesa-dev] [PATCH] st/mesa: check for out-of-memory in st_DrawPixels()

2015-10-16 Thread Brian Paul
Before, if make_texture() or st_create_texture_sampler_view() failed
we silently no-op'd the glDrawPixels.  Now, set GL_OUT_OF_MEMORY.
This also allows us to un-nest a bunch of code.

v2: also check if allocation of sv[1] fails, per Jose.
---
 src/mesa/state_tracker/st_cb_drawpixels.c | 76 ++-
 1 file changed, 44 insertions(+), 32 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index e4d3580..79fb9ec 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -975,6 +975,7 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y,
int num_sampler_view = 1;
struct gl_pixelstore_attrib clippedUnpack;
struct st_fp_variant *fpv = NULL;
+   struct pipe_resource *pt;
 
/* Mesa state should be up to date by now */
assert(ctx->NewState == 0x0);
@@ -1030,42 +1031,53 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y,
   st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT);
}
 
-   /* draw with textured quad */
-   {
-  struct pipe_resource *pt
- = make_texture(st, width, height, format, type, unpack, pixels);
-  if (pt) {
- sv[0] = st_create_texture_sampler_view(st->pipe, pt);
-
- if (sv[0]) {
-/* Create a second sampler view to read stencil.
- * The stencil is written using the shader stencil export
- * functionality. */
-if (write_stencil) {
-   enum pipe_format stencil_format =
- util_format_stencil_only(pt->format);
-   /* we should not be doing pixel map/transfer (see above) */
-   assert(num_sampler_view == 1);
-   sv[1] = st_create_texture_sampler_view_format(st->pipe, pt,
- stencil_format);
-   num_sampler_view++;
-}
+   /* Put glDrawPixels image into a texture */
+   pt = make_texture(st, width, height, format, type, unpack, pixels);
+   if (!pt) {
+  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
+  return;
+   }
 
-draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2],
-   width, height,
-   ctx->Pixel.ZoomX, ctx->Pixel.ZoomY,
-   sv,
-   num_sampler_view,
-   driver_vp,
-   driver_fp, fpv,
-   color, GL_FALSE, write_depth, write_stencil);
-pipe_sampler_view_reference(&sv[0], NULL);
-if (num_sampler_view > 1)
-   pipe_sampler_view_reference(&sv[1], NULL);
- }
+   /* create sampler view for the image */
+   sv[0] = st_create_texture_sampler_view(st->pipe, pt);
+   if (!sv[0]) {
+  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
+  pipe_resource_reference(&pt, NULL);
+  return;
+   }
+
+   /* Create a second sampler view to read stencil.  The stencil is
+* written using the shader stencil export functionality.
+*/
+   if (write_stencil) {
+  enum pipe_format stencil_format =
+ util_format_stencil_only(pt->format);
+  /* we should not be doing pixel map/transfer (see above) */
+  assert(num_sampler_view == 1);
+  sv[1] = st_create_texture_sampler_view_format(st->pipe, pt,
+stencil_format);
+  if (!sv[1]) {
+ _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
  pipe_resource_reference(&pt, NULL);
+ pipe_sampler_view_reference(&sv[0], NULL);
+ return;
   }
+  num_sampler_view++;
}
+
+   draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2],
+  width, height,
+  ctx->Pixel.ZoomX, ctx->Pixel.ZoomY,
+  sv,
+  num_sampler_view,
+  driver_vp,
+  driver_fp, fpv,
+  color, GL_FALSE, write_depth, write_stencil);
+   pipe_sampler_view_reference(&sv[0], NULL);
+   if (num_sampler_view > 1)
+  pipe_sampler_view_reference(&sv[1], NULL);
+
+   pipe_resource_reference(&pt, NULL);
 }
 
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] tnl: add some comments in render_line_loop code

2015-10-16 Thread Brian Paul

On 10/16/2015 04:14 PM, Matt Turner wrote:

On Fri, Oct 16, 2015 at 2:25 PM, Brian Paul  wrote:

And remove '(void) flags' line which is not needed.
---
  src/mesa/tnl/t_vb_rendertmp.h | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/tnl/t_vb_rendertmp.h b/src/mesa/tnl/t_vb_rendertmp.h
index 44dee76..26a1695 100644
--- a/src/mesa/tnl/t_vb_rendertmp.h
+++ b/src/mesa/tnl/t_vb_rendertmp.h
@@ -124,19 +124,19 @@ static void TAG(render_line_loop)( struct gl_context *ctx,
 GLuint i;
 LOCAL_VARS;

-   (void) flags;
-
 INIT(GL_LINE_LOOP);

 if (start+1 < count) {
if (TEST_PRIM_BEGIN(flags)) {
  RESET_STIPPLE;
+ /* draw the first line from v[0] to v[1] */
   if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT)
  RENDER_LINE( ELT(start), ELT(start+1) );
   else
  RENDER_LINE( ELT(start+1), ELT(start) );
}

+  /* draw lines from v[1] to V[n-1] */


Presumably lower case v[n-1] here.


I'll fix that before pushing.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] st/mesa: check of out-of-memory in st_DrawPixels()

2015-10-16 Thread Brian Paul

On 10/16/2015 04:11 PM, Jose Fonseca wrote:

On 15/10/15 20:01, Brian Paul wrote:

Before, if make_texture() or st_create_texture_sampler_view() failed
we silently no-op'd the glDrawPixels.  Now, set GL_OUT_OF_MEMORY.
This also allows us to un-nest a bunch of code.
---
  src/mesa/state_tracker/st_cb_drawpixels.c | 74
+--
  1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c
b/src/mesa/state_tracker/st_cb_drawpixels.c
index e4d3580..05f6e6b 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -975,6 +975,7 @@ st_DrawPixels(struct gl_context *ctx, GLint x,
GLint y,
 int num_sampler_view = 1;
 struct gl_pixelstore_attrib clippedUnpack;
 struct st_fp_variant *fpv = NULL;
+   struct pipe_resource *pt;

 /* Mesa state should be up to date by now */
 assert(ctx->NewState == 0x0);
@@ -1030,42 +1031,47 @@ st_DrawPixels(struct gl_context *ctx, GLint x,
GLint y,
st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT);
 }

-   /* draw with textured quad */
-   {
-  struct pipe_resource *pt
- = make_texture(st, width, height, format, type, unpack,
pixels);
-  if (pt) {
- sv[0] = st_create_texture_sampler_view(st->pipe, pt);
-
- if (sv[0]) {
-/* Create a second sampler view to read stencil.
- * The stencil is written using the shader stencil export
- * functionality. */
-if (write_stencil) {
-   enum pipe_format stencil_format =
- util_format_stencil_only(pt->format);
-   /* we should not be doing pixel map/transfer (see
above) */
-   assert(num_sampler_view == 1);
-   sv[1] =
st_create_texture_sampler_view_format(st->pipe, pt,
-
stencil_format);
-   num_sampler_view++;
-}
+   /* Put glDrawPixels image into a texture */
+   pt = make_texture(st, width, height, format, type, unpack, pixels);
+   if (!pt) {
+  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
+  return;
+   }

-draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2],
-   width, height,
-   ctx->Pixel.ZoomX, ctx->Pixel.ZoomY,
-   sv,
-   num_sampler_view,
-   driver_vp,
-   driver_fp, fpv,
-   color, GL_FALSE, write_depth,
write_stencil);
-pipe_sampler_view_reference(&sv[0], NULL);
-if (num_sampler_view > 1)
-   pipe_sampler_view_reference(&sv[1], NULL);
- }
- pipe_resource_reference(&pt, NULL);
-  }
+   /* create sampler view for the image */
+   sv[0] = st_create_texture_sampler_view(st->pipe, pt);
+   if (!sv[0]) {
+  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
+  pipe_resource_reference(&pt, NULL);
+  return;
 }
+
+   /* Create a second sampler view to read stencil.  The stencil is
+* written using the shader stencil export functionality.
+*/
+   if (write_stencil) {
+  enum pipe_format stencil_format =
+ util_format_stencil_only(pt->format);
+  /* we should not be doing pixel map/transfer (see above) */
+  assert(num_sampler_view == 1);
+  sv[1] = st_create_texture_sampler_view_format(st->pipe, pt,
+stencil_format);


Should check null sv[1] here too.


Yes, I'll post a v2.

-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 0/6] nouveau: add support for vaapi

2015-10-16 Thread Ilia Mirkin
On Fri, Oct 16, 2015 at 6:13 PM, Julien Isorce  wrote:
>
>
> On 18 September 2015 at 21:34, Ilia Mirkin  wrote:
>>
>> On Fri, Sep 18, 2015 at 4:29 PM, Julien Isorce 
>> wrote:
>> >
>> >
>> > On 17 September 2015 at 17:52, Ilia Mirkin  wrote:
>> >>
>> >> On Wed, Sep 16, 2015 at 8:22 AM, Julien Isorce 
>> >> wrote:
>> >> > I added below version4 updates. It works for all codecs expect h264.
>> >> > Video is visible but lot of blockiness.
>> >> > Can someone with a Radeon confirm that "LIBVA_DRIVER_NAME=gallium mpv
>> >> > --hwdec=vaapi"
>> >> > is working on h264 videos ?
>> >> > I want to make sure it is not a bug in st/va.
>> >>
>> >> The sad reality is that h264 is the only thing that matters (at least
>> >> from this list of supported codecs). My concern is that this series
>> >> will regress the situation for people who want to use VA-API -- right
>> >> now they can use the vdpau <-> vaapi adapter, whereas with this patch
>> >> series, they will end up with a va-api driver that doesn't work. So I
>> >> can't merge this as-is.
>> >
>> >
>> > Make perfectly sense.
>> >
>> >>
>> >>
>> >> Are the various lengths (for inter-bo size/etc) being computed
>> >> properly
>> >
>> >
>> > In the past I compared the final content of the nouveau_bo buffer at
>> > each
>> > endFrame step , with the content using vdpau. There were the same.
>> > I will re-check.
>> > Does it make sense to do that actually ?
>> > Is there anything else I could compare with vdpau ?
>>
>> If you're feeding the exact same stuff and everything is exactly the
>> same, then the results would also be the same. Clearly there's SOME
>> difference SOMEWHERE :)
>
>
> Indeed I compared only dec->bsp_bo, not dec->inter_bo :) I'll check that.
> But in the first place I do not see where dec->inter_bo is filled. Could you
> point out where this is done ?

nouveau doesn't fill inter_bo -- it's a bo shared between the VLD and
VDEC engines iirc.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] tnl: add some comments in render_line_loop code

2015-10-16 Thread Matt Turner
On Fri, Oct 16, 2015 at 2:25 PM, Brian Paul  wrote:
> And remove '(void) flags' line which is not needed.
> ---
>  src/mesa/tnl/t_vb_rendertmp.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/tnl/t_vb_rendertmp.h b/src/mesa/tnl/t_vb_rendertmp.h
> index 44dee76..26a1695 100644
> --- a/src/mesa/tnl/t_vb_rendertmp.h
> +++ b/src/mesa/tnl/t_vb_rendertmp.h
> @@ -124,19 +124,19 @@ static void TAG(render_line_loop)( struct gl_context 
> *ctx,
> GLuint i;
> LOCAL_VARS;
>
> -   (void) flags;
> -
> INIT(GL_LINE_LOOP);
>
> if (start+1 < count) {
>if (TEST_PRIM_BEGIN(flags)) {
>  RESET_STIPPLE;
> + /* draw the first line from v[0] to v[1] */
>   if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT)
>  RENDER_LINE( ELT(start), ELT(start+1) );
>   else
>  RENDER_LINE( ELT(start+1), ELT(start) );
>}
>
> +  /* draw lines from v[1] to V[n-1] */

Presumably lower case v[n-1] here.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 0/6] nouveau: add support for vaapi

2015-10-16 Thread Julien Isorce
On 18 September 2015 at 21:34, Ilia Mirkin  wrote:

> On Fri, Sep 18, 2015 at 4:29 PM, Julien Isorce 
> wrote:
> >
> >
> > On 17 September 2015 at 17:52, Ilia Mirkin  wrote:
> >>
> >> On Wed, Sep 16, 2015 at 8:22 AM, Julien Isorce 
> >> wrote:
> >> > I added below version4 updates. It works for all codecs expect h264.
> >> > Video is visible but lot of blockiness.
> >> > Can someone with a Radeon confirm that "LIBVA_DRIVER_NAME=gallium mpv
> >> > --hwdec=vaapi"
> >> > is working on h264 videos ?
> >> > I want to make sure it is not a bug in st/va.
> >>
> >> The sad reality is that h264 is the only thing that matters (at least
> >> from this list of supported codecs). My concern is that this series
> >> will regress the situation for people who want to use VA-API -- right
> >> now they can use the vdpau <-> vaapi adapter, whereas with this patch
> >> series, they will end up with a va-api driver that doesn't work. So I
> >> can't merge this as-is.
> >
> >
> > Make perfectly sense.
> >
> >>
> >>
> >> Are the various lengths (for inter-bo size/etc) being computed
> >> properly
> >
> >
> > In the past I compared the final content of the nouveau_bo buffer at each
> > endFrame step , with the content using vdpau. There were the same.
> > I will re-check.
> > Does it make sense to do that actually ?
> > Is there anything else I could compare with vdpau ?
>
> If you're feeding the exact same stuff and everything is exactly the
> same, then the results would also be the same. Clearly there's SOME
> difference SOMEWHERE :)
>

Indeed I compared only dec->bsp_bo, not dec->inter_bo :) I'll check that.
But in the first place I do not see where dec->inter_bo is filled. Could
you point out where this is done ?


>
> >
> >>
> >> Are you writing stuff to the correct inter bo? IIRC we flip
> >> between two of them, perhaps that logic got upset?
> >
> >
> > Probably I missed something. Does the flip happen at each endFrame ?
> > Could you point out where this flip is in the current upstream code
> exactly
> > ?
>

> I mean stuff like this:
>
>struct nouveau_bo *bsp_bo = dec->bsp_bo[comm_seq %
> NOUVEAU_VP3_VIDEO_QDEPTH];
>

I see what you meant by flip now.

>
> src/gallium/drivers/nouveau/nvc0/nvc0_video_bsp.c:  bo_refs[1].bo
> = dec->inter_bo[comm_seq & 1] = inter_bo = tmp_bo;
>
> and so on. Make sure that comm_seq is incremented once per frame, not
> once per chunk :)
>

Yes it is incremented just once per frame, not per chunk. See
"nvc0_decoder_begin_frame" in "[PATCH v4 2/6] nvc0: add support for st/va"

Thx for your comments
Julien


>
>   -ilia
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] st/mesa: check of out-of-memory in st_DrawPixels()

2015-10-16 Thread Jose Fonseca

On 15/10/15 20:01, Brian Paul wrote:

Before, if make_texture() or st_create_texture_sampler_view() failed
we silently no-op'd the glDrawPixels.  Now, set GL_OUT_OF_MEMORY.
This also allows us to un-nest a bunch of code.
---
  src/mesa/state_tracker/st_cb_drawpixels.c | 74 +--
  1 file changed, 40 insertions(+), 34 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index e4d3580..05f6e6b 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -975,6 +975,7 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y,
 int num_sampler_view = 1;
 struct gl_pixelstore_attrib clippedUnpack;
 struct st_fp_variant *fpv = NULL;
+   struct pipe_resource *pt;

 /* Mesa state should be up to date by now */
 assert(ctx->NewState == 0x0);
@@ -1030,42 +1031,47 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y,
st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT);
 }

-   /* draw with textured quad */
-   {
-  struct pipe_resource *pt
- = make_texture(st, width, height, format, type, unpack, pixels);
-  if (pt) {
- sv[0] = st_create_texture_sampler_view(st->pipe, pt);
-
- if (sv[0]) {
-/* Create a second sampler view to read stencil.
- * The stencil is written using the shader stencil export
- * functionality. */
-if (write_stencil) {
-   enum pipe_format stencil_format =
- util_format_stencil_only(pt->format);
-   /* we should not be doing pixel map/transfer (see above) */
-   assert(num_sampler_view == 1);
-   sv[1] = st_create_texture_sampler_view_format(st->pipe, pt,
- stencil_format);
-   num_sampler_view++;
-}
+   /* Put glDrawPixels image into a texture */
+   pt = make_texture(st, width, height, format, type, unpack, pixels);
+   if (!pt) {
+  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
+  return;
+   }

-draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2],
-   width, height,
-   ctx->Pixel.ZoomX, ctx->Pixel.ZoomY,
-   sv,
-   num_sampler_view,
-   driver_vp,
-   driver_fp, fpv,
-   color, GL_FALSE, write_depth, write_stencil);
-pipe_sampler_view_reference(&sv[0], NULL);
-if (num_sampler_view > 1)
-   pipe_sampler_view_reference(&sv[1], NULL);
- }
- pipe_resource_reference(&pt, NULL);
-  }
+   /* create sampler view for the image */
+   sv[0] = st_create_texture_sampler_view(st->pipe, pt);
+   if (!sv[0]) {
+  _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels");
+  pipe_resource_reference(&pt, NULL);
+  return;
 }
+
+   /* Create a second sampler view to read stencil.  The stencil is
+* written using the shader stencil export functionality.
+*/
+   if (write_stencil) {
+  enum pipe_format stencil_format =
+ util_format_stencil_only(pt->format);
+  /* we should not be doing pixel map/transfer (see above) */
+  assert(num_sampler_view == 1);
+  sv[1] = st_create_texture_sampler_view_format(st->pipe, pt,
+stencil_format);


Should check null sv[1] here too.


+  num_sampler_view++;
+   }
+
+   draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2],
+  width, height,
+  ctx->Pixel.ZoomX, ctx->Pixel.ZoomY,
+  sv,
+  num_sampler_view,
+  driver_vp,
+  driver_fp, fpv,
+  color, GL_FALSE, write_depth, write_stencil);
+   pipe_sampler_view_reference(&sv[0], NULL);
+   if (num_sampler_view > 1)
+  pipe_sampler_view_reference(&sv[1], NULL);
+
+   pipe_resource_reference(&pt, NULL);
  }






Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: fix incorrect opcode in save_BlendFunci()

2015-10-16 Thread Jose Fonseca

On 15/10/15 15:51, Brian Paul wrote:

Fixes assertion failure with new piglit
arb_draw_buffers_blend-state_set_get test.

Cc: mesa-sta...@lists.freedesktop.org
---
  src/mesa/main/dlist.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/dlist.c b/src/mesa/main/dlist.c
index fdb839c..2b65b2e 100644
--- a/src/mesa/main/dlist.c
+++ b/src/mesa/main/dlist.c
@@ -1400,7 +1400,7 @@ save_BlendFunci(GLuint buf, GLenum sfactor, GLenum 
dfactor)
 GET_CURRENT_CONTEXT(ctx);
 Node *n;
 ASSERT_OUTSIDE_SAVE_BEGIN_END_AND_FLUSH(ctx);
-   n = alloc_instruction(ctx, OPCODE_BLEND_FUNC_SEPARATE_I, 3);
+   n = alloc_instruction(ctx, OPCODE_BLEND_FUNC_I, 3);
 if (n) {
n[1].ui = buf;
n[2].e = sfactor;



Series is

Reviewed-by: Jose Fonseca 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi

2015-10-16 Thread Samuel Pitoiset



On 10/16/2015 11:57 PM, Ilia Mirkin wrote:

On Fri, Oct 16, 2015 at 5:35 PM, Samuel Pitoiset
 wrote:


On 10/16/2015 11:22 PM, Ilia Mirkin wrote:

On Fri, Oct 16, 2015 at 5:29 PM, Samuel Pitoiset
 wrote:

As explained in the CUDA toolkit documentation, "a metric is a
characteristic of an application that is calculated from one or more
event values."

Signed-off-by: Samuel Pitoiset 
---
   src/gallium/drivers/nouveau/Makefile.sources   |   2 +
   src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c   |  19 +-
   .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444
+
   .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h|  42 ++
   4 files changed, 504 insertions(+), 3 deletions(-)
   create mode 100644
src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
   create mode 100644
src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources
b/src/gallium/drivers/nouveau/Makefile.sources
index edc6cf4..c18e9f5 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -154,6 +154,8 @@ NVC0_C_SOURCES := \
  nvc0/nvc0_query.h \
  nvc0/nvc0_query_hw.c \
  nvc0/nvc0_query_hw.h \
+   nvc0/nvc0_query_hw_metric.c \
+   nvc0/nvc0_query_hw_metric.h \
  nvc0/nvc0_query_hw_sm.c \
  nvc0/nvc0_query_hw_sm.h \
  nvc0/nvc0_query_sw.c \
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
index 91254be..90ee82f 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
@@ -25,6 +25,7 @@

   #include "nvc0/nvc0_context.h"
   #include "nvc0/nvc0_query_hw.h"
+#include "nvc0/nvc0_query_hw_metric.h"
   #include "nvc0/nvc0_query_hw_sm.h"

   #define NVC0_HW_QUERY_STATE_READY   0
@@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0,
unsigned type, unsigned index)
 return (struct nvc0_query *)hq;
  }

+   hq = nvc0_hw_metric_create_query(nvc0, type);
+   if (hq) {
+  hq->base.funcs = &hw_query_funcs;
+  return (struct nvc0_query *)hq;
+   }
+
  hq = CALLOC_STRUCT(nvc0_hw_query);
  if (!hq)
 return NULL;
@@ -435,14 +442,20 @@ int
   nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id,
 struct pipe_driver_query_info *info)
   {
-   int num_hw_sm_queries = 0;
+   int num_hw_sm_queries = 0, num_hw_metric_queries = 0;

  num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0,
NULL);
+   num_hw_metric_queries =
+  nvc0_hw_metric_get_driver_query_info(screen, 0, NULL);

  if (!info)
-  return num_hw_sm_queries;
+  return num_hw_sm_queries + num_hw_metric_queries;
+
+   if (id < num_hw_sm_queries)
+  return nvc0_hw_sm_get_driver_query_info(screen, id, info);

-   return nvc0_hw_sm_get_driver_query_info(screen, id, info);
+   return nvc0_hw_metric_get_driver_query_info(screen,
+   id - num_hw_sm_queries,
info);
   }

   void
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
new file mode 100644
index 000..dbe350a
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
@@ -0,0 +1,444 @@
+/*
+ * Copyright 2015 Samuel Pitoiset
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining
a
+ * copy of this software and associated documentation files (the
"Software"),
+ * to deal in the Software without restriction, including without
limitation
+ * the rights to use, copy, modify, merge, publish, distribute,
sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "nvc0/nvc0_context.h"
+#include "nvc0/nvc0_query_hw_metric.h"
+#include "nvc0/nvc0_query_hw_sm.h"
+
+/* === PERFORMANCE MONITORING METRICS for NVC0:NVE4 === */
+static const char *nvc0_hw_metric_names[] =
+{
+   "metric-achieved_occupancy",
+   "metric-branch_efficiency",
+   "metric-inst_issued",
+   "metric-inst_per_wrap",
+   "metric-inst_replay_overhead",
+   "metric-issued_ipc",
+   "metric-issue_slots",
+   "metric-issue_slot_utilization",
+   "metric-i

Re: [Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi

2015-10-16 Thread Ilia Mirkin
On Fri, Oct 16, 2015 at 5:35 PM, Samuel Pitoiset
 wrote:
>
>
> On 10/16/2015 11:22 PM, Ilia Mirkin wrote:
>>
>> On Fri, Oct 16, 2015 at 5:29 PM, Samuel Pitoiset
>>  wrote:
>>>
>>> As explained in the CUDA toolkit documentation, "a metric is a
>>> characteristic of an application that is calculated from one or more
>>> event values."
>>>
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>   src/gallium/drivers/nouveau/Makefile.sources   |   2 +
>>>   src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c   |  19 +-
>>>   .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444
>>> +
>>>   .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h|  42 ++
>>>   4 files changed, 504 insertions(+), 3 deletions(-)
>>>   create mode 100644
>>> src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
>>>   create mode 100644
>>> src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h
>>>
>>> diff --git a/src/gallium/drivers/nouveau/Makefile.sources
>>> b/src/gallium/drivers/nouveau/Makefile.sources
>>> index edc6cf4..c18e9f5 100644
>>> --- a/src/gallium/drivers/nouveau/Makefile.sources
>>> +++ b/src/gallium/drivers/nouveau/Makefile.sources
>>> @@ -154,6 +154,8 @@ NVC0_C_SOURCES := \
>>>  nvc0/nvc0_query.h \
>>>  nvc0/nvc0_query_hw.c \
>>>  nvc0/nvc0_query_hw.h \
>>> +   nvc0/nvc0_query_hw_metric.c \
>>> +   nvc0/nvc0_query_hw_metric.h \
>>>  nvc0/nvc0_query_hw_sm.c \
>>>  nvc0/nvc0_query_hw_sm.h \
>>>  nvc0/nvc0_query_sw.c \
>>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
>>> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
>>> index 91254be..90ee82f 100644
>>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
>>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
>>> @@ -25,6 +25,7 @@
>>>
>>>   #include "nvc0/nvc0_context.h"
>>>   #include "nvc0/nvc0_query_hw.h"
>>> +#include "nvc0/nvc0_query_hw_metric.h"
>>>   #include "nvc0/nvc0_query_hw_sm.h"
>>>
>>>   #define NVC0_HW_QUERY_STATE_READY   0
>>> @@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0,
>>> unsigned type, unsigned index)
>>> return (struct nvc0_query *)hq;
>>>  }
>>>
>>> +   hq = nvc0_hw_metric_create_query(nvc0, type);
>>> +   if (hq) {
>>> +  hq->base.funcs = &hw_query_funcs;
>>> +  return (struct nvc0_query *)hq;
>>> +   }
>>> +
>>>  hq = CALLOC_STRUCT(nvc0_hw_query);
>>>  if (!hq)
>>> return NULL;
>>> @@ -435,14 +442,20 @@ int
>>>   nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id,
>>> struct pipe_driver_query_info *info)
>>>   {
>>> -   int num_hw_sm_queries = 0;
>>> +   int num_hw_sm_queries = 0, num_hw_metric_queries = 0;
>>>
>>>  num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0,
>>> NULL);
>>> +   num_hw_metric_queries =
>>> +  nvc0_hw_metric_get_driver_query_info(screen, 0, NULL);
>>>
>>>  if (!info)
>>> -  return num_hw_sm_queries;
>>> +  return num_hw_sm_queries + num_hw_metric_queries;
>>> +
>>> +   if (id < num_hw_sm_queries)
>>> +  return nvc0_hw_sm_get_driver_query_info(screen, id, info);
>>>
>>> -   return nvc0_hw_sm_get_driver_query_info(screen, id, info);
>>> +   return nvc0_hw_metric_get_driver_query_info(screen,
>>> +   id - num_hw_sm_queries,
>>> info);
>>>   }
>>>
>>>   void
>>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
>>> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
>>> new file mode 100644
>>> index 000..dbe350a
>>> --- /dev/null
>>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
>>> @@ -0,0 +1,444 @@
>>> +/*
>>> + * Copyright 2015 Samuel Pitoiset
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining
>>> a
>>> + * copy of this software and associated documentation files (the
>>> "Software"),
>>> + * to deal in the Software without restriction, including without
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute,
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be
>>> included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
>>> SHALL
>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + */
>>> +
>>> +#include "nvc0/nvc0_context.h"
>>> +#include "nvc0/nvc

[Mesa-dev] [Bug 81174] Gallium: GL_LINE_LOOP broken with more than 512 points

2015-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=81174

--- Comment #16 from Brian Paul  ---
I'm digging into this bug because it pertains to an issue with a particular app
and the VMware gallium driver.

The VBO code for splitting GL_LINE_LOOP is actually correct, I believe, but our
implementations of vbo_context::draw_prims(), such as st_draw_vbo() and
brw_draw_prims() are subtly broken.  And has been broken since day one!

The issue comes from the two 'begin' and 'end' flags in the _mesa_prim
structure.  These flags indicate whether the primitive's vertices start at a
glBegin() and whether the prim's vertices end at a glEnd().  Suppose we have a
long GL_LINE_LOOP that gets broken into three pieces.  Here are the flags for
the three _mesa_prims that we draw:

_mesa_prim[0].begin = 1
_mesa_prim[0].end = 0
_mesa_prim[1].begin = 0
_mesa_prim[1].end = 0
_mesa_prim[2].begin = 0
_mesa_prim[2].end = 1

For all three drawing calls, the 0th vertex in the primitive's vertex buffer
will be a copy of the first glVertex() issued after glBegin.  If N is the
number of vertices in the _mesa_prim:

For _mesa_prim[0] we should draw the line segments from v[0] .. V[N-1]
For _mesa_prim[1] we should draw the line segments from v[1] .. V[N-1]
For _mesa_prim[2] we should draw the line segments from v[1] .. V[N-1] and an
extra line from V[N-1] to v[0]

You can see this in the old 'tnl' code's t_vb_rendertmp.h code for
GL_LINE_LOOP.

Our implementations of draw_prims() ignore those flags and always draw
V[0].V[N-1] so we get the stray lines that people are seeing.

Furthermore, draw_prims() is supposed to look at the 'begin' flag to know when
to reset the line stipple counter.  We don't do that in the state tracker
either.

I've posted a patch series that fixes this.  The basic idea is when we have to
split a GL_LINE_LOOP, draw the pieces with GL_LINE_STRIP instead so that
drivers don't need to worry about the 'begin' and 'end' flags (except where the
stipple counter matters).  Drivers will only get a GL_LINE_LOOP when all the
vertices live in one vertex buffer.

Unfortunately, after fixing the VBO code, there's still a bug somewhere in the
gallium 'draw' code.  See comments in the patch series for more information.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/10] vbo: convert display list GL_LINE_LOOP prims to GL_LINE_STRIP

2015-10-16 Thread Brian Paul
When a long GL_LINE_LOOP prim was split across primitives we drew
stray lines.  See previous commit for details.

This patch converts GL_LINE_LOOP prims into GL_LINE_STRIP prims so
that drivers don't have to worry about the _mesa_prim::begin/end flags.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174
---
 src/mesa/vbo/vbo_save_api.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
index 6688ba0..d49aa15 100644
--- a/src/mesa/vbo/vbo_save_api.c
+++ b/src/mesa/vbo/vbo_save_api.c
@@ -360,6 +360,51 @@ merge_prims(struct _mesa_prim *prim_list,
*prim_count = prev_prim - prim_list + 1;
 }
 
+
+/**
+ * Convert GL_LINE_LOOP primitive into GL_LINE_STRIP so that drivers
+ * don't have to worry about handling the _mesa_prim::begin/end flags.
+ * See https://bugs.freedesktop.org/show_bug.cgi?id=81174
+ */
+static void
+convert_line_loop_to_strip(struct vbo_save_context *save,
+   struct vbo_save_vertex_list *node)
+{
+   struct _mesa_prim *prim = &node->prim[node->prim_count - 1];
+
+   assert(prim->mode == GL_LINE_LOOP);
+
+   if (prim->end) {
+  /* Copy the 0th vertex to end of the buffer and extend the
+   * vertex count by one to finish the line loop.
+   */
+  const GLuint sz = save->vertex_size;
+  /* 0th vertex: */
+  const fi_type *src = save->buffer + prim->start * sz;
+  /* end of buffer: */
+  fi_type *dst = save->buffer + (prim->start + prim->count) * sz;
+
+  memcpy(dst, src, sz * sizeof(float));
+
+  prim->count++;
+  node->count++;
+  save->vert_count++;
+  save->buffer_ptr += sz;
+  save->vertex_store->used += sz;
+   }
+
+   if (!prim->begin) {
+  /* Drawing the second or later section of a long line loop.
+   * Skip the 0th vertex.
+   */
+  prim->start++;
+  prim->count--;
+   }
+
+   prim->mode = GL_LINE_STRIP;
+}
+
+
 /**
  * Insert the active immediate struct onto the display list currently
  * being built.
@@ -441,6 +486,10 @@ _save_compile_vertex_list(struct gl_context *ctx)
 */
save->copied.nr = _save_copy_vertices(ctx, node, save->buffer);
 
+   if (node->prim[node->prim_count - 1].mode == GL_LINE_LOOP) {
+  convert_line_loop_to_strip(save, node);
+   }
+
merge_prims(node->prim, &node->prim_count);
 
/* Deal with GL_COMPILE_AND_EXECUTE:
@@ -482,6 +531,10 @@ _save_compile_vertex_list(struct gl_context *ctx)
   save->buffer_ptr = vbo_save_map_vertex_store(ctx, save->vertex_store);
   save->out_of_memory = save->buffer_ptr == NULL;
}
+   else {
+  /* update buffer_ptr for next vertex */
+  save->buffer_ptr = save->vertex_store->buffer + save->vertex_store->used;
+   }
 
if (save->prim_store->used > VBO_SAVE_PRIM_SIZE - 6) {
   save->prim_store->refcount--;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/10] vbo: replace the comment on vbo_copy_vertices()

2015-10-16 Thread Brian Paul
---
 src/mesa/vbo/vbo_exec_draw.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c
index 174cbc3..781991b 100644
--- a/src/mesa/vbo/vbo_exec_draw.c
+++ b/src/mesa/vbo/vbo_exec_draw.c
@@ -64,9 +64,13 @@ vbo_exec_debug_verts( struct vbo_exec_context *exec )
 }
 
 
-/*
- * NOTE: Need to have calculated primitives by this point -- do it on the fly.
- * NOTE: Old 'parity' issue is gone.
+/**
+ * Copy zero, one or two vertices from the current vertex buffer into
+ * the temporary "copy" buffer.
+ * This is used when a single primitive overflows a vertex buffer and
+ * we need to continue the primitive in a new vertex buffer.
+ * The temporary "copy" buffer holds the vertices which need to get
+ * copied from the old buffer to the new one.
  */
 static GLuint
 vbo_copy_vertices( struct vbo_exec_context *exec )
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/10] vbo: remove unneeded ctx parameter for merge_prims()

2015-10-16 Thread Brian Paul
---
 src/mesa/vbo/vbo_save_api.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
index fdc677f..6688ba0 100644
--- a/src/mesa/vbo/vbo_save_api.c
+++ b/src/mesa/vbo/vbo_save_api.c
@@ -330,8 +330,7 @@ _save_reset_counters(struct gl_context *ctx)
  * previous prim.
  */
 static void
-merge_prims(struct gl_context *ctx,
-struct _mesa_prim *prim_list,
+merge_prims(struct _mesa_prim *prim_list,
 GLuint *prim_count)
 {
GLuint i;
@@ -442,7 +441,7 @@ _save_compile_vertex_list(struct gl_context *ctx)
 */
save->copied.nr = _save_copy_vertices(ctx, node, save->buffer);
 
-   merge_prims(ctx, node->prim, &node->prim_count);
+   merge_prims(node->prim, &node->prim_count);
 
/* Deal with GL_COMPILE_AND_EXECUTE:
 */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/10] vbo: simplify some code in vbo_exec_End()

2015-10-16 Thread Brian Paul
---
 src/mesa/vbo/vbo_exec_api.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
index 2a78eac..903aa42 100644
--- a/src/mesa/vbo/vbo_exec_api.c
+++ b/src/mesa/vbo/vbo_exec_api.c
@@ -823,11 +823,10 @@ static void GLAPIENTRY vbo_exec_End( void )
 
if (exec->vtx.prim_count > 0) {
   /* close off current primitive */
-  int idx = exec->vtx.vert_count;
-  int i = exec->vtx.prim_count - 1;
+  struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1];
 
-  exec->vtx.prim[i].end = 1;
-  exec->vtx.prim[i].count = idx - exec->vtx.prim[i].start;
+  last_prim->end = 1;
+  last_prim->count = exec->vtx.vert_count - last_prim->start;
 
   try_vbo_merge(exec);
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/10] vbo: add new vbo_compute_max_verts() helper function

2015-10-16 Thread Brian Paul
---
 src/mesa/vbo/vbo_context.h   | 14 ++
 src/mesa/vbo/vbo_exec_api.c  |  3 +--
 src/mesa/vbo/vbo_exec_draw.c |  3 +--
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/mesa/vbo/vbo_context.h b/src/mesa/vbo/vbo_context.h
index a376efe..1e85335 100644
--- a/src/mesa/vbo/vbo_context.h
+++ b/src/mesa/vbo/vbo_context.h
@@ -196,6 +196,20 @@ vbo_get_default_vals_as_union(GLenum format)
}
 }
 
+
+/**
+ * Compute the max number of vertices which can be stored in
+ * a vertex buffer, given the current vertex size, and the amount
+ * of space already used.
+ */
+static inline unsigned
+vbo_compute_max_verts(const struct vbo_exec_context *exec)
+{
+   return (VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used) /
+  (exec->vtx.vertex_size * sizeof(GLfloat));
+}
+
+
 #ifdef __cplusplus
 } // extern "C"
 #endif
diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
index 903aa42..0116f56 100644
--- a/src/mesa/vbo/vbo_exec_api.c
+++ b/src/mesa/vbo/vbo_exec_api.c
@@ -293,8 +293,7 @@ vbo_exec_wrap_upgrade_vertex(struct vbo_exec_context *exec,
 */
exec->vtx.attrsz[attr] = newSize;
exec->vtx.vertex_size += newSize - oldSize;
-   exec->vtx.max_vert = ((VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used) / 
- (exec->vtx.vertex_size * sizeof(GLfloat)));
+   exec->vtx.max_vert = vbo_compute_max_verts(exec);
exec->vtx.vert_count = 0;
exec->vtx.buffer_ptr = exec->vtx.buffer_map;
 
diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c
index 412ebb6..397fc09 100644
--- a/src/mesa/vbo/vbo_exec_draw.c
+++ b/src/mesa/vbo/vbo_exec_draw.c
@@ -436,8 +436,7 @@ vbo_exec_vtx_flush(struct vbo_exec_context *exec, GLboolean 
keepUnmapped)
if (keepUnmapped || exec->vtx.vertex_size == 0)
   exec->vtx.max_vert = 0;
else
-  exec->vtx.max_vert = ((VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used) /
-(exec->vtx.vertex_size * sizeof(GLfloat)));
+  exec->vtx.max_vert = vbo_compute_max_verts(exec);
 
exec->vtx.buffer_ptr = exec->vtx.buffer_map;
exec->vtx.prim_count = 0;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/10] vbo: simplify some code in vbo_exec_debug_verts()

2015-10-16 Thread Brian Paul
As before, use a new 'last_prim' pointer to simplify things.  Plus, add
some const qualifiers.
---
 src/mesa/vbo/vbo_exec_draw.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c
index 781991b..412ebb6 100644
--- a/src/mesa/vbo/vbo_exec_draw.c
+++ b/src/mesa/vbo/vbo_exec_draw.c
@@ -75,13 +75,13 @@ vbo_exec_debug_verts( struct vbo_exec_context *exec )
 static GLuint
 vbo_copy_vertices( struct vbo_exec_context *exec )
 {
-   GLuint nr = exec->vtx.prim[exec->vtx.prim_count-1].count;
+   struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1];
+   const GLuint nr = last_prim->count;
GLuint ovf, i;
-   GLuint sz = exec->vtx.vertex_size;
+   const GLuint sz = exec->vtx.vertex_size;
fi_type *dst = exec->vtx.copied.buffer;
-   const fi_type *src = (exec->vtx.buffer_map +
- exec->vtx.prim[exec->vtx.prim_count-1].start * 
- exec->vtx.vertex_size);
+   const fi_type *src =
+  exec->vtx.buffer_map + last_prim->start * exec->vtx.vertex_size;
 
switch (exec->ctx->Driver.CurrentExecPrimitive) {
case GL_POINTS:
@@ -127,7 +127,7 @@ vbo_copy_vertices( struct vbo_exec_context *exec )
case GL_TRIANGLE_STRIP:
   /* no parity issue, but need to make sure the tri is not drawn twice */
   if (nr & 1) {
-exec->vtx.prim[exec->vtx.prim_count-1].count--;
+last_prim->count--;
   }
   /* fallthrough */
case GL_QUAD_STRIP:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/10] tnl: add some comments in render_line_loop code

2015-10-16 Thread Brian Paul
And remove '(void) flags' line which is not needed.
---
 src/mesa/tnl/t_vb_rendertmp.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/tnl/t_vb_rendertmp.h b/src/mesa/tnl/t_vb_rendertmp.h
index 44dee76..26a1695 100644
--- a/src/mesa/tnl/t_vb_rendertmp.h
+++ b/src/mesa/tnl/t_vb_rendertmp.h
@@ -124,19 +124,19 @@ static void TAG(render_line_loop)( struct gl_context *ctx,
GLuint i;
LOCAL_VARS;
 
-   (void) flags;
-
INIT(GL_LINE_LOOP);
 
if (start+1 < count) {
   if (TEST_PRIM_BEGIN(flags)) {
 RESET_STIPPLE;
+ /* draw the first line from v[0] to v[1] */
  if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT)
 RENDER_LINE( ELT(start), ELT(start+1) );
  else
 RENDER_LINE( ELT(start+1), ELT(start) );
   }
 
+  /* draw lines from v[1] to V[n-1] */
   for ( i = start+2 ; i < count ; i++) {
  if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT)
 RENDER_LINE( ELT(i-1), ELT(i) );
@@ -145,6 +145,7 @@ static void TAG(render_line_loop)( struct gl_context *ctx,
   }
 
   if ( TEST_PRIM_END(flags)) {
+ /* draw final line from v[n-1] to v[0] (the very first vertex) */
  if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT)
 RENDER_LINE( ELT(count-1), ELT(start) );
  else
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/10] vbo: fix GL_LINE_LOOP stray line bug

2015-10-16 Thread Brian Paul
When long GL_LINE_LOOP primitives don't fit in one vertex buffer they
have to be split across buffers.  The code to do this was basically correct
but drivers had to pay special attention to the _mesa_prim::begin,end flags
in order to draw the sections of the line loop properly.  Apparently, the
only drivers to do this were those using the old 'tnl' module for software
vertex processing.

Now we convert the split pieces of GL_LINE_LOOP prims into GL_LINE_STRIP
primitives so that drivers don't have to worry about the special begin/end
flags.  The only time a driver will get a GL_LINE_LOOP prim is when the
whole thing fits in one vertex buffer.

Most fixes bug 81174, but not completely.  There's another bug somewhere
in the src/gallium/auxiliary/draw/ code.  If the piglit lineloop test is
run with -count 4096, rendering is correct, but with -count 4097 there are
stray lines.  4096 is a magic number in the draw code (search for "4096").

Also note that this does not fix long line loops in display lists.  The
next patch fixes that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174
---
 src/mesa/vbo/vbo_context.h   |  5 -
 src/mesa/vbo/vbo_exec_api.c  | 38 +-
 src/mesa/vbo/vbo_exec_draw.c | 12 
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/src/mesa/vbo/vbo_context.h b/src/mesa/vbo/vbo_context.h
index 1e85335..28f43b4 100644
--- a/src/mesa/vbo/vbo_context.h
+++ b/src/mesa/vbo/vbo_context.h
@@ -205,7 +205,10 @@ vbo_get_default_vals_as_union(GLenum format)
 static inline unsigned
 vbo_compute_max_verts(const struct vbo_exec_context *exec)
 {
-   return (VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used) /
+   /* Subtract one so we're always sure to have room for an extra
+* vertex for GL_LINE_LOOP -> GL_LINE_STRIP conversion.
+*/
+   return (VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used - 1) /
   (exec->vtx.vertex_size * sizeof(GLfloat));
 }
 
diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
index 0116f56..3c25ec0 100644
--- a/src/mesa/vbo/vbo_exec_api.c
+++ b/src/mesa/vbo/vbo_exec_api.c
@@ -61,7 +61,8 @@ static void reset_attrfv( struct vbo_exec_context *exec );
 
 /**
  * Close off the last primitive, execute the buffer, restart the
- * primitive.  
+ * primitive.  This is called when we fill a vertex buffer before
+ * hitting glEnd.
  */
 static void vbo_exec_wrap_buffers( struct vbo_exec_context *exec )
 {
@@ -83,6 +84,22 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context 
*exec )
 
   last_count = last_prim->count;
 
+  /* Special handling for wrapping GL_LINE_LOOP */
+  if (last_prim->mode == GL_LINE_LOOP &&
+  last_count > 0 &&
+  !last_prim->end) {
+ /* draw this section of the incomplete line loop as a line strip */
+ last_prim->mode = GL_LINE_STRIP;
+ if (!last_prim->begin) {
+/* This is not the first section of the line loop, so don't
+ * draw the 0th vertex.  We're saving it until we draw the
+ * very last section of the loop.
+ */
+last_prim->start++;
+last_prim->count--;
+ }
+  }
+
   /* Execute the buffer and save copied vertices.
*/
   if (exec->vtx.vert_count)
@@ -98,6 +115,7 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context 
*exec )
 
   if (_mesa_inside_begin_end(exec->ctx)) {
 exec->vtx.prim[0].mode = exec->ctx->Driver.CurrentExecPrimitive;
+exec->vtx.prim[0].begin = 0;
 exec->vtx.prim[0].start = 0;
 exec->vtx.prim[0].count = 0;
 exec->vtx.prim_count++;
@@ -827,6 +845,24 @@ static void GLAPIENTRY vbo_exec_End( void )
   last_prim->end = 1;
   last_prim->count = exec->vtx.vert_count - last_prim->start;
 
+  /* Special handling for GL_LINE_LOOP */
+  if (last_prim->mode == GL_LINE_LOOP && last_prim->begin == 0) {
+ /* We're finishing drawing a line loop.  Append 0th vertex onto
+  * end of vertex buffer so we can draw it as a line strip.
+  */
+ const fi_type *src = exec->vtx.buffer_map;
+ fi_type *dst = exec->vtx.buffer_map +
+exec->vtx.vert_count * exec->vtx.vertex_size;
+
+ /* copy 0th vertex to end of buffer */
+ memcpy(dst, src, exec->vtx.vertex_size * sizeof(fi_type));
+
+ assert(last_prim->start == 0);
+ last_prim->start++;  /* skip vertex0 */
+ /* note that last_prim->count stays unchanged */
+ last_prim->mode = GL_LINE_STRIP;
+  }
+
   try_vbo_merge(exec);
}
 
diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c
index 397fc09..f91f7c8 100644
--- a/src/mesa/vbo/vbo_exec_draw.c
+++ b/src/mesa/vbo/vbo_exec_draw.c
@@ -110,6 +110,18 @@ vbo_copy_vertices( struct vbo_exec_context *exec )
 return 1;
   }
case GL_LINE_LOOP:
+  if (last_prim->begin == 0) {
+ /* We're dealing with the second or late

[Mesa-dev] [PATCH 05/10] vbo: simplify some code in vbo_exec_wrap_buffers()

2015-10-16 Thread Brian Paul
Use a new 'last_prim' pointer to simplify things.
---
 src/mesa/vbo/vbo_exec_api.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
index c1f2146..2a78eac 100644
--- a/src/mesa/vbo/vbo_exec_api.c
+++ b/src/mesa/vbo/vbo_exec_api.c
@@ -71,17 +71,17 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context 
*exec )
   exec->vtx.buffer_ptr = exec->vtx.buffer_map;
}
else {
-  GLuint last_begin = exec->vtx.prim[exec->vtx.prim_count-1].begin;
+  struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1];
+  const GLuint last_begin = last_prim->begin;
   GLuint last_count;
 
+  assert(exec->vtx.prim_count > 0);
+
   if (_mesa_inside_begin_end(exec->ctx)) {
-GLint i = exec->vtx.prim_count - 1;
-assert(i >= 0);
-exec->vtx.prim[i].count = (exec->vtx.vert_count - 
-   exec->vtx.prim[i].start);
+last_prim->count = exec->vtx.vert_count - last_prim->start;
   }
 
-  last_count = exec->vtx.prim[exec->vtx.prim_count-1].count;
+  last_count = last_prim->count;
 
   /* Execute the buffer and save copied vertices.
*/
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/10] vbo: make vbo_exec_vtx_wrap() static

2015-10-16 Thread Brian Paul
---
 src/mesa/vbo/vbo_exec.h | 2 --
 src/mesa/vbo/vbo_exec_api.c | 3 ++-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/mesa/vbo/vbo_exec.h b/src/mesa/vbo/vbo_exec.h
index 00378eb..a80b2c9 100644
--- a/src/mesa/vbo/vbo_exec.h
+++ b/src/mesa/vbo/vbo_exec.h
@@ -160,8 +160,6 @@ void vbo_exec_vtx_flush( struct vbo_exec_context *exec, 
GLboolean unmap );
 void vbo_exec_vtx_map( struct vbo_exec_context *exec );
 
 
-void vbo_exec_vtx_wrap( struct vbo_exec_context *exec );
-
 void vbo_exec_eval_update( struct vbo_exec_context *exec );
 
 void vbo_exec_do_EvalCoord2f( struct vbo_exec_context *exec, 
diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c
index 789869a..c1f2146 100644
--- a/src/mesa/vbo/vbo_exec_api.c
+++ b/src/mesa/vbo/vbo_exec_api.c
@@ -113,7 +113,8 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context 
*exec )
  * Deal with buffer wrapping where provoked by the vertex buffer
  * filling up, as opposed to upgrade_vertex().
  */
-void vbo_exec_vtx_wrap( struct vbo_exec_context *exec )
+static void
+vbo_exec_vtx_wrap(struct vbo_exec_context *exec)
 {
fi_type *data = exec->vtx.copied.buffer;
GLuint i;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi

2015-10-16 Thread Samuel Pitoiset



On 10/16/2015 11:22 PM, Ilia Mirkin wrote:

On Fri, Oct 16, 2015 at 5:29 PM, Samuel Pitoiset
 wrote:

As explained in the CUDA toolkit documentation, "a metric is a
characteristic of an application that is calculated from one or more
event values."

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/Makefile.sources   |   2 +
  src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c   |  19 +-
  .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444 +
  .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h|  42 ++
  4 files changed, 504 insertions(+), 3 deletions(-)
  create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
  create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
b/src/gallium/drivers/nouveau/Makefile.sources
index edc6cf4..c18e9f5 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -154,6 +154,8 @@ NVC0_C_SOURCES := \
 nvc0/nvc0_query.h \
 nvc0/nvc0_query_hw.c \
 nvc0/nvc0_query_hw.h \
+   nvc0/nvc0_query_hw_metric.c \
+   nvc0/nvc0_query_hw_metric.h \
 nvc0/nvc0_query_hw_sm.c \
 nvc0/nvc0_query_hw_sm.h \
 nvc0/nvc0_query_sw.c \
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
index 91254be..90ee82f 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
@@ -25,6 +25,7 @@

  #include "nvc0/nvc0_context.h"
  #include "nvc0/nvc0_query_hw.h"
+#include "nvc0/nvc0_query_hw_metric.h"
  #include "nvc0/nvc0_query_hw_sm.h"

  #define NVC0_HW_QUERY_STATE_READY   0
@@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0, unsigned 
type, unsigned index)
return (struct nvc0_query *)hq;
 }

+   hq = nvc0_hw_metric_create_query(nvc0, type);
+   if (hq) {
+  hq->base.funcs = &hw_query_funcs;
+  return (struct nvc0_query *)hq;
+   }
+
 hq = CALLOC_STRUCT(nvc0_hw_query);
 if (!hq)
return NULL;
@@ -435,14 +442,20 @@ int
  nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id,
struct pipe_driver_query_info *info)
  {
-   int num_hw_sm_queries = 0;
+   int num_hw_sm_queries = 0, num_hw_metric_queries = 0;

 num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0, NULL);
+   num_hw_metric_queries =
+  nvc0_hw_metric_get_driver_query_info(screen, 0, NULL);

 if (!info)
-  return num_hw_sm_queries;
+  return num_hw_sm_queries + num_hw_metric_queries;
+
+   if (id < num_hw_sm_queries)
+  return nvc0_hw_sm_get_driver_query_info(screen, id, info);

-   return nvc0_hw_sm_get_driver_query_info(screen, id, info);
+   return nvc0_hw_metric_get_driver_query_info(screen,
+   id - num_hw_sm_queries, info);
  }

  void
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
new file mode 100644
index 000..dbe350a
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
@@ -0,0 +1,444 @@
+/*
+ * Copyright 2015 Samuel Pitoiset
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "nvc0/nvc0_context.h"
+#include "nvc0/nvc0_query_hw_metric.h"
+#include "nvc0/nvc0_query_hw_sm.h"
+
+/* === PERFORMANCE MONITORING METRICS for NVC0:NVE4 === */
+static const char *nvc0_hw_metric_names[] =
+{
+   "metric-achieved_occupancy",
+   "metric-branch_efficiency",
+   "metric-inst_issued",
+   "metric-inst_per_wrap",
+   "metric-inst_replay_overhead",
+   "metric-issued_ipc",
+   "metric-issue_slots",
+   "metric-issue_slot_utilization",
+   "metric-ipc",
+};
+
+struct nvc0_hw_metric_query_cfg {
+   uint32_t queries[8];
+   uint32_t num_queries;
+};
+
+#define _SM(n) NVC0_HW_S

Re: [Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi

2015-10-16 Thread Ilia Mirkin
On Fri, Oct 16, 2015 at 5:29 PM, Samuel Pitoiset
 wrote:
> As explained in the CUDA toolkit documentation, "a metric is a
> characteristic of an application that is calculated from one or more
> event values."
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/Makefile.sources   |   2 +
>  src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c   |  19 +-
>  .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444 
> +
>  .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h|  42 ++
>  4 files changed, 504 insertions(+), 3 deletions(-)
>  create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
>  create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h
>
> diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
> b/src/gallium/drivers/nouveau/Makefile.sources
> index edc6cf4..c18e9f5 100644
> --- a/src/gallium/drivers/nouveau/Makefile.sources
> +++ b/src/gallium/drivers/nouveau/Makefile.sources
> @@ -154,6 +154,8 @@ NVC0_C_SOURCES := \
> nvc0/nvc0_query.h \
> nvc0/nvc0_query_hw.c \
> nvc0/nvc0_query_hw.h \
> +   nvc0/nvc0_query_hw_metric.c \
> +   nvc0/nvc0_query_hw_metric.h \
> nvc0/nvc0_query_hw_sm.c \
> nvc0/nvc0_query_hw_sm.h \
> nvc0/nvc0_query_sw.c \
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
> index 91254be..90ee82f 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
> @@ -25,6 +25,7 @@
>
>  #include "nvc0/nvc0_context.h"
>  #include "nvc0/nvc0_query_hw.h"
> +#include "nvc0/nvc0_query_hw_metric.h"
>  #include "nvc0/nvc0_query_hw_sm.h"
>
>  #define NVC0_HW_QUERY_STATE_READY   0
> @@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0, unsigned 
> type, unsigned index)
>return (struct nvc0_query *)hq;
> }
>
> +   hq = nvc0_hw_metric_create_query(nvc0, type);
> +   if (hq) {
> +  hq->base.funcs = &hw_query_funcs;
> +  return (struct nvc0_query *)hq;
> +   }
> +
> hq = CALLOC_STRUCT(nvc0_hw_query);
> if (!hq)
>return NULL;
> @@ -435,14 +442,20 @@ int
>  nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id,
>struct pipe_driver_query_info *info)
>  {
> -   int num_hw_sm_queries = 0;
> +   int num_hw_sm_queries = 0, num_hw_metric_queries = 0;
>
> num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0, NULL);
> +   num_hw_metric_queries =
> +  nvc0_hw_metric_get_driver_query_info(screen, 0, NULL);
>
> if (!info)
> -  return num_hw_sm_queries;
> +  return num_hw_sm_queries + num_hw_metric_queries;
> +
> +   if (id < num_hw_sm_queries)
> +  return nvc0_hw_sm_get_driver_query_info(screen, id, info);
>
> -   return nvc0_hw_sm_get_driver_query_info(screen, id, info);
> +   return nvc0_hw_metric_get_driver_query_info(screen,
> +   id - num_hw_sm_queries, info);
>  }
>
>  void
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
> new file mode 100644
> index 000..dbe350a
> --- /dev/null
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
> @@ -0,0 +1,444 @@
> +/*
> + * Copyright 2015 Samuel Pitoiset
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "nvc0/nvc0_context.h"
> +#include "nvc0/nvc0_query_hw_metric.h"
> +#include "nvc0/nvc0_query_hw_sm.h"
> +
> +/* === PERFORMANCE MONITORING METRICS for NVC0:NVE4 === */
> +static const char *nvc0_hw_metric_names[] =
> +{
> +   "metric-achieved_occupancy",
> +   "metric-branch_efficiency",
> +   "metric-inst_issued",
> +   "metric-inst_per_wrap",
> +   "metric-inst_replay_overhead",
> +   "metric-issued_ipc",
> +   "metric-issue_slots",
> +   "metric-iss

[Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi

2015-10-16 Thread Samuel Pitoiset
As explained in the CUDA toolkit documentation, "a metric is a
characteristic of an application that is calculated from one or more
event values."

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/Makefile.sources   |   2 +
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c   |  19 +-
 .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444 +
 .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h|  42 ++
 4 files changed, 504 insertions(+), 3 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
 create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
b/src/gallium/drivers/nouveau/Makefile.sources
index edc6cf4..c18e9f5 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -154,6 +154,8 @@ NVC0_C_SOURCES := \
nvc0/nvc0_query.h \
nvc0/nvc0_query_hw.c \
nvc0/nvc0_query_hw.h \
+   nvc0/nvc0_query_hw_metric.c \
+   nvc0/nvc0_query_hw_metric.h \
nvc0/nvc0_query_hw_sm.c \
nvc0/nvc0_query_hw_sm.h \
nvc0/nvc0_query_sw.c \
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
index 91254be..90ee82f 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
@@ -25,6 +25,7 @@
 
 #include "nvc0/nvc0_context.h"
 #include "nvc0/nvc0_query_hw.h"
+#include "nvc0/nvc0_query_hw_metric.h"
 #include "nvc0/nvc0_query_hw_sm.h"
 
 #define NVC0_HW_QUERY_STATE_READY   0
@@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0, unsigned 
type, unsigned index)
   return (struct nvc0_query *)hq;
}
 
+   hq = nvc0_hw_metric_create_query(nvc0, type);
+   if (hq) {
+  hq->base.funcs = &hw_query_funcs;
+  return (struct nvc0_query *)hq;
+   }
+
hq = CALLOC_STRUCT(nvc0_hw_query);
if (!hq)
   return NULL;
@@ -435,14 +442,20 @@ int
 nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id,
   struct pipe_driver_query_info *info)
 {
-   int num_hw_sm_queries = 0;
+   int num_hw_sm_queries = 0, num_hw_metric_queries = 0;
 
num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0, NULL);
+   num_hw_metric_queries =
+  nvc0_hw_metric_get_driver_query_info(screen, 0, NULL);
 
if (!info)
-  return num_hw_sm_queries;
+  return num_hw_sm_queries + num_hw_metric_queries;
+
+   if (id < num_hw_sm_queries)
+  return nvc0_hw_sm_get_driver_query_info(screen, id, info);
 
-   return nvc0_hw_sm_get_driver_query_info(screen, id, info);
+   return nvc0_hw_metric_get_driver_query_info(screen,
+   id - num_hw_sm_queries, info);
 }
 
 void
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
new file mode 100644
index 000..dbe350a
--- /dev/null
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c
@@ -0,0 +1,444 @@
+/*
+ * Copyright 2015 Samuel Pitoiset
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "nvc0/nvc0_context.h"
+#include "nvc0/nvc0_query_hw_metric.h"
+#include "nvc0/nvc0_query_hw_sm.h"
+
+/* === PERFORMANCE MONITORING METRICS for NVC0:NVE4 === */
+static const char *nvc0_hw_metric_names[] =
+{
+   "metric-achieved_occupancy",
+   "metric-branch_efficiency",
+   "metric-inst_issued",
+   "metric-inst_per_wrap",
+   "metric-inst_replay_overhead",
+   "metric-issued_ipc",
+   "metric-issue_slots",
+   "metric-issue_slot_utilization",
+   "metric-ipc",
+};
+
+struct nvc0_hw_metric_query_cfg {
+   uint32_t queries[8];
+   uint32_t num_queries;
+};
+
+#define _SM(n) NVC0_HW_SM_QUERY(NVC0_HW_SM_QUERY_ ##n)
+#define _M(n, c) [NVC0_HW_METRIC_QUERY_##n] = c
+
+/*  Compute capability 2.0 (GF100/GF110)

Re: [Mesa-dev] [PATCH 1/5] main: Use NumUniformBlocks to count UBOs

2015-10-16 Thread Kristian Høgsberg
On Fri, Oct 16, 2015 at 2:58 AM, Iago Toral Quiroga  wrote:
> Now that we have separate index spaces for UBOs and SSBOs we do not need
> to iterate through BufferInterfaceBlocks any more, we can just take the
> UBO count directly from NumUniformBlocks.

Nice cleanup, all five patches

Reviewed-by: Kristian Høgsberg 

> ---
>  src/mesa/main/shaderapi.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
> index 6a2f60d..26995ad 100644
> --- a/src/mesa/main/shaderapi.c
> +++ b/src/mesa/main/shaderapi.c
> @@ -729,11 +729,7 @@ get_programiv(struct gl_context *ctx, GLuint program, 
> GLenum pname,
>if (!has_ubo)
>   break;
>
> -  *params = 0;
> -  for (unsigned i = 0; i < shProg->NumBufferInterfaceBlocks; i++) {
> - if (!shProg->BufferInterfaceBlocks[i].IsShaderStorage)
> -(*params)++;
> -  }
> +  *params = shProg->NumUniformBlocks;
>return;
> case GL_PROGRAM_BINARY_RETRIEVABLE_HINT:
>/* This enum isn't part of the OES extension for OpenGL ES 2.0.  It is
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/16] nvc0: add a note about MP counters on GF100/GF110

2015-10-16 Thread Samuel Pitoiset



On 10/16/2015 07:50 PM, Ilia Mirkin wrote:

Series is Reviewed-by: Ilia Mirkin 

I had a couple of very minor comments that you can feel free to accept
or ignore.


Thank you for this review Ilia, and I think I'll accept all of your 
changes. :)




On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset
 wrote:

MP counters on GF100/GF110 (compute capability 2.0) are buggy
because there is a context-switch problem that we need to fix.
Results might be wrong sometimes, be careful!

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index c4b40a6..c5ce3e3 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -286,6 +286,11 @@ static const struct nvc0_hw_sm_query_cfg 
nve4_hw_sm_queries[] =
  #undef _M2B

  /* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */
+/* NOTES:
+ * - MP counters on GF100/GF110 (compute capability 2.0) are buggy
+ *   because there is a context-switch problem that we need to fix.
+ *   Results might be wrong sometimes, be careful!
+ */
  static const char *nvc0_hw_sm_query_names[] =
  {
 /* MP counters */
--
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/16] nvc0: add MP counters variants for GF100/GF110

2015-10-16 Thread Ilia Mirkin
Minor preferences for naming things SM20/SM21 when referring to
compute capabilities, but your call.

On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset
 wrote:
> GF100 and GF110 chipsets are compute capability 2.0, while the other
> Fermi chipsets are compute capability 2.1. That's why, some MP counters
> are different between these chipsets and we need to handle variants.
>
> Signed-off-by: Samuel Pitoiet 
> ---
>  .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 559 
> ++---
>  .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h|   1 +
>  2 files changed, 483 insertions(+), 77 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> index 174beef..c4b40a6 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> @@ -298,6 +298,7 @@ static const char *nvc0_hw_sm_query_names[] =
> "gred_count",
> "gst_request",
> "inst_executed",
> +   "inst_issued",
> "inst_issued1_0",
> "inst_issued1_1",
> "inst_issued2_0",
> @@ -373,82 +374,456 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
>  };
>
>  #define _C(f, o, g, m, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, m, 0, g, s }
> -#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { 
>  \
> -   { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },   
>  \
> -}
> +#define _Q(n, c) [NVC0_HW_SM_QUERY_##n] = c
> +
> +/*  Compute capability 2.0 (GF100/GF110)  */
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_active_cycles =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x11, 0x00ff, 0x),
> +   .num_counters = 1,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_active_warps =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0010),
> +   .ctr[1]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0020),
> +   .ctr[2]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0030),
> +   .ctr[3]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0040),
> +   .ctr[4]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0050),
> +   .ctr[5]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0060),
> +   .num_counters = 6,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_atom_count =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x63, 0x00ff, 0x0030),
> +   .num_counters = 1,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_branch =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x1a, 0x00ff, 0x),
> +   .ctr[1]   = _C(0x, LOGOP, 0x1a, 0x00ff, 0x0010),
> +   .num_counters = 2,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_divergent_branch =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x19, 0x00ff, 0x0020),
> +   .ctr[1]   = _C(0x, LOGOP, 0x19, 0x00ff, 0x0030),
> +   .num_counters = 2,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_gld_request =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x64, 0x00ff, 0x0030),
> +   .num_counters = 1,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_gred_count =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x63, 0x00ff, 0x0040),
> +   .num_counters = 1,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_gst_request =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x64, 0x00ff, 0x0060),
> +   .num_counters = 1,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_inst_executed =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x2d, 0x, 0x1000),
> +   .ctr[1]   = _C(0x, LOGOP, 0x2d, 0x, 0x1010),
> +   .num_counters = 2,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
> +
> +static const struct nvc0_hw_sm_query_cfg
> +nvc0_inst_issued =
> +{
> +   .ctr[0]   = _C(0x, LOGOP, 0x27, 0x, 0x7060),
> +   .ctr[1]   = _C(0x, LOGOP, 0x27, 0x, 0x7070),
> +   .num_counters = 2,
> +   .op   = NVC0_COUNTER_OPn_SUM,
> +   .norm = { 1, 1 },
> +};
>
> -static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] =
> -{
> -   _Q(ACTIVE_CYCLES,   1, _C(0x, LOGOP, 0x11, 0x00ff, 
> 0x)),
> -   _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x00ff, 
> 0x0010),
> -  

Re: [Mesa-dev] [PATCH 16/16] nvc0: add a note about MP counters on GF100/GF110

2015-10-16 Thread Ilia Mirkin
Series is Reviewed-by: Ilia Mirkin 

I had a couple of very minor comments that you can feel free to accept
or ignore.

On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset
 wrote:
> MP counters on GF100/GF110 (compute capability 2.0) are buggy
> because there is a context-switch problem that we need to fix.
> Results might be wrong sometimes, be careful!
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> index c4b40a6..c5ce3e3 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> @@ -286,6 +286,11 @@ static const struct nvc0_hw_sm_query_cfg 
> nve4_hw_sm_queries[] =
>  #undef _M2B
>
>  /* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */
> +/* NOTES:
> + * - MP counters on GF100/GF110 (compute capability 2.0) are buggy
> + *   because there is a context-switch problem that we need to fix.
> + *   Results might be wrong sometimes, be careful!
> + */
>  static const char *nvc0_hw_sm_query_names[] =
>  {
> /* MP counters */
> --
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/16] nvc0: fix monitoring multiple MP counters queries on Fermi

2015-10-16 Thread Ilia Mirkin
On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset
 wrote:
> For strange reasons, the signal id depends on the slot selected on Fermi
> but not on Kepler. Fortunately, the signal ids are just offseted by the
> slot id!
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 147 
> +++--
>  1 file changed, 79 insertions(+), 68 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> index 99e9073..6ee9fa6 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> @@ -128,7 +128,7 @@ struct nvc0_hw_sm_counter_cfg
>  {
> uint32_t func: 16; /* mask or 4-bit logic op (depending on mode) */
> uint32_t mode: 4;  /* LOGOP,B6,LOGOP_B6(_PULSE) */
> -   uint32_t num_src : 3;  /* number of sources (1 - 6, only for NVC0:NVE4) */
> +   uint32_t src_mask; /* mask for signal selection (only for NVC0:NVE4) */

Put this after the sized fields for better packing.

> uint32_t sig_dom : 1;  /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */
> uint32_t sig_sel : 8;  /* signal group */
> uint32_t src_sel;  /* signal selection for up to 4 sources */
> @@ -280,78 +280,78 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
> 0x80001de7ULL
>  };
>
> -#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s }
> +#define _C(f, o, g, m, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, m, 0, g, s }
>  #define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { 
>  \
> { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },   
>  \
>  }
>
>  static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] =
>  {
> -   _Q(ACTIVE_CYCLES,   1, _C(0x, LOGOP, 0x11, 0x)),
> -   _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x0010),
> -  _C(0x, LOGOP, 0x24, 0x0021),
> -  _C(0x, LOGOP, 0x24, 0x0032),
> -  _C(0x, LOGOP, 0x24, 0x0043),
> -  _C(0x, LOGOP, 0x24, 0x0054),
> -  _C(0x, LOGOP, 0x24, 0x0065)),
> -   _Q(ATOM_COUNT,  1, _C(0x, LOGOP, 0x63, 0x0030)),
> -   _Q(BRANCH,  2, _C(0x, LOGOP, 0x1a, 0x),
> -  _C(0x, LOGOP, 0x1a, 0x0011)),
> -   _Q(DIVERGENT_BRANCH,2, _C(0x, LOGOP, 0x19, 0x0020),
> -  _C(0x, LOGOP, 0x19, 0x0031)),
> -   _Q(GLD_REQUEST, 1, _C(0x, LOGOP, 0x64, 0x0030)),
> -   _Q(GRED_COUNT,  1, _C(0x, LOGOP, 0x63, 0x0040)),
> -   _Q(GST_REQUEST, 1, _C(0x, LOGOP, 0x64, 0x0060)),
> -   _Q(INST_EXECUTED,   3, _C(0x, LOGOP, 0x2d, 0x),
> -  _C(0x, LOGOP, 0x2d, 0x0011),
> -  _C(0x, LOGOP, 0x2d, 0x0022)),
> -   _Q(INST_ISSUED1_0,  1, _C(0x, LOGOP, 0x7e, 0x0010)),
> -   _Q(INST_ISSUED1_1,  1, _C(0x, LOGOP, 0x7e, 0x0040)),
> -   _Q(INST_ISSUED2_0,  1, _C(0x, LOGOP, 0x7e, 0x0020)),
> -   _Q(INST_ISSUED2_1,  1, _C(0x, LOGOP, 0x7e, 0x0050)),
> -   _Q(LOCAL_LD,1, _C(0x, LOGOP, 0x64, 0x0020)),
> -   _Q(LOCAL_ST,1, _C(0x, LOGOP, 0x64, 0x0050)),
> -   _Q(PROF_TRIGGER_0,  1, _C(0x, LOGOP, 0x01, 0x)),
> -   _Q(PROF_TRIGGER_1,  1, _C(0x, LOGOP, 0x01, 0x0010)),
> -   _Q(PROF_TRIGGER_2,  1, _C(0x, LOGOP, 0x01, 0x0020)),
> -   _Q(PROF_TRIGGER_3,  1, _C(0x, LOGOP, 0x01, 0x0030)),
> -   _Q(PROF_TRIGGER_4,  1, _C(0x, LOGOP, 0x01, 0x0040)),
> -   _Q(PROF_TRIGGER_5,  1, _C(0x, LOGOP, 0x01, 0x0050)),
> -   _Q(PROF_TRIGGER_6,  1, _C(0x, LOGOP, 0x01, 0x0060)),
> -   _Q(PROF_TRIGGER_7,  1, _C(0x, LOGOP, 0x01, 0x0070)),
> -   _Q(SHARED_LD,   1, _C(0x, LOGOP, 0x64, 0x0010)),
> -   _Q(SHARED_ST,   1, _C(0x, LOGOP, 0x64, 0x0040)),
> -   _Q(THREADS_LAUNCHED,6, _C(0x, LOGOP, 0x26, 0x0010),
> -  _C(0x, LOGOP, 0x26, 0x0021),
> -  _C(0x, LOGOP, 0x26, 0x0032),
> -  _C(0x, LOGOP, 0x26, 0x0043),
> -  _C(0x, LOGOP, 0x26, 0x0054),
> -  _C(0x, LOGOP, 0x26, 0x0065)),
> -   _Q(TH_INST_EXECUTED_0,  6, _C(0x, LOGOP, 0xa3, 0x),
> -  _C(0x, LOGOP, 0xa3, 0x0011),
> -  _C(0x, LOGOP, 0xa3, 0x0022),
> -  _C(0x, LOGOP, 0xa3, 0x0033),
> -  _C(0x, LOGOP, 0xa3, 0x0044),
> - 

Re: [Mesa-dev] [PATCH 07/16] nvc0: fix queries which use multiple MP counters on Fermi

2015-10-16 Thread Samuel Pitoiset



On 10/16/2015 07:32 PM, Ilia Mirkin wrote:

Other than the missing * (1 << c), what was wrong with the old logic?


MP counters were always configured starting from slot 0 to cfg->num_src. 
So, if you monitored two hardware events at the same time, the first one 
was overwritten by the second one.


Now, I check if the slot is free before pushing the configuration 
through the pushbuf.




On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset
 wrote:

Queries which use more than one MP counters was misconfigured and
computing the final result was also wrong because sources need to
be configured on different hardware counters instead.

According to the blob, computing the result is now as follows:

FOR  i..n
val += ctr[i] * pow(2, i)

Signed-off-by: Samuel Pitoiset 
---
  .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 128 +
  1 file changed, 81 insertions(+), 47 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index 2060662..99e9073 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -131,7 +131,7 @@ struct nvc0_hw_sm_counter_cfg
 uint32_t num_src : 3;  /* number of sources (1 - 6, only for NVC0:NVE4) */
 uint32_t sig_dom : 1;  /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */
 uint32_t sig_sel : 8;  /* signal group */
-   uint64_t src_sel;  /* signal selection for up to 6 sources (48 bit) */
+   uint32_t src_sel;  /* signal selection for up to 4 sources */
  };

  #define NVC0_COUNTER_OPn_SUM0
@@ -280,44 +280,82 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
 0x80001de7ULL
  };

-#define _Q(n, f, m, g, c, s0, s1, s2, s3, s4, s5) [NVC0_HW_SM_QUERY_##n] = { { { f, 
NVC0_COMPUTE_MP_PM_OP_MODE_##m, c, 0, g, s0|(s1 << 8)|(s2 << 16)|(s3 << 24)|(s4##ULL 
<< 32)|(s5##ULL << 40) }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { 1, 1 } }
+#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s }
+#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = {  \
+   { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },\
+}

  static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] =
  {
-   _Q(ACTIVE_CYCLES,   0x, LOGOP, 0x11, 1, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(ACTIVE_WARPS,0x, LOGOP, 0x24, 6, 0x10, 0x21, 0x32, 0x43, 
0x54, 0x65),
-   _Q(ATOM_COUNT,  0x, LOGOP, 0x63, 1, 0x30, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(BRANCH,  0x, LOGOP, 0x1a, 2, 0x00, 0x11, 0x00, 0x00, 
0x00, 0x00),
-   _Q(DIVERGENT_BRANCH,0x, LOGOP, 0x19, 2, 0x20, 0x31, 0x00, 0x00, 
0x00, 0x00),
-   _Q(GLD_REQUEST, 0x, LOGOP, 0x64, 1, 0x30, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(GRED_COUNT,  0x, LOGOP, 0x63, 1, 0x40, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(GST_REQUEST, 0x, LOGOP, 0x64, 1, 0x60, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(INST_EXECUTED,   0x, LOGOP, 0x2d, 3, 0x00, 0x11, 0x22, 0x00, 
0x00, 0x00),
-   _Q(INST_ISSUED1_0,  0x, LOGOP, 0x7e, 1, 0x10, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(INST_ISSUED1_1,  0x, LOGOP, 0x7e, 1, 0x40, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(INST_ISSUED2_0,  0x, LOGOP, 0x7e, 1, 0x20, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(INST_ISSUED2_1,  0x, LOGOP, 0x7e, 1, 0x50, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(LOCAL_LD,0x, LOGOP, 0x64, 1, 0x20, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(LOCAL_ST,0x, LOGOP, 0x64, 1, 0x50, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_0,  0x, LOGOP, 0x01, 1, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_1,  0x, LOGOP, 0x01, 1, 0x10, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_2,  0x, LOGOP, 0x01, 1, 0x20, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_3,  0x, LOGOP, 0x01, 1, 0x30, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_4,  0x, LOGOP, 0x01, 1, 0x40, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_5,  0x, LOGOP, 0x01, 1, 0x50, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_6,  0x, LOGOP, 0x01, 1, 0x60, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_7,  0x, LOGOP, 0x01, 1, 0x70, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(SHARED_LD,   0x, LOGOP, 0x64, 1, 0x10, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(SHARED_ST,   0x, LOGOP, 0x64, 1, 0x40, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(THREADS_LAUNCHED,0x, LOGOP, 0x26, 6, 0x10, 0x21, 0x32, 0x43, 
0x54, 0x65),
-   _Q(TH_INST_EXECUTED_0,  0x, LOGOP, 0xa3, 6, 0x00, 0x11, 0x22, 0x33, 
0x44, 0x55),
-   _Q(TH_INST_EXECUTED_1,  0x, LOGOP, 0xa5, 6, 0x00, 0x11, 0x22, 0x33, 
0x44, 0x55),
-   _Q(TH_INST_EXECUTED_2,  0x, LOGOP, 0xa4, 6, 0x00, 0x11, 0x22, 0x33, 
0x44, 0x55),
-   _Q(TH_INST_EXECUTED_3,  0x, LOGOP, 0xa6, 6, 0x00, 0x11, 0x22, 0x33, 
0x44, 0x5

Re: [Mesa-dev] [PATCH 07/16] nvc0: fix queries which use multiple MP counters on Fermi

2015-10-16 Thread Ilia Mirkin
Other than the missing * (1 << c), what was wrong with the old logic?

On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset
 wrote:
> Queries which use more than one MP counters was misconfigured and
> computing the final result was also wrong because sources need to
> be configured on different hardware counters instead.
>
> According to the blob, computing the result is now as follows:
>
> FOR  i..n
> val += ctr[i] * pow(2, i)
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 128 
> +
>  1 file changed, 81 insertions(+), 47 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> index 2060662..99e9073 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
> @@ -131,7 +131,7 @@ struct nvc0_hw_sm_counter_cfg
> uint32_t num_src : 3;  /* number of sources (1 - 6, only for NVC0:NVE4) */
> uint32_t sig_dom : 1;  /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */
> uint32_t sig_sel : 8;  /* signal group */
> -   uint64_t src_sel;  /* signal selection for up to 6 sources (48 bit) */
> +   uint32_t src_sel;  /* signal selection for up to 4 sources */
>  };
>
>  #define NVC0_COUNTER_OPn_SUM0
> @@ -280,44 +280,82 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
> 0x80001de7ULL
>  };
>
> -#define _Q(n, f, m, g, c, s0, s1, s2, s3, s4, s5) [NVC0_HW_SM_QUERY_##n] = { 
> { { f, NVC0_COMPUTE_MP_PM_OP_MODE_##m, c, 0, g, s0|(s1 << 8)|(s2 << 16)|(s3 
> << 24)|(s4##ULL << 32)|(s5##ULL << 40) }, {}, {}, {} }, 1, 
> NVC0_COUNTER_OPn_SUM, { 1, 1 } }
> +#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s }
> +#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { 
>  \
> +   { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },   
>  \
> +}
>
>  static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] =
>  {
> -   _Q(ACTIVE_CYCLES,   0x, LOGOP, 0x11, 1, 0x00, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(ACTIVE_WARPS,0x, LOGOP, 0x24, 6, 0x10, 0x21, 0x32, 0x43, 
> 0x54, 0x65),
> -   _Q(ATOM_COUNT,  0x, LOGOP, 0x63, 1, 0x30, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(BRANCH,  0x, LOGOP, 0x1a, 2, 0x00, 0x11, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(DIVERGENT_BRANCH,0x, LOGOP, 0x19, 2, 0x20, 0x31, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(GLD_REQUEST, 0x, LOGOP, 0x64, 1, 0x30, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(GRED_COUNT,  0x, LOGOP, 0x63, 1, 0x40, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(GST_REQUEST, 0x, LOGOP, 0x64, 1, 0x60, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(INST_EXECUTED,   0x, LOGOP, 0x2d, 3, 0x00, 0x11, 0x22, 0x00, 
> 0x00, 0x00),
> -   _Q(INST_ISSUED1_0,  0x, LOGOP, 0x7e, 1, 0x10, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(INST_ISSUED1_1,  0x, LOGOP, 0x7e, 1, 0x40, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(INST_ISSUED2_0,  0x, LOGOP, 0x7e, 1, 0x20, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(INST_ISSUED2_1,  0x, LOGOP, 0x7e, 1, 0x50, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(LOCAL_LD,0x, LOGOP, 0x64, 1, 0x20, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(LOCAL_ST,0x, LOGOP, 0x64, 1, 0x50, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(PROF_TRIGGER_0,  0x, LOGOP, 0x01, 1, 0x00, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(PROF_TRIGGER_1,  0x, LOGOP, 0x01, 1, 0x10, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(PROF_TRIGGER_2,  0x, LOGOP, 0x01, 1, 0x20, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(PROF_TRIGGER_3,  0x, LOGOP, 0x01, 1, 0x30, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(PROF_TRIGGER_4,  0x, LOGOP, 0x01, 1, 0x40, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(PROF_TRIGGER_5,  0x, LOGOP, 0x01, 1, 0x50, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(PROF_TRIGGER_6,  0x, LOGOP, 0x01, 1, 0x60, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(PROF_TRIGGER_7,  0x, LOGOP, 0x01, 1, 0x70, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(SHARED_LD,   0x, LOGOP, 0x64, 1, 0x10, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(SHARED_ST,   0x, LOGOP, 0x64, 1, 0x40, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> -   _Q(THREADS_LAUNCHED,0x, LOGOP, 0x26, 6, 0x10, 0x21, 0x32, 0x43, 
> 0x54, 0x65),
> -   _Q(TH_INST_EXECUTED_0,  0x, LOGOP, 0xa3, 6, 0x00, 0x11, 0x22, 0x33, 
> 0x44, 0x55),
> -   _Q(TH_INST_EXECUTED_1,  0x, LOGOP, 0xa5, 6, 0x00, 0x11, 0x22, 0x33, 
> 0x44, 0x55),
> -   _Q(TH_INST_EXECUTED_2,  0x, LOGOP, 0xa4, 6, 0x00, 0x11, 0x22, 0x33, 
> 0x44, 0x55),
> -   _Q(TH_INST_EXECUTED_3,  0x, LOGOP, 0xa6, 6, 0x00, 0x11, 0x22, 0x33, 
> 0x44, 0x55),
> -   _Q(WARPS_LAUNCHED,  0x, LOGOP, 0x26, 1, 0x00, 0x00, 0x00, 0x00, 
> 0x00, 0x00),
> +   _Q(ACTIVE_CYCLES,   

Re: [Mesa-dev] [PATCH 10/16] nvc0: store the number of GPCs to nvc0_screen

2015-10-16 Thread Samuel Pitoiset



On 10/16/2015 07:24 PM, Ilia Mirkin wrote:

On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset
 wrote:

NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total
number of TPCs and the number of ROP units. Note that when the DRM
version is too old the default number of GPCs is fixed to 4.

This will be used to launch the compute kernel which is used to read MP
performance counters over all GPCs.

Signed-off-by: Samuel Pitoiset 
---
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 +
  2 files changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index afd91e6..ba53d10 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -914,6 +914,7 @@ nvc0_screen_create(struct nouveau_device *dev)
else
   value = (16 << 8) | 4;
 }
+   screen->gpc_count = value & 0x;

gpc_count is a u8... I guess you wanted to make this &0xff? Doesn't
*really* matter in practice, but a bit confusing.


Yes, good catch.




 screen->mp_count = value >> 8;
 screen->mp_count_compute = screen->mp_count;

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
index 8cf7560..857eb03 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
@@ -67,6 +67,7 @@ struct nvc0_screen {
 struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */
 struct nouveau_bo *poly_cache;

+   uint8_t gpc_count;
 uint16_t mp_count;
 uint16_t mp_count_compute; /* magic reg can make compute use fewer MPs */

--
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/16] nvc0: store the number of GPCs to nvc0_screen

2015-10-16 Thread Ilia Mirkin
On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset
 wrote:
> NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total
> number of TPCs and the number of ROP units. Note that when the DRM
> version is too old the default number of GPCs is fixed to 4.
>
> This will be used to launch the compute kernel which is used to read MP
> performance counters over all GPCs.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 +
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> index afd91e6..ba53d10 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> @@ -914,6 +914,7 @@ nvc0_screen_create(struct nouveau_device *dev)
>else
>   value = (16 << 8) | 4;
> }
> +   screen->gpc_count = value & 0x;

gpc_count is a u8... I guess you wanted to make this &0xff? Doesn't
*really* matter in practice, but a bit confusing.

> screen->mp_count = value >> 8;
> screen->mp_count_compute = screen->mp_count;
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
> index 8cf7560..857eb03 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
> @@ -67,6 +67,7 @@ struct nvc0_screen {
> struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */
> struct nouveau_bo *poly_cache;
>
> +   uint8_t gpc_count;
> uint16_t mp_count;
> uint16_t mp_count_compute; /* magic reg can make compute use fewer MPs */
>
> --
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/16] nvc0: move SW/HW queries info to their respective files

2015-10-16 Thread Samuel Pitoiset
This will help for handling HW SM queries variants on Fermi.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query.c  | 185 +
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c   |  14 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.h   |   3 +
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 133 +++
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h|   4 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_query_sw.c   |  64 +++
 src/gallium/drivers/nouveau/nvc0/nvc0_query_sw.h   |   3 +
 7 files changed, 228 insertions(+), 178 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
index 80f311b..e4752e2 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
@@ -141,163 +141,19 @@ nvc0_render_condition(struct pipe_context *pipe,
PUSH_DATA (push, hq->bo->offset + hq->offset);
 }
 
-/* === DRIVER STATISTICS === */
-
-#ifdef NOUVEAU_ENABLE_DRIVER_STATISTICS
-
-static const char *nvc0_sw_query_drv_stat_names[] =
-{
-   "drv-tex_obj_current_count",
-   "drv-tex_obj_current_bytes",
-   "drv-buf_obj_current_count",
-   "drv-buf_obj_current_bytes_vid",
-   "drv-buf_obj_current_bytes_sys",
-   "drv-tex_transfers_rd",
-   "drv-tex_transfers_wr",
-   "drv-tex_copy_count",
-   "drv-tex_blit_count",
-   "drv-tex_cache_flush_count",
-   "drv-buf_transfers_rd",
-   "drv-buf_transfers_wr",
-   "drv-buf_read_bytes_staging_vid",
-   "drv-buf_write_bytes_direct",
-   "drv-buf_write_bytes_staging_vid",
-   "drv-buf_write_bytes_staging_sys",
-   "drv-buf_copy_bytes",
-   "drv-buf_non_kernel_fence_sync_count",
-   "drv-any_non_kernel_fence_sync_count",
-   "drv-query_sync_count",
-   "drv-gpu_serialize_count",
-   "drv-draw_calls_array",
-   "drv-draw_calls_indexed",
-   "drv-draw_calls_fallback_count",
-   "drv-user_buffer_upload_bytes",
-   "drv-constbuf_upload_count",
-   "drv-constbuf_upload_bytes",
-   "drv-pushbuf_count",
-   "drv-resource_validate_count"
-};
-
-#endif /* NOUVEAU_ENABLE_DRIVER_STATISTICS */
-
-/* === PERFORMANCE MONITORING COUNTERS for NVE4+ === */
-
-/* NOTE: intentionally using the same names as NV */
-static const char *nve4_hw_sm_query_names[] =
-{
-   /* MP counters */
-   "active_cycles",
-   "active_warps",
-   "atom_count",
-   "branch",
-   "divergent_branch",
-   "gld_request",
-   "global_ld_mem_divergence_replays",
-   "global_store_transaction",
-   "global_st_mem_divergence_replays",
-   "gred_count",
-   "gst_request",
-   "inst_executed",
-   "inst_issued",
-   "inst_issued1",
-   "inst_issued2",
-   "l1_global_load_hit",
-   "l1_global_load_miss",
-   "l1_local_load_hit",
-   "l1_local_load_miss",
-   "l1_local_store_hit",
-   "l1_local_store_miss",
-   "l1_shared_load_transactions",
-   "l1_shared_store_transactions",
-   "local_load",
-   "local_load_transactions",
-   "local_store",
-   "local_store_transactions",
-   "prof_trigger_00",
-   "prof_trigger_01",
-   "prof_trigger_02",
-   "prof_trigger_03",
-   "prof_trigger_04",
-   "prof_trigger_05",
-   "prof_trigger_06",
-   "prof_trigger_07",
-   "shared_load",
-   "shared_load_replay",
-   "shared_store",
-   "shared_store_replay",
-   "sm_cta_launched",
-   "threads_launched",
-   "uncached_global_load_transaction",
-   "warps_launched",
-   /* metrics, i.e. functions of the MP counters */
-   "metric-ipc",   /* inst_executed, clock */
-   "metric-ipac",  /* inst_executed, active_cycles */
-   "metric-ipec",  /* inst_executed, (bool)inst_executed */
-   "metric-achieved_occupancy",/* active_warps, active_cycles */
-   "metric-sm_efficiency", /* active_cycles, clock */
-   "metric-inst_replay_overhead"   /* inst_issued, inst_executed */
-};
-
-/* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */
-static const char *nvc0_hw_sm_query_names[] =
-{
-   /* MP counters */
-   "active_cycles",
-   "active_warps",
-   "atom_count",
-   "branch",
-   "divergent_branch",
-   "gld_request",
-   "gred_count",
-   "gst_request",
-   "inst_executed",
-   "inst_issued1_0",
-   "inst_issued1_1",
-   "inst_issued2_0",
-   "inst_issued2_1",
-   "local_load",
-   "local_store",
-   "prof_trigger_00",
-   "prof_trigger_01",
-   "prof_trigger_02",
-   "prof_trigger_03",
-   "prof_trigger_04",
-   "prof_trigger_05",
-   "prof_trigger_06",
-   "prof_trigger_07",
-   "shared_load",
-   "shared_store",
-   "threads_launched",
-   "thread_inst_executed_0",
-   "thread_inst_executed_1",
-   "thread_inst_executed_2",
-   "thread_inst_executed_3",
-   "warps_launched",
-};
-
 int
 nvc0_screen_get_driver_query_info(struct pipe_screen *pscreen,
   unsigned id,
   struct pipe_driver_query_info *info)
 {
struct nvc0_screen *screen = nvc0_screen(pscreen);
-   int count = 0;
-
-   count += NVC0_SW_QUERY_DRV_STAT_COUNT;
+   int num_sw_queries = 

[Mesa-dev] [PATCH 09/16] nvc0: fix unaligned mem access when reading MP counters on Fermi

2015-10-16 Thread Samuel Pitoiset
Memory access have to be aligned to 128-bits. Note that this
doesn't happen when the card only has TPC.

This patch fixes the following dmesg fail:

gr: GPC0/TPC1/MP trap: global 0004 [MULTIPLE_WARP_ERRORS] warp 000f
[UNALIGNED_MEM_ACCESS]

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index 6ee9fa6..840f200 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -247,7 +247,7 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
 * mov b32 $r11 c0[0x4]
 * ext u32 $r8 $r9 0x414
 * (not $p0) exit
-* mul $r8 u32 $r8 u32 36
+* mul $r8 u32 $r8 u32 48
 * add b32 $r10 $c $r10 $r8
 * add b32 $r11 $r11 0x0 $c
 * mov b32 $r8 c0[0x8]
@@ -270,7 +270,7 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
0x280040001002dde4ULL,
0x7000c01050921c03ULL,
0x800021e7ULL,
-   0x100090821c02ULL,
+   0x1000c0821c02ULL,
0x480120a29c03ULL,
0x08b2dc42ULL,
0x2800400020021de4ULL,
@@ -473,7 +473,7 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
 
/* set sequence field to 0 (used to check if result is available) */
for (i = 0; i < screen->mp_count; ++i) {
-  const unsigned b = (0x24 / 4) * i;
+  const unsigned b = (0x30 / 4) * i;
   hq->data[b + 8] = 0;
}
hq->sequence++;
@@ -617,7 +617,7 @@ nvc0_hw_sm_query_read_data(uint32_t count[32][8],
unsigned p, c;
 
for (p = 0; p < mp_count; ++p) {
-  const unsigned b = (0x24 / 4) * p;
+  const unsigned b = (0x30 / 4) * p;
 
   for (c = 0; c < cfg->num_counters; ++c) {
  if (hq->data[b + 8] != hq->sequence) {
@@ -815,7 +815,10 @@ nvc0_hw_sm_create_query(struct nvc0_context *nvc0, 
unsigned type)
 */
space = (4 * 4 + 4 + 4) * nvc0->screen->mp_count * sizeof(uint32_t);
} else {
-  /* for each MP:
+  /*
+   * Note that padding is used to align memory access to 128 bits.
+   *
+   * for each MP:
* [00] = MP.C0
* [04] = MP.C1
* [08] = MP.C2
@@ -825,8 +828,11 @@ nvc0_hw_sm_create_query(struct nvc0_context *nvc0, 
unsigned type)
* [18] = MP.C6
* [1c] = MP.C7
* [20] = MP.sequence
+   * [24] = padding
+   * [28] = padding
+   * [2c] = padding
*/
-  space = (8 + 1) * nvc0->screen->mp_count * sizeof(uint32_t);
+  space = (8 + 1 + 3) * nvc0->screen->mp_count * sizeof(uint32_t);
}
 
if (!nvc0_hw_query_allocate(nvc0, &hq->base, space)) {
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/16] nvc0: store the number of GPCs to nvc0_screen

2015-10-16 Thread Samuel Pitoiset
NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total
number of TPCs and the number of ROP units. Note that when the DRM
version is too old the default number of GPCs is fixed to 4.

This will be used to launch the compute kernel which is used to read MP
performance counters over all GPCs.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index afd91e6..ba53d10 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -914,6 +914,7 @@ nvc0_screen_create(struct nouveau_device *dev)
   else
  value = (16 << 8) | 4;
}
+   screen->gpc_count = value & 0x;
screen->mp_count = value >> 8;
screen->mp_count_compute = screen->mp_count;
 
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
index 8cf7560..857eb03 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
@@ -67,6 +67,7 @@ struct nvc0_screen {
struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */
struct nouveau_bo *poly_cache;
 
+   uint8_t gpc_count;
uint16_t mp_count;
uint16_t mp_count_compute; /* magic reg can make compute use fewer MPs */
 
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/16] nvc0: fix monitoring multiple MP counters queries on Fermi

2015-10-16 Thread Samuel Pitoiset
For strange reasons, the signal id depends on the slot selected on Fermi
but not on Kepler. Fortunately, the signal ids are just offseted by the
slot id!

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 147 +++--
 1 file changed, 79 insertions(+), 68 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index 99e9073..6ee9fa6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -128,7 +128,7 @@ struct nvc0_hw_sm_counter_cfg
 {
uint32_t func: 16; /* mask or 4-bit logic op (depending on mode) */
uint32_t mode: 4;  /* LOGOP,B6,LOGOP_B6(_PULSE) */
-   uint32_t num_src : 3;  /* number of sources (1 - 6, only for NVC0:NVE4) */
+   uint32_t src_mask; /* mask for signal selection (only for NVC0:NVE4) */
uint32_t sig_dom : 1;  /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */
uint32_t sig_sel : 8;  /* signal group */
uint32_t src_sel;  /* signal selection for up to 4 sources */
@@ -280,78 +280,78 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
0x80001de7ULL
 };
 
-#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s }
+#define _C(f, o, g, m, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, m, 0, g, s }
 #define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = {  \
{ __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },\
 }
 
 static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] =
 {
-   _Q(ACTIVE_CYCLES,   1, _C(0x, LOGOP, 0x11, 0x)),
-   _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x0010),
-  _C(0x, LOGOP, 0x24, 0x0021),
-  _C(0x, LOGOP, 0x24, 0x0032),
-  _C(0x, LOGOP, 0x24, 0x0043),
-  _C(0x, LOGOP, 0x24, 0x0054),
-  _C(0x, LOGOP, 0x24, 0x0065)),
-   _Q(ATOM_COUNT,  1, _C(0x, LOGOP, 0x63, 0x0030)),
-   _Q(BRANCH,  2, _C(0x, LOGOP, 0x1a, 0x),
-  _C(0x, LOGOP, 0x1a, 0x0011)),
-   _Q(DIVERGENT_BRANCH,2, _C(0x, LOGOP, 0x19, 0x0020),
-  _C(0x, LOGOP, 0x19, 0x0031)),
-   _Q(GLD_REQUEST, 1, _C(0x, LOGOP, 0x64, 0x0030)),
-   _Q(GRED_COUNT,  1, _C(0x, LOGOP, 0x63, 0x0040)),
-   _Q(GST_REQUEST, 1, _C(0x, LOGOP, 0x64, 0x0060)),
-   _Q(INST_EXECUTED,   3, _C(0x, LOGOP, 0x2d, 0x),
-  _C(0x, LOGOP, 0x2d, 0x0011),
-  _C(0x, LOGOP, 0x2d, 0x0022)),
-   _Q(INST_ISSUED1_0,  1, _C(0x, LOGOP, 0x7e, 0x0010)),
-   _Q(INST_ISSUED1_1,  1, _C(0x, LOGOP, 0x7e, 0x0040)),
-   _Q(INST_ISSUED2_0,  1, _C(0x, LOGOP, 0x7e, 0x0020)),
-   _Q(INST_ISSUED2_1,  1, _C(0x, LOGOP, 0x7e, 0x0050)),
-   _Q(LOCAL_LD,1, _C(0x, LOGOP, 0x64, 0x0020)),
-   _Q(LOCAL_ST,1, _C(0x, LOGOP, 0x64, 0x0050)),
-   _Q(PROF_TRIGGER_0,  1, _C(0x, LOGOP, 0x01, 0x)),
-   _Q(PROF_TRIGGER_1,  1, _C(0x, LOGOP, 0x01, 0x0010)),
-   _Q(PROF_TRIGGER_2,  1, _C(0x, LOGOP, 0x01, 0x0020)),
-   _Q(PROF_TRIGGER_3,  1, _C(0x, LOGOP, 0x01, 0x0030)),
-   _Q(PROF_TRIGGER_4,  1, _C(0x, LOGOP, 0x01, 0x0040)),
-   _Q(PROF_TRIGGER_5,  1, _C(0x, LOGOP, 0x01, 0x0050)),
-   _Q(PROF_TRIGGER_6,  1, _C(0x, LOGOP, 0x01, 0x0060)),
-   _Q(PROF_TRIGGER_7,  1, _C(0x, LOGOP, 0x01, 0x0070)),
-   _Q(SHARED_LD,   1, _C(0x, LOGOP, 0x64, 0x0010)),
-   _Q(SHARED_ST,   1, _C(0x, LOGOP, 0x64, 0x0040)),
-   _Q(THREADS_LAUNCHED,6, _C(0x, LOGOP, 0x26, 0x0010),
-  _C(0x, LOGOP, 0x26, 0x0021),
-  _C(0x, LOGOP, 0x26, 0x0032),
-  _C(0x, LOGOP, 0x26, 0x0043),
-  _C(0x, LOGOP, 0x26, 0x0054),
-  _C(0x, LOGOP, 0x26, 0x0065)),
-   _Q(TH_INST_EXECUTED_0,  6, _C(0x, LOGOP, 0xa3, 0x),
-  _C(0x, LOGOP, 0xa3, 0x0011),
-  _C(0x, LOGOP, 0xa3, 0x0022),
-  _C(0x, LOGOP, 0xa3, 0x0033),
-  _C(0x, LOGOP, 0xa3, 0x0044),
-  _C(0x, LOGOP, 0xa3, 0x0055)),
-   _Q(TH_INST_EXECUTED_1,  6, _C(0x, LOGOP, 0xa5, 0x),
-  _C(0x, LOGOP, 0xa5, 0x0011),
-  _C(0x, LOGOP, 0xa5, 0x0022),
-

[Mesa-dev] [PATCH 13/16] nvc0: enable compute support by default on Fermi

2015-10-16 Thread Samuel Pitoiset
Compute support was not enabled by default because weird effects
on 3D state happened, but I can't reproduce them anymore.

This also enables MP performance counters by default on Fermi.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query.c  | 3 +--
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 7 +--
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
index c81b85a..80f311b 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
@@ -291,7 +291,6 @@ nvc0_screen_get_driver_query_info(struct pipe_screen 
*pscreen,
 count += NVE4_HW_SM_QUERY_COUNT;
  } else
  if (screen->base.class_3d < NVE4_3D_CLASS) {
-/* NVC0_COMPUTE is not always enabled */
 count += NVC0_HW_SM_QUERY_COUNT;
  }
   }
@@ -358,7 +357,7 @@ nvc0_screen_get_driver_query_group_info(struct pipe_screen 
*pscreen,
 count++;
  } else
  if (screen->base.class_3d < NVE4_3D_CLASS) {
-count++; /* NVC0_COMPUTE is not always enabled */
+count++;
  }
   }
}
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index ba53d10..d6a4ac5 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -561,12 +561,7 @@ nvc0_screen_init_compute(struct nvc0_screen *screen)
switch (screen->base.device->chipset & ~0xf) {
case 0xc0:
case 0xd0:
-  /* Using COMPUTE has weird effects on 3D state, we need to
-   * investigate this further before enabling it by default.
-   */
-  if (debug_get_bool_option("NVC0_COMPUTE", false))
- return nvc0_screen_compute_setup(screen, screen->base.pushbuf);
-  return 0;
+  return nvc0_screen_compute_setup(screen, screen->base.pushbuf);
case 0xe0:
   return nve4_screen_compute_setup(screen, screen->base.pushbuf);
case 0xf0:
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/16] nvc0: allow only one active query for the MP counters group

2015-10-16 Thread Samuel Pitoiset
Because we can't expose the number of hardware counters needed for each
different query, we don't want to allow more than one active query
simultaneously to avoid failure when the maximum number of counters
is reached. Note that these groups of GPU counters are currently only
used by AMD_performance_monitor.

Like for Kepler, this limits the maximum number of active queries
to 1 on Fermi.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
index f8d4ba1..c81b85a 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c
@@ -371,22 +371,20 @@ nvc0_screen_get_driver_query_group_info(struct 
pipe_screen *pscreen,
  info->name = "MP counters";
  info->type = PIPE_DRIVER_QUERY_GROUP_TYPE_GPU;
 
+ /* Because we can't expose the number of hardware counters needed for
+  * each different query, we don't want to allow more than one active
+  * query simultaneously to avoid failure when the maximum number of
+  * counters is reached. Note that these groups of GPU counters are
+  * currently only used by AMD_performance_monitor.
+  */
+ info->max_active_queries = 1;
+
  if (screen->base.class_3d == NVE4_3D_CLASS) {
 info->num_queries = NVE4_HW_SM_QUERY_COUNT;
-
- /* On NVE4+, each multiprocessor have 8 hardware counters 
separated
-  * in two distinct domains, but we allow only one active query
-  * simultaneously because some of them use more than one hardware
-  * counter and this will result in an undefined behaviour. */
- info->max_active_queries = 1; /* TODO: handle multiple hw 
counters */
- return 1;
+return 1;
  } else
  if (screen->base.class_3d < NVE4_3D_CLASS) {
 info->num_queries = NVC0_HW_SM_QUERY_COUNT;
-
-/* On NVC0:NVE4, each multiprocessor have 8 hardware counters
- * in a single domain. */
-info->max_active_queries = 8;
 return 1;
  }
   }
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/16] nvc0: add a note about MP counters on GF100/GF110

2015-10-16 Thread Samuel Pitoiset
MP counters on GF100/GF110 (compute capability 2.0) are buggy
because there is a context-switch problem that we need to fix.
Results might be wrong sometimes, be careful!

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index c4b40a6..c5ce3e3 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -286,6 +286,11 @@ static const struct nvc0_hw_sm_query_cfg 
nve4_hw_sm_queries[] =
 #undef _M2B
 
 /* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */
+/* NOTES:
+ * - MP counters on GF100/GF110 (compute capability 2.0) are buggy
+ *   because there is a context-switch problem that we need to fix.
+ *   Results might be wrong sometimes, be careful!
+ */
 static const char *nvc0_hw_sm_query_names[] =
 {
/* MP counters */
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/16] nvc0: add MP counters variants for GF100/GF110

2015-10-16 Thread Samuel Pitoiset
GF100 and GF110 chipsets are compute capability 2.0, while the other
Fermi chipsets are compute capability 2.1. That's why, some MP counters
are different between these chipsets and we need to handle variants.

Signed-off-by: Samuel Pitoiet 
---
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 559 ++---
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h|   1 +
 2 files changed, 483 insertions(+), 77 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index 174beef..c4b40a6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -298,6 +298,7 @@ static const char *nvc0_hw_sm_query_names[] =
"gred_count",
"gst_request",
"inst_executed",
+   "inst_issued",
"inst_issued1_0",
"inst_issued1_1",
"inst_issued2_0",
@@ -373,82 +374,456 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
 };
 
 #define _C(f, o, g, m, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, m, 0, g, s }
-#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = {  \
-   { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },\
-}
+#define _Q(n, c) [NVC0_HW_SM_QUERY_##n] = c
+
+/*  Compute capability 2.0 (GF100/GF110)  */
+static const struct nvc0_hw_sm_query_cfg
+nvc0_active_cycles =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x11, 0x00ff, 0x),
+   .num_counters = 1,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_active_warps =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0010),
+   .ctr[1]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0020),
+   .ctr[2]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0030),
+   .ctr[3]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0040),
+   .ctr[4]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0050),
+   .ctr[5]   = _C(0x, LOGOP, 0x24, 0x00ff, 0x0060),
+   .num_counters = 6,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_atom_count =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x63, 0x00ff, 0x0030),
+   .num_counters = 1,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_branch =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x1a, 0x00ff, 0x),
+   .ctr[1]   = _C(0x, LOGOP, 0x1a, 0x00ff, 0x0010),
+   .num_counters = 2,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_divergent_branch =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x19, 0x00ff, 0x0020),
+   .ctr[1]   = _C(0x, LOGOP, 0x19, 0x00ff, 0x0030),
+   .num_counters = 2,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_gld_request =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x64, 0x00ff, 0x0030),
+   .num_counters = 1,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_gred_count =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x63, 0x00ff, 0x0040),
+   .num_counters = 1,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_gst_request =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x64, 0x00ff, 0x0060),
+   .num_counters = 1,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_inst_executed =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x2d, 0x, 0x1000),
+   .ctr[1]   = _C(0x, LOGOP, 0x2d, 0x, 0x1010),
+   .num_counters = 2,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
+
+static const struct nvc0_hw_sm_query_cfg
+nvc0_inst_issued =
+{
+   .ctr[0]   = _C(0x, LOGOP, 0x27, 0x, 0x7060),
+   .ctr[1]   = _C(0x, LOGOP, 0x27, 0x, 0x7070),
+   .num_counters = 2,
+   .op   = NVC0_COUNTER_OPn_SUM,
+   .norm = { 1, 1 },
+};
 
-static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] =
-{
-   _Q(ACTIVE_CYCLES,   1, _C(0x, LOGOP, 0x11, 0x00ff, 0x)),
-   _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x00ff, 0x0010),
-  _C(0x, LOGOP, 0x24, 0x00ff, 0x0020),
-  _C(0x, LOGOP, 0x24, 0x00ff, 0x0030),
-  _C(0x, LOGOP, 0x24, 0x00ff, 0x0040),
-  _C(0x, LOGOP, 0x24, 0x00ff, 0x0050),
-  _C(0x, LOGOP, 0x24, 0x00ff, 0x0060)),
-   _Q(ATOM_COUNT,  1, _C(0x, LOGOP, 0x6

[Mesa-dev] [PATCH 00/16] nvc0: improve MP counters support on Fermi

2015-10-16 Thread Samuel Pitoiset
Hello,

This series fixes some issues related to MP performance counters on Fermi.

MP counters for GF100/GF110 have also been improved because they are compute
capability 2.0 while the other Fermi chipsets are 2.1 and some HW events are
different.

Compute support is now enabled by default on Fermi because I can't reproduce
those weird effects on 3D state. This has been probably fixed as a side effect.
Anyway, if someone complain about it, I'll be glad to fix it.

This series has been tested with the following apps:
 - xonotic-glx
 - heaven
 - valley
 - glxgears, glxspheres64 and so on ...

And with the following Fermi chipsets:
 - GF100
 - GF108
 - GF110
 - GF114
 - GF116
 - GF119

Note that with GF100/GF110, some MP counters are not correctly context-switched
and results might be wrong. This is a known issue that we need to fix on the
Nouveau side.

There are no regressions with piglit.

I'll submit an other series in the next few days which adds some performance
monitoring metrics on Fermi and I'll double check MP counters on Kepler.

Thanks.

Samuel Pitoiset (16):
  nvc0: remove useless call to query_get_cfg() in nvc0_hw_sm_query_end()
  nvc0: split out begin_query() hook used by MP counters
  nvc0: rip off the kepler MP-enabling logic from the Fermi codepath
  nvc0: correctly enable the MP counters' multiplexer on Fermi
  nvc0: fix sequence field init for MP counters on Fermi
  nvc0: allow to use 8 MP counters on Fermi
  nvc0: fix queries which use multiple MP counters on Fermi
  nvc0: fix monitoring multiple MP counters queries on Fermi
  nvc0: fix unaligned mem access when reading MP counters on Fermi
  nvc0: store the number of GPCs to nvc0_screen
  nvc0: read MP counters of all GPCs on Fermi
  nvc0: allow only one active query for the MP counters group
  nvc0: enable compute support by default on Fermi
  nvc0: move SW/HW queries info to their respective files
  nvc0: add MP counters variants for GF100/GF110
  nvc0: add a note about MP counters on GF100/GF110

 src/gallium/drivers/nouveau/nvc0/nvc0_query.c  | 208 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c   |  14 +
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.h   |   3 +
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 809 ++---
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h|   7 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_query_sw.c   |  64 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_query_sw.h   |   3 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   8 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.h |   1 +
 9 files changed, 834 insertions(+), 283 deletions(-)

-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/16] nvc0: read MP counters of all GPCs on Fermi

2015-10-16 Thread Samuel Pitoiset
When a card has more than one GPC, the grid used by the compute
kernel which reads MP performance counters seems to be too small.
The consequence is that the kernel is not launched on all TPCs.

Increasing the grid size using the number of GPCs now launches
enough blocks and we can read MP performance counters of all TPCs.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index 840f200..c22ad4b 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -528,7 +528,7 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
uint32_t mask;
uint32_t input[3];
const uint block[3] = { 32, is_nve4 ? 4 : 1, 1 };
-   const uint grid[3] = { screen->mp_count, 1, 1 };
+   const uint grid[3] = { screen->mp_count, screen->gpc_count, 1 };
unsigned c;
 
if (unlikely(!screen->pm.prog)) {
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/16] nvc0: fix sequence field init for MP counters on Fermi

2015-10-16 Thread Samuel Pitoiset
Sequence fields are located at MP[i] + 0x20 in the buffer object.
This is used to check if result is available for MP[i].

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index f7b49da..b810d25 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -439,8 +439,10 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
PUSH_SPACE(push, 4 * 8 * 6 + 4);
 
/* set sequence field to 0 (used to check if result is available) */
-   for (i = 0; i < screen->mp_count; ++i)
-  hq->data[i * 10 + 10] = 0;
+   for (i = 0; i < screen->mp_count; ++i) {
+  const unsigned b = (0x24 / 4) * i;
+  hq->data[b + 8] = 0;
+   }
hq->sequence++;
 
for (i = 0; i < cfg->num_counters; ++i) {
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/16] nvc0: correctly enable the MP counters' multiplexer on Fermi

2015-10-16 Thread Samuel Pitoiset
Writing 0x408000 to 0x419e00 (like on Kepler) has no effect on Fermi
because we only have one domain of 8 counters. Instead, we have to
write 0x8000.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index a74bfee..f7b49da 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -448,11 +448,8 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
   unsigned s;
 
   if (!screen->pm.num_hw_sm_active[d]) {
- uint32_t m = (1 << 22) | (1 << (7 + (8 * !d)));
- if (screen->pm.num_hw_sm_active[!d])
-m |= 1 << (7 + (8 * d));
  BEGIN_NVC0(push, SUBC_SW(0x0600), 1);
- PUSH_DATA (push, m);
+ PUSH_DATA (push, 0x8000);
   }
   screen->pm.num_hw_sm_active[d]++;
 
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/16] nvc0: fix queries which use multiple MP counters on Fermi

2015-10-16 Thread Samuel Pitoiset
Queries which use more than one MP counters was misconfigured and
computing the final result was also wrong because sources need to
be configured on different hardware counters instead.

According to the blob, computing the result is now as follows:

FOR  i..n
val += ctr[i] * pow(2, i)

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 128 +
 1 file changed, 81 insertions(+), 47 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index 2060662..99e9073 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -131,7 +131,7 @@ struct nvc0_hw_sm_counter_cfg
uint32_t num_src : 3;  /* number of sources (1 - 6, only for NVC0:NVE4) */
uint32_t sig_dom : 1;  /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */
uint32_t sig_sel : 8;  /* signal group */
-   uint64_t src_sel;  /* signal selection for up to 6 sources (48 bit) */
+   uint32_t src_sel;  /* signal selection for up to 4 sources */
 };
 
 #define NVC0_COUNTER_OPn_SUM0
@@ -280,44 +280,82 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] =
0x80001de7ULL
 };
 
-#define _Q(n, f, m, g, c, s0, s1, s2, s3, s4, s5) [NVC0_HW_SM_QUERY_##n] = { { 
{ f, NVC0_COMPUTE_MP_PM_OP_MODE_##m, c, 0, g, s0|(s1 << 8)|(s2 << 16)|(s3 << 
24)|(s4##ULL << 32)|(s5##ULL << 40) }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { 
1, 1 } }
+#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s }
+#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = {  \
+   { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },\
+}
 
 static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] =
 {
-   _Q(ACTIVE_CYCLES,   0x, LOGOP, 0x11, 1, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(ACTIVE_WARPS,0x, LOGOP, 0x24, 6, 0x10, 0x21, 0x32, 0x43, 
0x54, 0x65),
-   _Q(ATOM_COUNT,  0x, LOGOP, 0x63, 1, 0x30, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(BRANCH,  0x, LOGOP, 0x1a, 2, 0x00, 0x11, 0x00, 0x00, 
0x00, 0x00),
-   _Q(DIVERGENT_BRANCH,0x, LOGOP, 0x19, 2, 0x20, 0x31, 0x00, 0x00, 
0x00, 0x00),
-   _Q(GLD_REQUEST, 0x, LOGOP, 0x64, 1, 0x30, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(GRED_COUNT,  0x, LOGOP, 0x63, 1, 0x40, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(GST_REQUEST, 0x, LOGOP, 0x64, 1, 0x60, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(INST_EXECUTED,   0x, LOGOP, 0x2d, 3, 0x00, 0x11, 0x22, 0x00, 
0x00, 0x00),
-   _Q(INST_ISSUED1_0,  0x, LOGOP, 0x7e, 1, 0x10, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(INST_ISSUED1_1,  0x, LOGOP, 0x7e, 1, 0x40, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(INST_ISSUED2_0,  0x, LOGOP, 0x7e, 1, 0x20, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(INST_ISSUED2_1,  0x, LOGOP, 0x7e, 1, 0x50, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(LOCAL_LD,0x, LOGOP, 0x64, 1, 0x20, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(LOCAL_ST,0x, LOGOP, 0x64, 1, 0x50, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_0,  0x, LOGOP, 0x01, 1, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_1,  0x, LOGOP, 0x01, 1, 0x10, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_2,  0x, LOGOP, 0x01, 1, 0x20, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_3,  0x, LOGOP, 0x01, 1, 0x30, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_4,  0x, LOGOP, 0x01, 1, 0x40, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_5,  0x, LOGOP, 0x01, 1, 0x50, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_6,  0x, LOGOP, 0x01, 1, 0x60, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(PROF_TRIGGER_7,  0x, LOGOP, 0x01, 1, 0x70, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(SHARED_LD,   0x, LOGOP, 0x64, 1, 0x10, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(SHARED_ST,   0x, LOGOP, 0x64, 1, 0x40, 0x00, 0x00, 0x00, 
0x00, 0x00),
-   _Q(THREADS_LAUNCHED,0x, LOGOP, 0x26, 6, 0x10, 0x21, 0x32, 0x43, 
0x54, 0x65),
-   _Q(TH_INST_EXECUTED_0,  0x, LOGOP, 0xa3, 6, 0x00, 0x11, 0x22, 0x33, 
0x44, 0x55),
-   _Q(TH_INST_EXECUTED_1,  0x, LOGOP, 0xa5, 6, 0x00, 0x11, 0x22, 0x33, 
0x44, 0x55),
-   _Q(TH_INST_EXECUTED_2,  0x, LOGOP, 0xa4, 6, 0x00, 0x11, 0x22, 0x33, 
0x44, 0x55),
-   _Q(TH_INST_EXECUTED_3,  0x, LOGOP, 0xa6, 6, 0x00, 0x11, 0x22, 0x33, 
0x44, 0x55),
-   _Q(WARPS_LAUNCHED,  0x, LOGOP, 0x26, 1, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00),
+   _Q(ACTIVE_CYCLES,   1, _C(0x, LOGOP, 0x11, 0x)),
+   _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x0010),
+  _C(0x, LOGOP, 0x24, 0x0021),
+  _C(0x, LOGOP, 0x24, 0x0032),
+  _C(0x, LOGOP, 0x24, 0x0043),
+ 

[Mesa-dev] [PATCH 01/16] nvc0: remove useless call to query_get_cfg() in nvc0_hw_sm_query_end()

2015-10-16 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index 3bdb90a..8e2239f 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -439,9 +439,6 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
const uint block[3] = { 32, is_nve4 ? 4 : 1, 1 };
const uint grid[3] = { screen->mp_count, 1, 1 };
unsigned c;
-   const struct nvc0_hw_sm_query_cfg *cfg;
-
-   cfg = nvc0_hw_sm_query_get_cfg(nvc0, hq);
 
if (unlikely(!screen->pm.prog)) {
   struct nvc0_program *prog = CALLOC_STRUCT(nvc0_program);
@@ -495,6 +492,7 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
PUSH_SPACE(push, 16);
mask = 0;
for (c = 0; c < 8; ++c) {
+  const struct nvc0_hw_sm_query_cfg *cfg;
   unsigned i;
 
   hsq = screen->pm.mp_counter[c];
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/16] nvc0: rip off the kepler MP-enabling logic from the Fermi codepath

2015-10-16 Thread Samuel Pitoiset
Writing 0x1fcb to 0x419eac is definitely not related to MP counters and
has no effect on Fermi (although this enables MP counters on Kepler).

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index f83966a..a74bfee 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -436,13 +436,7 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
}
 
assert(cfg->num_counters <= 4);
-   PUSH_SPACE(push, 4 * 8 * 6 + 6);
-
-   if (!screen->pm.mp_counters_enabled) {
-  screen->pm.mp_counters_enabled = true;
-  BEGIN_NVC0(push, SUBC_SW(0x06ac), 1);
-  PUSH_DATA (push, 0x1fcb);
-   }
+   PUSH_SPACE(push, 4 * 8 * 6 + 4);
 
/* set sequence field to 0 (used to check if result is available) */
for (i = 0; i < screen->mp_count; ++i)
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/16] nvc0: split out begin_query() hook used by MP counters

2015-10-16 Thread Samuel Pitoiset
The way we configure MP performance counters is going to pretty
different between Fermi and Kepler. Having two separate functions
is much better.

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 108 -
 1 file changed, 84 insertions(+), 24 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index 8e2239f..f83966a 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -338,16 +338,91 @@ nvc0_hw_sm_destroy_query(struct nvc0_context *nvc0, 
struct nvc0_hw_query *hq)
 }
 
 static boolean
+nve4_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq)
+{
+   struct nvc0_screen *screen = nvc0->screen;
+   struct nouveau_pushbuf *push = nvc0->base.pushbuf;
+   struct nvc0_hw_sm_query *hsq = nvc0_hw_sm_query(hq);
+   const struct nvc0_hw_sm_query_cfg *cfg;
+   unsigned i, c;
+   unsigned num_ab[2] = { 0, 0 };
+
+   cfg = nvc0_hw_sm_query_get_cfg(nvc0, hq);
+
+   /* check if we have enough free counter slots */
+   for (i = 0; i < cfg->num_counters; ++i)
+  num_ab[cfg->ctr[i].sig_dom]++;
+
+   if (screen->pm.num_hw_sm_active[0] + num_ab[0] > 4 ||
+   screen->pm.num_hw_sm_active[1] + num_ab[1] > 4) {
+  NOUVEAU_ERR("Not enough free MP counter slots !\n");
+  return false;
+   }
+
+   assert(cfg->num_counters <= 4);
+   PUSH_SPACE(push, 4 * 8 * + 6);
+
+   if (!screen->pm.mp_counters_enabled) {
+  screen->pm.mp_counters_enabled = true;
+  BEGIN_NVC0(push, SUBC_SW(0x06ac), 1);
+  PUSH_DATA (push, 0x1fcb);
+   }
+
+   /* set sequence field to 0 (used to check if result is available) */
+   for (i = 0; i < screen->mp_count; ++i)
+  hq->data[i * 10 + 10] = 0;
+   hq->sequence++;
+
+   for (i = 0; i < cfg->num_counters; ++i) {
+  const unsigned d = cfg->ctr[i].sig_dom;
+
+  if (!screen->pm.num_hw_sm_active[d]) {
+ uint32_t m = (1 << 22) | (1 << (7 + (8 * !d)));
+ if (screen->pm.num_hw_sm_active[!d])
+m |= 1 << (7 + (8 * d));
+ BEGIN_NVC0(push, SUBC_SW(0x0600), 1);
+ PUSH_DATA (push, m);
+  }
+  screen->pm.num_hw_sm_active[d]++;
+
+  for (c = d * 4; c < (d * 4 + 4); ++c) {
+ if (!screen->pm.mp_counter[c]) {
+hsq->ctr[i] = c;
+screen->pm.mp_counter[c] = hsq;
+break;
+ }
+  }
+  assert(c <= (d * 4 + 3)); /* must succeed, already checked for space */
+
+  /* configure and reset the counter(s) */
+ if (d == 0)
+BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_A_SIGSEL(c & 3)), 1);
+ else
+BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_B_SIGSEL(c & 3)), 1);
+ PUSH_DATA (push, cfg->ctr[i].sig_sel);
+ BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_SRCSEL(c)), 1);
+ PUSH_DATA (push, cfg->ctr[i].src_sel + 0x2108421 * (c & 3));
+ BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_FUNC(c)), 1);
+ PUSH_DATA (push, (cfg->ctr[i].func << 4) | cfg->ctr[i].mode);
+ BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_SET(c)), 1);
+ PUSH_DATA (push, 0);
+   }
+   return true;
+}
+
+static boolean
 nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq)
 {
struct nvc0_screen *screen = nvc0->screen;
struct nouveau_pushbuf *push = nvc0->base.pushbuf;
-   const bool is_nve4 = screen->base.class_3d >= NVE4_3D_CLASS;
struct nvc0_hw_sm_query *hsq = nvc0_hw_sm_query(hq);
const struct nvc0_hw_sm_query_cfg *cfg;
unsigned i, c;
unsigned num_ab[2] = { 0, 0 };
 
+   if (screen->base.class_3d >= NVE4_3D_CLASS)
+  return nve4_hw_sm_begin_query(nvc0, hq);
+
cfg = nvc0_hw_sm_query_get_cfg(nvc0, hq);
 
/* check if we have enough free counter slots */
@@ -361,7 +436,7 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
}
 
assert(cfg->num_counters <= 4);
-   PUSH_SPACE(push, 4 * 8 * (is_nve4 ? 1 : 6) + 6);
+   PUSH_SPACE(push, 4 * 8 * 6 + 6);
 
if (!screen->pm.mp_counters_enabled) {
   screen->pm.mp_counters_enabled = true;
@@ -376,6 +451,7 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
 
for (i = 0; i < cfg->num_counters; ++i) {
   const unsigned d = cfg->ctr[i].sig_dom;
+  unsigned s;
 
   if (!screen->pm.num_hw_sm_active[d]) {
  uint32_t m = (1 << 22) | (1 << (7 + (8 * !d)));
@@ -396,31 +472,15 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
   assert(c <= (d * 4 + 3)); /* must succeed, already checked for space */
 
   /* configure and reset the counter(s) */
-  if (is_nve4) {
- if (d == 0)
-BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_A_SIGSEL(c & 3)), 1);
- else
-BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_B_SIGSEL(c & 3)), 1);
+  for (s = 0; s < cfg->ctr[i].num_src; s++) {
+ BEGIN_NVC0(push, NVC0_COMPUTE(MP_PM_SIGSEL(s)), 1);
  PUSH_DATA (push, cfg

[Mesa-dev] [PATCH 06/16] nvc0: allow to use 8 MP counters on Fermi

2015-10-16 Thread Samuel Pitoiset
On Fermi, we have one domain of 8 MP counters while we have
two domains of 4 MP counters on Kepler.

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 30 +-
 .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h|  2 +-
 2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
index b810d25..2060662 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c
@@ -144,7 +144,7 @@ struct nvc0_hw_sm_counter_cfg
 
 struct nvc0_hw_sm_query_cfg
 {
-   struct nvc0_hw_sm_counter_cfg ctr[4];
+   struct nvc0_hw_sm_counter_cfg ctr[8];
uint8_t num_counters;
uint8_t op;
uint8_t norm[2]; /* normalization num,denom */
@@ -418,7 +418,6 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
struct nvc0_hw_sm_query *hsq = nvc0_hw_sm_query(hq);
const struct nvc0_hw_sm_query_cfg *cfg;
unsigned i, c;
-   unsigned num_ab[2] = { 0, 0 };
 
if (screen->base.class_3d >= NVE4_3D_CLASS)
   return nve4_hw_sm_begin_query(nvc0, hq);
@@ -426,17 +425,13 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
cfg = nvc0_hw_sm_query_get_cfg(nvc0, hq);
 
/* check if we have enough free counter slots */
-   for (i = 0; i < cfg->num_counters; ++i)
-  num_ab[cfg->ctr[i].sig_dom]++;
-
-   if (screen->pm.num_hw_sm_active[0] + num_ab[0] > 4 ||
-   screen->pm.num_hw_sm_active[1] + num_ab[1] > 4) {
+   if (screen->pm.num_hw_sm_active[0] + cfg->num_counters > 8) {
   NOUVEAU_ERR("Not enough free MP counter slots !\n");
   return false;
}
 
-   assert(cfg->num_counters <= 4);
-   PUSH_SPACE(push, 4 * 8 * 6 + 4);
+   assert(cfg->num_counters <= 8);
+   PUSH_SPACE(push, 4 * 8 * 6 + 2);
 
/* set sequence field to 0 (used to check if result is available) */
for (i = 0; i < screen->mp_count; ++i) {
@@ -446,23 +441,21 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
hq->sequence++;
 
for (i = 0; i < cfg->num_counters; ++i) {
-  const unsigned d = cfg->ctr[i].sig_dom;
   unsigned s;
 
-  if (!screen->pm.num_hw_sm_active[d]) {
+  if (!screen->pm.num_hw_sm_active[0]) {
  BEGIN_NVC0(push, SUBC_SW(0x0600), 1);
  PUSH_DATA (push, 0x8000);
   }
-  screen->pm.num_hw_sm_active[d]++;
+  screen->pm.num_hw_sm_active[0]++;
 
-  for (c = d * 4; c < (d * 4 + 4); ++c) {
+  for (c = 0; c < 8; ++c) {
  if (!screen->pm.mp_counter[c]) {
 hsq->ctr[i] = c;
 screen->pm.mp_counter[c] = hsq;
 break;
  }
   }
-  assert(c <= (d * 4 + 3)); /* must succeed, already checked for space */
 
   /* configure and reset the counter(s) */
   for (s = 0; s < cfg->ctr[i].num_src; s++) {
@@ -522,7 +515,8 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
/* release counters for this query */
for (c = 0; c < 8; ++c) {
   if (screen->pm.mp_counter[c] == hsq) {
- screen->pm.num_hw_sm_active[c / 4]--;
+ uint8_t d = is_nve4 ? c / 4 : 0; /* only one domain for NVC0:NVE4 */
+ screen->pm.num_hw_sm_active[d]--;
  screen->pm.mp_counter[c] = NULL;
   }
}
@@ -568,7 +562,7 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct 
nvc0_hw_query *hq)
 }
 
 static inline bool
-nvc0_hw_sm_query_read_data(uint32_t count[32][4],
+nvc0_hw_sm_query_read_data(uint32_t count[32][8],
struct nvc0_context *nvc0, bool wait,
struct nvc0_hw_query *hq,
const struct nvc0_hw_sm_query_cfg *cfg,
@@ -594,7 +588,7 @@ nvc0_hw_sm_query_read_data(uint32_t count[32][4],
 }
 
 static inline bool
-nve4_hw_sm_query_read_data(uint32_t count[32][4],
+nve4_hw_sm_query_read_data(uint32_t count[32][8],
struct nvc0_context *nvc0, bool wait,
struct nvc0_hw_query *hq,
const struct nvc0_hw_sm_query_cfg *cfg,
@@ -640,7 +634,7 @@ static boolean
 nvc0_hw_sm_get_query_result(struct nvc0_context *nvc0, struct nvc0_hw_query 
*hq,
 boolean wait, union pipe_query_result *result)
 {
-   uint32_t count[32][4];
+   uint32_t count[32][8];
uint64_t value = 0;
unsigned mp_count = MIN2(nvc0->screen->mp_count_compute, 32);
unsigned p, c;
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.h 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.h
index bab6f34..0ad8a91 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.h
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.h
@@ -5,7 +5,7 @@
 
 struct nvc0_hw_sm_query {
struct nvc0_hw_query base;
-   int8_t ctr[4];
+   uint8_t ctr[8];
 };
 
 static inline struct nvc0_hw_sm_query *
-- 
2.6.1


Re: [Mesa-dev] [PATCH 1/2] Used correct parameter to derive the value of the "h" variable in vlVaCreateImage

2015-10-16 Thread Emil Velikov
On 16 October 2015 at 15:53, Christian König  wrote:
> From: Indrajit Das 
>
> Reviewed-by: Christian König 
Nicely spotted !

For the future use correct prefix for the summary ("st/va:" here, but
git log will show you the way in other places) and add the stable tag
on bugfixes. There is no need to resend - things can be squashed at
commit time by Christian/me.

Cc: "11.0" 
Reviewed-by: Emil Velikov 

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] Added support for NV12 to IYUV conversion in vlVaGetImage

2015-10-16 Thread Emil Velikov
Hi guys,

Out of curiosity - do you know off-hand about any users of IYUV/I420 ?
I was under the impression that everyone is doing nv12/nv21/yv12 in
99% of the cases.

On 16 October 2015 at 15:53, Christian König  wrote:
> From: Indrajit Das 
>
> Reviewed-by: Christian König 
> ---
>  src/gallium/state_trackers/va/image.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/state_trackers/va/image.c 
> b/src/gallium/state_trackers/va/image.c
> index 3b36430..b37a971 100644
> --- a/src/gallium/state_trackers/va/image.c
> +++ b/src/gallium/state_trackers/va/image.c
> @@ -240,9 +240,11 @@ vlVaGetImage(VADriverContextP ctx, VASurfaceID surface, 
> int x, int y,
>return VA_STATUS_ERROR_OPERATION_FAILED;
>
> if (format != surf->buffer->buffer_format) {
> -  /* support NV12 to YV12 conversion now only */
> -  if (format == PIPE_FORMAT_YV12 &&
> -  surf->buffer->buffer_format == PIPE_FORMAT_NV12)
> +  /* support NV12 to YV12 and IYUV conversion now only */
> +  if ((format == PIPE_FORMAT_YV12 &&
> +  surf->buffer->buffer_format == PIPE_FORMAT_NV12) ||
> +  (format == PIPE_FORMAT_IYUV &&
> +  surf->buffer->buffer_format == PIPE_FORMAT_NV12))
I was just about to mention that we're not flipping the planes
pointers (and their respective info) only to see that we're checking
vs VA_FOURCC a few lines below :-)

Perhaps we should add a couple of trivial helpers u_copy_nv12_to_iyuv
and u_copy_iyuv_to_nv12, which wrap around their yv12 brethren ? This
way one can use them directly - be that here, in vdpau and/or
elsewhere.

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/17] i965/fs: Move some of the prog_data setup into brw_wm_emit

2015-10-16 Thread Jason Ekstrand
On Fri, Oct 16, 2015 at 12:35 AM, Pohjolainen, Topi
 wrote:
> On Fri, Oct 09, 2015 at 05:50:22AM -0700, Jason Ekstrand wrote:
>> On Fri, Oct 9, 2015 at 12:10 AM, Pohjolainen, Topi
>>  wrote:
>> > On Thu, Oct 08, 2015 at 05:22:47PM -0700, Jason Ekstrand wrote:
>> >> This commit moves the common/modern stuff.  Some legacy stuff such as
>> >> setting use_alt_mode was left because it needs to know whether or not 
>> >> we're
>> >> an ARB program.
>> >> ---
>> >>  src/mesa/drivers/dri/i965/brw_fs.cpp | 98 
>> >> 
>> >>  src/mesa/drivers/dri/i965/brw_wm.c   | 98 
>> >> 
>> >>  2 files changed, 98 insertions(+), 98 deletions(-)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> >> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> index 146f4b4..0e39b50 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> @@ -5114,6 +5114,90 @@ fs_visitor::run_cs()
>> >> return !failed;
>> >>  }
>> >>
>> >> +/**
>> >> + * Return a bitfield where bit n is set if barycentric interpolation 
>> >> mode n
>> >> + * (see enum brw_wm_barycentric_interp_mode) is needed by the fragment 
>> >> shader.
>> >> + */
>> >> +static unsigned
>> >> +brw_compute_barycentric_interp_modes(const struct brw_device_info 
>> >> *devinfo,
>> >> + bool shade_model_flat,
>> >> + bool persample_shading,
>> >> + const nir_shader *shader)
>> >> +{
>> >> +   unsigned barycentric_interp_modes = 0;
>> >> +
>> >> +   nir_foreach_variable(var, &shader->inputs) {
>> >> +  enum glsl_interp_qualifier interp_qualifier =
>> >> + (enum glsl_interp_qualifier)var->data.interpolation;
>> >> +  bool is_centroid = var->data.centroid && !persample_shading;
>> >> +  bool is_sample = var->data.sample || persample_shading;
>> >> +  bool is_gl_Color = (var->data.location == VARYING_SLOT_COL0) ||
>> >> + (var->data.location == VARYING_SLOT_COL1);
>> >> +
>> >> +  /* Ignore WPOS and FACE, because they don't require interpolation. 
>> >> */
>> >> +  if (var->data.location == VARYING_SLOT_POS ||
>> >> +  var->data.location == VARYING_SLOT_FACE)
>> >> + continue;
>> >> +
>> >> +  /* Determine the set (or sets) of barycentric coordinates needed to
>> >> +   * interpolate this variable.  Note that when
>> >> +   * brw->needs_unlit_centroid_workaround is set, centroid 
>> >> interpolation
>> >> +   * uses PIXEL interpolation for unlit pixels and CENTROID 
>> >> interpolation
>> >> +   * for lit pixels, so we need both sets of barycentric coordinates.
>> >> +   */
>> >> +  if (interp_qualifier == INTERP_QUALIFIER_NOPERSPECTIVE) {
>> >> + if (is_centroid) {
>> >> +barycentric_interp_modes |=
>> >> +   1 << BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC;
>> >> + } else if (is_sample) {
>> >> +barycentric_interp_modes |=
>> >> +   1 << BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC;
>> >> + }
>> >> + if ((!is_centroid && !is_sample) ||
>> >> + devinfo->needs_unlit_centroid_workaround) {
>> >> +barycentric_interp_modes |=
>> >> +   1 << BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC;
>> >> + }
>> >> +  } else if (interp_qualifier == INTERP_QUALIFIER_SMOOTH ||
>> >> + (!(shade_model_flat && is_gl_Color) &&
>> >> +  interp_qualifier == INTERP_QUALIFIER_NONE)) {
>> >> + if (is_centroid) {
>> >> +barycentric_interp_modes |=
>> >> +   1 << BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC;
>> >> + } else if (is_sample) {
>> >> +barycentric_interp_modes |=
>> >> +   1 << BRW_WM_PERSPECTIVE_SAMPLE_BARYCENTRIC;
>> >> + }
>> >> + if ((!is_centroid && !is_sample) ||
>> >> + devinfo->needs_unlit_centroid_workaround) {
>> >> +barycentric_interp_modes |=
>> >> +   1 << BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC;
>> >> + }
>> >> +  }
>> >> +   }
>> >> +
>> >> +   return barycentric_interp_modes;
>> >> +}
>> >> +
>> >> +static uint8_t
>> >> +computed_depth_mode(const nir_shader *shader)
>> >> +{
>> >> +   if (shader->info.outputs_written & BITFIELD64_BIT(FRAG_RESULT_DEPTH)) 
>> >> {
>> >> +  switch (shader->info.fs.depth_layout) {
>> >> +  case FRAG_DEPTH_LAYOUT_NONE:
>> >> +  case FRAG_DEPTH_LAYOUT_ANY:
>> >> + return BRW_PSCDEPTH_ON;
>> >> +  case FRAG_DEPTH_LAYOUT_GREATER:
>> >> + return BRW_PSCDEPTH_ON_GE;
>> >> +  case FRAG_DEPTH_LAYOUT_LESS:
>> >> + return BRW_PSCDEPTH_ON_LE;
>> >> +  case FRAG_DEPTH_LAYOUT_UNCHANGED:
>> >> + return BRW_PSCDEPTH_OFF;
>> >> +  }
>> >> +   }
>> >> +   return BRW_PSCDEPTH_OFF;
>> >> +}
>> >> +
>> >>  const unsigned *
>

Re: [Mesa-dev] [PATCH] i965/vs: Move URB entry_size and read_length calculations to compile_vs

2015-10-16 Thread Jason Ekstrand
Topi,
Seeing as you're on a roll reviewing my move-the-code patches, mind one more?
--Jason

On Thu, Oct 15, 2015 at 12:06 PM, Jason Ekstrand  wrote:
> This patch applies on top of my previous series to shuffle a bunch of
> the compiler code around.
>
> On Thu, Oct 15, 2015 at 12:05 PM, Jason Ekstrand  wrote:
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 34 
>> ++
>>  src/mesa/drivers/dri/i965/brw_vs.c | 34 
>> --
>>  2 files changed, 34 insertions(+), 34 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index ca4d23a..00e2d63 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -1933,6 +1933,40 @@ brw_compile_vs(const struct brw_compiler *compiler, 
>> void *log_data,
>>  {
>> const unsigned *assembly = NULL;
>>
>> +   unsigned nr_attributes = _mesa_bitcount_64(prog_data->inputs_read);
>> +
>> +   /* gl_VertexID and gl_InstanceID are system values, but arrive via an
>> +* incoming vertex attribute.  So, add an extra slot.
>> +*/
>> +   if (shader->info.system_values_read &
>> +   (BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) |
>> +BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID))) {
>> +  nr_attributes++;
>> +   }
>> +
>> +   /* The 3DSTATE_VS documentation lists the lower bound on "Vertex URB 
>> Entry
>> +* Read Length" as 1 in vec4 mode, and 0 in SIMD8 mode.  Empirically, in
>> +* vec4 mode, the hardware appears to wedge unless we read something.
>> +*/
>> +   if (compiler->scalar_vs)
>> +  prog_data->base.urb_read_length = DIV_ROUND_UP(nr_attributes, 2);
>> +   else
>> +  prog_data->base.urb_read_length = DIV_ROUND_UP(MAX2(nr_attributes, 
>> 1), 2);
>> +
>> +   prog_data->nr_attributes = nr_attributes;
>> +
>> +   /* Since vertex shaders reuse the same VUE entry for inputs and outputs
>> +* (overwriting the original contents), we need to make sure the size is
>> +* the larger of the two.
>> +*/
>> +   const unsigned vue_entries =
>> +  MAX2(nr_attributes, (unsigned)prog_data->base.vue_map.num_slots);
>> +
>> +   if (compiler->devinfo->gen == 6)
>> +  prog_data->base.urb_entry_size = DIV_ROUND_UP(vue_entries, 8);
>> +   else
>> +  prog_data->base.urb_entry_size = DIV_ROUND_UP(vue_entries, 4);
>> +
>> if (compiler->scalar_vs) {
>>prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
>> b/src/mesa/drivers/dri/i965/brw_vs.c
>> index 6c161d0..c9afc63 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vs.c
>> +++ b/src/mesa/drivers/dri/i965/brw_vs.c
>> @@ -160,40 +160,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
>> &prog_data.base.vue_map, outputs_written,
>> prog ? prog->SeparateShader : false);
>>
>> -   unsigned nr_attributes = _mesa_bitcount_64(prog_data.inputs_read);
>> -
>> -   /* gl_VertexID and gl_InstanceID are system values, but arrive via an
>> -* incoming vertex attribute.  So, add an extra slot.
>> -*/
>> -   if (vp->program.Base.SystemValuesRead &
>> -   (BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) |
>> -BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID))) {
>> -  nr_attributes++;
>> -   }
>> -
>> -   /* The 3DSTATE_VS documentation lists the lower bound on "Vertex URB 
>> Entry
>> -* Read Length" as 1 in vec4 mode, and 0 in SIMD8 mode.  Empirically, in
>> -* vec4 mode, the hardware appears to wedge unless we read something.
>> -*/
>> -   if (brw->intelScreen->compiler->scalar_vs)
>> -  prog_data.base.urb_read_length = DIV_ROUND_UP(nr_attributes, 2);
>> -   else
>> -  prog_data.base.urb_read_length = DIV_ROUND_UP(MAX2(nr_attributes, 1), 
>> 2);
>> -
>> -   prog_data.nr_attributes = nr_attributes;
>> -
>> -   /* Since vertex shaders reuse the same VUE entry for inputs and outputs
>> -* (overwriting the original contents), we need to make sure the size is
>> -* the larger of the two.
>> -*/
>> -   const unsigned vue_entries =
>> -  MAX2(nr_attributes, prog_data.base.vue_map.num_slots);
>> -
>> -   if (brw->gen == 6)
>> -  prog_data.base.urb_entry_size = DIV_ROUND_UP(vue_entries, 8);
>> -   else
>> -  prog_data.base.urb_entry_size = DIV_ROUND_UP(vue_entries, 4);
>> -
>> if (0) {
>>_mesa_fprint_program_opt(stderr, &vp->program.Base, PROG_PRINT_DEBUG,
>>true);
>> --
>> 2.5.0.400.gff86faf
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: convert programs to gl_program *

2015-10-16 Thread Brian Paul

On 10/16/2015 01:16 AM, Boyan Ding wrote:

Fixes following warnings:
state_tracker/st_cb_program.c: In function ‘st_new_program’:
state_tracker/st_cb_program.c:108:36: warning: passing argument 1 of
‘_mesa_init_gl_program’ from incompatible pointer type
[-Wincompatible-pointer-types]
return _mesa_init_gl_program(&prog->Base, target, id);
 ^
In file included from state_tracker/st_cb_program.c:38:0:
./program/program.h:67:1: note: expected ‘struct gl_program *’ but
argument is of type ‘struct gl_vertex_program *’
  _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint
id);
  ^
state_tracker/st_cb_program.c:112:36: warning: passing argument 1 of
‘_mesa_init_gl_program’ from incompatible pointer type
[-Wincompatible-pointer-types]
return _mesa_init_gl_program(&prog->Base, target, id);
 ^
In file included from state_tracker/st_cb_program.c:38:0:
./program/program.h:67:1: note: expected ‘struct gl_program *’ but
argument is of type ‘struct gl_fragment_program *’
  _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint
id);
  ^
state_tracker/st_cb_program.c:116:36: warning: passing argument 1 of
‘_mesa_init_gl_program’ from incompatible pointer type
[-Wincompatible-pointer-types]
return _mesa_init_gl_program(&prog->Base, target, id);
 ^
In file included from state_tracker/st_cb_program.c:38:0:
./program/program.h:67:1: note: expected ‘struct gl_program *’ but
argument is of type ‘struct gl_geometry_program *’
  _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint
id);
  ^
state_tracker/st_cb_program.c:120:36: warning: passing argument 1 of
‘_mesa_init_gl_program’ from incompatible pointer type
[-Wincompatible-pointer-types]
return _mesa_init_gl_program(&prog->Base, target, id);
 ^
In file included from state_tracker/st_cb_program.c:38:0:
./program/program.h:67:1: note: expected ‘struct gl_program *’ but
argument is of type ‘struct gl_tess_ctrl_program *’
  _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint
id);
  ^
state_tracker/st_cb_program.c:124:36: warning: passing argument 1 of
‘_mesa_init_gl_program’ from incompatible pointer type
[-Wincompatible-pointer-types]
return _mesa_init_gl_program(&prog->Base, target, id);
 ^
In file included from state_tracker/st_cb_program.c:38:0:
./program/program.h:67:1: note: expected ‘struct gl_program *’ but
argument is of type ‘struct gl_tess_eval_program *’
  _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint
id);
  ^
Fixes: bcb56c2c6 (program: convert _mesa_init_gl_program to take struct
gl_program *)

Signed-off-by: Boyan Ding 
---
  src/mesa/state_tracker/st_cb_program.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)



I already fixed this but forgot to push the patch yesterday.  I'll do so 
now...


-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()

2015-10-16 Thread Brian Paul

On 10/16/2015 08:17 AM, Brian Paul wrote:

On 10/16/2015 12:36 AM, Michel Dänzer wrote:


Hi Brian,


On 15.10.2015 22:23, Brian Paul wrote:

Module: Mesa
Branch: master
Commit: 0de5e0f3fb0f3671a3ecec6ab4473f9131ecd0ae
URL:
https://urldefense.proofpoint.com/v2/url?u=http-3A__cgit.freedesktop.org_mesa_mesa_commit_-3Fid-3D0de5e0f3fb0f3671a3ecec6ab4473f9131ecd0ae&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8&m=8sayHNUd839r9lIMSeT-XcC7Pgd2pvUFMx8M908uS1E&s=l0dFEwSLVXkyixgaBNadODlBRh4jy81TFLFucNg8BI8&e=


Author: Brian Paul 
Date:   Wed Oct 14 09:08:50 2015 -0600

mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()

Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.


Famous last words. :)

This change broke the piglit tests

spec@arb_vertex_program@clip-plane-transformation arb
spec@glsl-1.10@execution@clipping@clip-plane-transformation fixed

for me with radeonsi and llvmpipe.


Hmm, I don't know how I missed that.  I did a full piglit run with that
whole series of patches and didn't notice this.  I'll look into it.
Thanks for catching it.


I think I'll just have to revert this.  It looks like the state 
tracker's update_clip() isn't getting called at some point and the 
gallium driver doesn't get the updated clip planes.  I suspect my change 
to matrix.c actually exposed a bug elsewhere, but I don't have time to 
hunt it down right now.


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] Added support for NV12 to IYUV conversion in vlVaGetImage

2015-10-16 Thread Christian König
From: Indrajit Das 

Reviewed-by: Christian König 
---
 src/gallium/state_trackers/va/image.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index 3b36430..b37a971 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -240,9 +240,11 @@ vlVaGetImage(VADriverContextP ctx, VASurfaceID surface, 
int x, int y,
   return VA_STATUS_ERROR_OPERATION_FAILED;
 
if (format != surf->buffer->buffer_format) {
-  /* support NV12 to YV12 conversion now only */
-  if (format == PIPE_FORMAT_YV12 &&
-  surf->buffer->buffer_format == PIPE_FORMAT_NV12)
+  /* support NV12 to YV12 and IYUV conversion now only */
+  if ((format == PIPE_FORMAT_YV12 &&
+  surf->buffer->buffer_format == PIPE_FORMAT_NV12) ||
+  (format == PIPE_FORMAT_IYUV &&
+  surf->buffer->buffer_format == PIPE_FORMAT_NV12))
  convert = true;
   else
  return VA_STATUS_ERROR_OPERATION_FAILED;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] Used correct parameter to derive the value of the "h" variable in vlVaCreateImage

2015-10-16 Thread Christian König
From: Indrajit Das 

Reviewed-by: Christian König 
---
 src/gallium/state_trackers/va/image.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index 022240d..3b36430 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -116,7 +116,7 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat 
*format, int width, int heig
img->width = width;
img->height = height;
w = align(width, 2);
-   h = align(width, 2);
+   h = align(height, 2);
 
switch (format->fourcc) {
case VA_FOURCC('N','V','1','2'):
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()

2015-10-16 Thread Brian Paul

On 10/16/2015 12:36 AM, Michel Dänzer wrote:


Hi Brian,


On 15.10.2015 22:23, Brian Paul wrote:

Module: Mesa
Branch: master
Commit: 0de5e0f3fb0f3671a3ecec6ab4473f9131ecd0ae
URL:
https://urldefense.proofpoint.com/v2/url?u=http-3A__cgit.freedesktop.org_mesa_mesa_commit_-3Fid-3D0de5e0f3fb0f3671a3ecec6ab4473f9131ecd0ae&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8&m=8sayHNUd839r9lIMSeT-XcC7Pgd2pvUFMx8M908uS1E&s=l0dFEwSLVXkyixgaBNadODlBRh4jy81TFLFucNg8BI8&e=

Author: Brian Paul 
Date:   Wed Oct 14 09:08:50 2015 -0600

mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()

Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.


Famous last words. :)

This change broke the piglit tests

spec@arb_vertex_program@clip-plane-transformation arb
spec@glsl-1.10@execution@clipping@clip-plane-transformation fixed

for me with radeonsi and llvmpipe.


Hmm, I don't know how I missed that.  I did a full piglit run with that 
whole series of patches and didn't notice this.  I'll look into it. 
Thanks for catching it.


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 1/2] gallium: add renderonly driver

2015-10-16 Thread Thierry Reding
Hi Christian,

First off, thanks for reviving this effort. It's been one of the things
that I've had nagging at me for much too long and I think it needs to be
solved. So I'm hopeful that the more people we get looking at this the
more likely it will be to come up with a solution that works well for
everyone.

That said, I don't agree with the approach you've chosen here. I'll try
to clarify why below.

On Sun, Oct 11, 2015 at 05:09:21PM +0200, Christian Gmeiner wrote:
> This commit adds a generic renderonly driver library, which fullfille
> the requirements for tegra and etnaviv. As a result it is possible to
> run unmodified egl software directly (without any compositor) on
> supported devices.

Technically this isn't a library but rather a midlayer. There's a subtle
difference, but the implications are what concerns me.

Back when I wrote the original driver for Tegra/Nouveau I also looked
into possibilities to make this more generic. Since I know how bad mid-
layers can be (from kernel experience) I shied away from something like
this early on. What I tried to do next was abstract away enough to make
this usable by more than just a single driver. Unfortunately the end
result was that not much could be reused, so drivers ended up still
having to implement all of the pipe_* objects, only to call generic
functions. Most of the code needed in the various callbacks ended up not
being much more than just a single line, so the gains from a helper
library weren't very big.

Another reason why I think this level of abstraction doesn't gain us
much is that we already have a good level of abstraction, which is
Gallium. I realize that implementing only the skeleton for a full
Gallium driver is rather complicated, but that's due to the fact that
graphics drivers are complex beasts.

That said, I think for some areas it might be beneficial to have helpers
to reduce the amount of duplication. However I think at this point in
time we haven't had enough real-world exposure for this kind of driver
to know what the requirements are. For that reason I think it is
premature to use a generic midlayer such as this. Yes, I know that the
alternative is roughly 2000 lines of code per driver, but on one hand
that's nothing compared to the amount of code required by a proper GPU
driver and on the other hand this will (ideally) be temporary until we
get a better picture of where things need to go. At which point it may
become more obvious how we can solve the boilerplate problem while at
the same time avoiding the restrictions imposed by the midlayer.

> In every use case we import a dumb buffer from scanout gpu into
> the renderonly gpu.
> 
> If the scanout hardware does support the used tiling format from the
> renderonly gpu, a driver can define a function which is used to 'setup'
> the needed tiling on that imported buffer. This functions gets called
> during rendertarget resource creation.
> 
> If the scanout hardware does not support the used tiling format we need
> to create an extra rendertarget resource for the renderonly gpu.
> During XXX we blit the renderonly rendertarget onto the imported dumb
> buffer.
> 
> We assume that the renderonly driver provides a blit function that is
> capable of resolving the tilied into untiled one.

I understand that there's a want to eliminate the amount of boilerplate,
but I think this approach of using a midlayer has several flaws. One of
the typical pitfalls with a midlayer such as this is that it has the
potential to grow into an unmaintainable mess. Granted, this currently
doesn't look all that bad, but that's primarily because it supports only
two types of devices. I suspect that the more devices we add, the more
hooks and quirks we'll need. Every combination of GPU and display is
likely going to have their own specialties that need to be handled and
which are beyond simple things like the tiling format.

We also know that there are issues with the current approach (EGL
clients in Weston don't properly display). It's unknown what the reason
for this is and it may require largish changes to the architecture to
fix it.

For all of the above reasons I think it'd be better to live with a
little boilerplate for now and refactor things as they become obvious
candidates for refactoring.

[...]
> diff --git a/src/gallium/drivers/renderonly/renderonly_screen.c 
> b/src/gallium/drivers/renderonly/renderonly_screen.c
[...]
> +static const char *
> +renderonly_get_vendor(struct pipe_screen *pscreen)
> +{
> + return "renderonly";
> +}

I don't think this going to do us much good. Applications may want to
know more precisely what kind of device they're running on and change
behaviour accordingly.

> +static void renderonly_screen_destroy(struct pipe_screen *pscreen)
> +{
> + struct renderonly_screen *screen = to_renderonly_screen(pscreen);
> +
> + screen->gpu->destroy(screen->gpu);
> + free(pscreen);
> +}
> +
> +static int
> +renderonly_screen_get_param(struct pipe_screen *pscree

Re: [Mesa-dev] [RFC 1/2] gallium: add renderonly driver

2015-10-16 Thread Thierry Reding
On Fri, Oct 16, 2015 at 12:09:52AM +0100, Emil Velikov wrote:
> Hi Christian,
> 
> I'm glad to see Thierry's work revived. Hopefully this will soon be
> the basis of many more drivers.
> 
> On 11 October 2015 at 16:09, Christian Gmeiner
>  wrote:
> > This commit adds a generic renderonly driver library, which fullfille
> > the requirements for tegra and etnaviv. As a result it is possible to
> > run unmodified egl software directly (without any compositor) on
> > supported devices.
> >
> > In every use case we import a dumb buffer from scanout gpu into
> > the renderonly gpu.
> >
> > If the scanout hardware does support the used tiling format from the
> > renderonly gpu, a driver can define a function which is used to 'setup'
> > the needed tiling on that imported buffer. This functions gets called
> > during rendertarget resource creation.
> >
> > If the scanout hardware does not support the used tiling format we need
> > to create an extra rendertarget resource for the renderonly gpu.
> > During XXX we blit the renderonly rendertarget onto the imported dumb
> > buffer.
> >
> I'd assume you meant to add something over the XXX here :-P
> 
> But seriously some people might not be too happy with the blit onto
> dumb buffer. Personally I ok, esp. since we don't have anything better
> atm.
> 
> That aside, there are a few minor nitpicks below. With those sorted I
> believe the patch is good to land.

I'd prefer if at least the Tegra part wasn't merged just yet. We know
that it only works in a restricted set of use-cases. So until we figure
it out all the way I don't think it makes sense to have this code in
Mesa yet. I suppose the renderonly part could go in, though I do have
some reservations about the architecture. I'll try to comment on that
in a separate email because this subthread lacks some of the necessary
context.

Thierry


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block

2015-10-16 Thread Iago Toral
On Wed, 2015-10-14 at 21:40 +0300, Francisco Jerez wrote:
> Jordan Justen  writes:
> 
> > On 2015-10-13 22:49:08, Iago Toral wrote:
> >> On Tue, 2015-10-13 at 09:44 -0700, Jordan Justen wrote:
> >> > On 2015-10-13 05:17:37, Francisco Jerez wrote:
> >> > > Iago Toral Quiroga  writes:
> >> > > 
> >> > > > This fixes the following test:
> >> > > >
> >> > > > [require]
> >> > > > GL >= 3.3
> >> > > > GLSL >= 3.30
> >> > > > GL_ARB_shader_storage_buffer_object
> >> > > >
> >> > > > [fragment shader]
> >> > > > #version 330
> >> > > > #extension GL_ARB_shader_storage_buffer_object: require
> >> > > >
> >> > > > buffer SSBO {
> >> > > > mat4 sm4;
> >> > > > };
> >> > > >
> >> > > >
> >> > > > mat4 um4;
> >> > > >
> >> > > > void main() {
> >> > > > sm4 *= um4;
> >> > > 
> >> > > This is using the value of "um4", which is never assigned, so liveness
> >> > > analysis will correctly extend its live interval up to the start of the
> >> > > block.
> >> > 
> >> > This test was derived by simplifying a CTS test case.
> >> > 
> >> > Anyway, I'm not sure what happened on the way to the commit message,
> >> > but um4 should be a uniform.
> >> > 
> >> > http://sprunge.us/cEUe
> >> 
> >> Oh yes, that was me playing around with the example. The patches also
> >> fix the uniform version. Jordan, can you verify if this fixes the CTS
> >> test case?
> >
> > Unfortunately, no. The CTS case has some control flow. I had removed
> > it to minimize the test case.
> >
> > Here is a small shader_test that has control flow and still fails to
> > compile with your patches:
> >
> > http://sprunge.us/LIjA
> >
> >> In any case, since Curro is working on a more general fix for this
> >> stuff, I guess you'd rather wait for his patches...
> >
> > It depends how long we'd have to wait. :) Anyway, since we don't have
> > a short-term fix anyhow, let's wait to see what curro has to say.
> >
> Assuming that at least some of the scalar writes in the shader are being
> introduced by emit_uniformize(), an alternative hack that might get the
> shader to compile for the moment would be to change emit_uniformize() to
> emit a full SIMD-width BROADCAST instruction instead of a scalar one
> (see attachment) -- Which is pretty useless in principle because only
> the first component will ever be used but it might keep dataflow
> analysis from getting confused.

BTW, having a look at dEQP SSBO tests, I see that many of them use CS
and a huge amount of these hit this problem, we are talking close to 700
tests (there are about 2000 SSBO tests in total). It seems that CS
programs hit this problem a lot and this patch that you attached seems
to get all these tests to compile just fine.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix is-renderable check in intel_image_target_renderbuffer_storage

2015-10-16 Thread Tapani Pälli

Tested-by: Tapani Pälli 

On 10/16/2015 12:01 AM, Ian Romanick wrote:

From: Ian Romanick 

Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons.  Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.

There are more checks in brw_render_target_supported, but I don't think
they are necessary here.  A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.

Fixes:

 ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image

Signed-off-by: Ian Romanick 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" 
---
  src/mesa/drivers/dri/i965/intel_fbo.c | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c 
b/src/mesa/drivers/dri/i965/intel_fbo.c
index 5a6b0dd..7f281fa 100644
--- a/src/mesa/drivers/dri/i965/intel_fbo.c
+++ b/src/mesa/drivers/dri/i965/intel_fbo.c
@@ -348,14 +348,10 @@ intel_image_target_renderbuffer_storage(struct gl_context 
*ctx,
 }

 /* __DRIimage is opaque to the core so it has to be checked here */
-   switch (image->format) {
-   case MESA_FORMAT_R8G8B8A8_UNORM:
+   if (!brw->format_supported_as_render_target[image->format]) {
_mesa_error(ctx, GL_INVALID_OPERATION,
  "glEGLImageTargetRenderbufferStorage(unsupported image format");
return;
-  break;
-   default:
-  break;
 }

 irb = intel_renderbuffer(rb);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] glsl_to_tgsi: Use {Num}UniformBlocks instead of {Num}BufferInterfaceBlocks

2015-10-16 Thread Iago Toral Quiroga
The latter holds both UBOs and SSBOs, but here we only want UBOs.
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 06f510d..f481e89 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -5388,10 +5388,10 @@ st_translate_program(
}
 
if (program->shader) {
-  unsigned num_ubos = program->shader->NumBufferInterfaceBlocks;
+  unsigned num_ubos = program->shader->NumUniformBlocks;
 
   for (i = 0; i < num_ubos; i++) {
- unsigned size = 
program->shader->BufferInterfaceBlocks[i].UniformBufferSize;
+ unsigned size = program->shader->UniformBlocks[i]->UniformBufferSize;
  unsigned num_const_vecs = (size + 15) / 16;
  unsigned first, last;
  assert(num_const_vecs > 0);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] i965: Do not use NumBufferInterfaceBlocks

2015-10-16 Thread Iago Toral Quiroga
This is the only place in the driver where we use this. Since we now work
with separate index spaces, always use NumUniformBlocks and
NumShaderStorageBlocks instead of NumBufferInterfaceBlocks to be more
consistent with the rest of the code.
---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index a304eec..6ebe648 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -972,7 +972,7 @@ brw_upload_ubo_surfaces(struct brw_context *brw,
   }
}
 
-   if (shader->NumBufferInterfaceBlocks)
+   if (shader->NumUniformBlocks || shader->NumShaderStorageBlocks)
   brw->ctx.NewDriverState |= BRW_NEW_SURFACES;
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] main: GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH is about UBOS, not SSBOs

2015-10-16 Thread Iago Toral Quiroga
---
 src/mesa/main/shaderapi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 26995ad..18e463d 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -713,10 +713,10 @@ get_programiv(struct gl_context *ctx, GLuint program, 
GLenum pname,
   if (!has_ubo)
  break;
 
-  for (i = 0; i < shProg->NumBufferInterfaceBlocks; i++) {
+  for (i = 0; i < shProg->NumUniformBlocks; i++) {
 /* Add one for the terminating NUL character.
  */
-const GLint len = strlen(shProg->BufferInterfaceBlocks[i].Name) + 1;
+const GLint len = strlen(shProg->UniformBlocks[i]->Name) + 1;
 
 if (len > max_len)
max_len = len;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] st/mesa: Use {Num}UniformBlocks instead of {Num}BufferInterfaceBlocks

2015-10-16 Thread Iago Toral Quiroga
The latter holds both UBOs and SSBOs, but here we only want UBOs.
---
 src/mesa/state_tracker/st_atom_constbuf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom_constbuf.c 
b/src/mesa/state_tracker/st_atom_constbuf.c
index 69e26cb..acaa85d 100644
--- a/src/mesa/state_tracker/st_atom_constbuf.c
+++ b/src/mesa/state_tracker/st_atom_constbuf.c
@@ -234,11 +234,11 @@ static void st_bind_ubos(struct st_context *st,
if (!shader)
   return;
 
-   for (i = 0; i < shader->NumBufferInterfaceBlocks; i++) {
+   for (i = 0; i < shader->NumUniformBlocks; i++) {
   struct gl_uniform_buffer_binding *binding;
   struct st_buffer_object *st_obj;
 
-  binding = 
&st->ctx->UniformBufferBindings[shader->BufferInterfaceBlocks[i].Binding];
+  binding = 
&st->ctx->UniformBufferBindings[shader->UniformBlocks[i]->Binding];
   st_obj = st_buffer_object(binding->BufferObject);
 
   cb.buffer = st_obj->buffer;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] main: Use NumUniformBlocks to count UBOs

2015-10-16 Thread Iago Toral Quiroga
Now that we have separate index spaces for UBOs and SSBOs we do not need
to iterate through BufferInterfaceBlocks any more, we can just take the
UBO count directly from NumUniformBlocks.
---
 src/mesa/main/shaderapi.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 6a2f60d..26995ad 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -729,11 +729,7 @@ get_programiv(struct gl_context *ctx, GLuint program, 
GLenum pname,
   if (!has_ubo)
  break;
 
-  *params = 0;
-  for (unsigned i = 0; i < shProg->NumBufferInterfaceBlocks; i++) {
- if (!shProg->BufferInterfaceBlocks[i].IsShaderStorage)
-(*params)++;
-  }
+  *params = shProg->NumUniformBlocks;
   return;
case GL_PROGRAM_BINARY_RETRIEVABLE_HINT:
   /* This enum isn't part of the OES extension for OpenGL ES 2.0.  It is
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] mesa: Set api prefix to version string when overriding version

2015-10-16 Thread Boyan Ding
2015-10-16 14:36 GMT+08:00 Tapani Pälli :
> Otherwise there are problems when user overrides version and application
> such as Piglit wants to detect used api with glGetString(GL_VERSION).
>
> This makes it currently impossible to run glslparsertest tests for
> OpenGL ES when using version override.
>
> Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.
>
> Before:
> "3.1 Mesa 11.1.0-devel (git-24a1a15)"
>
> After:
> "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"
>
> v2: only include api prefix for OpenGL ES (Boyan Ding)
>
> Signed-off-by: Tapani Pälli 

Cc: "11.0" 

> ---
>  src/mesa/main/version.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c
> index 498b2f8..017bc3a 100644
> --- a/src/mesa/main/version.c
> +++ b/src/mesa/main/version.c
> @@ -24,6 +24,7 @@
>
>
>  #include 
> +#include "context.h"
>  #include "imports.h"
>  #include "mtypes.h"
>  #include "version.h"
> @@ -181,7 +182,15 @@ _mesa_override_gl_version(struct gl_context *ctx)
>  {
> if (_mesa_override_gl_version_contextless(&ctx->Const, &ctx->API,
>   &ctx->Version)) {
> -  create_version_string(ctx, "");
> +  /* We need to include API in version string for OpenGL ES, otherwise
> +   * application can not detect GLES via glGetString(GL_VERSION) query.
> +   *
> +   * From OpenGL ES 3.2 spec, Page 436:
> +   *
> +   * "The VERSION string is laid out as follows:
> +   *  "OpenGL ES N.M vendor-specific information" "
> +   */
> +  create_version_string(ctx, _mesa_is_gles(ctx) ? "OpenGL ES " : "");
> }
>  }
>
> --
> 2.4.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92361] [BSW SKL] Regression: glx@glx-copy-sub-buffer failed

2015-10-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92361

--- Comment #3 from cprigent  ---
New result with last setup (Mesa 11.0.3).:

glx@glx-copy-sub-buffer Not run 
glx@glx-copy-sub-buffer samples=2 Fail
glx@glx-copy-sub-buffer samples=4 Fail
glx@glx-copy-sub-buffer samples=6 Fail
glx@glx-copy-sub-buffer samples=8 Fail


Platform: Braswell M 
CPU : Intel(R) Celeron N3060 1.60GHz @ 1.6 GHz (family: 6, model: 76 stepping:
4)
SoC : BSW D0
QDF : K6XC
CRB : BRASWELL RVP Fab2
Mandatory Reworks : All 
Feature Reworks: F28, F32, F33, F35, F37
Optional reworks : O-01a; O-02, O-03
Software 
Linux distribution: Ubuntu 14.04 LTS 64 bits 
BIOS : BRAS.X64.B084.R00.1508310642
TXE FW : 2.0.0.2073
Ksc : 1.08

Linux distribution: Ubuntu 14.04 LTS 64 bits
kernel 4.3.0-rc5-drm-intel-nightly+ 819f710081d7ea116b9b44a9264061d2c030f009
from git://anongit.freedesktop.org/drm-intel
Mesa - 11.0.3 from http://cgit.freedesktop.org/mesa/mesa/
xf86-video-intel - 2.99.917 from
http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/
Libdrm - 2.4.65 from http://cgit.freedesktop.org/mesa/drm/
Libva - 1.6.1 from http://cgit.freedesktop.org/libva/
vaapi intel-driver - 1.6.1 from http://cgit.freedesktop.org/vaapi/intel-driver
Cairo - 1.14.2 from http://cgit.freedesktop.org/cairo
Xorg Xserver - 1.17.2 from http://cgit.freedesktop.org/xorg/xserver

Kernel commit 819f710081d7ea116b9b44a9264061d2c030f009
Author: Daniel Vetter 
Date:   Wed Oct 14 19:05:17 2015 +0200
drm-intel-nightly: 2015y-10m-14d-17h-04m-36s UTC integration manifest

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] mesa: Set api prefix to version string when overriding version

2015-10-16 Thread Iago Toral
On Fri, 2015-10-16 at 09:36 +0300, Tapani Pälli wrote:
> Otherwise there are problems when user overrides version and application
> such as Piglit wants to detect used api with glGetString(GL_VERSION).
> 
> This makes it currently impossible to run glslparsertest tests for
> OpenGL ES when using version override.
> 
> Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.
> 
> Before:
>   "3.1 Mesa 11.1.0-devel (git-24a1a15)"
> 
> After:
>   "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"
> 
> v2: only include api prefix for OpenGL ES (Boyan Ding)
> 
> Signed-off-by: Tapani Pälli 
> ---
>  src/mesa/main/version.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c
> index 498b2f8..017bc3a 100644
> --- a/src/mesa/main/version.c
> +++ b/src/mesa/main/version.c
> @@ -24,6 +24,7 @@
>  
> 
>  #include 
> +#include "context.h"
>  #include "imports.h"
>  #include "mtypes.h"
>  #include "version.h"
> @@ -181,7 +182,15 @@ _mesa_override_gl_version(struct gl_context *ctx)
>  {
> if (_mesa_override_gl_version_contextless(&ctx->Const, &ctx->API,
>   &ctx->Version)) {
> -  create_version_string(ctx, "");
> +  /* We need to include API in version string for OpenGL ES, otherwise
> +   * application can not detect GLES via glGetString(GL_VERSION) query.
> +   *
> +   * From OpenGL ES 3.2 spec, Page 436:
> +   *
> +   * "The VERSION string is laid out as follows:
> +   *  "OpenGL ES N.M vendor-specific information" "
> +   */

Maybe also include the desktop spec?

From OpenGL 4.5 spec, Page 538:
"The VERSION and SHADING_LANGUAGE_VERSION strings are laid out as
follows:
"

Reviewed-by: Iago Toral Quiroga 

> +  create_version_string(ctx, _mesa_is_gles(ctx) ? "OpenGL ES " : "");
> }
>  }
>  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: fix check SSBOs support for builtin functions

2015-10-16 Thread Samuel Iglesias Gonsálvez


On 16/10/15 09:36, Iago Toral wrote:
> On Fri, 2015-10-16 at 09:10 +0200, Samuel Iglesias Gonsalvez wrote:
>> has_shader_storage_buffer_objects() returns true also if the OpenGL
>> context is 4.30 or ES 3.1.
>>
>> Previously, we were saying that all atomic*() GLSL builtin functions
>> for SSBOs were not available when OpenGL ES 3.1 context was in use.
>>
>> Fixes 48 dEQP-GLES31 tests:
>>
>> dEQP-GLES31.functional.ssbo.atomic.*
> 
> Reviewed-by: Iago Toral Quiroga 
> 
> That said:
> 
>> Signed-off-by: Samuel Iglesias Gonsalvez 
>> ---
>>  src/glsl/builtin_functions.cpp | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
>> index f0f6be2..aae25f8 100644
>> --- a/src/glsl/builtin_functions.cpp
>> +++ b/src/glsl/builtin_functions.cpp
>> @@ -403,7 +403,7 @@ shader_atomic_counters(const _mesa_glsl_parse_state 
>> *state)
>>  static bool
>>  shader_storage_buffer_object(const _mesa_glsl_parse_state *state)
>>  {
>> -   return state->ARB_shader_storage_buffer_object_enable;
> 
> shouldn't we set the above to true with GLES 3.1?
> 

ARB_shader_storage_buffer_object_enable is true when the driver supports
the ARB_shader_storage_buffer_object extension *and* we explicitly
enable the extension inside a GLSL (or GLSL ES) shader:

#extension ARB_shader_storage_buffer_object : enable

Same happens with ARB_shader_storage_buffer_object_warn. See
_mesa_glsl_supported_extensions[] in glsl_parser_extras.cpp for more
details.

When we are under OpenGL 4.3 (or higher) or OpenGL ES 3.1 (or higher)
context, we don't need to explicitly enable the extension in the GLSL
shader as it should be already supported by the driver. That second test
is done in has_shader_storage_buffer_objects().

Sam

>> +   return state->has_shader_storage_buffer_objects();
>>  }
>>  
>>  static bool
> 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-16 Thread Pohjolainen, Topi
On Thu, Oct 15, 2015 at 07:29:31AM -0700, Jason Ekstrand wrote:
>On Oct 14, 2015 10:48 PM, "Pohjolainen, Topi" 
>wrote:
>>
>> On Wed, Oct 14, 2015 at 11:53:37AM -0700, Jason Ekstrand wrote:
>> > On Wed, Oct 14, 2015 at 1:41 AM, Pohjolainen, Topi
>> >  wrote:
>> > > On Wed, Oct 14, 2015 at 11:25:40AM +0300, Pohjolainen, Topi wrote:
>> > >> On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote:
>> > >> > This commit removes all dependence on GL state by getting rid of
>the
>> > >> > brw_context parameter and the GL data structures.
>> > >> >
>> > >> > v2 (Jason Ekstrand):
>> > >> >- Patch use_legacy_snorm_formula through as a function
>argument rather
>> > >> >  than trying to go through the shader key.
>> > >> > ---
>> > >> >  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70
>+-
>> > >> >  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
>> > >> >  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
>> > >> >  3 files changed, 49 insertions(+), 49 deletions(-)
>> > >> >
>> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > >> > index 4b8390f..8e38729 100644
>> > >> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> > >> > @@ -1937,51 +1937,42 @@ extern "C" {
>> > >> >   * Returns the final assembly and the program's size.
>> > >> >   */
>> > >> >  const unsigned *
>> > >> > -brw_vs_emit(struct brw_context *brw,
>> > >> > +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
>> > >> >  void *mem_ctx,
>> > >> >  const struct brw_vs_prog_key *key,
>> > >> >  struct brw_vs_prog_data *prog_data,
>> > >> > -struct gl_vertex_program *vp,
>> > >> > -struct gl_shader_program *prog,
>> > >> > +const nir_shader *shader,
>> > >> > +gl_clip_plane *clip_planes,
>> > >> > +bool use_legacy_snorm_formula,
>> > >> >  int shader_time_index,
>> > >> > -unsigned *final_assembly_size)
>> > >> > +unsigned *final_assembly_size,
>> > >> > +char **error_str)
>> > >> >  {
>> > >> > const unsigned *assembly = NULL;
>> > >> >
>> > >> > -   if (brw->intelScreen->compiler->scalar_vs) {
>> > >> > +   if (compiler->scalar_vs) {
>> > >> >prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
>> > >> >
>> > >> > -  fs_visitor v(brw->intelScreen->compiler, brw,
>> > >> > -   mem_ctx, key, &prog_data->base.base,
>> > >> > +  fs_visitor v(compiler, log_data, mem_ctx, key,
>&prog_data->base.base,
>> > >> > NULL, /* prog; Only used for
>TEXTURE_RECTANGLE on gen < 8 */
>> > >> > -   vp->Base.nir, 8, shader_time_index);
>> > >> > -  if (!v.run_vs(brw_select_clip_planes(&brw->ctx))) {
>> > >> > - if (prog) {
>> > >> > -prog->LinkStatus = false;
>> > >> > -ralloc_strcat(&prog->InfoLog, v.fail_msg);
>> > >> > - }
>> > >> > -
>> > >> > - _mesa_problem(NULL, "Failed to compile vertex shader:
>%s\n",
>> > >> > -   v.fail_msg);
>> > >> > +   shader, 8, shader_time_index);
>> > >> > +  if (!v.run_vs(clip_planes)) {
>> > >> > + if (error_str)
>> > >> > +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
>> > >>
>> > >> I don't particularly like the complexity of the error reporting
>mechanism.
>> > >> First vec4_visitor::fail() uses ralloc_asprintf() to create one
>string, then
>> > >> we make a copy of it here and finally the caller of brw_vs_emit()
>makes yet
>> > >> another copy using ralloc_strcat().
>> > >> I wonder if we could pass the final destination all the way for the
>> > >> vec4_visitor::fail() to augment with ralloc_asprintf() and hence
>avoid all
>> > >
>> > > Or more appropiately using ralloc_asprintf_append()...
>> > >
>> > >> the indirection in the middle. What do you think?
>> >
>> > I'd be moderately ok with just doing "*error_str = v.fail_msg" and
>> > avoiding the extra copy.  I'm not a big fan of the extra copy, but I
>> > decided to leave it in for a couple of reasons
>> >
>> > 1) It only happens on the error path so it's not a big deal.
>>
>> I wasn't concerned about the overhead either, as you said this is error
>path
>> only.
>>
>> >
>> > 2) Not copying it is kind of a layering violation.  You're grabbing a
>> > string from an object without copying it, destroying the object, and
>> > then handing it back to the thing that called you.  The only

Re: [Mesa-dev] [PATCH] glsl: fix check SSBOs support for builtin functions

2015-10-16 Thread Iago Toral
On Fri, 2015-10-16 at 09:10 +0200, Samuel Iglesias Gonsalvez wrote:
> has_shader_storage_buffer_objects() returns true also if the OpenGL
> context is 4.30 or ES 3.1.
> 
> Previously, we were saying that all atomic*() GLSL builtin functions
> for SSBOs were not available when OpenGL ES 3.1 context was in use.
> 
> Fixes 48 dEQP-GLES31 tests:
> 
> dEQP-GLES31.functional.ssbo.atomic.*

Reviewed-by: Iago Toral Quiroga 

That said:

> Signed-off-by: Samuel Iglesias Gonsalvez 
> ---
>  src/glsl/builtin_functions.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
> index f0f6be2..aae25f8 100644
> --- a/src/glsl/builtin_functions.cpp
> +++ b/src/glsl/builtin_functions.cpp
> @@ -403,7 +403,7 @@ shader_atomic_counters(const _mesa_glsl_parse_state 
> *state)
>  static bool
>  shader_storage_buffer_object(const _mesa_glsl_parse_state *state)
>  {
> -   return state->ARB_shader_storage_buffer_object_enable;

shouldn't we set the above to true with GLES 3.1?

> +   return state->has_shader_storage_buffer_objects();
>  }
>  
>  static bool


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >