[Mesa-dev] MSVC, MinGW build break
Hi Rob, Your recent commit "nir: remove dependency on glsl" broke the build for MSVC and MinGW. For MSVC: [...] Linking build\windows-x86-debug\gallium\tests\graw\occlusion-query.exe ... Linking build\windows-x86-debug\gallium\tests\graw\quad-sample.exe ... glsl.lib(loop_controls.obj) : error LNK2001: unresolved external symbol "public: static struct glsl_type const * const glsl_type::bool_type" (?bool_type@glsl_type@@2QBU1@B) glsl.lib(lower_discard_flow.obj) : error LNK2001: unresolved external symbol "public: static struct glsl_type const * const glsl_type::bool_type" (?bool_type@glsl_type@@2QBU1@B) [...] glsl.lib(builtin_functions.obj) : error LNK2001: unresolved external symbol "public: static struct glsl_type const * const glsl_type::float_type" (?float_type@glsl_type@@2QBU1@B) [...] mesa.lib(ff_fragment_shader.obj) : error LNK2001: unresolved external symbol "public: static struct glsl_type const * const glsl_type::sampler2DShadow_type" (?sampler2DShadow_type@glsl_type@@2QBU1@B) [etc. for hundreds more] For MinGW: [...] Linking build/windows-x86-debug/gallium/tests/graw/fs-write-z.exe ... Linking build/windows-x86-debug/gallium/targets/osmesa/osmesa.dll ... Linking build/windows-x86-debug/gallium/targets/libgl-gdi/opengl32.dll ... build/windows-x86-debug/mesa/libmesa.a(ir_to_mesa.o): In function `reladdr_to_temp': /var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:793: undefined reference to `glsl_type::vec4_type' build/windows-x86-debug/mesa/libmesa.a(ir_to_mesa.o): In function `visit': /var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1092: undefined reference to `glsl_type::vec4_type' /var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1115: undefined reference to `glsl_type::vec4_type' /var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1487: undefined reference to `glsl_type::float_type' /var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1497: undefined reference to `glsl_type::float_type' /var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1871: undefined reference to `glsl_type::vec4_type' /var/lib/hudson/jobs/mesa-mingw/workspace/src/mesa/program/ir_to_mesa.cpp:1883: undefined reference to `glsl_type::vec4_type' Hopefully it's something simple to fix. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 shader-db] report.py: rework and update for cycle info
Now that we have three separate things we want to measure (instructions, cycles, and loops), it's impractical to keep adding special code for changes in each thing. Instead, for each program in before and after we store a table of measurement -> value, and when reporting we loop over each measurement and report helped/hurt before reporting the gained/lost programs. v2: - Swap loop count and cycle count to be compatible with older shader-db. - Fix indentation. - Report summary information (hurt/helped) last to make it easier to find. - Squash in second commit to avoid duplicating the list of measurements due to the above. Signed-off-by: Connor Abbott --- report.py | 164 ++ 1 file changed, 89 insertions(+), 75 deletions(-) diff --git a/report.py b/report.py index 4c06714..409474a 100755 --- a/report.py +++ b/report.py @@ -10,17 +10,22 @@ def get_results(filename): results = {} -re_match = re.compile(r"(\S+) - (.S \S+) shader: (\S*) inst, (\S*) loops") +re_match = re.compile(r"(\S+) - (.S \S+) shader: (\S*) inst, (\S*) loops, (\S*) cycles") for line in lines: match = re.search(re_match, line) if match is None: continue groups = match.groups() -count = int(groups[2]) -loop = int(groups[3]) -if count != 0: -results[(groups[0], groups[1])] = count, loop +inst_count = int(groups[2]) +loop_count = int(groups[3]) +cycle_count = int(groups[4]) +if inst_count != 0: +results[(groups[0], groups[1])] = { +"instructions": inst_count, +"cycles": cycle_count, +"loops": loop_count +} return results @@ -43,83 +48,92 @@ def get_result_string(p, b, a): p = p + ' ' return p + change(b, a) +def split_list(string): +return string.split(",") def main(): parser = argparse.ArgumentParser() +parser.add_argument("--measurements", "-m", type=split_list, +default=["instructions", "cycles", "loops"], +help="comma-separated list of measurements to report") parser.add_argument("before", type=get_results, help="the output of the original code") parser.add_argument("after", type=get_results, help="the output of the new code") args = parser.parse_args() -total_before = 0 -total_after = 0 -total_before_loop = 0 -total_after_loop = 0 -affected_before = 0 -affected_after = 0 +total_before = {} +total_after = {} +affected_before = {} +affected_after = {} +num_hurt = {} +num_helped = {} -helped = [] -hurt = [] -lost = [] -gained = [] -loop_change = [] -for p in args.before: -(name, type) = p -namestr = name + " " + type -before_count = args.before[p][0] -before_loop = args.before[p][1] +for m in args.measurements: +total_before[m] = 0 +total_after[m] = 0 +affected_before[m] = 0 +affected_after[m] = 0 + +print m + +helped = [] +hurt = [] +for p in args.before: +before_count = args.before[p][m] + +if args.after.get(p) is None: +continue -if args.after.get(p) is not None: -after_count = args.after[p][0] -after_loop = args.after[p][1] +# If the number of loops changed, then we may have unrolled some +# loops, in which case other measurements will be misleading. +if m != "loops" and args.before[p]["loops"] != args.after[p]["loops"]: +continue -total_before_loop += before_loop -total_after_loop += after_loop +after_count = args.after[p][m] -if before_loop == after_loop: -total_before += before_count -total_after += after_count +total_before[m] += before_count +total_after[m] += after_count if before_count != after_count: -affected_before += before_count -affected_after += after_count +affected_before[m] += before_count +affected_after[m] += after_count -if after_loop != before_loop: -loop_change.append(p); -elif after_count > before_count: +if after_count > before_count: hurt.append(p) else: helped.append(p) -else: -lost.append(namestr) -for p in args.after: -if args.before.get(p) is None: -gained.append(p[0] + " " + p[1]) +helped.sort( +key=lambda k: float(args.before[k][m] - args.after[k][m]) / args.before[k][m]) +for p in helped: +namestr = p[0] + " " + p[1] +print(m + " helpe
[Mesa-dev] [PATCH v2 4/6] i965: dump scheduling cycle estimates
The heuristic we're using is rather lame, since it assumes everything is non-uniform and loops execute 10 times, but it should be enough for measuring improvements in the scheduler that don't result in a change in the number of instructions. v2: - Switch loops and cycle counts to be compatible with older shader-db. - Make loop heuristic 10x to match with spilling code. Signed-off-by: Connor Abbott --- src/mesa/drivers/dri/i965/brw_cfg.h | 4 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 11 ++- .../drivers/dri/i965/brw_schedule_instructions.cpp | 20 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 9 + 4 files changed, 35 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h b/src/mesa/drivers/dri/i965/brw_cfg.h index a094917..d0bdb00 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.h +++ b/src/mesa/drivers/dri/i965/brw_cfg.h @@ -90,6 +90,8 @@ struct bblock_t { struct exec_list parents; struct exec_list children; int num; + + unsigned cycle_count; }; static inline struct backend_instruction * @@ -285,6 +287,8 @@ struct cfg_t { int num_blocks; bool idom_dirty; + + unsigned cycle_count; }; /* Note that this is implemented with a double for loop -- break will diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 17e19cf..3bb0e7d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -2180,9 +2180,9 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) if (unlikely(debug_flag)) { fprintf(stderr, "Native code for %s\n" - "SIMD%d shader: %d instructions. %d loops. %d:%d spills:fills. Promoted %u constants. Compacted %d to %d" + "SIMD%d shader: %d instructions. %d loops. %u cycles. %d:%d spills:fills. Promoted %u constants. Compacted %d to %d" " bytes (%.0f%%)\n", - shader_name, dispatch_width, before_size / 16, loop_count, + shader_name, dispatch_width, before_size / 16, loop_count, cfg->cycle_count, spill_count, fill_count, promoted_constants, before_size, after_size, 100.0f * (before_size - after_size) / before_size); @@ -2192,12 +2192,13 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) } compiler->shader_debug_log(log_data, - "%s SIMD%d shader: %d inst, %d loops, " + "%s SIMD%d shader: %d inst, %d loops, %u cycles, " "%d:%d spills:fills, Promoted %u constants, " "compacted %d to %d bytes.\n", stage_abbrev, dispatch_width, before_size / 16, - loop_count, spill_count, fill_count, - promoted_constants, before_size, after_size); + loop_count, cfg->cycle_count, spill_count, + fill_count, promoted_constants, before_size, + after_size); return start_offset; } diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp index 1652261..e14d041 100644 --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp @@ -1467,6 +1467,24 @@ instruction_scheduler::schedule_instructions(bblock_t *block) if (block->end()->opcode == BRW_OPCODE_NOP) block->end()->remove(block); assert(instructions_to_schedule == 0); + + block->cycle_count = time; +} + +static unsigned get_cycle_count(cfg_t *cfg) +{ + unsigned count = 0, multiplier = 1; + foreach_block(block, cfg) { + if (block->start()->opcode == BRW_OPCODE_DO) + multiplier *= 10; /* assume that loops execute ~10 times */ + + count += block->cycle_count * multiplier; + + if (block->end()->opcode == BRW_OPCODE_WHILE) + multiplier /= 10; + } + + return count; } void @@ -1507,6 +1525,8 @@ instruction_scheduler::run(cfg_t *cfg) post_reg_alloc); bs->dump_instructions(); } + + cfg->cycle_count = get_cycle_count(cfg); } void diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index dcacc90..8c926ec 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1656,10 +1656,10 @@ vec4_generator::generate_code(const cfg_t *cfg) fprintf(stderr, "Native code for %s program %d:\n", stage_name, prog->Id); } - fprintf(stderr, "%s vec4 shader: %d instructions. %d loops. Compacted %d to %d" + fprintf(stderr, "%s vec4 shader: %d instructions. %d loops. %u cycles. Compacted %d to %d"
[Mesa-dev] [PATCH] st/mesa: optimize 4-component ubyte glDrawPixels
If we didn't find a gallium surface format that exactly matched the glDrawPixels format/type combination, we used some other 32-bit packed RGBA format and swizzled the whole image in the mesa texstore/format code. That slow path can be avoided in some common cases by using the pipe_samper_view's swizzle terms to do the swizzling at texture sampling time instead. For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported. In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could be added. v2: fix incorrect swizzle setup (need to invert the tex format's swizzle) --- src/mesa/state_tracker/st_cb_drawpixels.c | 104 +++--- 1 file changed, 95 insertions(+), 9 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index 79fb9ec..000d4f2 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -395,15 +395,35 @@ make_texture(struct st_context *st, * Note that the image is actually going to be upside down in * the texture. We deal with that with texcoords. */ - success = _mesa_texstore(ctx, 2, /* dims */ - baseInternalFormat, /* baseInternalFormat */ - mformat, /* mesa_format */ - transfer->stride, /* dstRowStride, bytes */ - &dest,/* destSlices */ - width, height, 1, /* size */ - format, type, /* src format/type */ - pixels, /* data source */ - unpack); + if ((format == GL_RGBA || format == GL_BGRA) + && type == GL_UNSIGNED_BYTE) { + /* Use a memcpy-based texstore to avoid software pixel swizzling. + * We'll do the necessary swizzling with the pipe_sampler_view to + * give much better performance. + * XXX in the future, expand this to accomodate more format and + * type combinations. + */ + _mesa_memcpy_texture(ctx, 2, + mformat, /* mesa_format */ + transfer->stride, /* dstRowStride, bytes */ + &dest,/* destSlices */ + width, height, 1, /* size */ + format, type, /* src format/type */ + pixels, /* data source */ + unpack); + success = GL_TRUE; + } + else { + success = _mesa_texstore(ctx, 2, /* dims */ + baseInternalFormat, /* baseInternalFormat */ + mformat, /* mesa_format */ + transfer->stride, /* dstRowStride, bytes */ + &dest,/* destSlices */ + width, height, 1, /* size */ + format, type, /* src format/type */ + pixels, /* data source */ + unpack); + } /* unmap */ pipe_transfer_unmap(pipe, transfer); @@ -958,6 +978,69 @@ clamp_size(struct pipe_context *pipe, GLsizei *width, GLsizei *height, /** + * Search the array of 4 swizzle components for the named component and return + * its position. + */ +static int +search_swizzle(const unsigned char swizzle[4], unsigned component) +{ + int i; + for (i = 0; i < 4; i++) { + if (swizzle[i] == component) + return i; + } + assert(!"search_swizzle() failed"); + return 0; +} + + +/** + * Set the sampler view's swizzle terms. This is used to handle RGBA + * swizzling when the incoming image format isn't an exact match for + * the actual texture format. For example, if we have glDrawPixels( + * GL_RGBA, GL_UNSIGNED_BYTE) and we chose the texture format + * PIPE_FORMAT_B8G8R8A8 then we can do use the sampler view swizzle to + * avoid swizzling all the pixels in software in the texstore code. + */ +static void +setup_sampler_swizzle(struct pipe_sampler_view *sv, GLenum format, GLenum type) +{ + if ((format == GL_RGBA || format == GL_BGRA) && type == GL_UNSIGNED_BYTE) { + const struct util_format_description *desc = + util_format_description(sv->texture->format); + unsigned c0, c1, c2, c3; + + /* Every gallium driver supports at least one 32-bit packed RGBA format. + * We must have chosen one for (GL_RGBA, GL_UNSIGNED_BYTE). + */ + assert(desc->block.bits == 32); + + /* invert the format's swizzle to setup the sampler's swizzle */ + if (format == GL_RGBA) { + c0 = UTIL_FORMAT_SWIZZLE_X; + c1 = UTIL_FORMAT_SWIZZLE_Y; + c2 = UTIL_FORMAT_SWIZZLE_Z; +
Re: [Mesa-dev] [PATCH 4/4] st/mesa: optimize 4-component ubyte glDrawPixels
On 10/16/2015 05:53 PM, Jose Fonseca wrote: On 15/10/15 20:01, Brian Paul wrote: If we didn't find a gallium surface format that exactly matched the glDrawPixels format/type combination, we used some other 32-bit packed RGBA format and swizzled the whole image in the mesa texstore/format code. That slow path can be avoided in some common cases by using the pipe_samper_view's swizzle terms to do the swizzling at texture sampling time instead. For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported. In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could be added. --- src/mesa/state_tracker/st_cb_drawpixels.c | 73 +++ 1 file changed, 64 insertions(+), 9 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index 05f6e6b..a135761 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -395,15 +395,35 @@ make_texture(struct st_context *st, * Note that the image is actually going to be upside down in * the texture. We deal with that with texcoords. */ - success = _mesa_texstore(ctx, 2, /* dims */ - baseInternalFormat, /* baseInternalFormat */ - mformat, /* mesa_format */ - transfer->stride, /* dstRowStride, bytes */ - &dest,/* destSlices */ - width, height, 1, /* size */ - format, type, /* src format/type */ - pixels, /* data source */ - unpack); + if ((format == GL_RGBA || format == GL_BGRA) + && type == GL_UNSIGNED_BYTE) { + /* Use a memcpy-based texstore to avoid software pixel swizzling. + * We'll do the necessary swizzling with the pipe_sampler_view to + * give much better performance. + * XXX in the future, expand this to accomodate more format and + * type combinations. + */ + _mesa_memcpy_texture(ctx, 2, + mformat, /* mesa_format */ + transfer->stride, /* dstRowStride, bytes */ + &dest,/* destSlices */ + width, height, 1, /* size */ + format, type, /* src format/type */ + pixels, /* data source */ + unpack); + success = GL_TRUE; + } + else { + success = _mesa_texstore(ctx, 2, /* dims */ + baseInternalFormat, /* baseInternalFormat */ + mformat, /* mesa_format */ + transfer->stride, /* dstRowStride, bytes */ + &dest,/* destSlices */ + width, height, 1, /* size */ + format, type, /* src format/type */ + pixels, /* data source */ + unpack); + } /* unmap */ pipe_transfer_unmap(pipe, transfer); @@ -958,6 +978,38 @@ clamp_size(struct pipe_context *pipe, GLsizei *width, GLsizei *height, /** + * Set the sampler view's swizzle terms. This is used to handle RGBA + * swizzling when the incoming image format isn't an exact match for + * the actual texture format. For example, if we have glDrawPixels( + * GL_RGBA, GL_UNSIGNED_BYTE) and we chose the texture format + * PIPE_FORMAT_B8G8R8A8 then we can do use the sampler view swizzle to + * avoid swizzling all the pixels in software in the texstore code. + */ +static void +setup_sampler_swizzle(struct pipe_sampler_view *sv, GLenum format, GLenum type) +{ + if ((format == GL_RGBA || format == GL_BGRA) && type == GL_UNSIGNED_BYTE) { + const struct util_format_description *desc = + util_format_description(sv->texture->format); + /* Every gallium driver supports at least one 32-bit packed RGBA format. + * We must have chosen one for (GL_RGBA, GL_UNSIGNED_BYTE). + */ + assert(desc->block.bits == 32); + /* use the format's swizzle to setup the sampler swizzle */ + sv->swizzle_r = desc->swizzle[0]; + sv->swizzle_g = desc->swizzle[1]; + sv->swizzle_b = desc->swizzle[2]; + sv->swizzle_a = desc->swizzle[3]; I think it should be the other way around: the sampler view's swizzle should _undo_ the format swizzle, not apply it again. This indeed works for RGBA8_URNOM / BGRA8_UNORM, but by mere coincidence. It will fail for something like ABGR8_UNORM. If you don't want to deal with the swizzle inversion now, it might be better to explicitly check that the texture->form
Re: [Mesa-dev] [PATCH 06/10] vbo: simplify some code in vbo_exec_debug_verts()
On 10/16/2015 08:14 PM, Sinclair Yeh wrote: On Fri, Oct 16, 2015 at 03:25:13PM -0600, Brian Paul wrote: As before, use a new 'last_prim' pointer to simplify things. Plus, add some const qualifiers. --- src/mesa/vbo/vbo_exec_draw.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c index 781991b..412ebb6 100644 --- a/src/mesa/vbo/vbo_exec_draw.c +++ b/src/mesa/vbo/vbo_exec_draw.c @@ -75,13 +75,13 @@ vbo_exec_debug_verts( struct vbo_exec_context *exec ) static GLuint vbo_copy_vertices( struct vbo_exec_context *exec ) { - GLuint nr = exec->vtx.prim[exec->vtx.prim_count-1].count; + struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1]; + const GLuint nr = last_prim->count; GLuint ovf, i; - GLuint sz = exec->vtx.vertex_size; + const GLuint sz = exec->vtx.vertex_size; fi_type *dst = exec->vtx.copied.buffer; - const fi_type *src = (exec->vtx.buffer_map + - exec->vtx.prim[exec->vtx.prim_count-1].start * - exec->vtx.vertex_size); + const fi_type *src = + exec->vtx.buffer_map + last_prim->start * exec->vtx.vertex_size; Can use "sz" here ^ Yeah, and I can use it in one of the later patches too. Plus, I just noticed that the function name in the commit msg is wrong too. Otherwise, looks good to me. Thanks. -Brian switch (exec->ctx->Driver.CurrentExecPrimitive) { case GL_POINTS: @@ -127,7 +127,7 @@ vbo_copy_vertices( struct vbo_exec_context *exec ) case GL_TRIANGLE_STRIP: /* no parity issue, but need to make sure the tri is not drawn twice */ if (nr & 1) { -exec->vtx.prim[exec->vtx.prim_count-1].count--; +last_prim->count--; } /* fallthrough */ case GL_QUAD_STRIP: -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/10] vbo: simplify some code in vbo_exec_debug_verts()
On Fri, Oct 16, 2015 at 03:25:13PM -0600, Brian Paul wrote: > As before, use a new 'last_prim' pointer to simplify things. Plus, add > some const qualifiers. > --- > src/mesa/vbo/vbo_exec_draw.c | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c > index 781991b..412ebb6 100644 > --- a/src/mesa/vbo/vbo_exec_draw.c > +++ b/src/mesa/vbo/vbo_exec_draw.c > @@ -75,13 +75,13 @@ vbo_exec_debug_verts( struct vbo_exec_context *exec ) > static GLuint > vbo_copy_vertices( struct vbo_exec_context *exec ) > { > - GLuint nr = exec->vtx.prim[exec->vtx.prim_count-1].count; > + struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1]; > + const GLuint nr = last_prim->count; > GLuint ovf, i; > - GLuint sz = exec->vtx.vertex_size; > + const GLuint sz = exec->vtx.vertex_size; > fi_type *dst = exec->vtx.copied.buffer; > - const fi_type *src = (exec->vtx.buffer_map + > - exec->vtx.prim[exec->vtx.prim_count-1].start * > - exec->vtx.vertex_size); > + const fi_type *src = > + exec->vtx.buffer_map + last_prim->start * exec->vtx.vertex_size; Can use "sz" here ^ Otherwise, looks good to me. > > switch (exec->ctx->Driver.CurrentExecPrimitive) { > case GL_POINTS: > @@ -127,7 +127,7 @@ vbo_copy_vertices( struct vbo_exec_context *exec ) > case GL_TRIANGLE_STRIP: >/* no parity issue, but need to make sure the tri is not drawn twice */ >if (nr & 1) { > - exec->vtx.prim[exec->vtx.prim_count-1].count--; > + last_prim->count--; >} >/* fallthrough */ > case GL_QUAD_STRIP: > -- > 1.9.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] st/mesa: use MAX3() instead of MAX2(MAX2) in draw_textured_quad()
1 and 3 look good to me. On Thu, Oct 15, 2015 at 01:01:40PM -0600, Brian Paul wrote: > --- > src/mesa/state_tracker/st_cb_drawpixels.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c > b/src/mesa/state_tracker/st_cb_drawpixels.c > index 7e8633e..e4d3580 100644 > --- a/src/mesa/state_tracker/st_cb_drawpixels.c > +++ b/src/mesa/state_tracker/st_cb_drawpixels.c > @@ -667,7 +667,8 @@ draw_textured_quad(struct gl_context *ctx, GLint x, GLint > y, GLfloat z, > /* user textures, plus the drawpix textures */ > if (fpv) { >struct pipe_sampler_view *sampler_views[PIPE_MAX_SAMPLERS]; > - uint num = MAX2(MAX2(fpv->drawpix_sampler, fpv->pixelmap_sampler) + 1, > + uint num = MAX3(fpv->drawpix_sampler + 1, > + fpv->pixelmap_sampler + 1, >st->state.num_sampler_views[PIPE_SHADER_FRAGMENT]); > >memcpy(sampler_views, st->state.sampler_views[PIPE_SHADER_FRAGMENT], > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] mesa: make memcpy_texture() non-static
Reviewed-by: Sinclair Yeh On Thu, Oct 15, 2015 at 01:01:42PM -0600, Brian Paul wrote: > So that we can use it directly from the mesa/gallium state tracker. > --- > src/mesa/main/texstore.c | 40 > src/mesa/main/texstore.h | 11 +++ > 2 files changed, 31 insertions(+), 20 deletions(-) > > diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c > index e50964e..4b13c42 100644 > --- a/src/mesa/main/texstore.c > +++ b/src/mesa/main/texstore.c > @@ -97,16 +97,16 @@ static const GLubyte map_1032[6] = { 1, 0, 3, 2, ZERO, > ONE }; > * No pixel transfer operations or special texel encodings allowed. > * 1D, 2D and 3D images supported. > */ > -static void > -memcpy_texture(struct gl_context *ctx, > -GLuint dimensions, > - mesa_format dstFormat, > - GLint dstRowStride, > - GLubyte **dstSlices, > - GLint srcWidth, GLint srcHeight, GLint srcDepth, > - GLenum srcFormat, GLenum srcType, > - const GLvoid *srcAddr, > - const struct gl_pixelstore_attrib *srcPacking) > +void > +_mesa_memcpy_texture(struct gl_context *ctx, > + GLuint dimensions, > + mesa_format dstFormat, > + GLint dstRowStride, > + GLubyte **dstSlices, > + GLint srcWidth, GLint srcHeight, GLint srcDepth, > + GLenum srcFormat, GLenum srcType, > + const GLvoid *srcAddr, > + const struct gl_pixelstore_attrib *srcPacking) > { > const GLint srcRowStride = _mesa_image_row_stride(srcPacking, srcWidth, > srcFormat, srcType); > @@ -296,11 +296,11 @@ _mesa_texstore_ycbcr(TEXSTORE_PARAMS) > assert(baseInternalFormat == GL_YCBCR_MESA); > > /* always just memcpy since no pixel transfer ops apply */ > - memcpy_texture(ctx, dims, > - dstFormat, > - dstRowStride, dstSlices, > - srcWidth, srcHeight, srcDepth, srcFormat, srcType, > - srcAddr, srcPacking); > + _mesa_memcpy_texture(ctx, dims, > +dstFormat, > +dstRowStride, dstSlices, > +srcWidth, srcHeight, srcDepth, srcFormat, srcType, > +srcAddr, srcPacking); > > /* Check if we need byte swapping */ > /* XXX the logic here _might_ be wrong */ > @@ -899,11 +899,11 @@ _mesa_texstore_memcpy(TEXSTORE_PARAMS) >return GL_FALSE; > } > > - memcpy_texture(ctx, dims, > - dstFormat, > - dstRowStride, dstSlices, > - srcWidth, srcHeight, srcDepth, srcFormat, srcType, > - srcAddr, srcPacking); > + _mesa_memcpy_texture(ctx, dims, > +dstFormat, > +dstRowStride, dstSlices, > +srcWidth, srcHeight, srcDepth, srcFormat, srcType, > +srcAddr, srcPacking); > return GL_TRUE; > } > /** > diff --git a/src/mesa/main/texstore.h b/src/mesa/main/texstore.h > index 2c974f7..f08dc08 100644 > --- a/src/mesa/main/texstore.h > +++ b/src/mesa/main/texstore.h > @@ -74,6 +74,17 @@ _mesa_texstore_needs_transfer_ops(struct gl_context *ctx, >GLenum baseInternalFormat, >mesa_format dstFormat); > > +extern void > +_mesa_memcpy_texture(struct gl_context *ctx, > + GLuint dimensions, > + mesa_format dstFormat, > + GLint dstRowStride, > + GLubyte **dstSlices, > + GLint srcWidth, GLint srcHeight, GLint srcDepth, > + GLenum srcFormat, GLenum srcType, > + const GLvoid *srcAddr, > + const struct gl_pixelstore_attrib *srcPacking); > + > extern GLboolean > _mesa_texstore_can_use_memcpy(struct gl_context *ctx, >GLenum baseInternalFormat, mesa_format > dstFormat, > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: check for out-of-memory in st_DrawPixels()
On 16/10/15 23:24, Brian Paul wrote: Before, if make_texture() or st_create_texture_sampler_view() failed we silently no-op'd the glDrawPixels. Now, set GL_OUT_OF_MEMORY. This also allows us to un-nest a bunch of code. v2: also check if allocation of sv[1] fails, per Jose. --- src/mesa/state_tracker/st_cb_drawpixels.c | 76 ++- 1 file changed, 44 insertions(+), 32 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index e4d3580..79fb9ec 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -975,6 +975,7 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, int num_sampler_view = 1; struct gl_pixelstore_attrib clippedUnpack; struct st_fp_variant *fpv = NULL; + struct pipe_resource *pt; /* Mesa state should be up to date by now */ assert(ctx->NewState == 0x0); @@ -1030,42 +1031,53 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT); } - /* draw with textured quad */ - { - struct pipe_resource *pt - = make_texture(st, width, height, format, type, unpack, pixels); - if (pt) { - sv[0] = st_create_texture_sampler_view(st->pipe, pt); - - if (sv[0]) { -/* Create a second sampler view to read stencil. - * The stencil is written using the shader stencil export - * functionality. */ -if (write_stencil) { - enum pipe_format stencil_format = - util_format_stencil_only(pt->format); - /* we should not be doing pixel map/transfer (see above) */ - assert(num_sampler_view == 1); - sv[1] = st_create_texture_sampler_view_format(st->pipe, pt, - stencil_format); - num_sampler_view++; -} + /* Put glDrawPixels image into a texture */ + pt = make_texture(st, width, height, format, type, unpack, pixels); + if (!pt) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); + return; + } -draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2], - width, height, - ctx->Pixel.ZoomX, ctx->Pixel.ZoomY, - sv, - num_sampler_view, - driver_vp, - driver_fp, fpv, - color, GL_FALSE, write_depth, write_stencil); -pipe_sampler_view_reference(&sv[0], NULL); -if (num_sampler_view > 1) - pipe_sampler_view_reference(&sv[1], NULL); - } + /* create sampler view for the image */ + sv[0] = st_create_texture_sampler_view(st->pipe, pt); + if (!sv[0]) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); + pipe_resource_reference(&pt, NULL); + return; + } + + /* Create a second sampler view to read stencil. The stencil is +* written using the shader stencil export functionality. +*/ + if (write_stencil) { + enum pipe_format stencil_format = + util_format_stencil_only(pt->format); + /* we should not be doing pixel map/transfer (see above) */ + assert(num_sampler_view == 1); + sv[1] = st_create_texture_sampler_view_format(st->pipe, pt, +stencil_format); + if (!sv[1]) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); pipe_resource_reference(&pt, NULL); + pipe_sampler_view_reference(&sv[0], NULL); + return; } + num_sampler_view++; } + + draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2], + width, height, + ctx->Pixel.ZoomX, ctx->Pixel.ZoomY, + sv, + num_sampler_view, + driver_vp, + driver_fp, fpv, + color, GL_FALSE, write_depth, write_stencil); + pipe_sampler_view_reference(&sv[0], NULL); + if (num_sampler_view > 1) + pipe_sampler_view_reference(&sv[1], NULL); + + pipe_resource_reference(&pt, NULL); } Looks good. Patch 1-3 of this series is Reviewed-by: Jose Fonseca ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] st/mesa: optimize 4-component ubyte glDrawPixels
On 15/10/15 20:01, Brian Paul wrote: If we didn't find a gallium surface format that exactly matched the glDrawPixels format/type combination, we used some other 32-bit packed RGBA format and swizzled the whole image in the mesa texstore/format code. That slow path can be avoided in some common cases by using the pipe_samper_view's swizzle terms to do the swizzling at texture sampling time instead. For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported. In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could be added. --- src/mesa/state_tracker/st_cb_drawpixels.c | 73 +++ 1 file changed, 64 insertions(+), 9 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index 05f6e6b..a135761 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -395,15 +395,35 @@ make_texture(struct st_context *st, * Note that the image is actually going to be upside down in * the texture. We deal with that with texcoords. */ - success = _mesa_texstore(ctx, 2, /* dims */ - baseInternalFormat, /* baseInternalFormat */ - mformat, /* mesa_format */ - transfer->stride, /* dstRowStride, bytes */ - &dest,/* destSlices */ - width, height, 1, /* size */ - format, type, /* src format/type */ - pixels, /* data source */ - unpack); + if ((format == GL_RGBA || format == GL_BGRA) + && type == GL_UNSIGNED_BYTE) { + /* Use a memcpy-based texstore to avoid software pixel swizzling. + * We'll do the necessary swizzling with the pipe_sampler_view to + * give much better performance. + * XXX in the future, expand this to accomodate more format and + * type combinations. + */ + _mesa_memcpy_texture(ctx, 2, + mformat, /* mesa_format */ + transfer->stride, /* dstRowStride, bytes */ + &dest,/* destSlices */ + width, height, 1, /* size */ + format, type, /* src format/type */ + pixels, /* data source */ + unpack); + success = GL_TRUE; + } + else { + success = _mesa_texstore(ctx, 2, /* dims */ + baseInternalFormat, /* baseInternalFormat */ + mformat, /* mesa_format */ + transfer->stride, /* dstRowStride, bytes */ + &dest,/* destSlices */ + width, height, 1, /* size */ + format, type, /* src format/type */ + pixels, /* data source */ + unpack); + } /* unmap */ pipe_transfer_unmap(pipe, transfer); @@ -958,6 +978,38 @@ clamp_size(struct pipe_context *pipe, GLsizei *width, GLsizei *height, /** + * Set the sampler view's swizzle terms. This is used to handle RGBA + * swizzling when the incoming image format isn't an exact match for + * the actual texture format. For example, if we have glDrawPixels( + * GL_RGBA, GL_UNSIGNED_BYTE) and we chose the texture format + * PIPE_FORMAT_B8G8R8A8 then we can do use the sampler view swizzle to + * avoid swizzling all the pixels in software in the texstore code. + */ +static void +setup_sampler_swizzle(struct pipe_sampler_view *sv, GLenum format, GLenum type) +{ + if ((format == GL_RGBA || format == GL_BGRA) && type == GL_UNSIGNED_BYTE) { + const struct util_format_description *desc = + util_format_description(sv->texture->format); + /* Every gallium driver supports at least one 32-bit packed RGBA format. + * We must have chosen one for (GL_RGBA, GL_UNSIGNED_BYTE). + */ + assert(desc->block.bits == 32); + /* use the format's swizzle to setup the sampler swizzle */ + sv->swizzle_r = desc->swizzle[0]; + sv->swizzle_g = desc->swizzle[1]; + sv->swizzle_b = desc->swizzle[2]; + sv->swizzle_a = desc->swizzle[3]; I think it should be the other way around: the sampler view's swizzle should _undo_ the format swizzle, not apply it again. This indeed works for RGBA8_URNOM / BGRA8_UNORM, but by mere coincidence. It will fail for something like ABGR8_UNORM. If you don't want to deal with the swizzle inversion now, it might be better to explicitly check that the texture->format is RGBA8_URNOM / BGRA8_UNORM Jos
Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments
On Tue 13 Oct 2015, Ben Widawsky wrote: > The impetus for this patch comes from a seemingly benign statement within the > spec (quoted within the patch). For me, this patch was at some point critical > for getting stable piglit results (though this did not seem to be the case on > a > branch Chad was working on). > > It is very important for clearing multiple color buffer attachments and can be > observed in the following piglit tests: > spec/arb_framebuffer_object/fbo-drawbuffers-none glclear > spec/ext_framebuffer_multisample/blit-multiple-render-targets 0 > > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 > + > 1 file changed, 84 insertions(+), 13 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > index 7bf52f0..9e6711e 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable) > brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM; > } > > +/** > + * Individually fast clear each color buffer attachment. On previous gens > this > + * isn't required. The motivation for this comes from one line (which seems > to > + * be specific to SKL+). The list item is in section titled _MCS Buffer for > + * Render Target(s)_ > + * > + * "Since only one RT is bound with a clear pass, only one RT can be > cleared > + * at a time. To clear multiple RTs, multiple clear passes are required." > + * > + * The code follows the same idea as the resolve code which creates a fake > FBO > + * to avoid interfering with too much of the GL state. > + */ > +static void > +fast_clear_attachments(struct brw_context *brw, > + struct gl_framebuffer *fb, > + uint32_t fast_clear_buffers, > + struct rect fast_clear_rect) > +{ > + assert(brw->gen >= 9); > + struct gl_context *ctx = &brw->ctx; > + > + GLuint old_fb = ctx->DrawBuffer->Name; > + > + for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) { > + struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf]; > + struct intel_renderbuffer *irb = intel_renderbuffer(rb); > + GLuint fbo, rbo; > + int index = fb->_ColorDrawBufferIndexes[buf]; > + > + if (!((1 << index) & fast_clear_buffers)) > + continue; > + > + _mesa_GenFramebuffers(1, &fbo); > + rbo = brw_get_rb_for_slice(brw, irb->mt, 0, 0, false); Hard-coding level=0 layer=0 here makes me uncomfortable. We'll have to fix it later when implementing fast clears for level > 0 and layer 0 on gen >= 8. Any, there's no need to create a new renderbuffer. You should instead extract the existing gl_renderbuffer of gl_texture from gl_framebuffer::Attachments[buf]. If you do that, then there's an added bonus: the hard-coded level=0 layer=0 disappear. > + > + _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); > + _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, > +GL_COLOR_ATTACHMENT0, > +GL_RENDERBUFFER, rbo); > + _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0); There's no need to create a new framebuffer on each iteration of the loop. You should create one framebuffer, before the loop, then reuse it for each iteration. Well... if this function were using real GL, not metainsanity, then there would be no need to create a new renderbuffer and framebuffer for each loop iteration. Meta, due to some insane reason, may require you to create a new framebuffer and renderbuffer, but I doubt it. > + > + brw_fast_clear_init(brw); > + > + use_rectlist(brw, true); > + > + brw_bind_rep_write_shader(brw, (float *) fast_clear_color); > + > + /* SKL+ also has a resolve mode for compressed render targets and thus > more > + * bits to let us select the type of resolve. For fast clear > resolves, it > + * turns out we can use the same value as pre-SKL though. > + */ > + set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE); > + brw_draw_rectlist(ctx, &fast_clear_rect, MAX2(1, fb->MaxNumLayers)); > + set_fast_clear_op(brw, 0); > + use_rectlist(brw, false); > + > + _mesa_DeleteRenderbuffers(1, &rbo); > + _mesa_DeleteFramebuffers(1, &fbo); > + > + /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll > + * resolve them eventually. > + */ > + irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR; > + } > + > + _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb); > +} > + > bool > brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, > GLbitfield buffers, bool partial_clear) > @@ -600,12 +668,27 @@ brw_meta_fast_clear(struct brw_context *brw, struct > gl_framebuffer *fb, > use_rectlist(brw, true); >
Re: [Mesa-dev] [PATCH 2/7] st/va: properly defines VAImageFormat formats and improve VaCreateImage
Not sure how VA specifies things, but if the RGBA8 stuff is supposed to be in CPU-endian as packed 32-bit ints, I think you're meant to use PIPE_FORMAT_RGBA_UNORM and so on. However if it's always supposed to be little-endian or array-based, then the way you have it is fine. -ilia On Fri, Oct 16, 2015 at 7:14 PM, Julien Isorce wrote: > Also add RGBA, RGBX and BGRX. > Also extend ChromaToPipe and implement PipeToYCbCr. > > Note that gstreamer-vaapi check all the VAImageFormat fields. > > Signed-off-by: Julien Isorce > --- > src/gallium/state_trackers/va/image.c | 10 ++-- > src/gallium/state_trackers/va/va_private.h | 38 > +- > 2 files changed, 45 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/state_trackers/va/image.c > b/src/gallium/state_trackers/va/image.c > index 022240d..c7fbe1a 100644 > --- a/src/gallium/state_trackers/va/image.c > +++ b/src/gallium/state_trackers/va/image.c > @@ -44,7 +44,10 @@ static const VAImageFormat > formats[VL_VA_MAX_IMAGE_FORMATS] = > {VA_FOURCC('Y','V','1','2')}, > {VA_FOURCC('Y','U','Y','V')}, > {VA_FOURCC('U','Y','V','Y')}, > - {VA_FOURCC('B','G','R','A')} > + {.fourcc = VA_FOURCC('B','G','R','A'), .byte_order = VA_LSB_FIRST, 32, > 32, 0x00ff, 0xff00, 0x00ff, 0xff00}, > + {.fourcc = VA_FOURCC('R','G','B','A'), .byte_order = VA_LSB_FIRST, 32, > 32, 0x00ff, 0xff00, 0x00ff, 0xff00}, > + {.fourcc = VA_FOURCC('B','G','R','X'), .byte_order = VA_LSB_FIRST, 32, > 24, 0x00ff, 0xff00, 0x00ff, 0x}, > + {.fourcc = VA_FOURCC('R','G','B','X'), .byte_order = VA_LSB_FIRST, 32, > 24, 0x00ff, 0xff00, 0x00ff, 0x} > }; > > static void > @@ -116,7 +119,7 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat > *format, int width, int heig > img->width = width; > img->height = height; > w = align(width, 2); > - h = align(width, 2); > + h = align(height, 2); > > switch (format->fourcc) { > case VA_FOURCC('N','V','1','2'): > @@ -149,6 +152,9 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat > *format, int width, int heig >break; > > case VA_FOURCC('B','G','R','A'): > + case VA_FOURCC('R','G','B','A'): > + case VA_FOURCC('B','G','R','X'): > + case VA_FOURCC('R','G','B','X'): >img->num_planes = 1; >img->pitches[0] = w * 4; >img->offsets[0] = 0; > diff --git a/src/gallium/state_trackers/va/va_private.h > b/src/gallium/state_trackers/va/va_private.h > index 1ea7be7..3479156 100644 > --- a/src/gallium/state_trackers/va/va_private.h > +++ b/src/gallium/state_trackers/va/va_private.h > @@ -46,7 +46,7 @@ > #define VL_VA_DRIVER(ctx) ((vlVaDriver *)ctx->pDriverData) > #define VL_VA_PSCREEN(ctx) (VL_VA_DRIVER(ctx)->vscreen->pscreen) > > -#define VL_VA_MAX_IMAGE_FORMATS 6 > +#define VL_VA_MAX_IMAGE_FORMATS 9 > > static inline enum pipe_video_chroma_format > ChromaToPipe(int format) > @@ -58,6 +58,8 @@ ChromaToPipe(int format) >return PIPE_VIDEO_CHROMA_FORMAT_422; > case VA_RT_FORMAT_YUV444: >return PIPE_VIDEO_CHROMA_FORMAT_444; > + case VA_RT_FORMAT_RGB32: > + return 0; > default: >assert(0); >return PIPE_VIDEO_CHROMA_FORMAT_420; > @@ -80,12 +82,46 @@ YCbCrToPipe(unsigned format) >return PIPE_FORMAT_UYVY; > case VA_FOURCC('B','G','R','A'): >return PIPE_FORMAT_B8G8R8A8_UNORM; > + case VA_FOURCC('R','G','B','A'): > + return PIPE_FORMAT_R8G8B8A8_UNORM; > + case VA_FOURCC('B','G','R','X'): > + return PIPE_FORMAT_B8G8R8X8_UNORM; > + case VA_FOURCC('R','G','B','X'): > + return PIPE_FORMAT_R8G8B8X8_UNORM; > default: >assert(0); >return PIPE_FORMAT_NONE; > } > } > > +static inline unsigned > +PipeToYCbCr(enum pipe_format p_format) > +{ > + switch (p_format) { > + case PIPE_FORMAT_NV12: > + return VA_FOURCC('N','V','1','2'); > + case PIPE_FORMAT_IYUV: > + return VA_FOURCC('I','4','2','0'); > + case PIPE_FORMAT_YV12: > + return VA_FOURCC('Y','V','1','2'); > + case PIPE_FORMAT_UYVY: > + return VA_FOURCC('U','Y','V','Y'); > + case PIPE_FORMAT_YUYV: > + return VA_FOURCC('Y','U','Y','V'); > + case PIPE_FORMAT_B8G8R8A8_UNORM: > + return VA_FOURCC('B','G','R','A'); > + case PIPE_FORMAT_R8G8B8A8_UNORM: > + return VA_FOURCC('R','G','B','A'); > + case PIPE_FORMAT_B8G8R8X8_UNORM: > + return VA_FOURCC('B','G','R','X'); > + case PIPE_FORMAT_R8G8B8X8_UNORM: > + return VA_FOURCC('R','G','B','X'); > + default: > + assert(0); > + return -1; > + } > +} > + > static inline VAProfile > PipeToProfile(enum pipe_video_profile profile) > { > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop
[Mesa-dev] [PATCH 7/7] st/va: add headless support, i.e. VA_DISPLAY_DRM
This patch allows to use gallium vaapi without requiring a X server running for your second graphic card. Signed-off-by: Julien Isorce --- src/gallium/state_trackers/va/Makefile.am | 9 ++ src/gallium/state_trackers/va/context.c | 49 +++ 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/src/gallium/state_trackers/va/Makefile.am b/src/gallium/state_trackers/va/Makefile.am index 2a93a90..348cfe1 100644 --- a/src/gallium/state_trackers/va/Makefile.am +++ b/src/gallium/state_trackers/va/Makefile.am @@ -30,6 +30,15 @@ AM_CFLAGS = \ $(VA_CFLAGS) \ -DVA_DRIVER_INIT_FUNC="__vaDriverInit_$(VA_MAJOR)_$(VA_MINOR)" +AM_CFLAGS += \ + $(GALLIUM_PIPE_LOADER_DEFINES) \ + -DPIPE_SEARCH_DIR=\"$(libdir)/gallium-pipe\" + +if HAVE_GALLIUM_STATIC_TARGETS +AM_CFLAGS += \ + -DGALLIUM_STATIC_TARGETS=1 +endif + AM_CPPFLAGS = \ -I$(top_srcdir)/include diff --git a/src/gallium/state_trackers/va/context.c b/src/gallium/state_trackers/va/context.c index ddc863b..9ab2710 100644 --- a/src/gallium/state_trackers/va/context.c +++ b/src/gallium/state_trackers/va/context.c @@ -28,7 +28,8 @@ #include "pipe/p_screen.h" #include "pipe/p_video_codec.h" - +#include "pipe-loader/pipe_loader.h" +#include "state_tracker/drm_driver.h" #include "util/u_memory.h" #include "util/u_handle_table.h" #include "util/u_video.h" @@ -98,7 +99,7 @@ static struct VADriverVTableVPP vtable_vpp = PUBLIC VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP ctx) { - vlVaDriver *drv; + vlVaDriver *drv = NULL; if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; @@ -107,8 +108,40 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx) if (!drv) return VA_STATUS_ERROR_ALLOCATION_FAILED; - drv->vscreen = vl_screen_create(ctx->native_dpy, ctx->x11_screen); - if (!drv->vscreen) + drv->vscreen = NULL; + + switch (ctx->display_type) { + case VA_DISPLAY_X11: + drv->vscreen = vl_screen_create(ctx->native_dpy, ctx->x11_screen); + if (!drv->vscreen) + goto error_screen; + break; + + case VA_DISPLAY_DRM: + case VA_DISPLAY_DRM_RENDERNODES: { + struct drm_state *drm_info = (struct drm_state *) ctx->drm_state; + if (!drm_info) + goto error_screen; + + drv->vscreen = CALLOC_STRUCT(vl_screen); + +#if GALLIUM_STATIC_TARGETS + drv->vscreen->pscreen = dd_create_screen(drm_info->fd); +#else + int loader_fd = dup(drm_info->fd); + if (loader_fd == -1) + goto error_screen; + + if (pipe_loader_drm_probe_fd(&drv->dev, loader_fd)) + drv->vscreen->pscreen = pipe_loader_create_screen(drv->dev, PIPE_SEARCH_DIR); +#endif + } + break; + default: + goto error_screen; + } + + if (!drv->vscreen->pscreen) goto error_screen; drv->pipe = drv->vscreen->pscreen->context_create(drv->vscreen->pscreen, @@ -146,8 +179,11 @@ error_htab: error_pipe: vl_screen_destroy(drv->vscreen); + drv->vscreen = NULL; error_screen: + if (drv->vscreen) + FREE(drv->vscreen); FREE(drv); return VA_STATUS_ERROR_ALLOCATION_FAILED; } @@ -266,7 +302,10 @@ vlVaTerminate(VADriverContextP ctx) vl_compositor_cleanup_state(&drv->cstate); vl_compositor_cleanup(&drv->compositor); drv->pipe->destroy(drv->pipe); - vl_screen_destroy(drv->vscreen); + if (ctx->display_type == VA_DISPLAY_X11) + vl_screen_destroy(drv->vscreen); + else + FREE(drv->vscreen); handle_table_destroy(drv->htab); FREE(drv); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] st/va: add initial Video Post Processing support
Improve following functions to support VA_PROFILE_NONE profile (vpp): vlVaQueryConfigProfiles vlVaQueryConfigEntrypoints vlVaCreateConfig vlVaQueryConfigAttributes Add VADriverVTableVPP and improve following functions to support vpp: vlVaCreateContext vlVaDestroyContext vlVaBeginPicture vlVaRenderPicture vlVaEndPicture Add handleVAProcPipelineParameterBufferType helper. One of the application is: VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA Signed-off-by: Julien Isorce --- src/gallium/state_trackers/va/config.c | 20 +++ src/gallium/state_trackers/va/context.c| 94 +++--- src/gallium/state_trackers/va/picture.c| 89 +++- src/gallium/state_trackers/va/surface.c| 73 +++ src/gallium/state_trackers/va/va_private.h | 13 - 5 files changed, 254 insertions(+), 35 deletions(-) diff --git a/src/gallium/state_trackers/va/config.c b/src/gallium/state_trackers/va/config.c index cfb0b25..bde6615 100644 --- a/src/gallium/state_trackers/va/config.c +++ b/src/gallium/state_trackers/va/config.c @@ -52,6 +52,9 @@ vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int *num_ profile_list[(*num_profiles)++] = vap; } + /* Support postprocessing through vl_compositor */ + profile_list[(*num_profiles)++] = VAProfileNone; + return VA_STATUS_SUCCESS; } @@ -67,6 +70,11 @@ vlVaQueryConfigEntrypoints(VADriverContextP ctx, VAProfile profile, *num_entrypoints = 0; + if (profile == VAProfileNone) { + entrypoint_list[(*num_entrypoints)++] = VAEntrypointVideoProc; + return VA_STATUS_SUCCESS; + } + p = ProfileToPipe(profile); if (p == PIPE_VIDEO_PROFILE_UNKNOWN) return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; @@ -118,6 +126,11 @@ vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, VAEntrypoint entrypoin if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; + if (profile == VAProfileNone && entrypoint == VAEntrypointVideoProc) { + *config_id = PIPE_VIDEO_PROFILE_UNKNOWN; + return VA_STATUS_SUCCESS; + } + p = ProfileToPipe(profile); if (p == PIPE_VIDEO_PROFILE_UNKNOWN) return VA_STATUS_ERROR_UNSUPPORTED_PROFILE; @@ -151,6 +164,13 @@ vlVaQueryConfigAttributes(VADriverContextP ctx, VAConfigID config_id, VAProfile return VA_STATUS_ERROR_INVALID_CONTEXT; *profile = PipeToProfile(config_id); + + if (config_id == PIPE_VIDEO_PROFILE_UNKNOWN) { + *entrypoint = VAEntrypointVideoProc; + *num_attribs = 0; + return VA_STATUS_SUCCESS; + } + *entrypoint = VAEntrypointVLD; *num_attribs = 1; diff --git a/src/gallium/state_trackers/va/context.c b/src/gallium/state_trackers/va/context.c index 8949d42..ddc863b 100644 --- a/src/gallium/state_trackers/va/context.c +++ b/src/gallium/state_trackers/va/context.c @@ -87,6 +87,14 @@ static struct VADriverVTable vtable = &vlVaQuerySurfaceAttributes }; +static struct VADriverVTableVPP vtable_vpp = +{ + VA_DRIVER_VTABLE_VPP_VERSION, + &vlVaQueryVideoProcFilters, + &vlVaQueryVideoProcFilterCaps, + &vlVaQueryVideoProcPipelineCaps +}; + PUBLIC VAStatus VA_DRIVER_INIT_FUNC(VADriverContextP ctx) { @@ -122,6 +130,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx) ctx->version_major = 0; ctx->version_minor = 1; *ctx->vtable = vtable; + *ctx->vtable_vpp = vtable_vpp; ctx->max_profiles = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH - PIPE_VIDEO_PROFILE_UNKNOWN; ctx->max_entrypoints = 1; ctx->max_attributes = 1; @@ -151,11 +160,16 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID config_id, int picture_width, struct pipe_video_codec templat = {}; vlVaDriver *drv; vlVaContext *context; + int is_vpp = 0; if (!ctx) return VA_STATUS_ERROR_INVALID_CONTEXT; - if (!(picture_width && picture_height)) + is_vpp = config_id == PIPE_VIDEO_PROFILE_UNKNOWN && + picture_width == 0 && picture_height == 0 && flag ==0 && !render_targets + && num_render_targets == 0; + + if (!(picture_width && picture_height) && !is_vpp) return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT; drv = VL_VA_DRIVER(ctx); @@ -163,38 +177,48 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID config_id, int picture_width, if (!context) return VA_STATUS_ERROR_ALLOCATION_FAILED; - templat.profile = config_id; - templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_BITSTREAM; - templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420; - templat.width = picture_width; - templat.height = picture_height; - templat.max_references = num_render_targets; - templat.expect_chunked_decode = true; - - if (u_reduce_video_profile(templat.profile) == - PIPE_VIDEO_FORMAT_MPEG4_AVC) - templat.level = u_get_h264_level(templat.width, templat.height, -&templat.max_references); - - context->decoder = drv->pipe->create_video_codec(drv->pipe, &templat); - if (!context->de
Re: [Mesa-dev] [PATCH 01/10] i965/gen8+: Remove redundant zeroing of surface state
On Tue 13 Oct 2015, Ben Widawsky wrote: > The allocate_surface_state already zeroes out the surface state, and doing it > later in the function is destructive for what we want to accomplish when we > split out support for gen9 fast clears (next patch). > > NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent > to remove the other instances as well. I can make an argument both ways (open > coding it, vs. not). I can rework the next patch if requires. > > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/gen8_surface_state.c | 12 > 1 file changed, 12 deletions(-) Patch 1 is Reviewed-by: Chad Versace ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/7] st/va: implement dmabuf import for VaCreateSurfaces2
For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces. Signed-off-by: Julien Isorce --- src/gallium/state_trackers/va/surface.c| 90 +- src/gallium/state_trackers/va/va_private.h | 1 + 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/va/surface.c b/src/gallium/state_trackers/va/surface.c index be435cb..eb5b8ca 100644 --- a/src/gallium/state_trackers/va/surface.c +++ b/src/gallium/state_trackers/va/surface.c @@ -376,7 +376,8 @@ vlVaQuerySurfaceAttributes(VADriverContextP ctx, VAConfigID config, attribs[i].type = VASurfaceAttribMemoryType; attribs[i].value.type = VAGenericValueTypeInteger; attribs[i].flags = VA_SURFACE_ATTRIB_GETTABLE | VA_SURFACE_ATTRIB_SETTABLE; -attribs[i].value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_VA; +attribs[i].value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_VA | +VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME; i++; attribs[i].type = VASurfaceAttribExternalBufferDescriptor; @@ -410,6 +411,82 @@ vlVaQuerySurfaceAttributes(VADriverContextP ctx, VAConfigID config, return vaStatus; } +static VAStatus +suface_from_external_memory(VADriverContextP ctx, vlVaSurface *surface, +VASurfaceAttribExternalBuffers *memory_attibute, +int index, VASurfaceID *surfaces, +struct pipe_video_buffer *templat) +{ +vlVaDriver *drv = NULL; +struct pipe_screen *pscreen = NULL; +struct pipe_resource *resource = NULL; +struct pipe_resource res_templ; +struct winsys_handle whandle; +struct pipe_resource *resources[VL_NUM_COMPONENTS]; + +if (!ctx) +return VA_STATUS_ERROR_INVALID_PARAMETER; + +pscreen = VL_VA_PSCREEN(ctx); +drv = VL_VA_DRIVER(ctx); + +if (!memory_attibute || !memory_attibute->buffers || index > memory_attibute->num_buffers) +return VA_STATUS_ERROR_INVALID_PARAMETER; + +if (surface->templat.width != memory_attibute->width || +surface->templat.height != memory_attibute->height || +memory_attibute->num_planes < 1) +return VA_STATUS_ERROR_INVALID_PARAMETER; + +switch (memory_attibute->pixel_format) { +case VA_FOURCC_RGBA: +case VA_FOURCC_RGBX: +case VA_FOURCC_BGRA: +case VA_FOURCC_BGRX: +if (memory_attibute->num_planes != 1) +return VA_STATUS_ERROR_INVALID_PARAMETER; +break; +default: +return VA_STATUS_ERROR_INVALID_PARAMETER; +} + +memset(&res_templ, 0, sizeof(res_templ)); +res_templ.target = PIPE_TEXTURE_2D; +res_templ.last_level = 0; +res_templ.depth0 = 1; +res_templ.array_size = 1; +res_templ.width0 = memory_attibute->width; +res_templ.height0 = memory_attibute->height; +res_templ.format = surface->templat.buffer_format; +res_templ.bind = PIPE_BIND_SAMPLER_VIEW; +res_templ.usage = PIPE_USAGE_DEFAULT; + +memset(&whandle, 0, sizeof(struct winsys_handle)); +whandle.type = DRM_API_HANDLE_TYPE_FD; +whandle.handle = memory_attibute->buffers[index]; +whandle.stride = memory_attibute->pitches[index]; + +resource = pscreen->resource_from_handle(pscreen, &res_templ, &whandle); + +if (!resource) + return VA_STATUS_ERROR_ALLOCATION_FAILED; + +memset(resources, 0, sizeof resources); +resources[0] = resource; + +surface->buffer = vl_video_buffer_create_ex2(drv->pipe, templat, resources); +if (!surface->buffer) +return VA_STATUS_ERROR_ALLOCATION_FAILED; + +util_dynarray_init(&surface->subpics); +surfaces[index] = handle_table_add(drv->htab, surface); + +if (!surfaces[index]) + return VA_STATUS_ERROR_ALLOCATION_FAILED; + +return VA_STATUS_SUCCESS; +} + VAStatus vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int format, unsigned int width, unsigned int height, @@ -450,6 +527,9 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int format, switch (attrib_list[i].value.value.i) { case VA_SURFACE_ATTRIB_MEM_TYPE_VA: +case VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME: +memory_type = attrib_list[i].value.value.i; +break; default:{ return VA_STATUS_ERROR_UNSUPPORTED_MEMORY_TYPE;} } @@ -477,6 +557,9 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int format, switch (memory_type) { case VA_SURFACE_ATTRIB_MEM_TYPE_VA: break; +case VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME: +expected_fourcc = memory_attibute->pixel_format; +break; default: assert(0); } @@ -522,6 +605,11 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int format, util_dynarray_init(&surf->subpics); surfaces[i] = handle_table_add(drv->htab, surf); break; +case VA_SURFACE_ATTRIB_MEM_TYPE_D
[Mesa-dev] [PATCH 1/7] nvc0: fix crash when nv50_miptree_from_handle fails
Signed-off-by: Julien Isorce --- src/gallium/drivers/nouveau/nvc0/nvc0_resource.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c index 12b5a02..15c803c 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c @@ -26,7 +26,8 @@ nvc0_resource_from_handle(struct pipe_screen * screen, } else { struct pipe_resource *res = nv50_miptree_from_handle(screen, templ, whandle); - nv04_resource(res)->vtbl = &nvc0_miptree_vtbl; + if (res) + nv04_resource(res)->vtbl = &nvc0_miptree_vtbl; return res; } } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/7] st/va: properly defines VAImageFormat formats and improve VaCreateImage
Also add RGBA, RGBX and BGRX. Also extend ChromaToPipe and implement PipeToYCbCr. Note that gstreamer-vaapi check all the VAImageFormat fields. Signed-off-by: Julien Isorce --- src/gallium/state_trackers/va/image.c | 10 ++-- src/gallium/state_trackers/va/va_private.h | 38 +- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/src/gallium/state_trackers/va/image.c b/src/gallium/state_trackers/va/image.c index 022240d..c7fbe1a 100644 --- a/src/gallium/state_trackers/va/image.c +++ b/src/gallium/state_trackers/va/image.c @@ -44,7 +44,10 @@ static const VAImageFormat formats[VL_VA_MAX_IMAGE_FORMATS] = {VA_FOURCC('Y','V','1','2')}, {VA_FOURCC('Y','U','Y','V')}, {VA_FOURCC('U','Y','V','Y')}, - {VA_FOURCC('B','G','R','A')} + {.fourcc = VA_FOURCC('B','G','R','A'), .byte_order = VA_LSB_FIRST, 32, 32, 0x00ff, 0xff00, 0x00ff, 0xff00}, + {.fourcc = VA_FOURCC('R','G','B','A'), .byte_order = VA_LSB_FIRST, 32, 32, 0x00ff, 0xff00, 0x00ff, 0xff00}, + {.fourcc = VA_FOURCC('B','G','R','X'), .byte_order = VA_LSB_FIRST, 32, 24, 0x00ff, 0xff00, 0x00ff, 0x}, + {.fourcc = VA_FOURCC('R','G','B','X'), .byte_order = VA_LSB_FIRST, 32, 24, 0x00ff, 0xff00, 0x00ff, 0x} }; static void @@ -116,7 +119,7 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat *format, int width, int heig img->width = width; img->height = height; w = align(width, 2); - h = align(width, 2); + h = align(height, 2); switch (format->fourcc) { case VA_FOURCC('N','V','1','2'): @@ -149,6 +152,9 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat *format, int width, int heig break; case VA_FOURCC('B','G','R','A'): + case VA_FOURCC('R','G','B','A'): + case VA_FOURCC('B','G','R','X'): + case VA_FOURCC('R','G','B','X'): img->num_planes = 1; img->pitches[0] = w * 4; img->offsets[0] = 0; diff --git a/src/gallium/state_trackers/va/va_private.h b/src/gallium/state_trackers/va/va_private.h index 1ea7be7..3479156 100644 --- a/src/gallium/state_trackers/va/va_private.h +++ b/src/gallium/state_trackers/va/va_private.h @@ -46,7 +46,7 @@ #define VL_VA_DRIVER(ctx) ((vlVaDriver *)ctx->pDriverData) #define VL_VA_PSCREEN(ctx) (VL_VA_DRIVER(ctx)->vscreen->pscreen) -#define VL_VA_MAX_IMAGE_FORMATS 6 +#define VL_VA_MAX_IMAGE_FORMATS 9 static inline enum pipe_video_chroma_format ChromaToPipe(int format) @@ -58,6 +58,8 @@ ChromaToPipe(int format) return PIPE_VIDEO_CHROMA_FORMAT_422; case VA_RT_FORMAT_YUV444: return PIPE_VIDEO_CHROMA_FORMAT_444; + case VA_RT_FORMAT_RGB32: + return 0; default: assert(0); return PIPE_VIDEO_CHROMA_FORMAT_420; @@ -80,12 +82,46 @@ YCbCrToPipe(unsigned format) return PIPE_FORMAT_UYVY; case VA_FOURCC('B','G','R','A'): return PIPE_FORMAT_B8G8R8A8_UNORM; + case VA_FOURCC('R','G','B','A'): + return PIPE_FORMAT_R8G8B8A8_UNORM; + case VA_FOURCC('B','G','R','X'): + return PIPE_FORMAT_B8G8R8X8_UNORM; + case VA_FOURCC('R','G','B','X'): + return PIPE_FORMAT_R8G8B8X8_UNORM; default: assert(0); return PIPE_FORMAT_NONE; } } +static inline unsigned +PipeToYCbCr(enum pipe_format p_format) +{ + switch (p_format) { + case PIPE_FORMAT_NV12: + return VA_FOURCC('N','V','1','2'); + case PIPE_FORMAT_IYUV: + return VA_FOURCC('I','4','2','0'); + case PIPE_FORMAT_YV12: + return VA_FOURCC('Y','V','1','2'); + case PIPE_FORMAT_UYVY: + return VA_FOURCC('U','Y','V','Y'); + case PIPE_FORMAT_YUYV: + return VA_FOURCC('Y','U','Y','V'); + case PIPE_FORMAT_B8G8R8A8_UNORM: + return VA_FOURCC('B','G','R','A'); + case PIPE_FORMAT_R8G8B8A8_UNORM: + return VA_FOURCC('R','G','B','A'); + case PIPE_FORMAT_B8G8R8X8_UNORM: + return VA_FOURCC('B','G','R','X'); + case PIPE_FORMAT_R8G8B8X8_UNORM: + return VA_FOURCC('R','G','B','X'); + default: + assert(0); + return -1; + } +} + static inline VAProfile PipeToProfile(enum pipe_video_profile profile) { -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] i965/meta: Assert fast clears and rep clears never overlap
On Tue 13 Oct 2015, Ben Widawsky wrote: > There is nothing wrong with the code today, but as one modifies the code it > turns out to be not too difficult to mess up the code, and this easy assertion > should catch such driver implementation failures quickly. > > Cc: Kristian Høgsberg > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 2 ++ > 1 file changed, 2 insertions(+) Patch 8 is Reviewed-by: Chad Versace ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] st/va: in VaPutImage only destroy previous buffer if pipe->create_video_buffer succeeds
If formats are not the same it seems to re-create the video buffer with the right format. But if the creation of this new video buffer fails the surface loose its video buffer. Let's just destroy the previous buffer on success. Signed-off-by: Julien Isorce --- src/gallium/state_trackers/va/image.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/src/gallium/state_trackers/va/image.c b/src/gallium/state_trackers/va/image.c index c7fbe1a..d3b0cd5 100644 --- a/src/gallium/state_trackers/va/image.c +++ b/src/gallium/state_trackers/va/image.c @@ -338,13 +338,22 @@ vlVaPutImage(VADriverContextP ctx, VASurfaceID surface, VAImageID image, if (format == PIPE_FORMAT_NONE) return VA_STATUS_ERROR_OPERATION_FAILED; - if (surf->buffer == NULL || format != surf->buffer->buffer_format) { + if (format != surf->buffer->buffer_format) { + struct pipe_video_buffer *tmp_buf = NULL; + enum pipe_format old_surf_format = surf->templat.buffer_format; + + surf->templat.buffer_format = format; + tmp_buf = drv->pipe->create_video_buffer(drv->pipe, &surf->templat); + + if (!tmp_buf) { + surf->templat.buffer_format = old_surf_format; + return VA_STATUS_ERROR_ALLOCATION_FAILED; + } + if (surf->buffer) surf->buffer->destroy(surf->buffer); - surf->templat.buffer_format = format; - surf->buffer = drv->pipe->create_video_buffer(drv->pipe, &surf->templat); - if (!surf->buffer) - return VA_STATUS_ERROR_ALLOCATION_FAILED; + + surf->buffer = tmp_buf; } views = surf->buffer->get_sampler_view_planes(surf->buffer); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes
Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/tree/src/i965_drv_video.c Signed-off-by: Julien Isorce --- src/gallium/state_trackers/va/context.c| 5 +- src/gallium/state_trackers/va/surface.c| 288 - src/gallium/state_trackers/va/va_private.h | 7 + 3 files changed, 249 insertions(+), 51 deletions(-) diff --git a/src/gallium/state_trackers/va/context.c b/src/gallium/state_trackers/va/context.c index 8b003ae..8949d42 100644 --- a/src/gallium/state_trackers/va/context.c +++ b/src/gallium/state_trackers/va/context.c @@ -81,7 +81,10 @@ static struct VADriverVTable vtable = &vlVaSetDisplayAttributes, &vlVaBufferInfo, &vlVaLockSurface, - &vlVaUnlockSurface + &vlVaUnlockSurface, + &vlVaGetSurfaceAttributes, + &vlVaCreateSurfaces2, + &vlVaQuerySurfaceAttributes }; PUBLIC VAStatus diff --git a/src/gallium/state_trackers/va/surface.c b/src/gallium/state_trackers/va/surface.c index 8d4487b..be435cb 100644 --- a/src/gallium/state_trackers/va/surface.c +++ b/src/gallium/state_trackers/va/surface.c @@ -29,6 +29,8 @@ #include "pipe/p_screen.h" #include "pipe/p_video_codec.h" +#include "state_tracker/drm_driver.h" + #include "util/u_memory.h" #include "util/u_handle_table.h" #include "util/u_rect.h" @@ -36,6 +38,7 @@ #include "util/u_surface.h" #include "vl/vl_compositor.h" +#include "vl/vl_video_buffer.h" #include "vl/vl_winsys.h" #include "va_private.h" @@ -44,56 +47,8 @@ VAStatus vlVaCreateSurfaces(VADriverContextP ctx, int width, int height, int format, int num_surfaces, VASurfaceID *surfaces) { - struct pipe_video_buffer templat = {}; - struct pipe_screen *pscreen; - vlVaDriver *drv; - int i; - - if (!ctx) - return VA_STATUS_ERROR_INVALID_CONTEXT; - - if (!(width && height)) - return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT; - - drv = VL_VA_DRIVER(ctx); - pscreen = VL_VA_PSCREEN(ctx); - - templat.buffer_format = pscreen->get_video_param - ( - pscreen, - PIPE_VIDEO_PROFILE_UNKNOWN, - PIPE_VIDEO_ENTRYPOINT_BITSTREAM, - PIPE_VIDEO_CAP_PREFERED_FORMAT - ); - templat.chroma_format = ChromaToPipe(format); - templat.width = width; - templat.height = height; - templat.interlaced = pscreen->get_video_param - ( - pscreen, - PIPE_VIDEO_PROFILE_UNKNOWN, - PIPE_VIDEO_ENTRYPOINT_BITSTREAM, - PIPE_VIDEO_CAP_PREFERS_INTERLACED - ); - - for (i = 0; i < num_surfaces; ++i) { - vlVaSurface *surf = CALLOC(1, sizeof(vlVaSurface)); - if (!surf) - goto no_res; - - surf->templat = templat; - surf->buffer = drv->pipe->create_video_buffer(drv->pipe, &templat); - util_dynarray_init(&surf->subpics); - surfaces[i] = handle_table_add(drv->htab, surf); - } - - return VA_STATUS_SUCCESS; - -no_res: - if (i) - vlVaDestroySurfaces(ctx, surfaces, i); - - return VA_STATUS_ERROR_ALLOCATION_FAILED; +return vlVaCreateSurfaces2(ctx, format, width, height, surfaces, num_surfaces, + NULL, 0); } VAStatus @@ -349,3 +304,236 @@ vlVaUnlockSurface(VADriverContextP ctx, VASurfaceID surface) return VA_STATUS_ERROR_UNIMPLEMENTED; } + +VAStatus +vlVaGetSurfaceAttributes(VADriverContextP ctx, VAConfigID config, + VASurfaceAttrib *attrib_list, unsigned int num_attribs) +{ +return VA_STATUS_ERROR_UNIMPLEMENTED; /* DEPRECATED */ +} + +VAStatus +vlVaQuerySurfaceAttributes(VADriverContextP ctx, VAConfigID config, + VASurfaceAttrib *attrib_list, unsigned int *num_attribs) +{ +VAStatus vaStatus = VA_STATUS_SUCCESS; +vlVaDriver *drv = NULL; +VASurfaceAttrib *attribs = NULL; +struct pipe_screen *pscreen = NULL; +int i = 0; + +if (config == VA_INVALID_ID) +return VA_STATUS_ERROR_INVALID_CONFIG; + +if (!attrib_list && !num_attribs) +return VA_STATUS_ERROR_INVALID_PARAMETER; + +if (attrib_list == NULL) { +*num_attribs = VASurfaceAttribCount; +return VA_STATUS_SUCCESS; +} + +attribs = CALLOC(VASurfaceAttribCount, sizeof(VASurfaceAttrib)); + +if (attribs == NULL) +return VA_STATUS_ERROR_ALLOCATION_FAILED; + +if (!ctx) + return VA_STATUS_ERROR_INVALID_CONTEXT; + +drv = VL_VA_DRIVER(ctx); + +if (!drv) +return VA_STATUS_ERROR_INVALID_CONTEXT; + +pscreen = VL_VA_PSCREEN(ctx); + +if (!pscreen) + return VA_STATUS_ERROR_INVALID_CONTEXT; + +if (config == PIPE_VIDEO_PROFILE_UNKNOWN) { + /* Assume VAEntrypointVideoProc for now. */ + attribs[i].type = VASurfaceAttribPixelFormat; + attribs[i].value.type = VAGenericValueTypeInteger; + attribs[i].flags = VA_SURFACE_ATTRIB_GETTABLE | VA_SURFACE_ATTRIB_SETTABLE; + attribs[i].value.value.i = VA_FOURCC_BGRA; + i++; + + attribs[i].type = VASurfaceAttribPixelFormat; + attribs[i].value.type = VAGenericValueTypeInteger; +
[Mesa-dev] [PATCH 0/7] st/va: VPP, dmabuf import and headless.
This patch serie adds initial support for Video Post Processing. It also implements VaCreateSurfaces2 for common purpose and also to import a dmabuf. Finally it adds support for headless mode, i.e. using DRM instead of X11 for device setup. Julien Isorce (7): nvc0: fix crash when nv50_miptree_from_handle fails st/va: properly defines VAImageFormat formats and improve VaCreateImage st/va: in VaPutImage only destroy previous buffer if pipe->create_video_buffer succeeds st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes st/va: implement dmabuf import for VaCreateSurfaces2 st/va: add initial Video Post Processing support st/va: add headless support, i.e. VA_DISPLAY_DRM src/gallium/drivers/nouveau/nvc0/nvc0_resource.c | 3 +- src/gallium/state_trackers/va/Makefile.am| 9 + src/gallium/state_trackers/va/config.c | 20 + src/gallium/state_trackers/va/context.c | 148 ++-- src/gallium/state_trackers/va/image.c| 29 +- src/gallium/state_trackers/va/picture.c | 89 - src/gallium/state_trackers/va/surface.c | 449 --- src/gallium/state_trackers/va/va_private.h | 59 ++- 8 files changed, 706 insertions(+), 100 deletions(-) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/10] i965/skl: Enable fast color clears on SKL
But this patch doesn't enable fast clears! The reverts in pathches 6 and 7 need to be folded into this patch, otherwise the patch does not do what it claims. Also, you can't enable fast clears before patches 4 and 5 without introducing regressions. Patches 4 and 5 must precede this patch. On Tue 13 Oct 2015, Ben Widawsky wrote: > Based on a patch originally from Kristian. Skylake has extended capabilities > with regard to fast clears, but that is saved for another patch. > > The same effect could be acheived with the following, however I think the way > I've done it is more in line with how the docs explain it. > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw, >/* In release builds, fall through */ > case I915_TILING_Y: >*width_px = 32 / mt->cpp; > - *height = 4; > + if (brw->gen >= 9) > + *height = 2; > + else > + *height = 4; > > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 54 > + > src/mesa/drivers/dri/i965/gen8_surface_state.c | 34 > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 + > src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 7 +++- > 4 files changed, 78 insertions(+), 26 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > index fbde3f0..7bf52f0 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > @@ -204,7 +204,7 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect > *rect, int num_instances) > } > > static void > -get_fast_clear_rect(struct gl_framebuffer *fb, > +get_fast_clear_rect(struct brw_context *brw, struct gl_framebuffer *fb, > struct intel_renderbuffer *irb, struct rect *rect) > { > unsigned int x_align, y_align; > @@ -228,7 +228,14 @@ get_fast_clear_rect(struct gl_framebuffer *fb, > */ >intel_get_non_msrt_mcs_alignment(irb->mt, &x_align, &y_align); >x_align *= 16; > - y_align *= 32; > + > + /* SKL+ line alignment requirement for Y-tiled are half those of the > prior > + * generations. > + */ > + if (brw->gen >= 9) > + y_align *= 16; > + else > + y_align *= 32; > >/* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render > * Target(s)", beneath the "Fast Color Clear" bullet (p327): > @@ -265,8 +272,10 @@ get_fast_clear_rect(struct gl_framebuffer *fb, > * terms of (width,height) of the RT. > * > * MSAA Width of Clear Rect Height of Clear Rect > + * 2X Ceil(1/8*width) Ceil(1/2*height) > * 4X Ceil(1/8*width) Ceil(1/2*height) > * 8X Ceil(1/2*width) Ceil(1/2*height) > + * 16X widthCeil(1/2*height) > * > * The text "with upper left co-ordinate to coincide with actual > * rectangle being cleared" is a little confusing--it seems to imply > @@ -289,6 +298,9 @@ get_fast_clear_rect(struct gl_framebuffer *fb, >case 8: > x_scaledown = 2; > break; > + case 16: > + x_scaledown = 1; > + break; >default: > unreachable("Unexpected sample count for fast clear"); >} > @@ -358,18 +370,24 @@ is_color_fast_clear_compatible(struct brw_context *brw, > > /** > * Convert the given color to a bitfield suitable for ORing into DWORD 7 of > - * SURFACE_STATE. > + * SURFACE_STATE (DWORD 12-15 on SKL+). > */ > -static uint32_t > -compute_fast_clear_color_bits(const union gl_color_union *color) > +static void > +set_fast_clear_color(struct brw_context *brw, > + struct intel_mipmap_tree *mt, > + const union gl_color_union *color) > { > - uint32_t bits = 0; > - for (int i = 0; i < 4; i++) { > - /* Testing for non-0 works for integer and float colors */ > - if (color->f[i] != 0.0f) > - bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); > + if (brw->gen >= 9) { > + mt->gen9_fast_clear_color = *color; > + } else { > + mt->fast_clear_color_value = 0; > + for (int i = 0; i < 4; i++) { > + /* Testing for non-0 works for integer and float colors */ > + if (color->f[i] != 0.0f) > + mt->fast_clear_color_value |= > +1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); Please put braces round the multi-line if-statement. > + } > } > - return bits; > } > > static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 }; > @@ -504,8 +522,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct > gl_framebuffer *fb, > >switch (clear_type) { >case FAST_CLEAR: > - irb->
Re: [Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats
On Tue 13 Oct 2015, Ben Widawsky wrote: > Initially I had this planned as a patch to be squashed in to the enabling > patch > because there is no point enabling fast clears without this. However, Chad > merged a patch which disables fast clears on gen9 explicitly, and so I can > hide > this behind the revert of that patch. This is a nice I really wanted this > patch > as a distinct patch for review. This is a new, weird, and poorly documented > restriction for SKL. (In fact, I am still not 100% certain the restriction is > entirely necessary, but there are around 30 piglit regressions without this). > > SKL adds compressible render targets and as a result mutates some of the > programming for fast clears and resolves. There is a new internal surface type > called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary surface > is > a CCS (Color Control Surface) with compression disabled or an MCS with > compression enabled, depending on number of multisamples. MCS (Multisample > Control Surface) is a special type of CCS." > > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_context.h | 1 + > src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 > + > src/mesa/drivers/dri/i965/gen8_surface_state.c | 8 ++-- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 3 +++ > 4 files changed, 37 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index e59478a..32b8250 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1546,6 +1546,7 @@ struct brw_context > > uint32_t render_target_format[MESA_FORMAT_COUNT]; > bool format_supported_as_render_target[MESA_FORMAT_COUNT]; > + bool losslessly_compressable[MESA_FORMAT_COUNT]; I agree with Neil. It's a shame to increase the context size for static information. And there is already a static array in brw_surface_formats.c for exactly this type of information. > > /* Interpolation modes, one byte per vue slot. > * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+. > diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c > b/src/mesa/drivers/dri/i965/brw_surface_formats.c > index 97fff60..d706ecc 100644 > --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c > +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c > @@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw) >} > } > > + if (brw->gen >= 9) { > + brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true; > + brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true; > + brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true; > + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true; > + brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true; > + brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true; > + } Properties of surface formats should go into the monster table that occurs earlier in the file. Then you can replace brw_context::losslessly_compressable with a query and keep brw_context at its current size. > + > /* We will check this table for FBO completeness, but the surface format > * table above only covered color rendering. > */ > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c > b/src/mesa/drivers/dri/i965/gen8_surface_state.c > index 995b4dd..b19b492 100644 > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c > @@ -243,8 +243,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw, > * "When Auxiliary Surface Mode i
[Mesa-dev] [PATCH] r600g: Pass conservative depth parameters to hw
Supported on R700 and up. Signed-off-by: Glenn Kennard --- Not exactly a commonly used extension, but might as well set the hardware registers rather than just dropping the hint on the floor. src/gallium/drivers/r600/evergreen_state.c | 13 + src/gallium/drivers/r600/evergreend.h | 7 +++ src/gallium/drivers/r600/r600_shader.c | 1 + src/gallium/drivers/r600/r600_shader.h | 2 ++ src/gallium/drivers/r600/r600_state.c | 15 +++ src/gallium/drivers/r600/r600d.h | 8 6 files changed, 46 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index c6702a9..96c6b11 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2940,6 +2940,19 @@ void evergreen_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export); db_shader_control |= S_02880C_MASK_EXPORT_ENABLE(mask_export); + switch (rshader->ps_conservative_z) { + default: /* fall through */ + case TGSI_FS_DEPTH_LAYOUT_ANY: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z); + break; + case TGSI_FS_DEPTH_LAYOUT_GREATER: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_GREATER_THAN_Z); + break; + case TGSI_FS_DEPTH_LAYOUT_LESS: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_LESS_THAN_Z); + break; + } + exports_ps = 0; for (i = 0; i < rshader->noutput; i++) { if (rshader->output[i].name == TGSI_SEMANTIC_POSITION || diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 937ffcb..a9a65f7 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -815,6 +815,13 @@ #define V_02880C_EXPORT_DB_FOUR16 0x01 #define V_02880C_EXPORT_DB_TWO 0x02 #define S_02880C_ALPHA_TO_MASK_DISABLE(x)(((x) & 0x1) << 12) +#define S_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) & 0x03) << 16) +#define G_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) >> 16) & 0x03) +#define C_02880C_CONSERVATIVE_Z_EXPORT 0xFFFC +#define V_02880C_EXPORT_ANY_Z 0 +#define V_02880C_EXPORT_LESS_THAN_Z1 +#define V_02880C_EXPORT_GREATER_THAN_Z 2 +#define V_02880C_EXPORT_RESERVED 3 #define R_028A00_PA_SU_POINT_SIZE0x028A00 #define S_028A00_HEIGHT(x) (((x) & 0x) << 0) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 8efe902..560696d 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -2048,6 +2048,7 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, shader->fs_write_all = ctx.info.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS]; shader->vs_position_window_space = ctx.info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION]; + shader->ps_conservative_z = ctx.info.properties[TGSI_PROPERTY_FS_DEPTH_LAYOUT]; if (shader->vs_as_gs_a) vs_add_primid_output(&ctx, key.vs.prim_id_out); diff --git a/src/gallium/drivers/r600/r600_shader.h b/src/gallium/drivers/r600/r600_shader.h index c240e71..e085263 100644 --- a/src/gallium/drivers/r600/r600_shader.h +++ b/src/gallium/drivers/r600/r600_shader.h @@ -76,6 +76,8 @@ struct r600_shader { boolean uses_tex_buffers; boolean gs_prim_id_input; + unsignedps_conservative_z; + /* Size in bytes of a data item in the ring(s) (single vertex data). Stages with only one ring items 123 will be set to 0. */ unsignedring_item_sizes[4]; diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 1be3e1b..09b2325 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2533,6 +2533,21 @@ void r600_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *sha if (rshader->uses_kill) db_shader_control |= S_02880C_KILL_ENABLE(1); + if (rctx->b.chip_class >= R700) { + switch (rshader->ps_conservative_z) { + default: /* fall through */ + case TGSI_FS_DEPTH_LAYOUT_ANY: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z); + break; + case TGSI_FS_DEPTH_LAYOUT_GREATER: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_GREATER_THAN_Z); +
[Mesa-dev] [PATCH] st/mesa: check for out-of-memory in st_DrawPixels()
Before, if make_texture() or st_create_texture_sampler_view() failed we silently no-op'd the glDrawPixels. Now, set GL_OUT_OF_MEMORY. This also allows us to un-nest a bunch of code. v2: also check if allocation of sv[1] fails, per Jose. --- src/mesa/state_tracker/st_cb_drawpixels.c | 76 ++- 1 file changed, 44 insertions(+), 32 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index e4d3580..79fb9ec 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -975,6 +975,7 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, int num_sampler_view = 1; struct gl_pixelstore_attrib clippedUnpack; struct st_fp_variant *fpv = NULL; + struct pipe_resource *pt; /* Mesa state should be up to date by now */ assert(ctx->NewState == 0x0); @@ -1030,42 +1031,53 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT); } - /* draw with textured quad */ - { - struct pipe_resource *pt - = make_texture(st, width, height, format, type, unpack, pixels); - if (pt) { - sv[0] = st_create_texture_sampler_view(st->pipe, pt); - - if (sv[0]) { -/* Create a second sampler view to read stencil. - * The stencil is written using the shader stencil export - * functionality. */ -if (write_stencil) { - enum pipe_format stencil_format = - util_format_stencil_only(pt->format); - /* we should not be doing pixel map/transfer (see above) */ - assert(num_sampler_view == 1); - sv[1] = st_create_texture_sampler_view_format(st->pipe, pt, - stencil_format); - num_sampler_view++; -} + /* Put glDrawPixels image into a texture */ + pt = make_texture(st, width, height, format, type, unpack, pixels); + if (!pt) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); + return; + } -draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2], - width, height, - ctx->Pixel.ZoomX, ctx->Pixel.ZoomY, - sv, - num_sampler_view, - driver_vp, - driver_fp, fpv, - color, GL_FALSE, write_depth, write_stencil); -pipe_sampler_view_reference(&sv[0], NULL); -if (num_sampler_view > 1) - pipe_sampler_view_reference(&sv[1], NULL); - } + /* create sampler view for the image */ + sv[0] = st_create_texture_sampler_view(st->pipe, pt); + if (!sv[0]) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); + pipe_resource_reference(&pt, NULL); + return; + } + + /* Create a second sampler view to read stencil. The stencil is +* written using the shader stencil export functionality. +*/ + if (write_stencil) { + enum pipe_format stencil_format = + util_format_stencil_only(pt->format); + /* we should not be doing pixel map/transfer (see above) */ + assert(num_sampler_view == 1); + sv[1] = st_create_texture_sampler_view_format(st->pipe, pt, +stencil_format); + if (!sv[1]) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); pipe_resource_reference(&pt, NULL); + pipe_sampler_view_reference(&sv[0], NULL); + return; } + num_sampler_view++; } + + draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2], + width, height, + ctx->Pixel.ZoomX, ctx->Pixel.ZoomY, + sv, + num_sampler_view, + driver_vp, + driver_fp, fpv, + color, GL_FALSE, write_depth, write_stencil); + pipe_sampler_view_reference(&sv[0], NULL); + if (num_sampler_view > 1) + pipe_sampler_view_reference(&sv[1], NULL); + + pipe_resource_reference(&pt, NULL); } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/10] tnl: add some comments in render_line_loop code
On 10/16/2015 04:14 PM, Matt Turner wrote: On Fri, Oct 16, 2015 at 2:25 PM, Brian Paul wrote: And remove '(void) flags' line which is not needed. --- src/mesa/tnl/t_vb_rendertmp.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/tnl/t_vb_rendertmp.h b/src/mesa/tnl/t_vb_rendertmp.h index 44dee76..26a1695 100644 --- a/src/mesa/tnl/t_vb_rendertmp.h +++ b/src/mesa/tnl/t_vb_rendertmp.h @@ -124,19 +124,19 @@ static void TAG(render_line_loop)( struct gl_context *ctx, GLuint i; LOCAL_VARS; - (void) flags; - INIT(GL_LINE_LOOP); if (start+1 < count) { if (TEST_PRIM_BEGIN(flags)) { RESET_STIPPLE; + /* draw the first line from v[0] to v[1] */ if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT) RENDER_LINE( ELT(start), ELT(start+1) ); else RENDER_LINE( ELT(start+1), ELT(start) ); } + /* draw lines from v[1] to V[n-1] */ Presumably lower case v[n-1] here. I'll fix that before pushing. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] st/mesa: check of out-of-memory in st_DrawPixels()
On 10/16/2015 04:11 PM, Jose Fonseca wrote: On 15/10/15 20:01, Brian Paul wrote: Before, if make_texture() or st_create_texture_sampler_view() failed we silently no-op'd the glDrawPixels. Now, set GL_OUT_OF_MEMORY. This also allows us to un-nest a bunch of code. --- src/mesa/state_tracker/st_cb_drawpixels.c | 74 +-- 1 file changed, 40 insertions(+), 34 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index e4d3580..05f6e6b 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -975,6 +975,7 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, int num_sampler_view = 1; struct gl_pixelstore_attrib clippedUnpack; struct st_fp_variant *fpv = NULL; + struct pipe_resource *pt; /* Mesa state should be up to date by now */ assert(ctx->NewState == 0x0); @@ -1030,42 +1031,47 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT); } - /* draw with textured quad */ - { - struct pipe_resource *pt - = make_texture(st, width, height, format, type, unpack, pixels); - if (pt) { - sv[0] = st_create_texture_sampler_view(st->pipe, pt); - - if (sv[0]) { -/* Create a second sampler view to read stencil. - * The stencil is written using the shader stencil export - * functionality. */ -if (write_stencil) { - enum pipe_format stencil_format = - util_format_stencil_only(pt->format); - /* we should not be doing pixel map/transfer (see above) */ - assert(num_sampler_view == 1); - sv[1] = st_create_texture_sampler_view_format(st->pipe, pt, - stencil_format); - num_sampler_view++; -} + /* Put glDrawPixels image into a texture */ + pt = make_texture(st, width, height, format, type, unpack, pixels); + if (!pt) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); + return; + } -draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2], - width, height, - ctx->Pixel.ZoomX, ctx->Pixel.ZoomY, - sv, - num_sampler_view, - driver_vp, - driver_fp, fpv, - color, GL_FALSE, write_depth, write_stencil); -pipe_sampler_view_reference(&sv[0], NULL); -if (num_sampler_view > 1) - pipe_sampler_view_reference(&sv[1], NULL); - } - pipe_resource_reference(&pt, NULL); - } + /* create sampler view for the image */ + sv[0] = st_create_texture_sampler_view(st->pipe, pt); + if (!sv[0]) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); + pipe_resource_reference(&pt, NULL); + return; } + + /* Create a second sampler view to read stencil. The stencil is +* written using the shader stencil export functionality. +*/ + if (write_stencil) { + enum pipe_format stencil_format = + util_format_stencil_only(pt->format); + /* we should not be doing pixel map/transfer (see above) */ + assert(num_sampler_view == 1); + sv[1] = st_create_texture_sampler_view_format(st->pipe, pt, +stencil_format); Should check null sv[1] here too. Yes, I'll post a v2. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 0/6] nouveau: add support for vaapi
On Fri, Oct 16, 2015 at 6:13 PM, Julien Isorce wrote: > > > On 18 September 2015 at 21:34, Ilia Mirkin wrote: >> >> On Fri, Sep 18, 2015 at 4:29 PM, Julien Isorce >> wrote: >> > >> > >> > On 17 September 2015 at 17:52, Ilia Mirkin wrote: >> >> >> >> On Wed, Sep 16, 2015 at 8:22 AM, Julien Isorce >> >> wrote: >> >> > I added below version4 updates. It works for all codecs expect h264. >> >> > Video is visible but lot of blockiness. >> >> > Can someone with a Radeon confirm that "LIBVA_DRIVER_NAME=gallium mpv >> >> > --hwdec=vaapi" >> >> > is working on h264 videos ? >> >> > I want to make sure it is not a bug in st/va. >> >> >> >> The sad reality is that h264 is the only thing that matters (at least >> >> from this list of supported codecs). My concern is that this series >> >> will regress the situation for people who want to use VA-API -- right >> >> now they can use the vdpau <-> vaapi adapter, whereas with this patch >> >> series, they will end up with a va-api driver that doesn't work. So I >> >> can't merge this as-is. >> > >> > >> > Make perfectly sense. >> > >> >> >> >> >> >> Are the various lengths (for inter-bo size/etc) being computed >> >> properly >> > >> > >> > In the past I compared the final content of the nouveau_bo buffer at >> > each >> > endFrame step , with the content using vdpau. There were the same. >> > I will re-check. >> > Does it make sense to do that actually ? >> > Is there anything else I could compare with vdpau ? >> >> If you're feeding the exact same stuff and everything is exactly the >> same, then the results would also be the same. Clearly there's SOME >> difference SOMEWHERE :) > > > Indeed I compared only dec->bsp_bo, not dec->inter_bo :) I'll check that. > But in the first place I do not see where dec->inter_bo is filled. Could you > point out where this is done ? nouveau doesn't fill inter_bo -- it's a bo shared between the VLD and VDEC engines iirc. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/10] tnl: add some comments in render_line_loop code
On Fri, Oct 16, 2015 at 2:25 PM, Brian Paul wrote: > And remove '(void) flags' line which is not needed. > --- > src/mesa/tnl/t_vb_rendertmp.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/tnl/t_vb_rendertmp.h b/src/mesa/tnl/t_vb_rendertmp.h > index 44dee76..26a1695 100644 > --- a/src/mesa/tnl/t_vb_rendertmp.h > +++ b/src/mesa/tnl/t_vb_rendertmp.h > @@ -124,19 +124,19 @@ static void TAG(render_line_loop)( struct gl_context > *ctx, > GLuint i; > LOCAL_VARS; > > - (void) flags; > - > INIT(GL_LINE_LOOP); > > if (start+1 < count) { >if (TEST_PRIM_BEGIN(flags)) { > RESET_STIPPLE; > + /* draw the first line from v[0] to v[1] */ > if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT) > RENDER_LINE( ELT(start), ELT(start+1) ); > else > RENDER_LINE( ELT(start+1), ELT(start) ); >} > > + /* draw lines from v[1] to V[n-1] */ Presumably lower case v[n-1] here. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 0/6] nouveau: add support for vaapi
On 18 September 2015 at 21:34, Ilia Mirkin wrote: > On Fri, Sep 18, 2015 at 4:29 PM, Julien Isorce > wrote: > > > > > > On 17 September 2015 at 17:52, Ilia Mirkin wrote: > >> > >> On Wed, Sep 16, 2015 at 8:22 AM, Julien Isorce > >> wrote: > >> > I added below version4 updates. It works for all codecs expect h264. > >> > Video is visible but lot of blockiness. > >> > Can someone with a Radeon confirm that "LIBVA_DRIVER_NAME=gallium mpv > >> > --hwdec=vaapi" > >> > is working on h264 videos ? > >> > I want to make sure it is not a bug in st/va. > >> > >> The sad reality is that h264 is the only thing that matters (at least > >> from this list of supported codecs). My concern is that this series > >> will regress the situation for people who want to use VA-API -- right > >> now they can use the vdpau <-> vaapi adapter, whereas with this patch > >> series, they will end up with a va-api driver that doesn't work. So I > >> can't merge this as-is. > > > > > > Make perfectly sense. > > > >> > >> > >> Are the various lengths (for inter-bo size/etc) being computed > >> properly > > > > > > In the past I compared the final content of the nouveau_bo buffer at each > > endFrame step , with the content using vdpau. There were the same. > > I will re-check. > > Does it make sense to do that actually ? > > Is there anything else I could compare with vdpau ? > > If you're feeding the exact same stuff and everything is exactly the > same, then the results would also be the same. Clearly there's SOME > difference SOMEWHERE :) > Indeed I compared only dec->bsp_bo, not dec->inter_bo :) I'll check that. But in the first place I do not see where dec->inter_bo is filled. Could you point out where this is done ? > > > > >> > >> Are you writing stuff to the correct inter bo? IIRC we flip > >> between two of them, perhaps that logic got upset? > > > > > > Probably I missed something. Does the flip happen at each endFrame ? > > Could you point out where this flip is in the current upstream code > exactly > > ? > > I mean stuff like this: > >struct nouveau_bo *bsp_bo = dec->bsp_bo[comm_seq % > NOUVEAU_VP3_VIDEO_QDEPTH]; > I see what you meant by flip now. > > src/gallium/drivers/nouveau/nvc0/nvc0_video_bsp.c: bo_refs[1].bo > = dec->inter_bo[comm_seq & 1] = inter_bo = tmp_bo; > > and so on. Make sure that comm_seq is incremented once per frame, not > once per chunk :) > Yes it is incremented just once per frame, not per chunk. See "nvc0_decoder_begin_frame" in "[PATCH v4 2/6] nvc0: add support for st/va" Thx for your comments Julien > > -ilia > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] st/mesa: check of out-of-memory in st_DrawPixels()
On 15/10/15 20:01, Brian Paul wrote: Before, if make_texture() or st_create_texture_sampler_view() failed we silently no-op'd the glDrawPixels. Now, set GL_OUT_OF_MEMORY. This also allows us to un-nest a bunch of code. --- src/mesa/state_tracker/st_cb_drawpixels.c | 74 +-- 1 file changed, 40 insertions(+), 34 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index e4d3580..05f6e6b 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -975,6 +975,7 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, int num_sampler_view = 1; struct gl_pixelstore_attrib clippedUnpack; struct st_fp_variant *fpv = NULL; + struct pipe_resource *pt; /* Mesa state should be up to date by now */ assert(ctx->NewState == 0x0); @@ -1030,42 +1031,47 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT); } - /* draw with textured quad */ - { - struct pipe_resource *pt - = make_texture(st, width, height, format, type, unpack, pixels); - if (pt) { - sv[0] = st_create_texture_sampler_view(st->pipe, pt); - - if (sv[0]) { -/* Create a second sampler view to read stencil. - * The stencil is written using the shader stencil export - * functionality. */ -if (write_stencil) { - enum pipe_format stencil_format = - util_format_stencil_only(pt->format); - /* we should not be doing pixel map/transfer (see above) */ - assert(num_sampler_view == 1); - sv[1] = st_create_texture_sampler_view_format(st->pipe, pt, - stencil_format); - num_sampler_view++; -} + /* Put glDrawPixels image into a texture */ + pt = make_texture(st, width, height, format, type, unpack, pixels); + if (!pt) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); + return; + } -draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2], - width, height, - ctx->Pixel.ZoomX, ctx->Pixel.ZoomY, - sv, - num_sampler_view, - driver_vp, - driver_fp, fpv, - color, GL_FALSE, write_depth, write_stencil); -pipe_sampler_view_reference(&sv[0], NULL); -if (num_sampler_view > 1) - pipe_sampler_view_reference(&sv[1], NULL); - } - pipe_resource_reference(&pt, NULL); - } + /* create sampler view for the image */ + sv[0] = st_create_texture_sampler_view(st->pipe, pt); + if (!sv[0]) { + _mesa_error(ctx, GL_OUT_OF_MEMORY, "glDrawPixels"); + pipe_resource_reference(&pt, NULL); + return; } + + /* Create a second sampler view to read stencil. The stencil is +* written using the shader stencil export functionality. +*/ + if (write_stencil) { + enum pipe_format stencil_format = + util_format_stencil_only(pt->format); + /* we should not be doing pixel map/transfer (see above) */ + assert(num_sampler_view == 1); + sv[1] = st_create_texture_sampler_view_format(st->pipe, pt, +stencil_format); Should check null sv[1] here too. + num_sampler_view++; + } + + draw_textured_quad(ctx, x, y, ctx->Current.RasterPos[2], + width, height, + ctx->Pixel.ZoomX, ctx->Pixel.ZoomY, + sv, + num_sampler_view, + driver_vp, + driver_fp, fpv, + color, GL_FALSE, write_depth, write_stencil); + pipe_sampler_view_reference(&sv[0], NULL); + if (num_sampler_view > 1) + pipe_sampler_view_reference(&sv[1], NULL); + + pipe_resource_reference(&pt, NULL); } Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa: fix incorrect opcode in save_BlendFunci()
On 15/10/15 15:51, Brian Paul wrote: Fixes assertion failure with new piglit arb_draw_buffers_blend-state_set_get test. Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/main/dlist.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/dlist.c b/src/mesa/main/dlist.c index fdb839c..2b65b2e 100644 --- a/src/mesa/main/dlist.c +++ b/src/mesa/main/dlist.c @@ -1400,7 +1400,7 @@ save_BlendFunci(GLuint buf, GLenum sfactor, GLenum dfactor) GET_CURRENT_CONTEXT(ctx); Node *n; ASSERT_OUTSIDE_SAVE_BEGIN_END_AND_FLUSH(ctx); - n = alloc_instruction(ctx, OPCODE_BLEND_FUNC_SEPARATE_I, 3); + n = alloc_instruction(ctx, OPCODE_BLEND_FUNC_I, 3); if (n) { n[1].ui = buf; n[2].e = sfactor; Series is Reviewed-by: Jose Fonseca ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi
On 10/16/2015 11:57 PM, Ilia Mirkin wrote: On Fri, Oct 16, 2015 at 5:35 PM, Samuel Pitoiset wrote: On 10/16/2015 11:22 PM, Ilia Mirkin wrote: On Fri, Oct 16, 2015 at 5:29 PM, Samuel Pitoiset wrote: As explained in the CUDA toolkit documentation, "a metric is a characteristic of an application that is calculated from one or more event values." Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/Makefile.sources | 2 + src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c | 19 +- .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444 + .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h| 42 ++ 4 files changed, 504 insertions(+), 3 deletions(-) create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h diff --git a/src/gallium/drivers/nouveau/Makefile.sources b/src/gallium/drivers/nouveau/Makefile.sources index edc6cf4..c18e9f5 100644 --- a/src/gallium/drivers/nouveau/Makefile.sources +++ b/src/gallium/drivers/nouveau/Makefile.sources @@ -154,6 +154,8 @@ NVC0_C_SOURCES := \ nvc0/nvc0_query.h \ nvc0/nvc0_query_hw.c \ nvc0/nvc0_query_hw.h \ + nvc0/nvc0_query_hw_metric.c \ + nvc0/nvc0_query_hw_metric.h \ nvc0/nvc0_query_hw_sm.c \ nvc0/nvc0_query_hw_sm.h \ nvc0/nvc0_query_sw.c \ diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c index 91254be..90ee82f 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c @@ -25,6 +25,7 @@ #include "nvc0/nvc0_context.h" #include "nvc0/nvc0_query_hw.h" +#include "nvc0/nvc0_query_hw_metric.h" #include "nvc0/nvc0_query_hw_sm.h" #define NVC0_HW_QUERY_STATE_READY 0 @@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0, unsigned type, unsigned index) return (struct nvc0_query *)hq; } + hq = nvc0_hw_metric_create_query(nvc0, type); + if (hq) { + hq->base.funcs = &hw_query_funcs; + return (struct nvc0_query *)hq; + } + hq = CALLOC_STRUCT(nvc0_hw_query); if (!hq) return NULL; @@ -435,14 +442,20 @@ int nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id, struct pipe_driver_query_info *info) { - int num_hw_sm_queries = 0; + int num_hw_sm_queries = 0, num_hw_metric_queries = 0; num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0, NULL); + num_hw_metric_queries = + nvc0_hw_metric_get_driver_query_info(screen, 0, NULL); if (!info) - return num_hw_sm_queries; + return num_hw_sm_queries + num_hw_metric_queries; + + if (id < num_hw_sm_queries) + return nvc0_hw_sm_get_driver_query_info(screen, id, info); - return nvc0_hw_sm_get_driver_query_info(screen, id, info); + return nvc0_hw_metric_get_driver_query_info(screen, + id - num_hw_sm_queries, info); } void diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c new file mode 100644 index 000..dbe350a --- /dev/null +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c @@ -0,0 +1,444 @@ +/* + * Copyright 2015 Samuel Pitoiset + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#include "nvc0/nvc0_context.h" +#include "nvc0/nvc0_query_hw_metric.h" +#include "nvc0/nvc0_query_hw_sm.h" + +/* === PERFORMANCE MONITORING METRICS for NVC0:NVE4 === */ +static const char *nvc0_hw_metric_names[] = +{ + "metric-achieved_occupancy", + "metric-branch_efficiency", + "metric-inst_issued", + "metric-inst_per_wrap", + "metric-inst_replay_overhead", + "metric-issued_ipc", + "metric-issue_slots", + "metric-issue_slot_utilization", + "metric-i
Re: [Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi
On Fri, Oct 16, 2015 at 5:35 PM, Samuel Pitoiset wrote: > > > On 10/16/2015 11:22 PM, Ilia Mirkin wrote: >> >> On Fri, Oct 16, 2015 at 5:29 PM, Samuel Pitoiset >> wrote: >>> >>> As explained in the CUDA toolkit documentation, "a metric is a >>> characteristic of an application that is calculated from one or more >>> event values." >>> >>> Signed-off-by: Samuel Pitoiset >>> --- >>> src/gallium/drivers/nouveau/Makefile.sources | 2 + >>> src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c | 19 +- >>> .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444 >>> + >>> .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h| 42 ++ >>> 4 files changed, 504 insertions(+), 3 deletions(-) >>> create mode 100644 >>> src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c >>> create mode 100644 >>> src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h >>> >>> diff --git a/src/gallium/drivers/nouveau/Makefile.sources >>> b/src/gallium/drivers/nouveau/Makefile.sources >>> index edc6cf4..c18e9f5 100644 >>> --- a/src/gallium/drivers/nouveau/Makefile.sources >>> +++ b/src/gallium/drivers/nouveau/Makefile.sources >>> @@ -154,6 +154,8 @@ NVC0_C_SOURCES := \ >>> nvc0/nvc0_query.h \ >>> nvc0/nvc0_query_hw.c \ >>> nvc0/nvc0_query_hw.h \ >>> + nvc0/nvc0_query_hw_metric.c \ >>> + nvc0/nvc0_query_hw_metric.h \ >>> nvc0/nvc0_query_hw_sm.c \ >>> nvc0/nvc0_query_hw_sm.h \ >>> nvc0/nvc0_query_sw.c \ >>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c >>> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c >>> index 91254be..90ee82f 100644 >>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c >>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c >>> @@ -25,6 +25,7 @@ >>> >>> #include "nvc0/nvc0_context.h" >>> #include "nvc0/nvc0_query_hw.h" >>> +#include "nvc0/nvc0_query_hw_metric.h" >>> #include "nvc0/nvc0_query_hw_sm.h" >>> >>> #define NVC0_HW_QUERY_STATE_READY 0 >>> @@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0, >>> unsigned type, unsigned index) >>> return (struct nvc0_query *)hq; >>> } >>> >>> + hq = nvc0_hw_metric_create_query(nvc0, type); >>> + if (hq) { >>> + hq->base.funcs = &hw_query_funcs; >>> + return (struct nvc0_query *)hq; >>> + } >>> + >>> hq = CALLOC_STRUCT(nvc0_hw_query); >>> if (!hq) >>> return NULL; >>> @@ -435,14 +442,20 @@ int >>> nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id, >>> struct pipe_driver_query_info *info) >>> { >>> - int num_hw_sm_queries = 0; >>> + int num_hw_sm_queries = 0, num_hw_metric_queries = 0; >>> >>> num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0, >>> NULL); >>> + num_hw_metric_queries = >>> + nvc0_hw_metric_get_driver_query_info(screen, 0, NULL); >>> >>> if (!info) >>> - return num_hw_sm_queries; >>> + return num_hw_sm_queries + num_hw_metric_queries; >>> + >>> + if (id < num_hw_sm_queries) >>> + return nvc0_hw_sm_get_driver_query_info(screen, id, info); >>> >>> - return nvc0_hw_sm_get_driver_query_info(screen, id, info); >>> + return nvc0_hw_metric_get_driver_query_info(screen, >>> + id - num_hw_sm_queries, >>> info); >>> } >>> >>> void >>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c >>> b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c >>> new file mode 100644 >>> index 000..dbe350a >>> --- /dev/null >>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c >>> @@ -0,0 +1,444 @@ >>> +/* >>> + * Copyright 2015 Samuel Pitoiset >>> + * >>> + * Permission is hereby granted, free of charge, to any person obtaining >>> a >>> + * copy of this software and associated documentation files (the >>> "Software"), >>> + * to deal in the Software without restriction, including without >>> limitation >>> + * the rights to use, copy, modify, merge, publish, distribute, >>> sublicense, >>> + * and/or sell copies of the Software, and to permit persons to whom the >>> + * Software is furnished to do so, subject to the following conditions: >>> + * >>> + * The above copyright notice and this permission notice shall be >>> included in >>> + * all copies or substantial portions of the Software. >>> + * >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>> EXPRESS OR >>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> MERCHANTABILITY, >>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >>> SHALL >>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR >>> + * OTHER DEALINGS IN THE SOFTWARE. >>> + */ >>> + >>> +#include "nvc0/nvc0_context.h" >>> +#include "nvc0/nvc
[Mesa-dev] [Bug 81174] Gallium: GL_LINE_LOOP broken with more than 512 points
https://bugs.freedesktop.org/show_bug.cgi?id=81174 --- Comment #16 from Brian Paul --- I'm digging into this bug because it pertains to an issue with a particular app and the VMware gallium driver. The VBO code for splitting GL_LINE_LOOP is actually correct, I believe, but our implementations of vbo_context::draw_prims(), such as st_draw_vbo() and brw_draw_prims() are subtly broken. And has been broken since day one! The issue comes from the two 'begin' and 'end' flags in the _mesa_prim structure. These flags indicate whether the primitive's vertices start at a glBegin() and whether the prim's vertices end at a glEnd(). Suppose we have a long GL_LINE_LOOP that gets broken into three pieces. Here are the flags for the three _mesa_prims that we draw: _mesa_prim[0].begin = 1 _mesa_prim[0].end = 0 _mesa_prim[1].begin = 0 _mesa_prim[1].end = 0 _mesa_prim[2].begin = 0 _mesa_prim[2].end = 1 For all three drawing calls, the 0th vertex in the primitive's vertex buffer will be a copy of the first glVertex() issued after glBegin. If N is the number of vertices in the _mesa_prim: For _mesa_prim[0] we should draw the line segments from v[0] .. V[N-1] For _mesa_prim[1] we should draw the line segments from v[1] .. V[N-1] For _mesa_prim[2] we should draw the line segments from v[1] .. V[N-1] and an extra line from V[N-1] to v[0] You can see this in the old 'tnl' code's t_vb_rendertmp.h code for GL_LINE_LOOP. Our implementations of draw_prims() ignore those flags and always draw V[0].V[N-1] so we get the stray lines that people are seeing. Furthermore, draw_prims() is supposed to look at the 'begin' flag to know when to reset the line stipple counter. We don't do that in the state tracker either. I've posted a patch series that fixes this. The basic idea is when we have to split a GL_LINE_LOOP, draw the pieces with GL_LINE_STRIP instead so that drivers don't need to worry about the 'begin' and 'end' flags (except where the stipple counter matters). Drivers will only get a GL_LINE_LOOP when all the vertices live in one vertex buffer. Unfortunately, after fixing the VBO code, there's still a bug somewhere in the gallium 'draw' code. See comments in the patch series for more information. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/10] vbo: convert display list GL_LINE_LOOP prims to GL_LINE_STRIP
When a long GL_LINE_LOOP prim was split across primitives we drew stray lines. See previous commit for details. This patch converts GL_LINE_LOOP prims into GL_LINE_STRIP prims so that drivers don't have to worry about the _mesa_prim::begin/end flags. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174 --- src/mesa/vbo/vbo_save_api.c | 53 + 1 file changed, 53 insertions(+) diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c index 6688ba0..d49aa15 100644 --- a/src/mesa/vbo/vbo_save_api.c +++ b/src/mesa/vbo/vbo_save_api.c @@ -360,6 +360,51 @@ merge_prims(struct _mesa_prim *prim_list, *prim_count = prev_prim - prim_list + 1; } + +/** + * Convert GL_LINE_LOOP primitive into GL_LINE_STRIP so that drivers + * don't have to worry about handling the _mesa_prim::begin/end flags. + * See https://bugs.freedesktop.org/show_bug.cgi?id=81174 + */ +static void +convert_line_loop_to_strip(struct vbo_save_context *save, + struct vbo_save_vertex_list *node) +{ + struct _mesa_prim *prim = &node->prim[node->prim_count - 1]; + + assert(prim->mode == GL_LINE_LOOP); + + if (prim->end) { + /* Copy the 0th vertex to end of the buffer and extend the + * vertex count by one to finish the line loop. + */ + const GLuint sz = save->vertex_size; + /* 0th vertex: */ + const fi_type *src = save->buffer + prim->start * sz; + /* end of buffer: */ + fi_type *dst = save->buffer + (prim->start + prim->count) * sz; + + memcpy(dst, src, sz * sizeof(float)); + + prim->count++; + node->count++; + save->vert_count++; + save->buffer_ptr += sz; + save->vertex_store->used += sz; + } + + if (!prim->begin) { + /* Drawing the second or later section of a long line loop. + * Skip the 0th vertex. + */ + prim->start++; + prim->count--; + } + + prim->mode = GL_LINE_STRIP; +} + + /** * Insert the active immediate struct onto the display list currently * being built. @@ -441,6 +486,10 @@ _save_compile_vertex_list(struct gl_context *ctx) */ save->copied.nr = _save_copy_vertices(ctx, node, save->buffer); + if (node->prim[node->prim_count - 1].mode == GL_LINE_LOOP) { + convert_line_loop_to_strip(save, node); + } + merge_prims(node->prim, &node->prim_count); /* Deal with GL_COMPILE_AND_EXECUTE: @@ -482,6 +531,10 @@ _save_compile_vertex_list(struct gl_context *ctx) save->buffer_ptr = vbo_save_map_vertex_store(ctx, save->vertex_store); save->out_of_memory = save->buffer_ptr == NULL; } + else { + /* update buffer_ptr for next vertex */ + save->buffer_ptr = save->vertex_store->buffer + save->vertex_store->used; + } if (save->prim_store->used > VBO_SAVE_PRIM_SIZE - 6) { save->prim_store->refcount--; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/10] vbo: replace the comment on vbo_copy_vertices()
--- src/mesa/vbo/vbo_exec_draw.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c index 174cbc3..781991b 100644 --- a/src/mesa/vbo/vbo_exec_draw.c +++ b/src/mesa/vbo/vbo_exec_draw.c @@ -64,9 +64,13 @@ vbo_exec_debug_verts( struct vbo_exec_context *exec ) } -/* - * NOTE: Need to have calculated primitives by this point -- do it on the fly. - * NOTE: Old 'parity' issue is gone. +/** + * Copy zero, one or two vertices from the current vertex buffer into + * the temporary "copy" buffer. + * This is used when a single primitive overflows a vertex buffer and + * we need to continue the primitive in a new vertex buffer. + * The temporary "copy" buffer holds the vertices which need to get + * copied from the old buffer to the new one. */ static GLuint vbo_copy_vertices( struct vbo_exec_context *exec ) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/10] vbo: remove unneeded ctx parameter for merge_prims()
--- src/mesa/vbo/vbo_save_api.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c index fdc677f..6688ba0 100644 --- a/src/mesa/vbo/vbo_save_api.c +++ b/src/mesa/vbo/vbo_save_api.c @@ -330,8 +330,7 @@ _save_reset_counters(struct gl_context *ctx) * previous prim. */ static void -merge_prims(struct gl_context *ctx, -struct _mesa_prim *prim_list, +merge_prims(struct _mesa_prim *prim_list, GLuint *prim_count) { GLuint i; @@ -442,7 +441,7 @@ _save_compile_vertex_list(struct gl_context *ctx) */ save->copied.nr = _save_copy_vertices(ctx, node, save->buffer); - merge_prims(ctx, node->prim, &node->prim_count); + merge_prims(node->prim, &node->prim_count); /* Deal with GL_COMPILE_AND_EXECUTE: */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/10] vbo: simplify some code in vbo_exec_End()
--- src/mesa/vbo/vbo_exec_api.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c index 2a78eac..903aa42 100644 --- a/src/mesa/vbo/vbo_exec_api.c +++ b/src/mesa/vbo/vbo_exec_api.c @@ -823,11 +823,10 @@ static void GLAPIENTRY vbo_exec_End( void ) if (exec->vtx.prim_count > 0) { /* close off current primitive */ - int idx = exec->vtx.vert_count; - int i = exec->vtx.prim_count - 1; + struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1]; - exec->vtx.prim[i].end = 1; - exec->vtx.prim[i].count = idx - exec->vtx.prim[i].start; + last_prim->end = 1; + last_prim->count = exec->vtx.vert_count - last_prim->start; try_vbo_merge(exec); } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/10] vbo: add new vbo_compute_max_verts() helper function
--- src/mesa/vbo/vbo_context.h | 14 ++ src/mesa/vbo/vbo_exec_api.c | 3 +-- src/mesa/vbo/vbo_exec_draw.c | 3 +-- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/src/mesa/vbo/vbo_context.h b/src/mesa/vbo/vbo_context.h index a376efe..1e85335 100644 --- a/src/mesa/vbo/vbo_context.h +++ b/src/mesa/vbo/vbo_context.h @@ -196,6 +196,20 @@ vbo_get_default_vals_as_union(GLenum format) } } + +/** + * Compute the max number of vertices which can be stored in + * a vertex buffer, given the current vertex size, and the amount + * of space already used. + */ +static inline unsigned +vbo_compute_max_verts(const struct vbo_exec_context *exec) +{ + return (VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used) / + (exec->vtx.vertex_size * sizeof(GLfloat)); +} + + #ifdef __cplusplus } // extern "C" #endif diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c index 903aa42..0116f56 100644 --- a/src/mesa/vbo/vbo_exec_api.c +++ b/src/mesa/vbo/vbo_exec_api.c @@ -293,8 +293,7 @@ vbo_exec_wrap_upgrade_vertex(struct vbo_exec_context *exec, */ exec->vtx.attrsz[attr] = newSize; exec->vtx.vertex_size += newSize - oldSize; - exec->vtx.max_vert = ((VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used) / - (exec->vtx.vertex_size * sizeof(GLfloat))); + exec->vtx.max_vert = vbo_compute_max_verts(exec); exec->vtx.vert_count = 0; exec->vtx.buffer_ptr = exec->vtx.buffer_map; diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c index 412ebb6..397fc09 100644 --- a/src/mesa/vbo/vbo_exec_draw.c +++ b/src/mesa/vbo/vbo_exec_draw.c @@ -436,8 +436,7 @@ vbo_exec_vtx_flush(struct vbo_exec_context *exec, GLboolean keepUnmapped) if (keepUnmapped || exec->vtx.vertex_size == 0) exec->vtx.max_vert = 0; else - exec->vtx.max_vert = ((VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used) / -(exec->vtx.vertex_size * sizeof(GLfloat))); + exec->vtx.max_vert = vbo_compute_max_verts(exec); exec->vtx.buffer_ptr = exec->vtx.buffer_map; exec->vtx.prim_count = 0; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/10] vbo: simplify some code in vbo_exec_debug_verts()
As before, use a new 'last_prim' pointer to simplify things. Plus, add some const qualifiers. --- src/mesa/vbo/vbo_exec_draw.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c index 781991b..412ebb6 100644 --- a/src/mesa/vbo/vbo_exec_draw.c +++ b/src/mesa/vbo/vbo_exec_draw.c @@ -75,13 +75,13 @@ vbo_exec_debug_verts( struct vbo_exec_context *exec ) static GLuint vbo_copy_vertices( struct vbo_exec_context *exec ) { - GLuint nr = exec->vtx.prim[exec->vtx.prim_count-1].count; + struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1]; + const GLuint nr = last_prim->count; GLuint ovf, i; - GLuint sz = exec->vtx.vertex_size; + const GLuint sz = exec->vtx.vertex_size; fi_type *dst = exec->vtx.copied.buffer; - const fi_type *src = (exec->vtx.buffer_map + - exec->vtx.prim[exec->vtx.prim_count-1].start * - exec->vtx.vertex_size); + const fi_type *src = + exec->vtx.buffer_map + last_prim->start * exec->vtx.vertex_size; switch (exec->ctx->Driver.CurrentExecPrimitive) { case GL_POINTS: @@ -127,7 +127,7 @@ vbo_copy_vertices( struct vbo_exec_context *exec ) case GL_TRIANGLE_STRIP: /* no parity issue, but need to make sure the tri is not drawn twice */ if (nr & 1) { -exec->vtx.prim[exec->vtx.prim_count-1].count--; +last_prim->count--; } /* fallthrough */ case GL_QUAD_STRIP: -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/10] tnl: add some comments in render_line_loop code
And remove '(void) flags' line which is not needed. --- src/mesa/tnl/t_vb_rendertmp.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/tnl/t_vb_rendertmp.h b/src/mesa/tnl/t_vb_rendertmp.h index 44dee76..26a1695 100644 --- a/src/mesa/tnl/t_vb_rendertmp.h +++ b/src/mesa/tnl/t_vb_rendertmp.h @@ -124,19 +124,19 @@ static void TAG(render_line_loop)( struct gl_context *ctx, GLuint i; LOCAL_VARS; - (void) flags; - INIT(GL_LINE_LOOP); if (start+1 < count) { if (TEST_PRIM_BEGIN(flags)) { RESET_STIPPLE; + /* draw the first line from v[0] to v[1] */ if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT) RENDER_LINE( ELT(start), ELT(start+1) ); else RENDER_LINE( ELT(start+1), ELT(start) ); } + /* draw lines from v[1] to V[n-1] */ for ( i = start+2 ; i < count ; i++) { if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT) RENDER_LINE( ELT(i-1), ELT(i) ); @@ -145,6 +145,7 @@ static void TAG(render_line_loop)( struct gl_context *ctx, } if ( TEST_PRIM_END(flags)) { + /* draw final line from v[n-1] to v[0] (the very first vertex) */ if (ctx->Light.ProvokingVertex == GL_LAST_VERTEX_CONVENTION_EXT) RENDER_LINE( ELT(count-1), ELT(start) ); else -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/10] vbo: fix GL_LINE_LOOP stray line bug
When long GL_LINE_LOOP primitives don't fit in one vertex buffer they have to be split across buffers. The code to do this was basically correct but drivers had to pay special attention to the _mesa_prim::begin,end flags in order to draw the sections of the line loop properly. Apparently, the only drivers to do this were those using the old 'tnl' module for software vertex processing. Now we convert the split pieces of GL_LINE_LOOP prims into GL_LINE_STRIP primitives so that drivers don't have to worry about the special begin/end flags. The only time a driver will get a GL_LINE_LOOP prim is when the whole thing fits in one vertex buffer. Most fixes bug 81174, but not completely. There's another bug somewhere in the src/gallium/auxiliary/draw/ code. If the piglit lineloop test is run with -count 4096, rendering is correct, but with -count 4097 there are stray lines. 4096 is a magic number in the draw code (search for "4096"). Also note that this does not fix long line loops in display lists. The next patch fixes that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174 --- src/mesa/vbo/vbo_context.h | 5 - src/mesa/vbo/vbo_exec_api.c | 38 +- src/mesa/vbo/vbo_exec_draw.c | 12 3 files changed, 53 insertions(+), 2 deletions(-) diff --git a/src/mesa/vbo/vbo_context.h b/src/mesa/vbo/vbo_context.h index 1e85335..28f43b4 100644 --- a/src/mesa/vbo/vbo_context.h +++ b/src/mesa/vbo/vbo_context.h @@ -205,7 +205,10 @@ vbo_get_default_vals_as_union(GLenum format) static inline unsigned vbo_compute_max_verts(const struct vbo_exec_context *exec) { - return (VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used) / + /* Subtract one so we're always sure to have room for an extra +* vertex for GL_LINE_LOOP -> GL_LINE_STRIP conversion. +*/ + return (VBO_VERT_BUFFER_SIZE - exec->vtx.buffer_used - 1) / (exec->vtx.vertex_size * sizeof(GLfloat)); } diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c index 0116f56..3c25ec0 100644 --- a/src/mesa/vbo/vbo_exec_api.c +++ b/src/mesa/vbo/vbo_exec_api.c @@ -61,7 +61,8 @@ static void reset_attrfv( struct vbo_exec_context *exec ); /** * Close off the last primitive, execute the buffer, restart the - * primitive. + * primitive. This is called when we fill a vertex buffer before + * hitting glEnd. */ static void vbo_exec_wrap_buffers( struct vbo_exec_context *exec ) { @@ -83,6 +84,22 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context *exec ) last_count = last_prim->count; + /* Special handling for wrapping GL_LINE_LOOP */ + if (last_prim->mode == GL_LINE_LOOP && + last_count > 0 && + !last_prim->end) { + /* draw this section of the incomplete line loop as a line strip */ + last_prim->mode = GL_LINE_STRIP; + if (!last_prim->begin) { +/* This is not the first section of the line loop, so don't + * draw the 0th vertex. We're saving it until we draw the + * very last section of the loop. + */ +last_prim->start++; +last_prim->count--; + } + } + /* Execute the buffer and save copied vertices. */ if (exec->vtx.vert_count) @@ -98,6 +115,7 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context *exec ) if (_mesa_inside_begin_end(exec->ctx)) { exec->vtx.prim[0].mode = exec->ctx->Driver.CurrentExecPrimitive; +exec->vtx.prim[0].begin = 0; exec->vtx.prim[0].start = 0; exec->vtx.prim[0].count = 0; exec->vtx.prim_count++; @@ -827,6 +845,24 @@ static void GLAPIENTRY vbo_exec_End( void ) last_prim->end = 1; last_prim->count = exec->vtx.vert_count - last_prim->start; + /* Special handling for GL_LINE_LOOP */ + if (last_prim->mode == GL_LINE_LOOP && last_prim->begin == 0) { + /* We're finishing drawing a line loop. Append 0th vertex onto + * end of vertex buffer so we can draw it as a line strip. + */ + const fi_type *src = exec->vtx.buffer_map; + fi_type *dst = exec->vtx.buffer_map + +exec->vtx.vert_count * exec->vtx.vertex_size; + + /* copy 0th vertex to end of buffer */ + memcpy(dst, src, exec->vtx.vertex_size * sizeof(fi_type)); + + assert(last_prim->start == 0); + last_prim->start++; /* skip vertex0 */ + /* note that last_prim->count stays unchanged */ + last_prim->mode = GL_LINE_STRIP; + } + try_vbo_merge(exec); } diff --git a/src/mesa/vbo/vbo_exec_draw.c b/src/mesa/vbo/vbo_exec_draw.c index 397fc09..f91f7c8 100644 --- a/src/mesa/vbo/vbo_exec_draw.c +++ b/src/mesa/vbo/vbo_exec_draw.c @@ -110,6 +110,18 @@ vbo_copy_vertices( struct vbo_exec_context *exec ) return 1; } case GL_LINE_LOOP: + if (last_prim->begin == 0) { + /* We're dealing with the second or late
[Mesa-dev] [PATCH 05/10] vbo: simplify some code in vbo_exec_wrap_buffers()
Use a new 'last_prim' pointer to simplify things. --- src/mesa/vbo/vbo_exec_api.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c index c1f2146..2a78eac 100644 --- a/src/mesa/vbo/vbo_exec_api.c +++ b/src/mesa/vbo/vbo_exec_api.c @@ -71,17 +71,17 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context *exec ) exec->vtx.buffer_ptr = exec->vtx.buffer_map; } else { - GLuint last_begin = exec->vtx.prim[exec->vtx.prim_count-1].begin; + struct _mesa_prim *last_prim = &exec->vtx.prim[exec->vtx.prim_count - 1]; + const GLuint last_begin = last_prim->begin; GLuint last_count; + assert(exec->vtx.prim_count > 0); + if (_mesa_inside_begin_end(exec->ctx)) { -GLint i = exec->vtx.prim_count - 1; -assert(i >= 0); -exec->vtx.prim[i].count = (exec->vtx.vert_count - - exec->vtx.prim[i].start); +last_prim->count = exec->vtx.vert_count - last_prim->start; } - last_count = exec->vtx.prim[exec->vtx.prim_count-1].count; + last_count = last_prim->count; /* Execute the buffer and save copied vertices. */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/10] vbo: make vbo_exec_vtx_wrap() static
--- src/mesa/vbo/vbo_exec.h | 2 -- src/mesa/vbo/vbo_exec_api.c | 3 ++- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/vbo/vbo_exec.h b/src/mesa/vbo/vbo_exec.h index 00378eb..a80b2c9 100644 --- a/src/mesa/vbo/vbo_exec.h +++ b/src/mesa/vbo/vbo_exec.h @@ -160,8 +160,6 @@ void vbo_exec_vtx_flush( struct vbo_exec_context *exec, GLboolean unmap ); void vbo_exec_vtx_map( struct vbo_exec_context *exec ); -void vbo_exec_vtx_wrap( struct vbo_exec_context *exec ); - void vbo_exec_eval_update( struct vbo_exec_context *exec ); void vbo_exec_do_EvalCoord2f( struct vbo_exec_context *exec, diff --git a/src/mesa/vbo/vbo_exec_api.c b/src/mesa/vbo/vbo_exec_api.c index 789869a..c1f2146 100644 --- a/src/mesa/vbo/vbo_exec_api.c +++ b/src/mesa/vbo/vbo_exec_api.c @@ -113,7 +113,8 @@ static void vbo_exec_wrap_buffers( struct vbo_exec_context *exec ) * Deal with buffer wrapping where provoked by the vertex buffer * filling up, as opposed to upgrade_vertex(). */ -void vbo_exec_vtx_wrap( struct vbo_exec_context *exec ) +static void +vbo_exec_vtx_wrap(struct vbo_exec_context *exec) { fi_type *data = exec->vtx.copied.buffer; GLuint i; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi
On 10/16/2015 11:22 PM, Ilia Mirkin wrote: On Fri, Oct 16, 2015 at 5:29 PM, Samuel Pitoiset wrote: As explained in the CUDA toolkit documentation, "a metric is a characteristic of an application that is calculated from one or more event values." Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/Makefile.sources | 2 + src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c | 19 +- .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444 + .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h| 42 ++ 4 files changed, 504 insertions(+), 3 deletions(-) create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h diff --git a/src/gallium/drivers/nouveau/Makefile.sources b/src/gallium/drivers/nouveau/Makefile.sources index edc6cf4..c18e9f5 100644 --- a/src/gallium/drivers/nouveau/Makefile.sources +++ b/src/gallium/drivers/nouveau/Makefile.sources @@ -154,6 +154,8 @@ NVC0_C_SOURCES := \ nvc0/nvc0_query.h \ nvc0/nvc0_query_hw.c \ nvc0/nvc0_query_hw.h \ + nvc0/nvc0_query_hw_metric.c \ + nvc0/nvc0_query_hw_metric.h \ nvc0/nvc0_query_hw_sm.c \ nvc0/nvc0_query_hw_sm.h \ nvc0/nvc0_query_sw.c \ diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c index 91254be..90ee82f 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c @@ -25,6 +25,7 @@ #include "nvc0/nvc0_context.h" #include "nvc0/nvc0_query_hw.h" +#include "nvc0/nvc0_query_hw_metric.h" #include "nvc0/nvc0_query_hw_sm.h" #define NVC0_HW_QUERY_STATE_READY 0 @@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0, unsigned type, unsigned index) return (struct nvc0_query *)hq; } + hq = nvc0_hw_metric_create_query(nvc0, type); + if (hq) { + hq->base.funcs = &hw_query_funcs; + return (struct nvc0_query *)hq; + } + hq = CALLOC_STRUCT(nvc0_hw_query); if (!hq) return NULL; @@ -435,14 +442,20 @@ int nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id, struct pipe_driver_query_info *info) { - int num_hw_sm_queries = 0; + int num_hw_sm_queries = 0, num_hw_metric_queries = 0; num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0, NULL); + num_hw_metric_queries = + nvc0_hw_metric_get_driver_query_info(screen, 0, NULL); if (!info) - return num_hw_sm_queries; + return num_hw_sm_queries + num_hw_metric_queries; + + if (id < num_hw_sm_queries) + return nvc0_hw_sm_get_driver_query_info(screen, id, info); - return nvc0_hw_sm_get_driver_query_info(screen, id, info); + return nvc0_hw_metric_get_driver_query_info(screen, + id - num_hw_sm_queries, info); } void diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c new file mode 100644 index 000..dbe350a --- /dev/null +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c @@ -0,0 +1,444 @@ +/* + * Copyright 2015 Samuel Pitoiset + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#include "nvc0/nvc0_context.h" +#include "nvc0/nvc0_query_hw_metric.h" +#include "nvc0/nvc0_query_hw_sm.h" + +/* === PERFORMANCE MONITORING METRICS for NVC0:NVE4 === */ +static const char *nvc0_hw_metric_names[] = +{ + "metric-achieved_occupancy", + "metric-branch_efficiency", + "metric-inst_issued", + "metric-inst_per_wrap", + "metric-inst_replay_overhead", + "metric-issued_ipc", + "metric-issue_slots", + "metric-issue_slot_utilization", + "metric-ipc", +}; + +struct nvc0_hw_metric_query_cfg { + uint32_t queries[8]; + uint32_t num_queries; +}; + +#define _SM(n) NVC0_HW_S
Re: [Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi
On Fri, Oct 16, 2015 at 5:29 PM, Samuel Pitoiset wrote: > As explained in the CUDA toolkit documentation, "a metric is a > characteristic of an application that is calculated from one or more > event values." > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/Makefile.sources | 2 + > src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c | 19 +- > .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444 > + > .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h| 42 ++ > 4 files changed, 504 insertions(+), 3 deletions(-) > create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c > create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h > > diff --git a/src/gallium/drivers/nouveau/Makefile.sources > b/src/gallium/drivers/nouveau/Makefile.sources > index edc6cf4..c18e9f5 100644 > --- a/src/gallium/drivers/nouveau/Makefile.sources > +++ b/src/gallium/drivers/nouveau/Makefile.sources > @@ -154,6 +154,8 @@ NVC0_C_SOURCES := \ > nvc0/nvc0_query.h \ > nvc0/nvc0_query_hw.c \ > nvc0/nvc0_query_hw.h \ > + nvc0/nvc0_query_hw_metric.c \ > + nvc0/nvc0_query_hw_metric.h \ > nvc0/nvc0_query_hw_sm.c \ > nvc0/nvc0_query_hw_sm.h \ > nvc0/nvc0_query_sw.c \ > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c > index 91254be..90ee82f 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c > @@ -25,6 +25,7 @@ > > #include "nvc0/nvc0_context.h" > #include "nvc0/nvc0_query_hw.h" > +#include "nvc0/nvc0_query_hw_metric.h" > #include "nvc0/nvc0_query_hw_sm.h" > > #define NVC0_HW_QUERY_STATE_READY 0 > @@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0, unsigned > type, unsigned index) >return (struct nvc0_query *)hq; > } > > + hq = nvc0_hw_metric_create_query(nvc0, type); > + if (hq) { > + hq->base.funcs = &hw_query_funcs; > + return (struct nvc0_query *)hq; > + } > + > hq = CALLOC_STRUCT(nvc0_hw_query); > if (!hq) >return NULL; > @@ -435,14 +442,20 @@ int > nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id, >struct pipe_driver_query_info *info) > { > - int num_hw_sm_queries = 0; > + int num_hw_sm_queries = 0, num_hw_metric_queries = 0; > > num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0, NULL); > + num_hw_metric_queries = > + nvc0_hw_metric_get_driver_query_info(screen, 0, NULL); > > if (!info) > - return num_hw_sm_queries; > + return num_hw_sm_queries + num_hw_metric_queries; > + > + if (id < num_hw_sm_queries) > + return nvc0_hw_sm_get_driver_query_info(screen, id, info); > > - return nvc0_hw_sm_get_driver_query_info(screen, id, info); > + return nvc0_hw_metric_get_driver_query_info(screen, > + id - num_hw_sm_queries, info); > } > > void > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c > new file mode 100644 > index 000..dbe350a > --- /dev/null > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c > @@ -0,0 +1,444 @@ > +/* > + * Copyright 2015 Samuel Pitoiset > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > + * OTHER DEALINGS IN THE SOFTWARE. > + */ > + > +#include "nvc0/nvc0_context.h" > +#include "nvc0/nvc0_query_hw_metric.h" > +#include "nvc0/nvc0_query_hw_sm.h" > + > +/* === PERFORMANCE MONITORING METRICS for NVC0:NVE4 === */ > +static const char *nvc0_hw_metric_names[] = > +{ > + "metric-achieved_occupancy", > + "metric-branch_efficiency", > + "metric-inst_issued", > + "metric-inst_per_wrap", > + "metric-inst_replay_overhead", > + "metric-issued_ipc", > + "metric-issue_slots", > + "metric-iss
[Mesa-dev] [PATCH] nvc0: add support for performance monitoring metrics on Fermi
As explained in the CUDA toolkit documentation, "a metric is a characteristic of an application that is calculated from one or more event values." Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/Makefile.sources | 2 + src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c | 19 +- .../drivers/nouveau/nvc0/nvc0_query_hw_metric.c| 444 + .../drivers/nouveau/nvc0/nvc0_query_hw_metric.h| 42 ++ 4 files changed, 504 insertions(+), 3 deletions(-) create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.h diff --git a/src/gallium/drivers/nouveau/Makefile.sources b/src/gallium/drivers/nouveau/Makefile.sources index edc6cf4..c18e9f5 100644 --- a/src/gallium/drivers/nouveau/Makefile.sources +++ b/src/gallium/drivers/nouveau/Makefile.sources @@ -154,6 +154,8 @@ NVC0_C_SOURCES := \ nvc0/nvc0_query.h \ nvc0/nvc0_query_hw.c \ nvc0/nvc0_query_hw.h \ + nvc0/nvc0_query_hw_metric.c \ + nvc0/nvc0_query_hw_metric.h \ nvc0/nvc0_query_hw_sm.c \ nvc0/nvc0_query_hw_sm.h \ nvc0/nvc0_query_sw.c \ diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c index 91254be..90ee82f 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c @@ -25,6 +25,7 @@ #include "nvc0/nvc0_context.h" #include "nvc0/nvc0_query_hw.h" +#include "nvc0/nvc0_query_hw_metric.h" #include "nvc0/nvc0_query_hw_sm.h" #define NVC0_HW_QUERY_STATE_READY 0 @@ -371,6 +372,12 @@ nvc0_hw_create_query(struct nvc0_context *nvc0, unsigned type, unsigned index) return (struct nvc0_query *)hq; } + hq = nvc0_hw_metric_create_query(nvc0, type); + if (hq) { + hq->base.funcs = &hw_query_funcs; + return (struct nvc0_query *)hq; + } + hq = CALLOC_STRUCT(nvc0_hw_query); if (!hq) return NULL; @@ -435,14 +442,20 @@ int nvc0_hw_get_driver_query_info(struct nvc0_screen *screen, unsigned id, struct pipe_driver_query_info *info) { - int num_hw_sm_queries = 0; + int num_hw_sm_queries = 0, num_hw_metric_queries = 0; num_hw_sm_queries = nvc0_hw_sm_get_driver_query_info(screen, 0, NULL); + num_hw_metric_queries = + nvc0_hw_metric_get_driver_query_info(screen, 0, NULL); if (!info) - return num_hw_sm_queries; + return num_hw_sm_queries + num_hw_metric_queries; + + if (id < num_hw_sm_queries) + return nvc0_hw_sm_get_driver_query_info(screen, id, info); - return nvc0_hw_sm_get_driver_query_info(screen, id, info); + return nvc0_hw_metric_get_driver_query_info(screen, + id - num_hw_sm_queries, info); } void diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c new file mode 100644 index 000..dbe350a --- /dev/null +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_metric.c @@ -0,0 +1,444 @@ +/* + * Copyright 2015 Samuel Pitoiset + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#include "nvc0/nvc0_context.h" +#include "nvc0/nvc0_query_hw_metric.h" +#include "nvc0/nvc0_query_hw_sm.h" + +/* === PERFORMANCE MONITORING METRICS for NVC0:NVE4 === */ +static const char *nvc0_hw_metric_names[] = +{ + "metric-achieved_occupancy", + "metric-branch_efficiency", + "metric-inst_issued", + "metric-inst_per_wrap", + "metric-inst_replay_overhead", + "metric-issued_ipc", + "metric-issue_slots", + "metric-issue_slot_utilization", + "metric-ipc", +}; + +struct nvc0_hw_metric_query_cfg { + uint32_t queries[8]; + uint32_t num_queries; +}; + +#define _SM(n) NVC0_HW_SM_QUERY(NVC0_HW_SM_QUERY_ ##n) +#define _M(n, c) [NVC0_HW_METRIC_QUERY_##n] = c + +/* Compute capability 2.0 (GF100/GF110)
Re: [Mesa-dev] [PATCH 1/5] main: Use NumUniformBlocks to count UBOs
On Fri, Oct 16, 2015 at 2:58 AM, Iago Toral Quiroga wrote: > Now that we have separate index spaces for UBOs and SSBOs we do not need > to iterate through BufferInterfaceBlocks any more, we can just take the > UBO count directly from NumUniformBlocks. Nice cleanup, all five patches Reviewed-by: Kristian Høgsberg > --- > src/mesa/main/shaderapi.c | 6 +- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c > index 6a2f60d..26995ad 100644 > --- a/src/mesa/main/shaderapi.c > +++ b/src/mesa/main/shaderapi.c > @@ -729,11 +729,7 @@ get_programiv(struct gl_context *ctx, GLuint program, > GLenum pname, >if (!has_ubo) > break; > > - *params = 0; > - for (unsigned i = 0; i < shProg->NumBufferInterfaceBlocks; i++) { > - if (!shProg->BufferInterfaceBlocks[i].IsShaderStorage) > -(*params)++; > - } > + *params = shProg->NumUniformBlocks; >return; > case GL_PROGRAM_BINARY_RETRIEVABLE_HINT: >/* This enum isn't part of the OES extension for OpenGL ES 2.0. It is > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/16] nvc0: add a note about MP counters on GF100/GF110
On 10/16/2015 07:50 PM, Ilia Mirkin wrote: Series is Reviewed-by: Ilia Mirkin I had a couple of very minor comments that you can feel free to accept or ignore. Thank you for this review Ilia, and I think I'll accept all of your changes. :) On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset wrote: MP counters on GF100/GF110 (compute capability 2.0) are buggy because there is a context-switch problem that we need to fix. Results might be wrong sometimes, be careful! Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index c4b40a6..c5ce3e3 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -286,6 +286,11 @@ static const struct nvc0_hw_sm_query_cfg nve4_hw_sm_queries[] = #undef _M2B /* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */ +/* NOTES: + * - MP counters on GF100/GF110 (compute capability 2.0) are buggy + * because there is a context-switch problem that we need to fix. + * Results might be wrong sometimes, be careful! + */ static const char *nvc0_hw_sm_query_names[] = { /* MP counters */ -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/16] nvc0: add MP counters variants for GF100/GF110
Minor preferences for naming things SM20/SM21 when referring to compute capabilities, but your call. On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset wrote: > GF100 and GF110 chipsets are compute capability 2.0, while the other > Fermi chipsets are compute capability 2.1. That's why, some MP counters > are different between these chipsets and we need to handle variants. > > Signed-off-by: Samuel Pitoiet > --- > .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 559 > ++--- > .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h| 1 + > 2 files changed, 483 insertions(+), 77 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > index 174beef..c4b40a6 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > @@ -298,6 +298,7 @@ static const char *nvc0_hw_sm_query_names[] = > "gred_count", > "gst_request", > "inst_executed", > + "inst_issued", > "inst_issued1_0", > "inst_issued1_1", > "inst_issued2_0", > @@ -373,82 +374,456 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = > }; > > #define _C(f, o, g, m, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, m, 0, g, s } > -#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { > \ > - { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 }, > \ > -} > +#define _Q(n, c) [NVC0_HW_SM_QUERY_##n] = c > + > +/* Compute capability 2.0 (GF100/GF110) */ > +static const struct nvc0_hw_sm_query_cfg > +nvc0_active_cycles = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x11, 0x00ff, 0x), > + .num_counters = 1, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_active_warps = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0010), > + .ctr[1] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0020), > + .ctr[2] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0030), > + .ctr[3] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0040), > + .ctr[4] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0050), > + .ctr[5] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0060), > + .num_counters = 6, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_atom_count = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x63, 0x00ff, 0x0030), > + .num_counters = 1, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_branch = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x1a, 0x00ff, 0x), > + .ctr[1] = _C(0x, LOGOP, 0x1a, 0x00ff, 0x0010), > + .num_counters = 2, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_divergent_branch = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x19, 0x00ff, 0x0020), > + .ctr[1] = _C(0x, LOGOP, 0x19, 0x00ff, 0x0030), > + .num_counters = 2, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_gld_request = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x64, 0x00ff, 0x0030), > + .num_counters = 1, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_gred_count = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x63, 0x00ff, 0x0040), > + .num_counters = 1, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_gst_request = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x64, 0x00ff, 0x0060), > + .num_counters = 1, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_inst_executed = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x2d, 0x, 0x1000), > + .ctr[1] = _C(0x, LOGOP, 0x2d, 0x, 0x1010), > + .num_counters = 2, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > + > +static const struct nvc0_hw_sm_query_cfg > +nvc0_inst_issued = > +{ > + .ctr[0] = _C(0x, LOGOP, 0x27, 0x, 0x7060), > + .ctr[1] = _C(0x, LOGOP, 0x27, 0x, 0x7070), > + .num_counters = 2, > + .op = NVC0_COUNTER_OPn_SUM, > + .norm = { 1, 1 }, > +}; > > -static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] = > -{ > - _Q(ACTIVE_CYCLES, 1, _C(0x, LOGOP, 0x11, 0x00ff, > 0x)), > - _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x00ff, > 0x0010), > -
Re: [Mesa-dev] [PATCH 16/16] nvc0: add a note about MP counters on GF100/GF110
Series is Reviewed-by: Ilia Mirkin I had a couple of very minor comments that you can feel free to accept or ignore. On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset wrote: > MP counters on GF100/GF110 (compute capability 2.0) are buggy > because there is a context-switch problem that we need to fix. > Results might be wrong sometimes, be careful! > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > index c4b40a6..c5ce3e3 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > @@ -286,6 +286,11 @@ static const struct nvc0_hw_sm_query_cfg > nve4_hw_sm_queries[] = > #undef _M2B > > /* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */ > +/* NOTES: > + * - MP counters on GF100/GF110 (compute capability 2.0) are buggy > + * because there is a context-switch problem that we need to fix. > + * Results might be wrong sometimes, be careful! > + */ > static const char *nvc0_hw_sm_query_names[] = > { > /* MP counters */ > -- > 2.6.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/16] nvc0: fix monitoring multiple MP counters queries on Fermi
On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset wrote: > For strange reasons, the signal id depends on the slot selected on Fermi > but not on Kepler. Fortunately, the signal ids are just offseted by the > slot id! > > Signed-off-by: Samuel Pitoiset > --- > .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 147 > +++-- > 1 file changed, 79 insertions(+), 68 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > index 99e9073..6ee9fa6 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > @@ -128,7 +128,7 @@ struct nvc0_hw_sm_counter_cfg > { > uint32_t func: 16; /* mask or 4-bit logic op (depending on mode) */ > uint32_t mode: 4; /* LOGOP,B6,LOGOP_B6(_PULSE) */ > - uint32_t num_src : 3; /* number of sources (1 - 6, only for NVC0:NVE4) */ > + uint32_t src_mask; /* mask for signal selection (only for NVC0:NVE4) */ Put this after the sized fields for better packing. > uint32_t sig_dom : 1; /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */ > uint32_t sig_sel : 8; /* signal group */ > uint32_t src_sel; /* signal selection for up to 4 sources */ > @@ -280,78 +280,78 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = > 0x80001de7ULL > }; > > -#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s } > +#define _C(f, o, g, m, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, m, 0, g, s } > #define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { > \ > { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 }, > \ > } > > static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] = > { > - _Q(ACTIVE_CYCLES, 1, _C(0x, LOGOP, 0x11, 0x)), > - _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x0010), > - _C(0x, LOGOP, 0x24, 0x0021), > - _C(0x, LOGOP, 0x24, 0x0032), > - _C(0x, LOGOP, 0x24, 0x0043), > - _C(0x, LOGOP, 0x24, 0x0054), > - _C(0x, LOGOP, 0x24, 0x0065)), > - _Q(ATOM_COUNT, 1, _C(0x, LOGOP, 0x63, 0x0030)), > - _Q(BRANCH, 2, _C(0x, LOGOP, 0x1a, 0x), > - _C(0x, LOGOP, 0x1a, 0x0011)), > - _Q(DIVERGENT_BRANCH,2, _C(0x, LOGOP, 0x19, 0x0020), > - _C(0x, LOGOP, 0x19, 0x0031)), > - _Q(GLD_REQUEST, 1, _C(0x, LOGOP, 0x64, 0x0030)), > - _Q(GRED_COUNT, 1, _C(0x, LOGOP, 0x63, 0x0040)), > - _Q(GST_REQUEST, 1, _C(0x, LOGOP, 0x64, 0x0060)), > - _Q(INST_EXECUTED, 3, _C(0x, LOGOP, 0x2d, 0x), > - _C(0x, LOGOP, 0x2d, 0x0011), > - _C(0x, LOGOP, 0x2d, 0x0022)), > - _Q(INST_ISSUED1_0, 1, _C(0x, LOGOP, 0x7e, 0x0010)), > - _Q(INST_ISSUED1_1, 1, _C(0x, LOGOP, 0x7e, 0x0040)), > - _Q(INST_ISSUED2_0, 1, _C(0x, LOGOP, 0x7e, 0x0020)), > - _Q(INST_ISSUED2_1, 1, _C(0x, LOGOP, 0x7e, 0x0050)), > - _Q(LOCAL_LD,1, _C(0x, LOGOP, 0x64, 0x0020)), > - _Q(LOCAL_ST,1, _C(0x, LOGOP, 0x64, 0x0050)), > - _Q(PROF_TRIGGER_0, 1, _C(0x, LOGOP, 0x01, 0x)), > - _Q(PROF_TRIGGER_1, 1, _C(0x, LOGOP, 0x01, 0x0010)), > - _Q(PROF_TRIGGER_2, 1, _C(0x, LOGOP, 0x01, 0x0020)), > - _Q(PROF_TRIGGER_3, 1, _C(0x, LOGOP, 0x01, 0x0030)), > - _Q(PROF_TRIGGER_4, 1, _C(0x, LOGOP, 0x01, 0x0040)), > - _Q(PROF_TRIGGER_5, 1, _C(0x, LOGOP, 0x01, 0x0050)), > - _Q(PROF_TRIGGER_6, 1, _C(0x, LOGOP, 0x01, 0x0060)), > - _Q(PROF_TRIGGER_7, 1, _C(0x, LOGOP, 0x01, 0x0070)), > - _Q(SHARED_LD, 1, _C(0x, LOGOP, 0x64, 0x0010)), > - _Q(SHARED_ST, 1, _C(0x, LOGOP, 0x64, 0x0040)), > - _Q(THREADS_LAUNCHED,6, _C(0x, LOGOP, 0x26, 0x0010), > - _C(0x, LOGOP, 0x26, 0x0021), > - _C(0x, LOGOP, 0x26, 0x0032), > - _C(0x, LOGOP, 0x26, 0x0043), > - _C(0x, LOGOP, 0x26, 0x0054), > - _C(0x, LOGOP, 0x26, 0x0065)), > - _Q(TH_INST_EXECUTED_0, 6, _C(0x, LOGOP, 0xa3, 0x), > - _C(0x, LOGOP, 0xa3, 0x0011), > - _C(0x, LOGOP, 0xa3, 0x0022), > - _C(0x, LOGOP, 0xa3, 0x0033), > - _C(0x, LOGOP, 0xa3, 0x0044), > -
Re: [Mesa-dev] [PATCH 07/16] nvc0: fix queries which use multiple MP counters on Fermi
On 10/16/2015 07:32 PM, Ilia Mirkin wrote: Other than the missing * (1 << c), what was wrong with the old logic? MP counters were always configured starting from slot 0 to cfg->num_src. So, if you monitored two hardware events at the same time, the first one was overwritten by the second one. Now, I check if the slot is free before pushing the configuration through the pushbuf. On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset wrote: Queries which use more than one MP counters was misconfigured and computing the final result was also wrong because sources need to be configured on different hardware counters instead. According to the blob, computing the result is now as follows: FOR i..n val += ctr[i] * pow(2, i) Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 128 + 1 file changed, 81 insertions(+), 47 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index 2060662..99e9073 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -131,7 +131,7 @@ struct nvc0_hw_sm_counter_cfg uint32_t num_src : 3; /* number of sources (1 - 6, only for NVC0:NVE4) */ uint32_t sig_dom : 1; /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */ uint32_t sig_sel : 8; /* signal group */ - uint64_t src_sel; /* signal selection for up to 6 sources (48 bit) */ + uint32_t src_sel; /* signal selection for up to 4 sources */ }; #define NVC0_COUNTER_OPn_SUM0 @@ -280,44 +280,82 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = 0x80001de7ULL }; -#define _Q(n, f, m, g, c, s0, s1, s2, s3, s4, s5) [NVC0_HW_SM_QUERY_##n] = { { { f, NVC0_COMPUTE_MP_PM_OP_MODE_##m, c, 0, g, s0|(s1 << 8)|(s2 << 16)|(s3 << 24)|(s4##ULL << 32)|(s5##ULL << 40) }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { 1, 1 } } +#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s } +#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { \ + { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },\ +} static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] = { - _Q(ACTIVE_CYCLES, 0x, LOGOP, 0x11, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(ACTIVE_WARPS,0x, LOGOP, 0x24, 6, 0x10, 0x21, 0x32, 0x43, 0x54, 0x65), - _Q(ATOM_COUNT, 0x, LOGOP, 0x63, 1, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(BRANCH, 0x, LOGOP, 0x1a, 2, 0x00, 0x11, 0x00, 0x00, 0x00, 0x00), - _Q(DIVERGENT_BRANCH,0x, LOGOP, 0x19, 2, 0x20, 0x31, 0x00, 0x00, 0x00, 0x00), - _Q(GLD_REQUEST, 0x, LOGOP, 0x64, 1, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(GRED_COUNT, 0x, LOGOP, 0x63, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(GST_REQUEST, 0x, LOGOP, 0x64, 1, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(INST_EXECUTED, 0x, LOGOP, 0x2d, 3, 0x00, 0x11, 0x22, 0x00, 0x00, 0x00), - _Q(INST_ISSUED1_0, 0x, LOGOP, 0x7e, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(INST_ISSUED1_1, 0x, LOGOP, 0x7e, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(INST_ISSUED2_0, 0x, LOGOP, 0x7e, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(INST_ISSUED2_1, 0x, LOGOP, 0x7e, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(LOCAL_LD,0x, LOGOP, 0x64, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(LOCAL_ST,0x, LOGOP, 0x64, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_0, 0x, LOGOP, 0x01, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_1, 0x, LOGOP, 0x01, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_2, 0x, LOGOP, 0x01, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_3, 0x, LOGOP, 0x01, 1, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_4, 0x, LOGOP, 0x01, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_5, 0x, LOGOP, 0x01, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_6, 0x, LOGOP, 0x01, 1, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_7, 0x, LOGOP, 0x01, 1, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(SHARED_LD, 0x, LOGOP, 0x64, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(SHARED_ST, 0x, LOGOP, 0x64, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(THREADS_LAUNCHED,0x, LOGOP, 0x26, 6, 0x10, 0x21, 0x32, 0x43, 0x54, 0x65), - _Q(TH_INST_EXECUTED_0, 0x, LOGOP, 0xa3, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55), - _Q(TH_INST_EXECUTED_1, 0x, LOGOP, 0xa5, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55), - _Q(TH_INST_EXECUTED_2, 0x, LOGOP, 0xa4, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55), - _Q(TH_INST_EXECUTED_3, 0x, LOGOP, 0xa6, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x5
Re: [Mesa-dev] [PATCH 07/16] nvc0: fix queries which use multiple MP counters on Fermi
Other than the missing * (1 << c), what was wrong with the old logic? On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset wrote: > Queries which use more than one MP counters was misconfigured and > computing the final result was also wrong because sources need to > be configured on different hardware counters instead. > > According to the blob, computing the result is now as follows: > > FOR i..n > val += ctr[i] * pow(2, i) > > Signed-off-by: Samuel Pitoiset > --- > .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 128 > + > 1 file changed, 81 insertions(+), 47 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > index 2060662..99e9073 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c > @@ -131,7 +131,7 @@ struct nvc0_hw_sm_counter_cfg > uint32_t num_src : 3; /* number of sources (1 - 6, only for NVC0:NVE4) */ > uint32_t sig_dom : 1; /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */ > uint32_t sig_sel : 8; /* signal group */ > - uint64_t src_sel; /* signal selection for up to 6 sources (48 bit) */ > + uint32_t src_sel; /* signal selection for up to 4 sources */ > }; > > #define NVC0_COUNTER_OPn_SUM0 > @@ -280,44 +280,82 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = > 0x80001de7ULL > }; > > -#define _Q(n, f, m, g, c, s0, s1, s2, s3, s4, s5) [NVC0_HW_SM_QUERY_##n] = { > { { f, NVC0_COMPUTE_MP_PM_OP_MODE_##m, c, 0, g, s0|(s1 << 8)|(s2 << 16)|(s3 > << 24)|(s4##ULL << 32)|(s5##ULL << 40) }, {}, {}, {} }, 1, > NVC0_COUNTER_OPn_SUM, { 1, 1 } } > +#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s } > +#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { > \ > + { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 }, > \ > +} > > static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] = > { > - _Q(ACTIVE_CYCLES, 0x, LOGOP, 0x11, 1, 0x00, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(ACTIVE_WARPS,0x, LOGOP, 0x24, 6, 0x10, 0x21, 0x32, 0x43, > 0x54, 0x65), > - _Q(ATOM_COUNT, 0x, LOGOP, 0x63, 1, 0x30, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(BRANCH, 0x, LOGOP, 0x1a, 2, 0x00, 0x11, 0x00, 0x00, > 0x00, 0x00), > - _Q(DIVERGENT_BRANCH,0x, LOGOP, 0x19, 2, 0x20, 0x31, 0x00, 0x00, > 0x00, 0x00), > - _Q(GLD_REQUEST, 0x, LOGOP, 0x64, 1, 0x30, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(GRED_COUNT, 0x, LOGOP, 0x63, 1, 0x40, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(GST_REQUEST, 0x, LOGOP, 0x64, 1, 0x60, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(INST_EXECUTED, 0x, LOGOP, 0x2d, 3, 0x00, 0x11, 0x22, 0x00, > 0x00, 0x00), > - _Q(INST_ISSUED1_0, 0x, LOGOP, 0x7e, 1, 0x10, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(INST_ISSUED1_1, 0x, LOGOP, 0x7e, 1, 0x40, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(INST_ISSUED2_0, 0x, LOGOP, 0x7e, 1, 0x20, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(INST_ISSUED2_1, 0x, LOGOP, 0x7e, 1, 0x50, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(LOCAL_LD,0x, LOGOP, 0x64, 1, 0x20, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(LOCAL_ST,0x, LOGOP, 0x64, 1, 0x50, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(PROF_TRIGGER_0, 0x, LOGOP, 0x01, 1, 0x00, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(PROF_TRIGGER_1, 0x, LOGOP, 0x01, 1, 0x10, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(PROF_TRIGGER_2, 0x, LOGOP, 0x01, 1, 0x20, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(PROF_TRIGGER_3, 0x, LOGOP, 0x01, 1, 0x30, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(PROF_TRIGGER_4, 0x, LOGOP, 0x01, 1, 0x40, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(PROF_TRIGGER_5, 0x, LOGOP, 0x01, 1, 0x50, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(PROF_TRIGGER_6, 0x, LOGOP, 0x01, 1, 0x60, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(PROF_TRIGGER_7, 0x, LOGOP, 0x01, 1, 0x70, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(SHARED_LD, 0x, LOGOP, 0x64, 1, 0x10, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(SHARED_ST, 0x, LOGOP, 0x64, 1, 0x40, 0x00, 0x00, 0x00, > 0x00, 0x00), > - _Q(THREADS_LAUNCHED,0x, LOGOP, 0x26, 6, 0x10, 0x21, 0x32, 0x43, > 0x54, 0x65), > - _Q(TH_INST_EXECUTED_0, 0x, LOGOP, 0xa3, 6, 0x00, 0x11, 0x22, 0x33, > 0x44, 0x55), > - _Q(TH_INST_EXECUTED_1, 0x, LOGOP, 0xa5, 6, 0x00, 0x11, 0x22, 0x33, > 0x44, 0x55), > - _Q(TH_INST_EXECUTED_2, 0x, LOGOP, 0xa4, 6, 0x00, 0x11, 0x22, 0x33, > 0x44, 0x55), > - _Q(TH_INST_EXECUTED_3, 0x, LOGOP, 0xa6, 6, 0x00, 0x11, 0x22, 0x33, > 0x44, 0x55), > - _Q(WARPS_LAUNCHED, 0x, LOGOP, 0x26, 1, 0x00, 0x00, 0x00, 0x00, > 0x00, 0x00), > + _Q(ACTIVE_CYCLES,
Re: [Mesa-dev] [PATCH 10/16] nvc0: store the number of GPCs to nvc0_screen
On 10/16/2015 07:24 PM, Ilia Mirkin wrote: On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset wrote: NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total number of TPCs and the number of ROP units. Note that when the DRM version is too old the default number of GPCs is fixed to 4. This will be used to launch the compute kernel which is used to read MP performance counters over all GPCs. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 + 2 files changed, 2 insertions(+) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index afd91e6..ba53d10 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -914,6 +914,7 @@ nvc0_screen_create(struct nouveau_device *dev) else value = (16 << 8) | 4; } + screen->gpc_count = value & 0x; gpc_count is a u8... I guess you wanted to make this &0xff? Doesn't *really* matter in practice, but a bit confusing. Yes, good catch. screen->mp_count = value >> 8; screen->mp_count_compute = screen->mp_count; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h index 8cf7560..857eb03 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h @@ -67,6 +67,7 @@ struct nvc0_screen { struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */ struct nouveau_bo *poly_cache; + uint8_t gpc_count; uint16_t mp_count; uint16_t mp_count_compute; /* magic reg can make compute use fewer MPs */ -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/16] nvc0: store the number of GPCs to nvc0_screen
On Fri, Oct 16, 2015 at 1:22 PM, Samuel Pitoiset wrote: > NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total > number of TPCs and the number of ROP units. Note that when the DRM > version is too old the default number of GPCs is fixed to 4. > > This will be used to launch the compute kernel which is used to read MP > performance counters over all GPCs. > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + > src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > index afd91e6..ba53d10 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > @@ -914,6 +914,7 @@ nvc0_screen_create(struct nouveau_device *dev) >else > value = (16 << 8) | 4; > } > + screen->gpc_count = value & 0x; gpc_count is a u8... I guess you wanted to make this &0xff? Doesn't *really* matter in practice, but a bit confusing. > screen->mp_count = value >> 8; > screen->mp_count_compute = screen->mp_count; > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h > b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h > index 8cf7560..857eb03 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h > @@ -67,6 +67,7 @@ struct nvc0_screen { > struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */ > struct nouveau_bo *poly_cache; > > + uint8_t gpc_count; > uint16_t mp_count; > uint16_t mp_count_compute; /* magic reg can make compute use fewer MPs */ > > -- > 2.6.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/16] nvc0: move SW/HW queries info to their respective files
This will help for handling HW SM queries variants on Fermi. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query.c | 185 + src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c | 14 ++ src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.h | 3 + .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 133 +++ .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h| 4 +- src/gallium/drivers/nouveau/nvc0/nvc0_query_sw.c | 64 +++ src/gallium/drivers/nouveau/nvc0/nvc0_query_sw.h | 3 + 7 files changed, 228 insertions(+), 178 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c index 80f311b..e4752e2 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c @@ -141,163 +141,19 @@ nvc0_render_condition(struct pipe_context *pipe, PUSH_DATA (push, hq->bo->offset + hq->offset); } -/* === DRIVER STATISTICS === */ - -#ifdef NOUVEAU_ENABLE_DRIVER_STATISTICS - -static const char *nvc0_sw_query_drv_stat_names[] = -{ - "drv-tex_obj_current_count", - "drv-tex_obj_current_bytes", - "drv-buf_obj_current_count", - "drv-buf_obj_current_bytes_vid", - "drv-buf_obj_current_bytes_sys", - "drv-tex_transfers_rd", - "drv-tex_transfers_wr", - "drv-tex_copy_count", - "drv-tex_blit_count", - "drv-tex_cache_flush_count", - "drv-buf_transfers_rd", - "drv-buf_transfers_wr", - "drv-buf_read_bytes_staging_vid", - "drv-buf_write_bytes_direct", - "drv-buf_write_bytes_staging_vid", - "drv-buf_write_bytes_staging_sys", - "drv-buf_copy_bytes", - "drv-buf_non_kernel_fence_sync_count", - "drv-any_non_kernel_fence_sync_count", - "drv-query_sync_count", - "drv-gpu_serialize_count", - "drv-draw_calls_array", - "drv-draw_calls_indexed", - "drv-draw_calls_fallback_count", - "drv-user_buffer_upload_bytes", - "drv-constbuf_upload_count", - "drv-constbuf_upload_bytes", - "drv-pushbuf_count", - "drv-resource_validate_count" -}; - -#endif /* NOUVEAU_ENABLE_DRIVER_STATISTICS */ - -/* === PERFORMANCE MONITORING COUNTERS for NVE4+ === */ - -/* NOTE: intentionally using the same names as NV */ -static const char *nve4_hw_sm_query_names[] = -{ - /* MP counters */ - "active_cycles", - "active_warps", - "atom_count", - "branch", - "divergent_branch", - "gld_request", - "global_ld_mem_divergence_replays", - "global_store_transaction", - "global_st_mem_divergence_replays", - "gred_count", - "gst_request", - "inst_executed", - "inst_issued", - "inst_issued1", - "inst_issued2", - "l1_global_load_hit", - "l1_global_load_miss", - "l1_local_load_hit", - "l1_local_load_miss", - "l1_local_store_hit", - "l1_local_store_miss", - "l1_shared_load_transactions", - "l1_shared_store_transactions", - "local_load", - "local_load_transactions", - "local_store", - "local_store_transactions", - "prof_trigger_00", - "prof_trigger_01", - "prof_trigger_02", - "prof_trigger_03", - "prof_trigger_04", - "prof_trigger_05", - "prof_trigger_06", - "prof_trigger_07", - "shared_load", - "shared_load_replay", - "shared_store", - "shared_store_replay", - "sm_cta_launched", - "threads_launched", - "uncached_global_load_transaction", - "warps_launched", - /* metrics, i.e. functions of the MP counters */ - "metric-ipc", /* inst_executed, clock */ - "metric-ipac", /* inst_executed, active_cycles */ - "metric-ipec", /* inst_executed, (bool)inst_executed */ - "metric-achieved_occupancy",/* active_warps, active_cycles */ - "metric-sm_efficiency", /* active_cycles, clock */ - "metric-inst_replay_overhead" /* inst_issued, inst_executed */ -}; - -/* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */ -static const char *nvc0_hw_sm_query_names[] = -{ - /* MP counters */ - "active_cycles", - "active_warps", - "atom_count", - "branch", - "divergent_branch", - "gld_request", - "gred_count", - "gst_request", - "inst_executed", - "inst_issued1_0", - "inst_issued1_1", - "inst_issued2_0", - "inst_issued2_1", - "local_load", - "local_store", - "prof_trigger_00", - "prof_trigger_01", - "prof_trigger_02", - "prof_trigger_03", - "prof_trigger_04", - "prof_trigger_05", - "prof_trigger_06", - "prof_trigger_07", - "shared_load", - "shared_store", - "threads_launched", - "thread_inst_executed_0", - "thread_inst_executed_1", - "thread_inst_executed_2", - "thread_inst_executed_3", - "warps_launched", -}; - int nvc0_screen_get_driver_query_info(struct pipe_screen *pscreen, unsigned id, struct pipe_driver_query_info *info) { struct nvc0_screen *screen = nvc0_screen(pscreen); - int count = 0; - - count += NVC0_SW_QUERY_DRV_STAT_COUNT; + int num_sw_queries =
[Mesa-dev] [PATCH 09/16] nvc0: fix unaligned mem access when reading MP counters on Fermi
Memory access have to be aligned to 128-bits. Note that this doesn't happen when the card only has TPC. This patch fixes the following dmesg fail: gr: GPC0/TPC1/MP trap: global 0004 [MULTIPLE_WARP_ERRORS] warp 000f [UNALIGNED_MEM_ACCESS] Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index 6ee9fa6..840f200 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -247,7 +247,7 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = * mov b32 $r11 c0[0x4] * ext u32 $r8 $r9 0x414 * (not $p0) exit -* mul $r8 u32 $r8 u32 36 +* mul $r8 u32 $r8 u32 48 * add b32 $r10 $c $r10 $r8 * add b32 $r11 $r11 0x0 $c * mov b32 $r8 c0[0x8] @@ -270,7 +270,7 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = 0x280040001002dde4ULL, 0x7000c01050921c03ULL, 0x800021e7ULL, - 0x100090821c02ULL, + 0x1000c0821c02ULL, 0x480120a29c03ULL, 0x08b2dc42ULL, 0x2800400020021de4ULL, @@ -473,7 +473,7 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) /* set sequence field to 0 (used to check if result is available) */ for (i = 0; i < screen->mp_count; ++i) { - const unsigned b = (0x24 / 4) * i; + const unsigned b = (0x30 / 4) * i; hq->data[b + 8] = 0; } hq->sequence++; @@ -617,7 +617,7 @@ nvc0_hw_sm_query_read_data(uint32_t count[32][8], unsigned p, c; for (p = 0; p < mp_count; ++p) { - const unsigned b = (0x24 / 4) * p; + const unsigned b = (0x30 / 4) * p; for (c = 0; c < cfg->num_counters; ++c) { if (hq->data[b + 8] != hq->sequence) { @@ -815,7 +815,10 @@ nvc0_hw_sm_create_query(struct nvc0_context *nvc0, unsigned type) */ space = (4 * 4 + 4 + 4) * nvc0->screen->mp_count * sizeof(uint32_t); } else { - /* for each MP: + /* + * Note that padding is used to align memory access to 128 bits. + * + * for each MP: * [00] = MP.C0 * [04] = MP.C1 * [08] = MP.C2 @@ -825,8 +828,11 @@ nvc0_hw_sm_create_query(struct nvc0_context *nvc0, unsigned type) * [18] = MP.C6 * [1c] = MP.C7 * [20] = MP.sequence + * [24] = padding + * [28] = padding + * [2c] = padding */ - space = (8 + 1) * nvc0->screen->mp_count * sizeof(uint32_t); + space = (8 + 1 + 3) * nvc0->screen->mp_count * sizeof(uint32_t); } if (!nvc0_hw_query_allocate(nvc0, &hq->base, space)) { -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/16] nvc0: store the number of GPCs to nvc0_screen
NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total number of TPCs and the number of ROP units. Note that when the DRM version is too old the default number of GPCs is fixed to 4. This will be used to launch the compute kernel which is used to read MP performance counters over all GPCs. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 + 2 files changed, 2 insertions(+) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index afd91e6..ba53d10 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -914,6 +914,7 @@ nvc0_screen_create(struct nouveau_device *dev) else value = (16 << 8) | 4; } + screen->gpc_count = value & 0x; screen->mp_count = value >> 8; screen->mp_count_compute = screen->mp_count; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h index 8cf7560..857eb03 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h @@ -67,6 +67,7 @@ struct nvc0_screen { struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */ struct nouveau_bo *poly_cache; + uint8_t gpc_count; uint16_t mp_count; uint16_t mp_count_compute; /* magic reg can make compute use fewer MPs */ -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/16] nvc0: fix monitoring multiple MP counters queries on Fermi
For strange reasons, the signal id depends on the slot selected on Fermi but not on Kepler. Fortunately, the signal ids are just offseted by the slot id! Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 147 +++-- 1 file changed, 79 insertions(+), 68 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index 99e9073..6ee9fa6 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -128,7 +128,7 @@ struct nvc0_hw_sm_counter_cfg { uint32_t func: 16; /* mask or 4-bit logic op (depending on mode) */ uint32_t mode: 4; /* LOGOP,B6,LOGOP_B6(_PULSE) */ - uint32_t num_src : 3; /* number of sources (1 - 6, only for NVC0:NVE4) */ + uint32_t src_mask; /* mask for signal selection (only for NVC0:NVE4) */ uint32_t sig_dom : 1; /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */ uint32_t sig_sel : 8; /* signal group */ uint32_t src_sel; /* signal selection for up to 4 sources */ @@ -280,78 +280,78 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = 0x80001de7ULL }; -#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s } +#define _C(f, o, g, m, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, m, 0, g, s } #define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { \ { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },\ } static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] = { - _Q(ACTIVE_CYCLES, 1, _C(0x, LOGOP, 0x11, 0x)), - _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x0010), - _C(0x, LOGOP, 0x24, 0x0021), - _C(0x, LOGOP, 0x24, 0x0032), - _C(0x, LOGOP, 0x24, 0x0043), - _C(0x, LOGOP, 0x24, 0x0054), - _C(0x, LOGOP, 0x24, 0x0065)), - _Q(ATOM_COUNT, 1, _C(0x, LOGOP, 0x63, 0x0030)), - _Q(BRANCH, 2, _C(0x, LOGOP, 0x1a, 0x), - _C(0x, LOGOP, 0x1a, 0x0011)), - _Q(DIVERGENT_BRANCH,2, _C(0x, LOGOP, 0x19, 0x0020), - _C(0x, LOGOP, 0x19, 0x0031)), - _Q(GLD_REQUEST, 1, _C(0x, LOGOP, 0x64, 0x0030)), - _Q(GRED_COUNT, 1, _C(0x, LOGOP, 0x63, 0x0040)), - _Q(GST_REQUEST, 1, _C(0x, LOGOP, 0x64, 0x0060)), - _Q(INST_EXECUTED, 3, _C(0x, LOGOP, 0x2d, 0x), - _C(0x, LOGOP, 0x2d, 0x0011), - _C(0x, LOGOP, 0x2d, 0x0022)), - _Q(INST_ISSUED1_0, 1, _C(0x, LOGOP, 0x7e, 0x0010)), - _Q(INST_ISSUED1_1, 1, _C(0x, LOGOP, 0x7e, 0x0040)), - _Q(INST_ISSUED2_0, 1, _C(0x, LOGOP, 0x7e, 0x0020)), - _Q(INST_ISSUED2_1, 1, _C(0x, LOGOP, 0x7e, 0x0050)), - _Q(LOCAL_LD,1, _C(0x, LOGOP, 0x64, 0x0020)), - _Q(LOCAL_ST,1, _C(0x, LOGOP, 0x64, 0x0050)), - _Q(PROF_TRIGGER_0, 1, _C(0x, LOGOP, 0x01, 0x)), - _Q(PROF_TRIGGER_1, 1, _C(0x, LOGOP, 0x01, 0x0010)), - _Q(PROF_TRIGGER_2, 1, _C(0x, LOGOP, 0x01, 0x0020)), - _Q(PROF_TRIGGER_3, 1, _C(0x, LOGOP, 0x01, 0x0030)), - _Q(PROF_TRIGGER_4, 1, _C(0x, LOGOP, 0x01, 0x0040)), - _Q(PROF_TRIGGER_5, 1, _C(0x, LOGOP, 0x01, 0x0050)), - _Q(PROF_TRIGGER_6, 1, _C(0x, LOGOP, 0x01, 0x0060)), - _Q(PROF_TRIGGER_7, 1, _C(0x, LOGOP, 0x01, 0x0070)), - _Q(SHARED_LD, 1, _C(0x, LOGOP, 0x64, 0x0010)), - _Q(SHARED_ST, 1, _C(0x, LOGOP, 0x64, 0x0040)), - _Q(THREADS_LAUNCHED,6, _C(0x, LOGOP, 0x26, 0x0010), - _C(0x, LOGOP, 0x26, 0x0021), - _C(0x, LOGOP, 0x26, 0x0032), - _C(0x, LOGOP, 0x26, 0x0043), - _C(0x, LOGOP, 0x26, 0x0054), - _C(0x, LOGOP, 0x26, 0x0065)), - _Q(TH_INST_EXECUTED_0, 6, _C(0x, LOGOP, 0xa3, 0x), - _C(0x, LOGOP, 0xa3, 0x0011), - _C(0x, LOGOP, 0xa3, 0x0022), - _C(0x, LOGOP, 0xa3, 0x0033), - _C(0x, LOGOP, 0xa3, 0x0044), - _C(0x, LOGOP, 0xa3, 0x0055)), - _Q(TH_INST_EXECUTED_1, 6, _C(0x, LOGOP, 0xa5, 0x), - _C(0x, LOGOP, 0xa5, 0x0011), - _C(0x, LOGOP, 0xa5, 0x0022), -
[Mesa-dev] [PATCH 13/16] nvc0: enable compute support by default on Fermi
Compute support was not enabled by default because weird effects on 3D state happened, but I can't reproduce them anymore. This also enables MP performance counters by default on Fermi. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query.c | 3 +-- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 7 +-- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c index c81b85a..80f311b 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c @@ -291,7 +291,6 @@ nvc0_screen_get_driver_query_info(struct pipe_screen *pscreen, count += NVE4_HW_SM_QUERY_COUNT; } else if (screen->base.class_3d < NVE4_3D_CLASS) { -/* NVC0_COMPUTE is not always enabled */ count += NVC0_HW_SM_QUERY_COUNT; } } @@ -358,7 +357,7 @@ nvc0_screen_get_driver_query_group_info(struct pipe_screen *pscreen, count++; } else if (screen->base.class_3d < NVE4_3D_CLASS) { -count++; /* NVC0_COMPUTE is not always enabled */ +count++; } } } diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index ba53d10..d6a4ac5 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -561,12 +561,7 @@ nvc0_screen_init_compute(struct nvc0_screen *screen) switch (screen->base.device->chipset & ~0xf) { case 0xc0: case 0xd0: - /* Using COMPUTE has weird effects on 3D state, we need to - * investigate this further before enabling it by default. - */ - if (debug_get_bool_option("NVC0_COMPUTE", false)) - return nvc0_screen_compute_setup(screen, screen->base.pushbuf); - return 0; + return nvc0_screen_compute_setup(screen, screen->base.pushbuf); case 0xe0: return nve4_screen_compute_setup(screen, screen->base.pushbuf); case 0xf0: -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/16] nvc0: allow only one active query for the MP counters group
Because we can't expose the number of hardware counters needed for each different query, we don't want to allow more than one active query simultaneously to avoid failure when the maximum number of counters is reached. Note that these groups of GPU counters are currently only used by AMD_performance_monitor. Like for Kepler, this limits the maximum number of active queries to 1 on Fermi. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c index f8d4ba1..c81b85a 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query.c @@ -371,22 +371,20 @@ nvc0_screen_get_driver_query_group_info(struct pipe_screen *pscreen, info->name = "MP counters"; info->type = PIPE_DRIVER_QUERY_GROUP_TYPE_GPU; + /* Because we can't expose the number of hardware counters needed for + * each different query, we don't want to allow more than one active + * query simultaneously to avoid failure when the maximum number of + * counters is reached. Note that these groups of GPU counters are + * currently only used by AMD_performance_monitor. + */ + info->max_active_queries = 1; + if (screen->base.class_3d == NVE4_3D_CLASS) { info->num_queries = NVE4_HW_SM_QUERY_COUNT; - - /* On NVE4+, each multiprocessor have 8 hardware counters separated - * in two distinct domains, but we allow only one active query - * simultaneously because some of them use more than one hardware - * counter and this will result in an undefined behaviour. */ - info->max_active_queries = 1; /* TODO: handle multiple hw counters */ - return 1; +return 1; } else if (screen->base.class_3d < NVE4_3D_CLASS) { info->num_queries = NVC0_HW_SM_QUERY_COUNT; - -/* On NVC0:NVE4, each multiprocessor have 8 hardware counters - * in a single domain. */ -info->max_active_queries = 8; return 1; } } -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 16/16] nvc0: add a note about MP counters on GF100/GF110
MP counters on GF100/GF110 (compute capability 2.0) are buggy because there is a context-switch problem that we need to fix. Results might be wrong sometimes, be careful! Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index c4b40a6..c5ce3e3 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -286,6 +286,11 @@ static const struct nvc0_hw_sm_query_cfg nve4_hw_sm_queries[] = #undef _M2B /* === PERFORMANCE MONITORING COUNTERS for NVC0:NVE4 === */ +/* NOTES: + * - MP counters on GF100/GF110 (compute capability 2.0) are buggy + * because there is a context-switch problem that we need to fix. + * Results might be wrong sometimes, be careful! + */ static const char *nvc0_hw_sm_query_names[] = { /* MP counters */ -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/16] nvc0: add MP counters variants for GF100/GF110
GF100 and GF110 chipsets are compute capability 2.0, while the other Fermi chipsets are compute capability 2.1. That's why, some MP counters are different between these chipsets and we need to handle variants. Signed-off-by: Samuel Pitoiet --- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 559 ++--- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h| 1 + 2 files changed, 483 insertions(+), 77 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index 174beef..c4b40a6 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -298,6 +298,7 @@ static const char *nvc0_hw_sm_query_names[] = "gred_count", "gst_request", "inst_executed", + "inst_issued", "inst_issued1_0", "inst_issued1_1", "inst_issued2_0", @@ -373,82 +374,456 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = }; #define _C(f, o, g, m, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, m, 0, g, s } -#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { \ - { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },\ -} +#define _Q(n, c) [NVC0_HW_SM_QUERY_##n] = c + +/* Compute capability 2.0 (GF100/GF110) */ +static const struct nvc0_hw_sm_query_cfg +nvc0_active_cycles = +{ + .ctr[0] = _C(0x, LOGOP, 0x11, 0x00ff, 0x), + .num_counters = 1, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_active_warps = +{ + .ctr[0] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0010), + .ctr[1] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0020), + .ctr[2] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0030), + .ctr[3] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0040), + .ctr[4] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0050), + .ctr[5] = _C(0x, LOGOP, 0x24, 0x00ff, 0x0060), + .num_counters = 6, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_atom_count = +{ + .ctr[0] = _C(0x, LOGOP, 0x63, 0x00ff, 0x0030), + .num_counters = 1, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_branch = +{ + .ctr[0] = _C(0x, LOGOP, 0x1a, 0x00ff, 0x), + .ctr[1] = _C(0x, LOGOP, 0x1a, 0x00ff, 0x0010), + .num_counters = 2, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_divergent_branch = +{ + .ctr[0] = _C(0x, LOGOP, 0x19, 0x00ff, 0x0020), + .ctr[1] = _C(0x, LOGOP, 0x19, 0x00ff, 0x0030), + .num_counters = 2, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_gld_request = +{ + .ctr[0] = _C(0x, LOGOP, 0x64, 0x00ff, 0x0030), + .num_counters = 1, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_gred_count = +{ + .ctr[0] = _C(0x, LOGOP, 0x63, 0x00ff, 0x0040), + .num_counters = 1, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_gst_request = +{ + .ctr[0] = _C(0x, LOGOP, 0x64, 0x00ff, 0x0060), + .num_counters = 1, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_inst_executed = +{ + .ctr[0] = _C(0x, LOGOP, 0x2d, 0x, 0x1000), + .ctr[1] = _C(0x, LOGOP, 0x2d, 0x, 0x1010), + .num_counters = 2, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; + +static const struct nvc0_hw_sm_query_cfg +nvc0_inst_issued = +{ + .ctr[0] = _C(0x, LOGOP, 0x27, 0x, 0x7060), + .ctr[1] = _C(0x, LOGOP, 0x27, 0x, 0x7070), + .num_counters = 2, + .op = NVC0_COUNTER_OPn_SUM, + .norm = { 1, 1 }, +}; -static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] = -{ - _Q(ACTIVE_CYCLES, 1, _C(0x, LOGOP, 0x11, 0x00ff, 0x)), - _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x00ff, 0x0010), - _C(0x, LOGOP, 0x24, 0x00ff, 0x0020), - _C(0x, LOGOP, 0x24, 0x00ff, 0x0030), - _C(0x, LOGOP, 0x24, 0x00ff, 0x0040), - _C(0x, LOGOP, 0x24, 0x00ff, 0x0050), - _C(0x, LOGOP, 0x24, 0x00ff, 0x0060)), - _Q(ATOM_COUNT, 1, _C(0x, LOGOP, 0x6
[Mesa-dev] [PATCH 00/16] nvc0: improve MP counters support on Fermi
Hello, This series fixes some issues related to MP performance counters on Fermi. MP counters for GF100/GF110 have also been improved because they are compute capability 2.0 while the other Fermi chipsets are 2.1 and some HW events are different. Compute support is now enabled by default on Fermi because I can't reproduce those weird effects on 3D state. This has been probably fixed as a side effect. Anyway, if someone complain about it, I'll be glad to fix it. This series has been tested with the following apps: - xonotic-glx - heaven - valley - glxgears, glxspheres64 and so on ... And with the following Fermi chipsets: - GF100 - GF108 - GF110 - GF114 - GF116 - GF119 Note that with GF100/GF110, some MP counters are not correctly context-switched and results might be wrong. This is a known issue that we need to fix on the Nouveau side. There are no regressions with piglit. I'll submit an other series in the next few days which adds some performance monitoring metrics on Fermi and I'll double check MP counters on Kepler. Thanks. Samuel Pitoiset (16): nvc0: remove useless call to query_get_cfg() in nvc0_hw_sm_query_end() nvc0: split out begin_query() hook used by MP counters nvc0: rip off the kepler MP-enabling logic from the Fermi codepath nvc0: correctly enable the MP counters' multiplexer on Fermi nvc0: fix sequence field init for MP counters on Fermi nvc0: allow to use 8 MP counters on Fermi nvc0: fix queries which use multiple MP counters on Fermi nvc0: fix monitoring multiple MP counters queries on Fermi nvc0: fix unaligned mem access when reading MP counters on Fermi nvc0: store the number of GPCs to nvc0_screen nvc0: read MP counters of all GPCs on Fermi nvc0: allow only one active query for the MP counters group nvc0: enable compute support by default on Fermi nvc0: move SW/HW queries info to their respective files nvc0: add MP counters variants for GF100/GF110 nvc0: add a note about MP counters on GF100/GF110 src/gallium/drivers/nouveau/nvc0/nvc0_query.c | 208 +- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c | 14 + src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.h | 3 + .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 809 ++--- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h| 7 +- src/gallium/drivers/nouveau/nvc0/nvc0_query_sw.c | 64 ++ src/gallium/drivers/nouveau/nvc0/nvc0_query_sw.h | 3 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 8 +- src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 1 + 9 files changed, 834 insertions(+), 283 deletions(-) -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/16] nvc0: read MP counters of all GPCs on Fermi
When a card has more than one GPC, the grid used by the compute kernel which reads MP performance counters seems to be too small. The consequence is that the kernel is not launched on all TPCs. Increasing the grid size using the number of GPCs now launches enough blocks and we can read MP performance counters of all TPCs. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index 840f200..c22ad4b 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -528,7 +528,7 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) uint32_t mask; uint32_t input[3]; const uint block[3] = { 32, is_nve4 ? 4 : 1, 1 }; - const uint grid[3] = { screen->mp_count, 1, 1 }; + const uint grid[3] = { screen->mp_count, screen->gpc_count, 1 }; unsigned c; if (unlikely(!screen->pm.prog)) { -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/16] nvc0: fix sequence field init for MP counters on Fermi
Sequence fields are located at MP[i] + 0x20 in the buffer object. This is used to check if result is available for MP[i]. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index f7b49da..b810d25 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -439,8 +439,10 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) PUSH_SPACE(push, 4 * 8 * 6 + 4); /* set sequence field to 0 (used to check if result is available) */ - for (i = 0; i < screen->mp_count; ++i) - hq->data[i * 10 + 10] = 0; + for (i = 0; i < screen->mp_count; ++i) { + const unsigned b = (0x24 / 4) * i; + hq->data[b + 8] = 0; + } hq->sequence++; for (i = 0; i < cfg->num_counters; ++i) { -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/16] nvc0: correctly enable the MP counters' multiplexer on Fermi
Writing 0x408000 to 0x419e00 (like on Kepler) has no effect on Fermi because we only have one domain of 8 counters. Instead, we have to write 0x8000. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index a74bfee..f7b49da 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -448,11 +448,8 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) unsigned s; if (!screen->pm.num_hw_sm_active[d]) { - uint32_t m = (1 << 22) | (1 << (7 + (8 * !d))); - if (screen->pm.num_hw_sm_active[!d]) -m |= 1 << (7 + (8 * d)); BEGIN_NVC0(push, SUBC_SW(0x0600), 1); - PUSH_DATA (push, m); + PUSH_DATA (push, 0x8000); } screen->pm.num_hw_sm_active[d]++; -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/16] nvc0: fix queries which use multiple MP counters on Fermi
Queries which use more than one MP counters was misconfigured and computing the final result was also wrong because sources need to be configured on different hardware counters instead. According to the blob, computing the result is now as follows: FOR i..n val += ctr[i] * pow(2, i) Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 128 + 1 file changed, 81 insertions(+), 47 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index 2060662..99e9073 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -131,7 +131,7 @@ struct nvc0_hw_sm_counter_cfg uint32_t num_src : 3; /* number of sources (1 - 6, only for NVC0:NVE4) */ uint32_t sig_dom : 1; /* if 0, MP_PM_A (per warp-sched), if 1, MP_PM_B */ uint32_t sig_sel : 8; /* signal group */ - uint64_t src_sel; /* signal selection for up to 6 sources (48 bit) */ + uint32_t src_sel; /* signal selection for up to 4 sources */ }; #define NVC0_COUNTER_OPn_SUM0 @@ -280,44 +280,82 @@ static const uint64_t nvc0_read_hw_sm_counters_code[] = 0x80001de7ULL }; -#define _Q(n, f, m, g, c, s0, s1, s2, s3, s4, s5) [NVC0_HW_SM_QUERY_##n] = { { { f, NVC0_COMPUTE_MP_PM_OP_MODE_##m, c, 0, g, s0|(s1 << 8)|(s2 << 16)|(s3 << 24)|(s4##ULL << 32)|(s5##ULL << 40) }, {}, {}, {} }, 1, NVC0_COUNTER_OPn_SUM, { 1, 1 } } +#define _C(f, o, g, s) { f, NVC0_COMPUTE_MP_PM_OP_MODE_##o, 0, 0, g, s } +#define _Q(n, c, ...) [NVC0_HW_SM_QUERY_##n] = { \ + { __VA_ARGS__ }, c, NVC0_COUNTER_OPn_SUM, { 1, 1 },\ +} static const struct nvc0_hw_sm_query_cfg nvc0_hw_sm_queries[] = { - _Q(ACTIVE_CYCLES, 0x, LOGOP, 0x11, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(ACTIVE_WARPS,0x, LOGOP, 0x24, 6, 0x10, 0x21, 0x32, 0x43, 0x54, 0x65), - _Q(ATOM_COUNT, 0x, LOGOP, 0x63, 1, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(BRANCH, 0x, LOGOP, 0x1a, 2, 0x00, 0x11, 0x00, 0x00, 0x00, 0x00), - _Q(DIVERGENT_BRANCH,0x, LOGOP, 0x19, 2, 0x20, 0x31, 0x00, 0x00, 0x00, 0x00), - _Q(GLD_REQUEST, 0x, LOGOP, 0x64, 1, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(GRED_COUNT, 0x, LOGOP, 0x63, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(GST_REQUEST, 0x, LOGOP, 0x64, 1, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(INST_EXECUTED, 0x, LOGOP, 0x2d, 3, 0x00, 0x11, 0x22, 0x00, 0x00, 0x00), - _Q(INST_ISSUED1_0, 0x, LOGOP, 0x7e, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(INST_ISSUED1_1, 0x, LOGOP, 0x7e, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(INST_ISSUED2_0, 0x, LOGOP, 0x7e, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(INST_ISSUED2_1, 0x, LOGOP, 0x7e, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(LOCAL_LD,0x, LOGOP, 0x64, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(LOCAL_ST,0x, LOGOP, 0x64, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_0, 0x, LOGOP, 0x01, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_1, 0x, LOGOP, 0x01, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_2, 0x, LOGOP, 0x01, 1, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_3, 0x, LOGOP, 0x01, 1, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_4, 0x, LOGOP, 0x01, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_5, 0x, LOGOP, 0x01, 1, 0x50, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_6, 0x, LOGOP, 0x01, 1, 0x60, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(PROF_TRIGGER_7, 0x, LOGOP, 0x01, 1, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(SHARED_LD, 0x, LOGOP, 0x64, 1, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(SHARED_ST, 0x, LOGOP, 0x64, 1, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00), - _Q(THREADS_LAUNCHED,0x, LOGOP, 0x26, 6, 0x10, 0x21, 0x32, 0x43, 0x54, 0x65), - _Q(TH_INST_EXECUTED_0, 0x, LOGOP, 0xa3, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55), - _Q(TH_INST_EXECUTED_1, 0x, LOGOP, 0xa5, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55), - _Q(TH_INST_EXECUTED_2, 0x, LOGOP, 0xa4, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55), - _Q(TH_INST_EXECUTED_3, 0x, LOGOP, 0xa6, 6, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55), - _Q(WARPS_LAUNCHED, 0x, LOGOP, 0x26, 1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00), + _Q(ACTIVE_CYCLES, 1, _C(0x, LOGOP, 0x11, 0x)), + _Q(ACTIVE_WARPS,6, _C(0x, LOGOP, 0x24, 0x0010), + _C(0x, LOGOP, 0x24, 0x0021), + _C(0x, LOGOP, 0x24, 0x0032), + _C(0x, LOGOP, 0x24, 0x0043), +
[Mesa-dev] [PATCH 01/16] nvc0: remove useless call to query_get_cfg() in nvc0_hw_sm_query_end()
Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index 3bdb90a..8e2239f 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -439,9 +439,6 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) const uint block[3] = { 32, is_nve4 ? 4 : 1, 1 }; const uint grid[3] = { screen->mp_count, 1, 1 }; unsigned c; - const struct nvc0_hw_sm_query_cfg *cfg; - - cfg = nvc0_hw_sm_query_get_cfg(nvc0, hq); if (unlikely(!screen->pm.prog)) { struct nvc0_program *prog = CALLOC_STRUCT(nvc0_program); @@ -495,6 +492,7 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) PUSH_SPACE(push, 16); mask = 0; for (c = 0; c < 8; ++c) { + const struct nvc0_hw_sm_query_cfg *cfg; unsigned i; hsq = screen->pm.mp_counter[c]; -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/16] nvc0: rip off the kepler MP-enabling logic from the Fermi codepath
Writing 0x1fcb to 0x419eac is definitely not related to MP counters and has no effect on Fermi (although this enables MP counters on Kepler). Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index f83966a..a74bfee 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -436,13 +436,7 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) } assert(cfg->num_counters <= 4); - PUSH_SPACE(push, 4 * 8 * 6 + 6); - - if (!screen->pm.mp_counters_enabled) { - screen->pm.mp_counters_enabled = true; - BEGIN_NVC0(push, SUBC_SW(0x06ac), 1); - PUSH_DATA (push, 0x1fcb); - } + PUSH_SPACE(push, 4 * 8 * 6 + 4); /* set sequence field to 0 (used to check if result is available) */ for (i = 0; i < screen->mp_count; ++i) -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/16] nvc0: split out begin_query() hook used by MP counters
The way we configure MP performance counters is going to pretty different between Fermi and Kepler. Having two separate functions is much better. Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 108 - 1 file changed, 84 insertions(+), 24 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index 8e2239f..f83966a 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -338,16 +338,91 @@ nvc0_hw_sm_destroy_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) } static boolean +nve4_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) +{ + struct nvc0_screen *screen = nvc0->screen; + struct nouveau_pushbuf *push = nvc0->base.pushbuf; + struct nvc0_hw_sm_query *hsq = nvc0_hw_sm_query(hq); + const struct nvc0_hw_sm_query_cfg *cfg; + unsigned i, c; + unsigned num_ab[2] = { 0, 0 }; + + cfg = nvc0_hw_sm_query_get_cfg(nvc0, hq); + + /* check if we have enough free counter slots */ + for (i = 0; i < cfg->num_counters; ++i) + num_ab[cfg->ctr[i].sig_dom]++; + + if (screen->pm.num_hw_sm_active[0] + num_ab[0] > 4 || + screen->pm.num_hw_sm_active[1] + num_ab[1] > 4) { + NOUVEAU_ERR("Not enough free MP counter slots !\n"); + return false; + } + + assert(cfg->num_counters <= 4); + PUSH_SPACE(push, 4 * 8 * + 6); + + if (!screen->pm.mp_counters_enabled) { + screen->pm.mp_counters_enabled = true; + BEGIN_NVC0(push, SUBC_SW(0x06ac), 1); + PUSH_DATA (push, 0x1fcb); + } + + /* set sequence field to 0 (used to check if result is available) */ + for (i = 0; i < screen->mp_count; ++i) + hq->data[i * 10 + 10] = 0; + hq->sequence++; + + for (i = 0; i < cfg->num_counters; ++i) { + const unsigned d = cfg->ctr[i].sig_dom; + + if (!screen->pm.num_hw_sm_active[d]) { + uint32_t m = (1 << 22) | (1 << (7 + (8 * !d))); + if (screen->pm.num_hw_sm_active[!d]) +m |= 1 << (7 + (8 * d)); + BEGIN_NVC0(push, SUBC_SW(0x0600), 1); + PUSH_DATA (push, m); + } + screen->pm.num_hw_sm_active[d]++; + + for (c = d * 4; c < (d * 4 + 4); ++c) { + if (!screen->pm.mp_counter[c]) { +hsq->ctr[i] = c; +screen->pm.mp_counter[c] = hsq; +break; + } + } + assert(c <= (d * 4 + 3)); /* must succeed, already checked for space */ + + /* configure and reset the counter(s) */ + if (d == 0) +BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_A_SIGSEL(c & 3)), 1); + else +BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_B_SIGSEL(c & 3)), 1); + PUSH_DATA (push, cfg->ctr[i].sig_sel); + BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_SRCSEL(c)), 1); + PUSH_DATA (push, cfg->ctr[i].src_sel + 0x2108421 * (c & 3)); + BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_FUNC(c)), 1); + PUSH_DATA (push, (cfg->ctr[i].func << 4) | cfg->ctr[i].mode); + BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_SET(c)), 1); + PUSH_DATA (push, 0); + } + return true; +} + +static boolean nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) { struct nvc0_screen *screen = nvc0->screen; struct nouveau_pushbuf *push = nvc0->base.pushbuf; - const bool is_nve4 = screen->base.class_3d >= NVE4_3D_CLASS; struct nvc0_hw_sm_query *hsq = nvc0_hw_sm_query(hq); const struct nvc0_hw_sm_query_cfg *cfg; unsigned i, c; unsigned num_ab[2] = { 0, 0 }; + if (screen->base.class_3d >= NVE4_3D_CLASS) + return nve4_hw_sm_begin_query(nvc0, hq); + cfg = nvc0_hw_sm_query_get_cfg(nvc0, hq); /* check if we have enough free counter slots */ @@ -361,7 +436,7 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) } assert(cfg->num_counters <= 4); - PUSH_SPACE(push, 4 * 8 * (is_nve4 ? 1 : 6) + 6); + PUSH_SPACE(push, 4 * 8 * 6 + 6); if (!screen->pm.mp_counters_enabled) { screen->pm.mp_counters_enabled = true; @@ -376,6 +451,7 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) for (i = 0; i < cfg->num_counters; ++i) { const unsigned d = cfg->ctr[i].sig_dom; + unsigned s; if (!screen->pm.num_hw_sm_active[d]) { uint32_t m = (1 << 22) | (1 << (7 + (8 * !d))); @@ -396,31 +472,15 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) assert(c <= (d * 4 + 3)); /* must succeed, already checked for space */ /* configure and reset the counter(s) */ - if (is_nve4) { - if (d == 0) -BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_A_SIGSEL(c & 3)), 1); - else -BEGIN_NVC0(push, NVE4_COMPUTE(MP_PM_B_SIGSEL(c & 3)), 1); + for (s = 0; s < cfg->ctr[i].num_src; s++) { + BEGIN_NVC0(push, NVC0_COMPUTE(MP_PM_SIGSEL(s)), 1); PUSH_DATA (push, cfg
[Mesa-dev] [PATCH 06/16] nvc0: allow to use 8 MP counters on Fermi
On Fermi, we have one domain of 8 MP counters while we have two domains of 4 MP counters on Kepler. Signed-off-by: Samuel Pitoiset --- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.c| 30 +- .../drivers/nouveau/nvc0/nvc0_query_hw_sm.h| 2 +- 2 files changed, 13 insertions(+), 19 deletions(-) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c index b810d25..2060662 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.c @@ -144,7 +144,7 @@ struct nvc0_hw_sm_counter_cfg struct nvc0_hw_sm_query_cfg { - struct nvc0_hw_sm_counter_cfg ctr[4]; + struct nvc0_hw_sm_counter_cfg ctr[8]; uint8_t num_counters; uint8_t op; uint8_t norm[2]; /* normalization num,denom */ @@ -418,7 +418,6 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) struct nvc0_hw_sm_query *hsq = nvc0_hw_sm_query(hq); const struct nvc0_hw_sm_query_cfg *cfg; unsigned i, c; - unsigned num_ab[2] = { 0, 0 }; if (screen->base.class_3d >= NVE4_3D_CLASS) return nve4_hw_sm_begin_query(nvc0, hq); @@ -426,17 +425,13 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) cfg = nvc0_hw_sm_query_get_cfg(nvc0, hq); /* check if we have enough free counter slots */ - for (i = 0; i < cfg->num_counters; ++i) - num_ab[cfg->ctr[i].sig_dom]++; - - if (screen->pm.num_hw_sm_active[0] + num_ab[0] > 4 || - screen->pm.num_hw_sm_active[1] + num_ab[1] > 4) { + if (screen->pm.num_hw_sm_active[0] + cfg->num_counters > 8) { NOUVEAU_ERR("Not enough free MP counter slots !\n"); return false; } - assert(cfg->num_counters <= 4); - PUSH_SPACE(push, 4 * 8 * 6 + 4); + assert(cfg->num_counters <= 8); + PUSH_SPACE(push, 4 * 8 * 6 + 2); /* set sequence field to 0 (used to check if result is available) */ for (i = 0; i < screen->mp_count; ++i) { @@ -446,23 +441,21 @@ nvc0_hw_sm_begin_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) hq->sequence++; for (i = 0; i < cfg->num_counters; ++i) { - const unsigned d = cfg->ctr[i].sig_dom; unsigned s; - if (!screen->pm.num_hw_sm_active[d]) { + if (!screen->pm.num_hw_sm_active[0]) { BEGIN_NVC0(push, SUBC_SW(0x0600), 1); PUSH_DATA (push, 0x8000); } - screen->pm.num_hw_sm_active[d]++; + screen->pm.num_hw_sm_active[0]++; - for (c = d * 4; c < (d * 4 + 4); ++c) { + for (c = 0; c < 8; ++c) { if (!screen->pm.mp_counter[c]) { hsq->ctr[i] = c; screen->pm.mp_counter[c] = hsq; break; } } - assert(c <= (d * 4 + 3)); /* must succeed, already checked for space */ /* configure and reset the counter(s) */ for (s = 0; s < cfg->ctr[i].num_src; s++) { @@ -522,7 +515,8 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) /* release counters for this query */ for (c = 0; c < 8; ++c) { if (screen->pm.mp_counter[c] == hsq) { - screen->pm.num_hw_sm_active[c / 4]--; + uint8_t d = is_nve4 ? c / 4 : 0; /* only one domain for NVC0:NVE4 */ + screen->pm.num_hw_sm_active[d]--; screen->pm.mp_counter[c] = NULL; } } @@ -568,7 +562,7 @@ nvc0_hw_sm_end_query(struct nvc0_context *nvc0, struct nvc0_hw_query *hq) } static inline bool -nvc0_hw_sm_query_read_data(uint32_t count[32][4], +nvc0_hw_sm_query_read_data(uint32_t count[32][8], struct nvc0_context *nvc0, bool wait, struct nvc0_hw_query *hq, const struct nvc0_hw_sm_query_cfg *cfg, @@ -594,7 +588,7 @@ nvc0_hw_sm_query_read_data(uint32_t count[32][4], } static inline bool -nve4_hw_sm_query_read_data(uint32_t count[32][4], +nve4_hw_sm_query_read_data(uint32_t count[32][8], struct nvc0_context *nvc0, bool wait, struct nvc0_hw_query *hq, const struct nvc0_hw_sm_query_cfg *cfg, @@ -640,7 +634,7 @@ static boolean nvc0_hw_sm_get_query_result(struct nvc0_context *nvc0, struct nvc0_hw_query *hq, boolean wait, union pipe_query_result *result) { - uint32_t count[32][4]; + uint32_t count[32][8]; uint64_t value = 0; unsigned mp_count = MIN2(nvc0->screen->mp_count_compute, 32); unsigned p, c; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.h b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.h index bab6f34..0ad8a91 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.h +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw_sm.h @@ -5,7 +5,7 @@ struct nvc0_hw_sm_query { struct nvc0_hw_query base; - int8_t ctr[4]; + uint8_t ctr[8]; }; static inline struct nvc0_hw_sm_query * -- 2.6.1
Re: [Mesa-dev] [PATCH 1/2] Used correct parameter to derive the value of the "h" variable in vlVaCreateImage
On 16 October 2015 at 15:53, Christian König wrote: > From: Indrajit Das > > Reviewed-by: Christian König Nicely spotted ! For the future use correct prefix for the summary ("st/va:" here, but git log will show you the way in other places) and add the stable tag on bugfixes. There is no need to resend - things can be squashed at commit time by Christian/me. Cc: "11.0" Reviewed-by: Emil Velikov Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] Added support for NV12 to IYUV conversion in vlVaGetImage
Hi guys, Out of curiosity - do you know off-hand about any users of IYUV/I420 ? I was under the impression that everyone is doing nv12/nv21/yv12 in 99% of the cases. On 16 October 2015 at 15:53, Christian König wrote: > From: Indrajit Das > > Reviewed-by: Christian König > --- > src/gallium/state_trackers/va/image.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/state_trackers/va/image.c > b/src/gallium/state_trackers/va/image.c > index 3b36430..b37a971 100644 > --- a/src/gallium/state_trackers/va/image.c > +++ b/src/gallium/state_trackers/va/image.c > @@ -240,9 +240,11 @@ vlVaGetImage(VADriverContextP ctx, VASurfaceID surface, > int x, int y, >return VA_STATUS_ERROR_OPERATION_FAILED; > > if (format != surf->buffer->buffer_format) { > - /* support NV12 to YV12 conversion now only */ > - if (format == PIPE_FORMAT_YV12 && > - surf->buffer->buffer_format == PIPE_FORMAT_NV12) > + /* support NV12 to YV12 and IYUV conversion now only */ > + if ((format == PIPE_FORMAT_YV12 && > + surf->buffer->buffer_format == PIPE_FORMAT_NV12) || > + (format == PIPE_FORMAT_IYUV && > + surf->buffer->buffer_format == PIPE_FORMAT_NV12)) I was just about to mention that we're not flipping the planes pointers (and their respective info) only to see that we're checking vs VA_FOURCC a few lines below :-) Perhaps we should add a couple of trivial helpers u_copy_nv12_to_iyuv and u_copy_iyuv_to_nv12, which wrap around their yv12 brethren ? This way one can use them directly - be that here, in vdpau and/or elsewhere. Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/17] i965/fs: Move some of the prog_data setup into brw_wm_emit
On Fri, Oct 16, 2015 at 12:35 AM, Pohjolainen, Topi wrote: > On Fri, Oct 09, 2015 at 05:50:22AM -0700, Jason Ekstrand wrote: >> On Fri, Oct 9, 2015 at 12:10 AM, Pohjolainen, Topi >> wrote: >> > On Thu, Oct 08, 2015 at 05:22:47PM -0700, Jason Ekstrand wrote: >> >> This commit moves the common/modern stuff. Some legacy stuff such as >> >> setting use_alt_mode was left because it needs to know whether or not >> >> we're >> >> an ARB program. >> >> --- >> >> src/mesa/drivers/dri/i965/brw_fs.cpp | 98 >> >> >> >> src/mesa/drivers/dri/i965/brw_wm.c | 98 >> >> >> >> 2 files changed, 98 insertions(+), 98 deletions(-) >> >> >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> >> index 146f4b4..0e39b50 100644 >> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> >> @@ -5114,6 +5114,90 @@ fs_visitor::run_cs() >> >> return !failed; >> >> } >> >> >> >> +/** >> >> + * Return a bitfield where bit n is set if barycentric interpolation >> >> mode n >> >> + * (see enum brw_wm_barycentric_interp_mode) is needed by the fragment >> >> shader. >> >> + */ >> >> +static unsigned >> >> +brw_compute_barycentric_interp_modes(const struct brw_device_info >> >> *devinfo, >> >> + bool shade_model_flat, >> >> + bool persample_shading, >> >> + const nir_shader *shader) >> >> +{ >> >> + unsigned barycentric_interp_modes = 0; >> >> + >> >> + nir_foreach_variable(var, &shader->inputs) { >> >> + enum glsl_interp_qualifier interp_qualifier = >> >> + (enum glsl_interp_qualifier)var->data.interpolation; >> >> + bool is_centroid = var->data.centroid && !persample_shading; >> >> + bool is_sample = var->data.sample || persample_shading; >> >> + bool is_gl_Color = (var->data.location == VARYING_SLOT_COL0) || >> >> + (var->data.location == VARYING_SLOT_COL1); >> >> + >> >> + /* Ignore WPOS and FACE, because they don't require interpolation. >> >> */ >> >> + if (var->data.location == VARYING_SLOT_POS || >> >> + var->data.location == VARYING_SLOT_FACE) >> >> + continue; >> >> + >> >> + /* Determine the set (or sets) of barycentric coordinates needed to >> >> + * interpolate this variable. Note that when >> >> + * brw->needs_unlit_centroid_workaround is set, centroid >> >> interpolation >> >> + * uses PIXEL interpolation for unlit pixels and CENTROID >> >> interpolation >> >> + * for lit pixels, so we need both sets of barycentric coordinates. >> >> + */ >> >> + if (interp_qualifier == INTERP_QUALIFIER_NOPERSPECTIVE) { >> >> + if (is_centroid) { >> >> +barycentric_interp_modes |= >> >> + 1 << BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC; >> >> + } else if (is_sample) { >> >> +barycentric_interp_modes |= >> >> + 1 << BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC; >> >> + } >> >> + if ((!is_centroid && !is_sample) || >> >> + devinfo->needs_unlit_centroid_workaround) { >> >> +barycentric_interp_modes |= >> >> + 1 << BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC; >> >> + } >> >> + } else if (interp_qualifier == INTERP_QUALIFIER_SMOOTH || >> >> + (!(shade_model_flat && is_gl_Color) && >> >> + interp_qualifier == INTERP_QUALIFIER_NONE)) { >> >> + if (is_centroid) { >> >> +barycentric_interp_modes |= >> >> + 1 << BRW_WM_PERSPECTIVE_CENTROID_BARYCENTRIC; >> >> + } else if (is_sample) { >> >> +barycentric_interp_modes |= >> >> + 1 << BRW_WM_PERSPECTIVE_SAMPLE_BARYCENTRIC; >> >> + } >> >> + if ((!is_centroid && !is_sample) || >> >> + devinfo->needs_unlit_centroid_workaround) { >> >> +barycentric_interp_modes |= >> >> + 1 << BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC; >> >> + } >> >> + } >> >> + } >> >> + >> >> + return barycentric_interp_modes; >> >> +} >> >> + >> >> +static uint8_t >> >> +computed_depth_mode(const nir_shader *shader) >> >> +{ >> >> + if (shader->info.outputs_written & BITFIELD64_BIT(FRAG_RESULT_DEPTH)) >> >> { >> >> + switch (shader->info.fs.depth_layout) { >> >> + case FRAG_DEPTH_LAYOUT_NONE: >> >> + case FRAG_DEPTH_LAYOUT_ANY: >> >> + return BRW_PSCDEPTH_ON; >> >> + case FRAG_DEPTH_LAYOUT_GREATER: >> >> + return BRW_PSCDEPTH_ON_GE; >> >> + case FRAG_DEPTH_LAYOUT_LESS: >> >> + return BRW_PSCDEPTH_ON_LE; >> >> + case FRAG_DEPTH_LAYOUT_UNCHANGED: >> >> + return BRW_PSCDEPTH_OFF; >> >> + } >> >> + } >> >> + return BRW_PSCDEPTH_OFF; >> >> +} >> >> + >> >> const unsigned * >
Re: [Mesa-dev] [PATCH] i965/vs: Move URB entry_size and read_length calculations to compile_vs
Topi, Seeing as you're on a roll reviewing my move-the-code patches, mind one more? --Jason On Thu, Oct 15, 2015 at 12:06 PM, Jason Ekstrand wrote: > This patch applies on top of my previous series to shuffle a bunch of > the compiler code around. > > On Thu, Oct 15, 2015 at 12:05 PM, Jason Ekstrand wrote: >> --- >> src/mesa/drivers/dri/i965/brw_vec4.cpp | 34 >> ++ >> src/mesa/drivers/dri/i965/brw_vs.c | 34 >> -- >> 2 files changed, 34 insertions(+), 34 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> index ca4d23a..00e2d63 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> @@ -1933,6 +1933,40 @@ brw_compile_vs(const struct brw_compiler *compiler, >> void *log_data, >> { >> const unsigned *assembly = NULL; >> >> + unsigned nr_attributes = _mesa_bitcount_64(prog_data->inputs_read); >> + >> + /* gl_VertexID and gl_InstanceID are system values, but arrive via an >> +* incoming vertex attribute. So, add an extra slot. >> +*/ >> + if (shader->info.system_values_read & >> + (BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) | >> +BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID))) { >> + nr_attributes++; >> + } >> + >> + /* The 3DSTATE_VS documentation lists the lower bound on "Vertex URB >> Entry >> +* Read Length" as 1 in vec4 mode, and 0 in SIMD8 mode. Empirically, in >> +* vec4 mode, the hardware appears to wedge unless we read something. >> +*/ >> + if (compiler->scalar_vs) >> + prog_data->base.urb_read_length = DIV_ROUND_UP(nr_attributes, 2); >> + else >> + prog_data->base.urb_read_length = DIV_ROUND_UP(MAX2(nr_attributes, >> 1), 2); >> + >> + prog_data->nr_attributes = nr_attributes; >> + >> + /* Since vertex shaders reuse the same VUE entry for inputs and outputs >> +* (overwriting the original contents), we need to make sure the size is >> +* the larger of the two. >> +*/ >> + const unsigned vue_entries = >> + MAX2(nr_attributes, (unsigned)prog_data->base.vue_map.num_slots); >> + >> + if (compiler->devinfo->gen == 6) >> + prog_data->base.urb_entry_size = DIV_ROUND_UP(vue_entries, 8); >> + else >> + prog_data->base.urb_entry_size = DIV_ROUND_UP(vue_entries, 4); >> + >> if (compiler->scalar_vs) { >>prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8; >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vs.c >> b/src/mesa/drivers/dri/i965/brw_vs.c >> index 6c161d0..c9afc63 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vs.c >> +++ b/src/mesa/drivers/dri/i965/brw_vs.c >> @@ -160,40 +160,6 @@ brw_codegen_vs_prog(struct brw_context *brw, >> &prog_data.base.vue_map, outputs_written, >> prog ? prog->SeparateShader : false); >> >> - unsigned nr_attributes = _mesa_bitcount_64(prog_data.inputs_read); >> - >> - /* gl_VertexID and gl_InstanceID are system values, but arrive via an >> -* incoming vertex attribute. So, add an extra slot. >> -*/ >> - if (vp->program.Base.SystemValuesRead & >> - (BITFIELD64_BIT(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE) | >> -BITFIELD64_BIT(SYSTEM_VALUE_INSTANCE_ID))) { >> - nr_attributes++; >> - } >> - >> - /* The 3DSTATE_VS documentation lists the lower bound on "Vertex URB >> Entry >> -* Read Length" as 1 in vec4 mode, and 0 in SIMD8 mode. Empirically, in >> -* vec4 mode, the hardware appears to wedge unless we read something. >> -*/ >> - if (brw->intelScreen->compiler->scalar_vs) >> - prog_data.base.urb_read_length = DIV_ROUND_UP(nr_attributes, 2); >> - else >> - prog_data.base.urb_read_length = DIV_ROUND_UP(MAX2(nr_attributes, 1), >> 2); >> - >> - prog_data.nr_attributes = nr_attributes; >> - >> - /* Since vertex shaders reuse the same VUE entry for inputs and outputs >> -* (overwriting the original contents), we need to make sure the size is >> -* the larger of the two. >> -*/ >> - const unsigned vue_entries = >> - MAX2(nr_attributes, prog_data.base.vue_map.num_slots); >> - >> - if (brw->gen == 6) >> - prog_data.base.urb_entry_size = DIV_ROUND_UP(vue_entries, 8); >> - else >> - prog_data.base.urb_entry_size = DIV_ROUND_UP(vue_entries, 4); >> - >> if (0) { >>_mesa_fprint_program_opt(stderr, &vp->program.Base, PROG_PRINT_DEBUG, >>true); >> -- >> 2.5.0.400.gff86faf >> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: convert programs to gl_program *
On 10/16/2015 01:16 AM, Boyan Ding wrote: Fixes following warnings: state_tracker/st_cb_program.c: In function ‘st_new_program’: state_tracker/st_cb_program.c:108:36: warning: passing argument 1 of ‘_mesa_init_gl_program’ from incompatible pointer type [-Wincompatible-pointer-types] return _mesa_init_gl_program(&prog->Base, target, id); ^ In file included from state_tracker/st_cb_program.c:38:0: ./program/program.h:67:1: note: expected ‘struct gl_program *’ but argument is of type ‘struct gl_vertex_program *’ _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint id); ^ state_tracker/st_cb_program.c:112:36: warning: passing argument 1 of ‘_mesa_init_gl_program’ from incompatible pointer type [-Wincompatible-pointer-types] return _mesa_init_gl_program(&prog->Base, target, id); ^ In file included from state_tracker/st_cb_program.c:38:0: ./program/program.h:67:1: note: expected ‘struct gl_program *’ but argument is of type ‘struct gl_fragment_program *’ _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint id); ^ state_tracker/st_cb_program.c:116:36: warning: passing argument 1 of ‘_mesa_init_gl_program’ from incompatible pointer type [-Wincompatible-pointer-types] return _mesa_init_gl_program(&prog->Base, target, id); ^ In file included from state_tracker/st_cb_program.c:38:0: ./program/program.h:67:1: note: expected ‘struct gl_program *’ but argument is of type ‘struct gl_geometry_program *’ _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint id); ^ state_tracker/st_cb_program.c:120:36: warning: passing argument 1 of ‘_mesa_init_gl_program’ from incompatible pointer type [-Wincompatible-pointer-types] return _mesa_init_gl_program(&prog->Base, target, id); ^ In file included from state_tracker/st_cb_program.c:38:0: ./program/program.h:67:1: note: expected ‘struct gl_program *’ but argument is of type ‘struct gl_tess_ctrl_program *’ _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint id); ^ state_tracker/st_cb_program.c:124:36: warning: passing argument 1 of ‘_mesa_init_gl_program’ from incompatible pointer type [-Wincompatible-pointer-types] return _mesa_init_gl_program(&prog->Base, target, id); ^ In file included from state_tracker/st_cb_program.c:38:0: ./program/program.h:67:1: note: expected ‘struct gl_program *’ but argument is of type ‘struct gl_tess_eval_program *’ _mesa_init_gl_program(struct gl_program *prog, GLenum target, GLuint id); ^ Fixes: bcb56c2c6 (program: convert _mesa_init_gl_program to take struct gl_program *) Signed-off-by: Boyan Ding --- src/mesa/state_tracker/st_cb_program.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) I already fixed this but forgot to push the patch yesterday. I'll do so now... -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()
On 10/16/2015 08:17 AM, Brian Paul wrote: On 10/16/2015 12:36 AM, Michel Dänzer wrote: Hi Brian, On 15.10.2015 22:23, Brian Paul wrote: Module: Mesa Branch: master Commit: 0de5e0f3fb0f3671a3ecec6ab4473f9131ecd0ae URL: https://urldefense.proofpoint.com/v2/url?u=http-3A__cgit.freedesktop.org_mesa_mesa_commit_-3Fid-3D0de5e0f3fb0f3671a3ecec6ab4473f9131ecd0ae&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8&m=8sayHNUd839r9lIMSeT-XcC7Pgd2pvUFMx8M908uS1E&s=l0dFEwSLVXkyixgaBNadODlBRh4jy81TFLFucNg8BI8&e= Author: Brian Paul Date: Wed Oct 14 09:08:50 2015 -0600 mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode() Changing the matrix mode alone has no effect on rendering and does not need to trigger a flush or state validation. Famous last words. :) This change broke the piglit tests spec@arb_vertex_program@clip-plane-transformation arb spec@glsl-1.10@execution@clipping@clip-plane-transformation fixed for me with radeonsi and llvmpipe. Hmm, I don't know how I missed that. I did a full piglit run with that whole series of patches and didn't notice this. I'll look into it. Thanks for catching it. I think I'll just have to revert this. It looks like the state tracker's update_clip() isn't getting called at some point and the gallium driver doesn't get the updated clip planes. I suspect my change to matrix.c actually exposed a bug elsewhere, but I don't have time to hunt it down right now. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] Added support for NV12 to IYUV conversion in vlVaGetImage
From: Indrajit Das Reviewed-by: Christian König --- src/gallium/state_trackers/va/image.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/gallium/state_trackers/va/image.c b/src/gallium/state_trackers/va/image.c index 3b36430..b37a971 100644 --- a/src/gallium/state_trackers/va/image.c +++ b/src/gallium/state_trackers/va/image.c @@ -240,9 +240,11 @@ vlVaGetImage(VADriverContextP ctx, VASurfaceID surface, int x, int y, return VA_STATUS_ERROR_OPERATION_FAILED; if (format != surf->buffer->buffer_format) { - /* support NV12 to YV12 conversion now only */ - if (format == PIPE_FORMAT_YV12 && - surf->buffer->buffer_format == PIPE_FORMAT_NV12) + /* support NV12 to YV12 and IYUV conversion now only */ + if ((format == PIPE_FORMAT_YV12 && + surf->buffer->buffer_format == PIPE_FORMAT_NV12) || + (format == PIPE_FORMAT_IYUV && + surf->buffer->buffer_format == PIPE_FORMAT_NV12)) convert = true; else return VA_STATUS_ERROR_OPERATION_FAILED; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] Used correct parameter to derive the value of the "h" variable in vlVaCreateImage
From: Indrajit Das Reviewed-by: Christian König --- src/gallium/state_trackers/va/image.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/va/image.c b/src/gallium/state_trackers/va/image.c index 022240d..3b36430 100644 --- a/src/gallium/state_trackers/va/image.c +++ b/src/gallium/state_trackers/va/image.c @@ -116,7 +116,7 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat *format, int width, int heig img->width = width; img->height = height; w = align(width, 2); - h = align(width, 2); + h = align(height, 2); switch (format->fourcc) { case VA_FOURCC('N','V','1','2'): -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode()
On 10/16/2015 12:36 AM, Michel Dänzer wrote: Hi Brian, On 15.10.2015 22:23, Brian Paul wrote: Module: Mesa Branch: master Commit: 0de5e0f3fb0f3671a3ecec6ab4473f9131ecd0ae URL: https://urldefense.proofpoint.com/v2/url?u=http-3A__cgit.freedesktop.org_mesa_mesa_commit_-3Fid-3D0de5e0f3fb0f3671a3ecec6ab4473f9131ecd0ae&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8&m=8sayHNUd839r9lIMSeT-XcC7Pgd2pvUFMx8M908uS1E&s=l0dFEwSLVXkyixgaBNadODlBRh4jy81TFLFucNg8BI8&e= Author: Brian Paul Date: Wed Oct 14 09:08:50 2015 -0600 mesa: remove FLUSH_VERTICES() in _mesa_MatrixMode() Changing the matrix mode alone has no effect on rendering and does not need to trigger a flush or state validation. Famous last words. :) This change broke the piglit tests spec@arb_vertex_program@clip-plane-transformation arb spec@glsl-1.10@execution@clipping@clip-plane-transformation fixed for me with radeonsi and llvmpipe. Hmm, I don't know how I missed that. I did a full piglit run with that whole series of patches and didn't notice this. I'll look into it. Thanks for catching it. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC 1/2] gallium: add renderonly driver
Hi Christian, First off, thanks for reviving this effort. It's been one of the things that I've had nagging at me for much too long and I think it needs to be solved. So I'm hopeful that the more people we get looking at this the more likely it will be to come up with a solution that works well for everyone. That said, I don't agree with the approach you've chosen here. I'll try to clarify why below. On Sun, Oct 11, 2015 at 05:09:21PM +0200, Christian Gmeiner wrote: > This commit adds a generic renderonly driver library, which fullfille > the requirements for tegra and etnaviv. As a result it is possible to > run unmodified egl software directly (without any compositor) on > supported devices. Technically this isn't a library but rather a midlayer. There's a subtle difference, but the implications are what concerns me. Back when I wrote the original driver for Tegra/Nouveau I also looked into possibilities to make this more generic. Since I know how bad mid- layers can be (from kernel experience) I shied away from something like this early on. What I tried to do next was abstract away enough to make this usable by more than just a single driver. Unfortunately the end result was that not much could be reused, so drivers ended up still having to implement all of the pipe_* objects, only to call generic functions. Most of the code needed in the various callbacks ended up not being much more than just a single line, so the gains from a helper library weren't very big. Another reason why I think this level of abstraction doesn't gain us much is that we already have a good level of abstraction, which is Gallium. I realize that implementing only the skeleton for a full Gallium driver is rather complicated, but that's due to the fact that graphics drivers are complex beasts. That said, I think for some areas it might be beneficial to have helpers to reduce the amount of duplication. However I think at this point in time we haven't had enough real-world exposure for this kind of driver to know what the requirements are. For that reason I think it is premature to use a generic midlayer such as this. Yes, I know that the alternative is roughly 2000 lines of code per driver, but on one hand that's nothing compared to the amount of code required by a proper GPU driver and on the other hand this will (ideally) be temporary until we get a better picture of where things need to go. At which point it may become more obvious how we can solve the boilerplate problem while at the same time avoiding the restrictions imposed by the midlayer. > In every use case we import a dumb buffer from scanout gpu into > the renderonly gpu. > > If the scanout hardware does support the used tiling format from the > renderonly gpu, a driver can define a function which is used to 'setup' > the needed tiling on that imported buffer. This functions gets called > during rendertarget resource creation. > > If the scanout hardware does not support the used tiling format we need > to create an extra rendertarget resource for the renderonly gpu. > During XXX we blit the renderonly rendertarget onto the imported dumb > buffer. > > We assume that the renderonly driver provides a blit function that is > capable of resolving the tilied into untiled one. I understand that there's a want to eliminate the amount of boilerplate, but I think this approach of using a midlayer has several flaws. One of the typical pitfalls with a midlayer such as this is that it has the potential to grow into an unmaintainable mess. Granted, this currently doesn't look all that bad, but that's primarily because it supports only two types of devices. I suspect that the more devices we add, the more hooks and quirks we'll need. Every combination of GPU and display is likely going to have their own specialties that need to be handled and which are beyond simple things like the tiling format. We also know that there are issues with the current approach (EGL clients in Weston don't properly display). It's unknown what the reason for this is and it may require largish changes to the architecture to fix it. For all of the above reasons I think it'd be better to live with a little boilerplate for now and refactor things as they become obvious candidates for refactoring. [...] > diff --git a/src/gallium/drivers/renderonly/renderonly_screen.c > b/src/gallium/drivers/renderonly/renderonly_screen.c [...] > +static const char * > +renderonly_get_vendor(struct pipe_screen *pscreen) > +{ > + return "renderonly"; > +} I don't think this going to do us much good. Applications may want to know more precisely what kind of device they're running on and change behaviour accordingly. > +static void renderonly_screen_destroy(struct pipe_screen *pscreen) > +{ > + struct renderonly_screen *screen = to_renderonly_screen(pscreen); > + > + screen->gpu->destroy(screen->gpu); > + free(pscreen); > +} > + > +static int > +renderonly_screen_get_param(struct pipe_screen *pscree
Re: [Mesa-dev] [RFC 1/2] gallium: add renderonly driver
On Fri, Oct 16, 2015 at 12:09:52AM +0100, Emil Velikov wrote: > Hi Christian, > > I'm glad to see Thierry's work revived. Hopefully this will soon be > the basis of many more drivers. > > On 11 October 2015 at 16:09, Christian Gmeiner > wrote: > > This commit adds a generic renderonly driver library, which fullfille > > the requirements for tegra and etnaviv. As a result it is possible to > > run unmodified egl software directly (without any compositor) on > > supported devices. > > > > In every use case we import a dumb buffer from scanout gpu into > > the renderonly gpu. > > > > If the scanout hardware does support the used tiling format from the > > renderonly gpu, a driver can define a function which is used to 'setup' > > the needed tiling on that imported buffer. This functions gets called > > during rendertarget resource creation. > > > > If the scanout hardware does not support the used tiling format we need > > to create an extra rendertarget resource for the renderonly gpu. > > During XXX we blit the renderonly rendertarget onto the imported dumb > > buffer. > > > I'd assume you meant to add something over the XXX here :-P > > But seriously some people might not be too happy with the blit onto > dumb buffer. Personally I ok, esp. since we don't have anything better > atm. > > That aside, there are a few minor nitpicks below. With those sorted I > believe the patch is good to land. I'd prefer if at least the Tegra part wasn't merged just yet. We know that it only works in a restricted set of use-cases. So until we figure it out all the way I don't think it makes sense to have this code in Mesa yet. I suppose the renderonly part could go in, though I do have some reservations about the architecture. I'll try to comment on that in a separate email because this subthread lacks some of the necessary context. Thierry signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block
On Wed, 2015-10-14 at 21:40 +0300, Francisco Jerez wrote: > Jordan Justen writes: > > > On 2015-10-13 22:49:08, Iago Toral wrote: > >> On Tue, 2015-10-13 at 09:44 -0700, Jordan Justen wrote: > >> > On 2015-10-13 05:17:37, Francisco Jerez wrote: > >> > > Iago Toral Quiroga writes: > >> > > > >> > > > This fixes the following test: > >> > > > > >> > > > [require] > >> > > > GL >= 3.3 > >> > > > GLSL >= 3.30 > >> > > > GL_ARB_shader_storage_buffer_object > >> > > > > >> > > > [fragment shader] > >> > > > #version 330 > >> > > > #extension GL_ARB_shader_storage_buffer_object: require > >> > > > > >> > > > buffer SSBO { > >> > > > mat4 sm4; > >> > > > }; > >> > > > > >> > > > > >> > > > mat4 um4; > >> > > > > >> > > > void main() { > >> > > > sm4 *= um4; > >> > > > >> > > This is using the value of "um4", which is never assigned, so liveness > >> > > analysis will correctly extend its live interval up to the start of the > >> > > block. > >> > > >> > This test was derived by simplifying a CTS test case. > >> > > >> > Anyway, I'm not sure what happened on the way to the commit message, > >> > but um4 should be a uniform. > >> > > >> > http://sprunge.us/cEUe > >> > >> Oh yes, that was me playing around with the example. The patches also > >> fix the uniform version. Jordan, can you verify if this fixes the CTS > >> test case? > > > > Unfortunately, no. The CTS case has some control flow. I had removed > > it to minimize the test case. > > > > Here is a small shader_test that has control flow and still fails to > > compile with your patches: > > > > http://sprunge.us/LIjA > > > >> In any case, since Curro is working on a more general fix for this > >> stuff, I guess you'd rather wait for his patches... > > > > It depends how long we'd have to wait. :) Anyway, since we don't have > > a short-term fix anyhow, let's wait to see what curro has to say. > > > Assuming that at least some of the scalar writes in the shader are being > introduced by emit_uniformize(), an alternative hack that might get the > shader to compile for the moment would be to change emit_uniformize() to > emit a full SIMD-width BROADCAST instruction instead of a scalar one > (see attachment) -- Which is pretty useless in principle because only > the first component will ever be used but it might keep dataflow > analysis from getting confused. BTW, having a look at dEQP SSBO tests, I see that many of them use CS and a huge amount of these hit this problem, we are talking close to 700 tests (there are about 2000 SSBO tests in total). It seems that CS programs hit this problem a lot and this patch that you attached seems to get all these tests to compile just fine. Iago ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix is-renderable check in intel_image_target_renderbuffer_storage
Tested-by: Tapani Pälli On 10/16/2015 12:01 AM, Ian Romanick wrote: From: Ian Romanick Previously we could create a renderbuffer with format MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage, then FAIL to convert the EGLImage back to a renderbuffer because reasons. Just use the same check in intel_image_target_renderbuffer_storage that brw_render_target_supported uses. There are more checks in brw_render_target_supported, but I don't think they are necessary here. A different approach would be to refactor brw_render_target_supported to take rb->Format and rb->NumSamples as parameters (instead of a gl_renderbuffer) and use the new function here. Fixes: ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image Signed-off-by: Ian Romanick Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476 Cc: "10.3 10.4 10.5 10.6 11.0" --- src/mesa/drivers/dri/i965/intel_fbo.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c b/src/mesa/drivers/dri/i965/intel_fbo.c index 5a6b0dd..7f281fa 100644 --- a/src/mesa/drivers/dri/i965/intel_fbo.c +++ b/src/mesa/drivers/dri/i965/intel_fbo.c @@ -348,14 +348,10 @@ intel_image_target_renderbuffer_storage(struct gl_context *ctx, } /* __DRIimage is opaque to the core so it has to be checked here */ - switch (image->format) { - case MESA_FORMAT_R8G8B8A8_UNORM: + if (!brw->format_supported_as_render_target[image->format]) { _mesa_error(ctx, GL_INVALID_OPERATION, "glEGLImageTargetRenderbufferStorage(unsupported image format"); return; - break; - default: - break; } irb = intel_renderbuffer(rb); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] glsl_to_tgsi: Use {Num}UniformBlocks instead of {Num}BufferInterfaceBlocks
The latter holds both UBOs and SSBOs, but here we only want UBOs. --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 06f510d..f481e89 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -5388,10 +5388,10 @@ st_translate_program( } if (program->shader) { - unsigned num_ubos = program->shader->NumBufferInterfaceBlocks; + unsigned num_ubos = program->shader->NumUniformBlocks; for (i = 0; i < num_ubos; i++) { - unsigned size = program->shader->BufferInterfaceBlocks[i].UniformBufferSize; + unsigned size = program->shader->UniformBlocks[i]->UniformBufferSize; unsigned num_const_vecs = (size + 15) / 16; unsigned first, last; assert(num_const_vecs > 0); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] i965: Do not use NumBufferInterfaceBlocks
This is the only place in the driver where we use this. Since we now work with separate index spaces, always use NumUniformBlocks and NumShaderStorageBlocks instead of NumBufferInterfaceBlocks to be more consistent with the rest of the code. --- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index a304eec..6ebe648 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -972,7 +972,7 @@ brw_upload_ubo_surfaces(struct brw_context *brw, } } - if (shader->NumBufferInterfaceBlocks) + if (shader->NumUniformBlocks || shader->NumShaderStorageBlocks) brw->ctx.NewDriverState |= BRW_NEW_SURFACES; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] main: GL_ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH is about UBOS, not SSBOs
--- src/mesa/main/shaderapi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index 26995ad..18e463d 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -713,10 +713,10 @@ get_programiv(struct gl_context *ctx, GLuint program, GLenum pname, if (!has_ubo) break; - for (i = 0; i < shProg->NumBufferInterfaceBlocks; i++) { + for (i = 0; i < shProg->NumUniformBlocks; i++) { /* Add one for the terminating NUL character. */ -const GLint len = strlen(shProg->BufferInterfaceBlocks[i].Name) + 1; +const GLint len = strlen(shProg->UniformBlocks[i]->Name) + 1; if (len > max_len) max_len = len; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] st/mesa: Use {Num}UniformBlocks instead of {Num}BufferInterfaceBlocks
The latter holds both UBOs and SSBOs, but here we only want UBOs. --- src/mesa/state_tracker/st_atom_constbuf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_atom_constbuf.c b/src/mesa/state_tracker/st_atom_constbuf.c index 69e26cb..acaa85d 100644 --- a/src/mesa/state_tracker/st_atom_constbuf.c +++ b/src/mesa/state_tracker/st_atom_constbuf.c @@ -234,11 +234,11 @@ static void st_bind_ubos(struct st_context *st, if (!shader) return; - for (i = 0; i < shader->NumBufferInterfaceBlocks; i++) { + for (i = 0; i < shader->NumUniformBlocks; i++) { struct gl_uniform_buffer_binding *binding; struct st_buffer_object *st_obj; - binding = &st->ctx->UniformBufferBindings[shader->BufferInterfaceBlocks[i].Binding]; + binding = &st->ctx->UniformBufferBindings[shader->UniformBlocks[i]->Binding]; st_obj = st_buffer_object(binding->BufferObject); cb.buffer = st_obj->buffer; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] main: Use NumUniformBlocks to count UBOs
Now that we have separate index spaces for UBOs and SSBOs we do not need to iterate through BufferInterfaceBlocks any more, we can just take the UBO count directly from NumUniformBlocks. --- src/mesa/main/shaderapi.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index 6a2f60d..26995ad 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -729,11 +729,7 @@ get_programiv(struct gl_context *ctx, GLuint program, GLenum pname, if (!has_ubo) break; - *params = 0; - for (unsigned i = 0; i < shProg->NumBufferInterfaceBlocks; i++) { - if (!shProg->BufferInterfaceBlocks[i].IsShaderStorage) -(*params)++; - } + *params = shProg->NumUniformBlocks; return; case GL_PROGRAM_BINARY_RETRIEVABLE_HINT: /* This enum isn't part of the OES extension for OpenGL ES 2.0. It is -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] mesa: Set api prefix to version string when overriding version
2015-10-16 14:36 GMT+08:00 Tapani Pälli : > Otherwise there are problems when user overrides version and application > such as Piglit wants to detect used api with glGetString(GL_VERSION). > > This makes it currently impossible to run glslparsertest tests for > OpenGL ES when using version override. > > Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1. > > Before: > "3.1 Mesa 11.1.0-devel (git-24a1a15)" > > After: > "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)" > > v2: only include api prefix for OpenGL ES (Boyan Ding) > > Signed-off-by: Tapani Pälli Cc: "11.0" > --- > src/mesa/main/version.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c > index 498b2f8..017bc3a 100644 > --- a/src/mesa/main/version.c > +++ b/src/mesa/main/version.c > @@ -24,6 +24,7 @@ > > > #include > +#include "context.h" > #include "imports.h" > #include "mtypes.h" > #include "version.h" > @@ -181,7 +182,15 @@ _mesa_override_gl_version(struct gl_context *ctx) > { > if (_mesa_override_gl_version_contextless(&ctx->Const, &ctx->API, > &ctx->Version)) { > - create_version_string(ctx, ""); > + /* We need to include API in version string for OpenGL ES, otherwise > + * application can not detect GLES via glGetString(GL_VERSION) query. > + * > + * From OpenGL ES 3.2 spec, Page 436: > + * > + * "The VERSION string is laid out as follows: > + * "OpenGL ES N.M vendor-specific information" " > + */ > + create_version_string(ctx, _mesa_is_gles(ctx) ? "OpenGL ES " : ""); > } > } > > -- > 2.4.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 92361] [BSW SKL] Regression: glx@glx-copy-sub-buffer failed
https://bugs.freedesktop.org/show_bug.cgi?id=92361 --- Comment #3 from cprigent --- New result with last setup (Mesa 11.0.3).: glx@glx-copy-sub-buffer Not run glx@glx-copy-sub-buffer samples=2 Fail glx@glx-copy-sub-buffer samples=4 Fail glx@glx-copy-sub-buffer samples=6 Fail glx@glx-copy-sub-buffer samples=8 Fail Platform: Braswell M CPU : Intel(R) Celeron N3060 1.60GHz @ 1.6 GHz (family: 6, model: 76 stepping: 4) SoC : BSW D0 QDF : K6XC CRB : BRASWELL RVP Fab2 Mandatory Reworks : All Feature Reworks: F28, F32, F33, F35, F37 Optional reworks : O-01a; O-02, O-03 Software Linux distribution: Ubuntu 14.04 LTS 64 bits BIOS : BRAS.X64.B084.R00.1508310642 TXE FW : 2.0.0.2073 Ksc : 1.08 Linux distribution: Ubuntu 14.04 LTS 64 bits kernel 4.3.0-rc5-drm-intel-nightly+ 819f710081d7ea116b9b44a9264061d2c030f009 from git://anongit.freedesktop.org/drm-intel Mesa - 11.0.3 from http://cgit.freedesktop.org/mesa/mesa/ xf86-video-intel - 2.99.917 from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/ Libdrm - 2.4.65 from http://cgit.freedesktop.org/mesa/drm/ Libva - 1.6.1 from http://cgit.freedesktop.org/libva/ vaapi intel-driver - 1.6.1 from http://cgit.freedesktop.org/vaapi/intel-driver Cairo - 1.14.2 from http://cgit.freedesktop.org/cairo Xorg Xserver - 1.17.2 from http://cgit.freedesktop.org/xorg/xserver Kernel commit 819f710081d7ea116b9b44a9264061d2c030f009 Author: Daniel Vetter Date: Wed Oct 14 19:05:17 2015 +0200 drm-intel-nightly: 2015y-10m-14d-17h-04m-36s UTC integration manifest -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] mesa: Set api prefix to version string when overriding version
On Fri, 2015-10-16 at 09:36 +0300, Tapani Pälli wrote: > Otherwise there are problems when user overrides version and application > such as Piglit wants to detect used api with glGetString(GL_VERSION). > > This makes it currently impossible to run glslparsertest tests for > OpenGL ES when using version override. > > Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1. > > Before: > "3.1 Mesa 11.1.0-devel (git-24a1a15)" > > After: > "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)" > > v2: only include api prefix for OpenGL ES (Boyan Ding) > > Signed-off-by: Tapani Pälli > --- > src/mesa/main/version.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c > index 498b2f8..017bc3a 100644 > --- a/src/mesa/main/version.c > +++ b/src/mesa/main/version.c > @@ -24,6 +24,7 @@ > > > #include > +#include "context.h" > #include "imports.h" > #include "mtypes.h" > #include "version.h" > @@ -181,7 +182,15 @@ _mesa_override_gl_version(struct gl_context *ctx) > { > if (_mesa_override_gl_version_contextless(&ctx->Const, &ctx->API, > &ctx->Version)) { > - create_version_string(ctx, ""); > + /* We need to include API in version string for OpenGL ES, otherwise > + * application can not detect GLES via glGetString(GL_VERSION) query. > + * > + * From OpenGL ES 3.2 spec, Page 436: > + * > + * "The VERSION string is laid out as follows: > + * "OpenGL ES N.M vendor-specific information" " > + */ Maybe also include the desktop spec? From OpenGL 4.5 spec, Page 538: "The VERSION and SHADING_LANGUAGE_VERSION strings are laid out as follows: " Reviewed-by: Iago Toral Quiroga > + create_version_string(ctx, _mesa_is_gles(ctx) ? "OpenGL ES " : ""); > } > } > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: fix check SSBOs support for builtin functions
On 16/10/15 09:36, Iago Toral wrote: > On Fri, 2015-10-16 at 09:10 +0200, Samuel Iglesias Gonsalvez wrote: >> has_shader_storage_buffer_objects() returns true also if the OpenGL >> context is 4.30 or ES 3.1. >> >> Previously, we were saying that all atomic*() GLSL builtin functions >> for SSBOs were not available when OpenGL ES 3.1 context was in use. >> >> Fixes 48 dEQP-GLES31 tests: >> >> dEQP-GLES31.functional.ssbo.atomic.* > > Reviewed-by: Iago Toral Quiroga > > That said: > >> Signed-off-by: Samuel Iglesias Gonsalvez >> --- >> src/glsl/builtin_functions.cpp | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp >> index f0f6be2..aae25f8 100644 >> --- a/src/glsl/builtin_functions.cpp >> +++ b/src/glsl/builtin_functions.cpp >> @@ -403,7 +403,7 @@ shader_atomic_counters(const _mesa_glsl_parse_state >> *state) >> static bool >> shader_storage_buffer_object(const _mesa_glsl_parse_state *state) >> { >> - return state->ARB_shader_storage_buffer_object_enable; > > shouldn't we set the above to true with GLES 3.1? > ARB_shader_storage_buffer_object_enable is true when the driver supports the ARB_shader_storage_buffer_object extension *and* we explicitly enable the extension inside a GLSL (or GLSL ES) shader: #extension ARB_shader_storage_buffer_object : enable Same happens with ARB_shader_storage_buffer_object_warn. See _mesa_glsl_supported_extensions[] in glsl_parser_extras.cpp for more details. When we are under OpenGL 4.3 (or higher) or OpenGL ES 3.1 (or higher) context, we don't need to explicitly enable the extension in the GLSL shader as it should be already supported by the driver. That second test is done in has_shader_storage_buffer_objects(). Sam >> + return state->has_shader_storage_buffer_objects(); >> } >> >> static bool > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler
On Thu, Oct 15, 2015 at 07:29:31AM -0700, Jason Ekstrand wrote: >On Oct 14, 2015 10:48 PM, "Pohjolainen, Topi" >wrote: >> >> On Wed, Oct 14, 2015 at 11:53:37AM -0700, Jason Ekstrand wrote: >> > On Wed, Oct 14, 2015 at 1:41 AM, Pohjolainen, Topi >> > wrote: >> > > On Wed, Oct 14, 2015 at 11:25:40AM +0300, Pohjolainen, Topi wrote: >> > >> On Sat, Oct 10, 2015 at 08:09:01AM -0700, Jason Ekstrand wrote: >> > >> > This commit removes all dependence on GL state by getting rid of >the >> > >> > brw_context parameter and the GL data structures. >> > >> > >> > >> > v2 (Jason Ekstrand): >> > >> >- Patch use_legacy_snorm_formula through as a function >argument rather >> > >> > than trying to go through the shader key. >> > >> > --- >> > >> > src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 >+- >> > >> > src/mesa/drivers/dri/i965/brw_vs.c | 16 +++- >> > >> > src/mesa/drivers/dri/i965/brw_vs.h | 12 -- >> > >> > 3 files changed, 49 insertions(+), 49 deletions(-) >> > >> > >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> > >> > index 4b8390f..8e38729 100644 >> > >> > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >> > >> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >> > >> > @@ -1937,51 +1937,42 @@ extern "C" { >> > >> > * Returns the final assembly and the program's size. >> > >> > */ >> > >> > const unsigned * >> > >> > -brw_vs_emit(struct brw_context *brw, >> > >> > +brw_vs_emit(const struct brw_compiler *compiler, void *log_data, >> > >> > void *mem_ctx, >> > >> > const struct brw_vs_prog_key *key, >> > >> > struct brw_vs_prog_data *prog_data, >> > >> > -struct gl_vertex_program *vp, >> > >> > -struct gl_shader_program *prog, >> > >> > +const nir_shader *shader, >> > >> > +gl_clip_plane *clip_planes, >> > >> > +bool use_legacy_snorm_formula, >> > >> > int shader_time_index, >> > >> > -unsigned *final_assembly_size) >> > >> > +unsigned *final_assembly_size, >> > >> > +char **error_str) >> > >> > { >> > >> > const unsigned *assembly = NULL; >> > >> > >> > >> > - if (brw->intelScreen->compiler->scalar_vs) { >> > >> > + if (compiler->scalar_vs) { >> > >> >prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8; >> > >> > >> > >> > - fs_visitor v(brw->intelScreen->compiler, brw, >> > >> > - mem_ctx, key, &prog_data->base.base, >> > >> > + fs_visitor v(compiler, log_data, mem_ctx, key, >&prog_data->base.base, >> > >> > NULL, /* prog; Only used for >TEXTURE_RECTANGLE on gen < 8 */ >> > >> > - vp->Base.nir, 8, shader_time_index); >> > >> > - if (!v.run_vs(brw_select_clip_planes(&brw->ctx))) { >> > >> > - if (prog) { >> > >> > -prog->LinkStatus = false; >> > >> > -ralloc_strcat(&prog->InfoLog, v.fail_msg); >> > >> > - } >> > >> > - >> > >> > - _mesa_problem(NULL, "Failed to compile vertex shader: >%s\n", >> > >> > - v.fail_msg); >> > >> > + shader, 8, shader_time_index); >> > >> > + if (!v.run_vs(clip_planes)) { >> > >> > + if (error_str) >> > >> > +*error_str = ralloc_strdup(mem_ctx, v.fail_msg); >> > >> >> > >> I don't particularly like the complexity of the error reporting >mechanism. >> > >> First vec4_visitor::fail() uses ralloc_asprintf() to create one >string, then >> > >> we make a copy of it here and finally the caller of brw_vs_emit() >makes yet >> > >> another copy using ralloc_strcat(). >> > >> I wonder if we could pass the final destination all the way for the >> > >> vec4_visitor::fail() to augment with ralloc_asprintf() and hence >avoid all >> > > >> > > Or more appropiately using ralloc_asprintf_append()... >> > > >> > >> the indirection in the middle. What do you think? >> > >> > I'd be moderately ok with just doing "*error_str = v.fail_msg" and >> > avoiding the extra copy. I'm not a big fan of the extra copy, but I >> > decided to leave it in for a couple of reasons >> > >> > 1) It only happens on the error path so it's not a big deal. >> >> I wasn't concerned about the overhead either, as you said this is error >path >> only. >> >> > >> > 2) Not copying it is kind of a layering violation. You're grabbing a >> > string from an object without copying it, destroying the object, and >> > then handing it back to the thing that called you. The only
Re: [Mesa-dev] [PATCH] glsl: fix check SSBOs support for builtin functions
On Fri, 2015-10-16 at 09:10 +0200, Samuel Iglesias Gonsalvez wrote: > has_shader_storage_buffer_objects() returns true also if the OpenGL > context is 4.30 or ES 3.1. > > Previously, we were saying that all atomic*() GLSL builtin functions > for SSBOs were not available when OpenGL ES 3.1 context was in use. > > Fixes 48 dEQP-GLES31 tests: > > dEQP-GLES31.functional.ssbo.atomic.* Reviewed-by: Iago Toral Quiroga That said: > Signed-off-by: Samuel Iglesias Gonsalvez > --- > src/glsl/builtin_functions.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp > index f0f6be2..aae25f8 100644 > --- a/src/glsl/builtin_functions.cpp > +++ b/src/glsl/builtin_functions.cpp > @@ -403,7 +403,7 @@ shader_atomic_counters(const _mesa_glsl_parse_state > *state) > static bool > shader_storage_buffer_object(const _mesa_glsl_parse_state *state) > { > - return state->ARB_shader_storage_buffer_object_enable; shouldn't we set the above to true with GLES 3.1? > + return state->has_shader_storage_buffer_objects(); > } > > static bool ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev