[Mesa-dev] [PATCH v3] glsl: handle a switch where default is in the middle of cases
This fixes following tests in es3conform: shaders.switch.default_not_last_dynamic_vertex shaders.switch.default_not_last_dynamic_fragment and makes following tests in Piglit pass: glsl-1.30/execution/switch/fs-default-notlast-fallthrough glsl-1.30/execution/switch/fs-default_notlast No Piglit regressions. v2: take away unnecessary ir_if, just use conditional assignment v3: use foreach_in_list instead of foreach_list Signed-off-by: Tapani Pälli tapani.pa...@intel.com Reviewed-by: Roland Scheidegger srol...@vmware.com (v2) --- src/glsl/ast_to_hir.cpp | 83 +-- src/glsl/glsl_parser_extras.h | 3 ++ 2 files changed, 83 insertions(+), 3 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 885bee5..0e0013e 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -4513,6 +4513,12 @@ ast_switch_statement::hir(exec_list *instructions, instructions-push_tail(new(ctx) ir_assignment(deref_is_break_var, is_break_val)); + state-switch_state.run_default = + new(ctx) ir_variable(glsl_type::bool_type, + run_default_tmp, + ir_var_temporary); + instructions-push_tail(state-switch_state.run_default); + /* Cache test expression. */ test_to_hir(instructions, state); @@ -4567,8 +4573,71 @@ ir_rvalue * ast_case_statement_list::hir(exec_list *instructions, struct _mesa_glsl_parse_state *state) { - foreach_list_typed (ast_case_statement, case_stmt, link, this-cases) - case_stmt-hir(instructions, state); + exec_list default_case, after_default, tmp; + + foreach_list_typed (ast_case_statement, case_stmt, link, this-cases) { + case_stmt-hir(tmp, state); + + /* Default case. */ + if (state-switch_state.previous_default default_case.is_empty()) { + default_case.append_list(tmp); + continue; + } + + /* If default case found, append 'after_default' list. */ + if (!default_case.is_empty()) + after_default.append_list(tmp); + else + instructions-append_list(tmp); + } + + /* Handle the default case. This is done here because default might not be +* the last case. We need to add checks against following cases first to see +* if default should be chosen or not. +*/ + if (!default_case.is_empty()) { + + /* Default case was the last one, no checks required. */ + if (after_default.is_empty()) { + instructions-append_list(default_case); + return NULL; + } + + ir_rvalue *const true_val = new (state) ir_constant(true); + ir_dereference_variable *deref_run_default_var = + new(state) ir_dereference_variable(state-switch_state.run_default); + + /* Choose to run default case initially, following conditional + * assignments might change this. + */ + ir_assignment *const init_var = + new(state) ir_assignment(deref_run_default_var, true_val); + instructions-push_tail(init_var); + + foreach_in_list(ir_instruction, ir, after_default) { + ir_assignment *assign = ir-as_assignment(); + + if (!assign) +continue; + + /* Clone the check between case label and init expression. */ + ir_expression *exp = (ir_expression*) assign-condition; + ir_expression *clone = exp-clone(state, NULL); + + ir_dereference_variable *deref_var = +new(state) ir_dereference_variable(state-switch_state.run_default); + ir_rvalue *const false_val = new (state) ir_constant(false); + + ir_assignment *const set_false = +new(state) ir_assignment(deref_var, false_val, clone); + + instructions-push_tail(set_false); + } + + /* Append default case and all cases after it. */ + instructions-append_list(default_case); + instructions-append_list(after_default); + } /* Case statements do not have r-values. */ return NULL; @@ -4728,9 +4797,17 @@ ast_case_label::hir(exec_list *instructions, } state-switch_state.previous_default = this; + /* Set fallthru condition on 'run_default' bool. */ + ir_dereference_variable *deref_run_default = + new(ctx) ir_dereference_variable(state-switch_state.run_default); + ir_rvalue *const cond_true = new(ctx) ir_constant(true); + ir_expression *test_cond = new(ctx) ir_expression(ir_binop_all_equal, +cond_true, +deref_run_default); + /* Set falltrhu state. */ ir_assignment *set_fallthru = - new(ctx) ir_assignment(deref_fallthru_var, true_val); + new(ctx) ir_assignment(deref_fallthru_var, true_val, test_cond); instructions-push_tail(set_fallthru); } diff --git
[Mesa-dev] [Bug 78716] Fix Mesa bugs for running Unreal Engine 4.1 Cave effects demo compiled for Linux
https://bugs.freedesktop.org/show_bug.cgi?id=78716 --- Comment #10 from Tapani Pälli lem...@gmail.com --- Created attachment 102771 -- https://bugs.freedesktop.org/attachment.cgi?id=102771action=edit patch that adds 4.0 missing blend functions api attached patch makes some of these new demos work for me on Intel Haswell machine -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 78716] Fix Mesa bugs for running Unreal Engine 4.1 Cave effects demo compiled for Linux
https://bugs.freedesktop.org/show_bug.cgi?id=78716 --- Comment #11 from Kenneth Graunke kenn...@whitecape.org --- (In reply to comment #10) Created attachment 102771 [details] [review] patch that adds 4.0 missing blend functions api attached patch makes some of these new demos work for me on Intel Haswell machine Ah, that problem :) Tapani, would you mind sending that patch to the mailing list and Cc'ing Ian? We support ARB_draw_buffers_blend, which introduces these functions but with an ARB suffix. But, if you read ARB_blend_func_extended, it talks abbout removing references to BlendFunci (not BlendFunciARB!) if you don't support ARB_draw_buffers_blend...which makes it sound like ARB_draw_buffers_blend should have introduced them. https://www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt https://www.opengl.org/registry/specs/ARB/blend_func_extended.txt Since it's just accepting the core names in addition to the ARB names, which we're going to do eventually anyway, I'm in favor of taking your patch. I've seen a few of Intel's GL tests hit this bug as well. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Add an option to not generate the SIMD8 fragment shader
On Friday, July 11, 2014 11:26:30 AM Kristian Høgsberg wrote: For now, this can only be triggered with a new 'no8' INTEL_DEBUG option Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_context.h | 2 ++ src/mesa/drivers/dri/i965/brw_fs.cpp | 14 -- src/mesa/drivers/dri/i965/gen7_wm_state.c | 4 ++-- src/mesa/drivers/dri/i965/gen8_ps_state.c | 4 ++-- src/mesa/drivers/dri/i965/intel_debug.c | 1 + src/mesa/drivers/dri/i965/intel_debug.h | 1 + 6 files changed, 20 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 2943a20..11cea04 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -340,6 +340,7 @@ struct brw_wm_prog_data { /** @} */ } binding_table; + bool no_8; bool dual_src_blend; bool uses_pos_offset; bool uses_omask; @@ -1039,6 +1040,7 @@ struct brw_context bool has_compr4; bool has_negative_rhw_bug; bool has_pln; + bool no_simd8; /** * Some versions of Gen hardware don't do centroid interpolation correctly diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a3ad375..f018d94 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3258,15 +3258,25 @@ brw_wm_fs_emit(struct brw_context *brw, } } + exec_list *simd8_instructions; + int no_simd8 = (INTEL_DEBUG DEBUG_NO8) || brw-no_simd8; + if (no_simd8 simd16_instructions) { + simd8_instructions = NULL; + prog_data-no_8 = true; + } else { + simd8_instructions = v.instructions; + prog_data-no_8 = false; + } + const unsigned *assembly = NULL; if (brw-gen = 8) { gen8_fs_generator g(brw, mem_ctx, key, prog_data, prog, fp, v.do_dual_src); - assembly = g.generate_assembly(v.instructions, simd16_instructions, + assembly = g.generate_assembly(simd8_instructions, simd16_instructions, final_assembly_size); } else { fs_generator g(brw, mem_ctx, key, prog_data, prog, fp, v.do_dual_src, v.runtime_check_aads_emit, INTEL_DEBUG DEBUG_WM); - assembly = g.generate_assembly(v.instructions, simd16_instructions, + assembly = g.generate_assembly(simd8_instructions, simd16_instructions, final_assembly_size); } diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c b/src/mesa/drivers/dri/i965/gen7_wm_state.c index c03d19d..c3e9316 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c @@ -223,9 +223,9 @@ upload_ps_state(struct brw_context *brw) _mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program, false); assert(min_inv_per_frag = 1); - if (brw-wm.prog_data-prog_offset_16) { + if (brw-wm.prog_data-prog_offset_16 || brw-wm.prog_data-no_8) { dw4 |= GEN7_PS_16_DISPATCH_ENABLE; - if (min_inv_per_frag == 1) { + if (!brw-wm.prog_data-no_8 min_inv_per_frag == 1) { dw4 |= GEN7_PS_8_DISPATCH_ENABLE; dw5 |= (brw-wm.prog_data-base.dispatch_grf_start_reg GEN7_PS_DISPATCH_START_GRF_SHIFT_0); diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index f58d49c..49d4fe0 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -195,9 +195,9 @@ upload_ps_state(struct brw_context *brw) _mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program, false); assert(min_invocations_per_fragment = 1); - if (brw-wm.prog_data-prog_offset_16) { + if (brw-wm.prog_data-prog_offset_16 || brw-wm.prog_data-no_8) { dw6 |= GEN7_PS_16_DISPATCH_ENABLE; - if (min_invocations_per_fragment == 1) { + if (!brw-wm.prog_data-no_8 min_invocations_per_fragment == 1) { dw6 |= GEN7_PS_8_DISPATCH_ENABLE; dw7 |= (brw-wm.prog_data-base.dispatch_grf_start_reg GEN7_PS_DISPATCH_START_GRF_SHIFT_0); diff --git a/src/mesa/drivers/dri/i965/intel_debug.c b/src/mesa/drivers/dri/i965/intel_debug.c index c72fce2..1fb8b6c 100644 --- a/src/mesa/drivers/dri/i965/intel_debug.c +++ b/src/mesa/drivers/dri/i965/intel_debug.c @@ -66,6 +66,7 @@ static const struct dri_debug_control debug_control[] = { { nodualobj, DEBUG_NO_DUAL_OBJECT_GS }, { optimizer, DEBUG_OPTIMIZER }, { noann, DEBUG_NO_ANNOTATION }, + { no8, DEBUG_NO8 }, { NULL,0 } }; diff --git a/src/mesa/drivers/dri/i965/intel_debug.h b/src/mesa/drivers/dri/i965/intel_debug.h index 37dc34a..8e1c299 100644 --- a/src/mesa/drivers/dri/i965/intel_debug.h +++ b/src/mesa/drivers/dri/i965/intel_debug.h @@ -62,6 +62,7 @@ extern uint64_t
Re: [Mesa-dev] [PATCH] glsl: Fix aggregates with dynamic initializers.
On Thursday, July 10, 2014 09:55:31 AM Cody Northrop wrote: Vectors are falling in to the ir_dereference_array() path. Without this change, the following glsl aborts the debug driver, or gets the wrong answer in release: mat2x2 a = mat2( vec2( 1.0, vertex.x ), vec2( 0.0, 1.0 ) ); Also submitting piglit tests, will reference in bug. v2: Rebase on Mesa master. v3: Remove unneeded check for arrays, which are covered by process_array_constructor(), recommended by Timothy Arceri. Thanks for the feedback, Timothy, and for the updated patch, Cody. Pushed. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: add a mechanism to allow #extension directives in the middle of shaders
On Tuesday, July 08, 2014 08:29:44 PM Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com This is needed to make Unigine Heaven 4.0 and Unigine Valley 1.0 work with sample shading. Also, if this is disabled, the error message at least makes sense now. --- src/glsl/glsl_parser.yy | 8 src/glsl/glsl_parser_extras.cpp | 2 ++ src/glsl/glsl_parser_extras.h | 2 ++ src/mesa/main/mtypes.h | 5 + 4 files changed, 17 insertions(+) diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy index b989749..4c87163 100644 --- a/src/glsl/glsl_parser.yy +++ b/src/glsl/glsl_parser.yy @@ -376,6 +376,14 @@ external_declaration_list: if ($2 != NULL) state-translation_unit.push_tail( $2-link); } + | external_declaration_list extension_statement { + if (!state-allow_extension_directive_midshader) { + _mesa_glsl_error( @2, state, + #extension directive is not allowed + in the middle of a shader); + YYERROR; + } + } ; variable_identifier: diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index b327c2b..890123a 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -210,6 +210,8 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx, this-early_fragment_tests = false; memset(this-atomic_counter_offsets, 0, sizeof(this-atomic_counter_offsets)); + this-allow_extension_directive_midshader = + ctx-Const.AllowGLSLExtensionDirectiveMidShader; } /** diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index 1791816..f720d6a 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -490,6 +490,8 @@ struct _mesa_glsl_parse_state { /** Atomic counter offsets by binding */ unsigned atomic_counter_offsets[MAX_COMBINED_ATOMIC_BUFFERS]; + + bool allow_extension_directive_midshader; }; # define YYLLOC_DEFAULT(Current, Rhs, N) \ diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index a7126fd..b42a175 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3349,6 +3349,11 @@ struct gl_constants GLuint ForceGLSLVersion; /** +* Allow GLSL #extension directives in the middle of shaders. +*/ + GLboolean AllowGLSLExtensionDirectiveMidShader; + + /** * Does the driver support real 32-bit integers? (Otherwise, integers are * simulated via floats.) */ Looks great to me, Marek - thanks for doing this. I'd written a similar patch, but introduced shift/reduce conflicts while doing it. Looks like you avoided that - nice :) Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glapi: Use GetProcAddress instead of dlsym on Windows.
On 07/11/2014 12:52 PM, Vinson Lee wrote: This patch fixes this MinGW build error. glapi_gentable.c: In function '_glapi_create_table_from_handle': glapi_gentable.c:123:9: error: implicit declaration of function 'dlsym' [-Werror=implicit-function-declaration] *procp = dlsym(handle, symboln); ^ Signed-off-by: Vinson Lee v...@freedesktop.org --- src/mapi/glapi/gen/gl_gentable.py | 4 1 file changed, 4 insertions(+) diff --git a/src/mapi/glapi/gen/gl_gentable.py b/src/mapi/glapi/gen/gl_gentable.py index d45a5e0..9db6a77 100644 --- a/src/mapi/glapi/gen/gl_gentable.py +++ b/src/mapi/glapi/gen/gl_gentable.py @@ -134,7 +134,11 @@ body_template = if(!disp-%(name)s) { void ** procp = (void **) disp-%(name)s; snprintf(symboln, sizeof(symboln), %%s%(entry_point)s, symbol_prefix); +#ifdef _WIN32 +*procp = GetProcAddress(handle, symboln); +#else *procp = dlsym(handle, symboln); +#endif } Hmm, I've been building on Mingw as-is for a long-time without ever seeing this problem. That said, this patch doesn't seem to break anything for me. Acked-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Add an option to not generate the SIMD8 fragment shader
On Mon, Jul 14, 2014 at 9:17 AM, Kenneth Graunke kenn...@whitecape.org wrote: On Friday, July 11, 2014 11:26:30 AM Kristian Høgsberg wrote: For now, this can only be triggered with a new 'no8' INTEL_DEBUG option Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_context.h | 2 ++ src/mesa/drivers/dri/i965/brw_fs.cpp | 14 -- src/mesa/drivers/dri/i965/gen7_wm_state.c | 4 ++-- src/mesa/drivers/dri/i965/gen8_ps_state.c | 4 ++-- src/mesa/drivers/dri/i965/intel_debug.c | 1 + src/mesa/drivers/dri/i965/intel_debug.h | 1 + 6 files changed, 20 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 2943a20..11cea04 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -340,6 +340,7 @@ struct brw_wm_prog_data { /** @} */ } binding_table; + bool no_8; bool dual_src_blend; bool uses_pos_offset; bool uses_omask; @@ -1039,6 +1040,7 @@ struct brw_context bool has_compr4; bool has_negative_rhw_bug; bool has_pln; + bool no_simd8; /** * Some versions of Gen hardware don't do centroid interpolation correctly diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a3ad375..f018d94 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3258,15 +3258,25 @@ brw_wm_fs_emit(struct brw_context *brw, } } + exec_list *simd8_instructions; + int no_simd8 = (INTEL_DEBUG DEBUG_NO8) || brw-no_simd8; + if (no_simd8 simd16_instructions) { + simd8_instructions = NULL; + prog_data-no_8 = true; + } else { + simd8_instructions = v.instructions; + prog_data-no_8 = false; + } + const unsigned *assembly = NULL; if (brw-gen = 8) { gen8_fs_generator g(brw, mem_ctx, key, prog_data, prog, fp, v.do_dual_src); - assembly = g.generate_assembly(v.instructions, simd16_instructions, + assembly = g.generate_assembly(simd8_instructions, simd16_instructions, final_assembly_size); } else { fs_generator g(brw, mem_ctx, key, prog_data, prog, fp, v.do_dual_src, v.runtime_check_aads_emit, INTEL_DEBUG DEBUG_WM); - assembly = g.generate_assembly(v.instructions, simd16_instructions, + assembly = g.generate_assembly(simd8_instructions, simd16_instructions, final_assembly_size); } diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c b/src/mesa/drivers/dri/i965/gen7_wm_state.c index c03d19d..c3e9316 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c @@ -223,9 +223,9 @@ upload_ps_state(struct brw_context *brw) _mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program, false); assert(min_inv_per_frag = 1); - if (brw-wm.prog_data-prog_offset_16) { + if (brw-wm.prog_data-prog_offset_16 || brw-wm.prog_data-no_8) { dw4 |= GEN7_PS_16_DISPATCH_ENABLE; - if (min_inv_per_frag == 1) { + if (!brw-wm.prog_data-no_8 min_inv_per_frag == 1) { dw4 |= GEN7_PS_8_DISPATCH_ENABLE; dw5 |= (brw-wm.prog_data-base.dispatch_grf_start_reg GEN7_PS_DISPATCH_START_GRF_SHIFT_0); diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index f58d49c..49d4fe0 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -195,9 +195,9 @@ upload_ps_state(struct brw_context *brw) _mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program, false); assert(min_invocations_per_fragment = 1); - if (brw-wm.prog_data-prog_offset_16) { + if (brw-wm.prog_data-prog_offset_16 || brw-wm.prog_data-no_8) { dw6 |= GEN7_PS_16_DISPATCH_ENABLE; - if (min_invocations_per_fragment == 1) { + if (!brw-wm.prog_data-no_8 min_invocations_per_fragment == 1) { dw6 |= GEN7_PS_8_DISPATCH_ENABLE; dw7 |= (brw-wm.prog_data-base.dispatch_grf_start_reg GEN7_PS_DISPATCH_START_GRF_SHIFT_0); diff --git a/src/mesa/drivers/dri/i965/intel_debug.c b/src/mesa/drivers/dri/i965/intel_debug.c index c72fce2..1fb8b6c 100644 --- a/src/mesa/drivers/dri/i965/intel_debug.c +++ b/src/mesa/drivers/dri/i965/intel_debug.c @@ -66,6 +66,7 @@ static const struct dri_debug_control debug_control[] = { { nodualobj, DEBUG_NO_DUAL_OBJECT_GS }, { optimizer, DEBUG_OPTIMIZER }, { noann, DEBUG_NO_ANNOTATION }, + { no8, DEBUG_NO8 }, { NULL,0 } }; diff --git a/src/mesa/drivers/dri/i965/intel_debug.h b/src/mesa/drivers/dri/i965/intel_debug.h index 37dc34a..8e1c299 100644 --- a/src/mesa/drivers/dri/i965/intel_debug.h +++
[Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather
Only supported on evergreen and later. Limited to single component textures as the hardware GATHER4 instruction ignores texture swizzles. Piglit quick run passes on radeon 6670 with all applicable textureGather tests, no regressions. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- docs/GL3.txt | 2 +- docs/relnotes/10.3.html| 2 +- src/gallium/drivers/r600/r600_pipe.c | 5 ++-- src/gallium/drivers/r600/r600_shader.c | 47 +- 4 files changed, 46 insertions(+), 10 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index a2f438b..20e57b0 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -118,7 +118,7 @@ GL 4.0: GL_ARB_tessellation_shader started (Fabian) GL_ARB_texture_buffer_object_rgb32 DONE (i965, nvc0, r600, radeonsi, softpipe) GL_ARB_texture_cube_map_arrayDONE (i965, nv50, nvc0, r600, radeonsi, softpipe) - GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi) + GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi, r600) GL_ARB_texture_query_lod DONE (i965, nv50, nvc0, radeonsi) GL_ARB_transform_feedback2 DONE (i965, nv50, nvc0, r600, radeonsi) GL_ARB_transform_feedback3 DONE (i965, nv50, nvc0, r600, radeonsi) diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html index 2e718fc..1c0fab6 100644 --- a/docs/relnotes/10.3.html +++ b/docs/relnotes/10.3.html @@ -49,7 +49,7 @@ Note: some of the new features are only available with certain drivers. liGL_ARB_sample_shading on radeonsi/li liGL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi/li liGL_ARB_texture_cube_map_array on radeonsi/li -liGL_ARB_texture_gather on radeonsi/li +liGL_ARB_texture_gather on radeonsi, r600/li liGL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, softpipe/li liGL_ARB_texture_query_lod on radeonsi/li liGL_ARB_viewport_array on nvc0/li diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index ca6399f..d967f0f 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -303,6 +303,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: case PIPE_CAP_CUBE_MAP_ARRAY: case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: + case PIPE_CAP_TEXTURE_GATHER_SM5: + return family = CHIP_CEDAR ? 1 : 0; + case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family = CHIP_CEDAR ? 1 : 0; /* Unsupported features. */ @@ -312,8 +315,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: - case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: - case PIPE_CAP_TEXTURE_GATHER_SM5: case PIPE_CAP_TEXTURE_QUERY_LOD: case PIPE_CAP_SAMPLE_SHADING: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6952e3c..db928f3 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -4477,7 +4477,8 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) if (inst-Instruction.Opcode == TGSI_OPCODE_TEX2 || inst-Instruction.Opcode == TGSI_OPCODE_TXB2 || - inst-Instruction.Opcode == TGSI_OPCODE_TXL2) + inst-Instruction.Opcode == TGSI_OPCODE_TXL2 || + inst-Instruction.Opcode == TGSI_OPCODE_TG4) sampler_src_reg = 2; src_gpr = tgsi_tex_get_src_gpr(ctx, 0); @@ -5079,6 +5080,13 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) case FETCH_OP_SAMPLE_G: opcode = FETCH_OP_SAMPLE_C_G; break; + /* Texture gather variants */ + case FETCH_OP_GATHER4: + tex.op = FETCH_OP_GATHER4_C; + break; + case FETCH_OP_GATHER4_O: + tex.op = FETCH_OP_GATHER4_C_O; + break; } } @@ -5089,9 +5097,21 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) tex.resource_id = tex.sampler_id + R600_MAX_CONST_BUFFERS; tex.src_gpr = src_gpr; tex.dst_gpr = ctx-file_offset[inst-Dst[0].Register.File] + inst-Dst[0].Register.Index; - tex.dst_sel_x = (inst-Dst[0].Register.WriteMask 1) ? 0 : 7; - tex.dst_sel_y = (inst-Dst[0].Register.WriteMask 2) ? 1 : 7; - tex.dst_sel_z = (inst-Dst[0].Register.WriteMask 4) ? 2 : 7; + + if (inst-Instruction.Opcode == TGSI_OPCODE_TG4) { + int8_t
Re: [Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather
On Mon, Jul 14, 2014 at 2:20 PM, Glenn Kennard glenn.kenn...@gmail.com wrote: Only supported on evergreen and later. Limited to single component textures as the hardware GATHER4 instruction ignores texture swizzles. Piglit quick run passes on radeon 6670 with all applicable textureGather tests, no regressions. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- docs/GL3.txt | 2 +- docs/relnotes/10.3.html| 2 +- src/gallium/drivers/r600/r600_pipe.c | 5 ++-- src/gallium/drivers/r600/r600_shader.c | 47 +- 4 files changed, 46 insertions(+), 10 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index a2f438b..20e57b0 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -118,7 +118,7 @@ GL 4.0: GL_ARB_tessellation_shader started (Fabian) GL_ARB_texture_buffer_object_rgb32 DONE (i965, nvc0, r600, radeonsi, softpipe) GL_ARB_texture_cube_map_arrayDONE (i965, nv50, nvc0, r600, radeonsi, softpipe) - GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi) + GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi, r600) GL_ARB_texture_query_lod DONE (i965, nv50, nvc0, radeonsi) GL_ARB_transform_feedback2 DONE (i965, nv50, nvc0, r600, radeonsi) GL_ARB_transform_feedback3 DONE (i965, nv50, nvc0, r600, radeonsi) diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html index 2e718fc..1c0fab6 100644 --- a/docs/relnotes/10.3.html +++ b/docs/relnotes/10.3.html @@ -49,7 +49,7 @@ Note: some of the new features are only available with certain drivers. liGL_ARB_sample_shading on radeonsi/li liGL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi/li liGL_ARB_texture_cube_map_array on radeonsi/li -liGL_ARB_texture_gather on radeonsi/li +liGL_ARB_texture_gather on radeonsi, r600/li liGL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, softpipe/li liGL_ARB_texture_query_lod on radeonsi/li liGL_ARB_viewport_array on nvc0/li diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index ca6399f..d967f0f 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -303,6 +303,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: case PIPE_CAP_CUBE_MAP_ARRAY: case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: + case PIPE_CAP_TEXTURE_GATHER_SM5: + return family = CHIP_CEDAR ? 1 : 0; No clue what the hardware supports, and this CAP is actually not used by the state tracker. However I believe it is meant to imply that you can support the ARB_gs5 texture gather, including shadow comparisons and non-constant offsets. This is not yet turned on by core mesa, but do the textureGather tests still pass if you add MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5? (You can do a piglit-run with -t gather.) I'm not 100% sure, but I _think_ it also implies that you can deal with selecting any component from a 4-component texture... the ARB_gs5 spec says: Since this extension requires support for gathering from multi-component textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB is increased to 4. My understanding is that the hardware is meant to be able to support GL4 one way or aonther, but sounds like more work needs to be done in order to claim the SM5 compatibility cap... Also, have you looked at Dave Airlie's impl? Not sure what's been going on there... http://cgit.freedesktop.org/~airlied/mesa/log/?h=r600g-texture-gather . He seemed to enable 4 components for = CEDAR. + case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family = CHIP_CEDAR ? 1 : 0; /* Unsupported features. */ @@ -312,8 +315,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: - case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: - case PIPE_CAP_TEXTURE_GATHER_SM5: case PIPE_CAP_TEXTURE_QUERY_LOD: case PIPE_CAP_SAMPLE_SHADING: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6952e3c..db928f3 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -4477,7 +4477,8 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) if (inst-Instruction.Opcode == TGSI_OPCODE_TEX2 || inst-Instruction.Opcode == TGSI_OPCODE_TXB2 || - inst-Instruction.Opcode == TGSI_OPCODE_TXL2) +
[Mesa-dev] [PATCH] configure: Don't override user -g or -O options for debug builds
From: Ian Romanick ian.d.roman...@intel.com I already pass '-ggdb3 -O1' or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them for me. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- configure.ac | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 4646212..04abcc0 100644 --- a/configure.ac +++ b/configure.ac @@ -308,10 +308,20 @@ AC_ARG_ENABLE([debug], if test x$enable_debug = xyes; then DEFINES=$DEFINES -DDEBUG if test x$GCC = xyes; then -CFLAGS=$CFLAGS -g -O0 +if ! echo $CFLAGS | grep -q -e '-g'; then +CFLAGS=$CFLAGS -g +fi +if ! echo $CFLAGS | grep -q -e '-O'; then +CFLAGS=$CFLAGS -O0 +fi fi if test x$GXX = xyes; then -CXXFLAGS=$CXXFLAGS -g -O0 +if ! echo $CXXFLAGS | grep -q -e '-g'; then +CXXFLAGS=$CXXFLAGS -g +fi +if ! echo $CXXFLAGS | grep -q -e '-O'; then +CXXFLAGS=$CXXFLAGS -O0 +fi fi fi -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: Don't override user -g or -O options for debug builds
On Monday, July 14, 2014 11:49:57 AM Ian Romanick wrote: From: Ian Romanick ian.d.roman...@intel.com I already pass '-ggdb3 -O1' or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them for me. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- configure.ac | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 4646212..04abcc0 100644 --- a/configure.ac +++ b/configure.ac @@ -308,10 +308,20 @@ AC_ARG_ENABLE([debug], if test x$enable_debug = xyes; then DEFINES=$DEFINES -DDEBUG if test x$GCC = xyes; then -CFLAGS=$CFLAGS -g -O0 +if ! echo $CFLAGS | grep -q -e '-g'; then +CFLAGS=$CFLAGS -g +fi +if ! echo $CFLAGS | grep -q -e '-O'; then +CFLAGS=$CFLAGS -O0 +fi fi if test x$GXX = xyes; then -CXXFLAGS=$CXXFLAGS -g -O0 +if ! echo $CXXFLAGS | grep -q -e '-g'; then +CXXFLAGS=$CXXFLAGS -g +fi +if ! echo $CXXFLAGS | grep -q -e '-O'; then +CXXFLAGS=$CXXFLAGS -O0 +fi fi fi Seems reasonable to me. Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather
Reviewed-by: Marek Olšák marek.ol...@amd.com Yes, Dave Airlie's branch seems to support multi-component textures as well as texture offsets. However, his solution triggers shader recompilation when the format and/or swizzle is changed even if the shader doesn't use GATHER4. The recompilation should only be done for shaders and samplers which use that opcode. Marek On Mon, Jul 14, 2014 at 8:20 PM, Glenn Kennard glenn.kenn...@gmail.com wrote: Only supported on evergreen and later. Limited to single component textures as the hardware GATHER4 instruction ignores texture swizzles. Piglit quick run passes on radeon 6670 with all applicable textureGather tests, no regressions. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- docs/GL3.txt | 2 +- docs/relnotes/10.3.html| 2 +- src/gallium/drivers/r600/r600_pipe.c | 5 ++-- src/gallium/drivers/r600/r600_shader.c | 47 +- 4 files changed, 46 insertions(+), 10 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index a2f438b..20e57b0 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -118,7 +118,7 @@ GL 4.0: GL_ARB_tessellation_shader started (Fabian) GL_ARB_texture_buffer_object_rgb32 DONE (i965, nvc0, r600, radeonsi, softpipe) GL_ARB_texture_cube_map_arrayDONE (i965, nv50, nvc0, r600, radeonsi, softpipe) - GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi) + GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi, r600) GL_ARB_texture_query_lod DONE (i965, nv50, nvc0, radeonsi) GL_ARB_transform_feedback2 DONE (i965, nv50, nvc0, r600, radeonsi) GL_ARB_transform_feedback3 DONE (i965, nv50, nvc0, r600, radeonsi) diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html index 2e718fc..1c0fab6 100644 --- a/docs/relnotes/10.3.html +++ b/docs/relnotes/10.3.html @@ -49,7 +49,7 @@ Note: some of the new features are only available with certain drivers. liGL_ARB_sample_shading on radeonsi/li liGL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi/li liGL_ARB_texture_cube_map_array on radeonsi/li -liGL_ARB_texture_gather on radeonsi/li +liGL_ARB_texture_gather on radeonsi, r600/li liGL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, softpipe/li liGL_ARB_texture_query_lod on radeonsi/li liGL_ARB_viewport_array on nvc0/li diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index ca6399f..d967f0f 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -303,6 +303,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: case PIPE_CAP_CUBE_MAP_ARRAY: case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: + case PIPE_CAP_TEXTURE_GATHER_SM5: + return family = CHIP_CEDAR ? 1 : 0; + case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family = CHIP_CEDAR ? 1 : 0; /* Unsupported features. */ @@ -312,8 +315,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: - case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: - case PIPE_CAP_TEXTURE_GATHER_SM5: case PIPE_CAP_TEXTURE_QUERY_LOD: case PIPE_CAP_SAMPLE_SHADING: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6952e3c..db928f3 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -4477,7 +4477,8 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) if (inst-Instruction.Opcode == TGSI_OPCODE_TEX2 || inst-Instruction.Opcode == TGSI_OPCODE_TXB2 || - inst-Instruction.Opcode == TGSI_OPCODE_TXL2) + inst-Instruction.Opcode == TGSI_OPCODE_TXL2 || + inst-Instruction.Opcode == TGSI_OPCODE_TG4) sampler_src_reg = 2; src_gpr = tgsi_tex_get_src_gpr(ctx, 0); @@ -5079,6 +5080,13 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) case FETCH_OP_SAMPLE_G: opcode = FETCH_OP_SAMPLE_C_G; break; + /* Texture gather variants */ + case FETCH_OP_GATHER4: + tex.op = FETCH_OP_GATHER4_C; + break; + case FETCH_OP_GATHER4_O: + tex.op = FETCH_OP_GATHER4_C_O; + break; } } @@ -5089,9 +5097,21 @@ static
Re: [Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather
On Mon, Jul 14, 2014 at 3:06 PM, Marek Olšák mar...@gmail.com wrote: Reviewed-by: Marek Olšák marek.ol...@amd.com Yes, Dave Airlie's branch seems to support multi-component textures as well as texture offsets. However, his solution triggers shader recompilation when the format and/or swizzle is changed even if the shader doesn't use GATHER4. The recompilation should only be done for shaders and samplers which use that opcode. My objection was not to the implementation but to the enabling of PIPE_CAP_TEXTURE_GATHER_SM5 which seems like it can't be true given that the max number of components in the texture is only allowed to be 1, and the various other things are likely not actually supported (although I'm not sure). -ilia Marek On Mon, Jul 14, 2014 at 8:20 PM, Glenn Kennard glenn.kenn...@gmail.com wrote: Only supported on evergreen and later. Limited to single component textures as the hardware GATHER4 instruction ignores texture swizzles. Piglit quick run passes on radeon 6670 with all applicable textureGather tests, no regressions. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- docs/GL3.txt | 2 +- docs/relnotes/10.3.html| 2 +- src/gallium/drivers/r600/r600_pipe.c | 5 ++-- src/gallium/drivers/r600/r600_shader.c | 47 +- 4 files changed, 46 insertions(+), 10 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index a2f438b..20e57b0 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -118,7 +118,7 @@ GL 4.0: GL_ARB_tessellation_shader started (Fabian) GL_ARB_texture_buffer_object_rgb32 DONE (i965, nvc0, r600, radeonsi, softpipe) GL_ARB_texture_cube_map_arrayDONE (i965, nv50, nvc0, r600, radeonsi, softpipe) - GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi) + GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi, r600) GL_ARB_texture_query_lod DONE (i965, nv50, nvc0, radeonsi) GL_ARB_transform_feedback2 DONE (i965, nv50, nvc0, r600, radeonsi) GL_ARB_transform_feedback3 DONE (i965, nv50, nvc0, r600, radeonsi) diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html index 2e718fc..1c0fab6 100644 --- a/docs/relnotes/10.3.html +++ b/docs/relnotes/10.3.html @@ -49,7 +49,7 @@ Note: some of the new features are only available with certain drivers. liGL_ARB_sample_shading on radeonsi/li liGL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi/li liGL_ARB_texture_cube_map_array on radeonsi/li -liGL_ARB_texture_gather on radeonsi/li +liGL_ARB_texture_gather on radeonsi, r600/li liGL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, softpipe/li liGL_ARB_texture_query_lod on radeonsi/li liGL_ARB_viewport_array on nvc0/li diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index ca6399f..d967f0f 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -303,6 +303,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: case PIPE_CAP_CUBE_MAP_ARRAY: case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: + case PIPE_CAP_TEXTURE_GATHER_SM5: + return family = CHIP_CEDAR ? 1 : 0; + case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family = CHIP_CEDAR ? 1 : 0; /* Unsupported features. */ @@ -312,8 +315,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: - case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: - case PIPE_CAP_TEXTURE_GATHER_SM5: case PIPE_CAP_TEXTURE_QUERY_LOD: case PIPE_CAP_SAMPLE_SHADING: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6952e3c..db928f3 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -4477,7 +4477,8 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) if (inst-Instruction.Opcode == TGSI_OPCODE_TEX2 || inst-Instruction.Opcode == TGSI_OPCODE_TXB2 || - inst-Instruction.Opcode == TGSI_OPCODE_TXL2) + inst-Instruction.Opcode == TGSI_OPCODE_TXL2 || + inst-Instruction.Opcode == TGSI_OPCODE_TG4) sampler_src_reg = 2; src_gpr = tgsi_tex_get_src_gpr(ctx, 0); @@ -5079,6 +5080,13 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) case FETCH_OP_SAMPLE_G: opcode = FETCH_OP_SAMPLE_C_G;
Re: [Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather
Hm, indeed. PIPE_CAP_TEXTURE_GATHER_SM5 should be 0. Marek On Mon, Jul 14, 2014 at 9:10 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Mon, Jul 14, 2014 at 3:06 PM, Marek Olšák mar...@gmail.com wrote: Reviewed-by: Marek Olšák marek.ol...@amd.com Yes, Dave Airlie's branch seems to support multi-component textures as well as texture offsets. However, his solution triggers shader recompilation when the format and/or swizzle is changed even if the shader doesn't use GATHER4. The recompilation should only be done for shaders and samplers which use that opcode. My objection was not to the implementation but to the enabling of PIPE_CAP_TEXTURE_GATHER_SM5 which seems like it can't be true given that the max number of components in the texture is only allowed to be 1, and the various other things are likely not actually supported (although I'm not sure). -ilia Marek On Mon, Jul 14, 2014 at 8:20 PM, Glenn Kennard glenn.kenn...@gmail.com wrote: Only supported on evergreen and later. Limited to single component textures as the hardware GATHER4 instruction ignores texture swizzles. Piglit quick run passes on radeon 6670 with all applicable textureGather tests, no regressions. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- docs/GL3.txt | 2 +- docs/relnotes/10.3.html| 2 +- src/gallium/drivers/r600/r600_pipe.c | 5 ++-- src/gallium/drivers/r600/r600_shader.c | 47 +- 4 files changed, 46 insertions(+), 10 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index a2f438b..20e57b0 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -118,7 +118,7 @@ GL 4.0: GL_ARB_tessellation_shader started (Fabian) GL_ARB_texture_buffer_object_rgb32 DONE (i965, nvc0, r600, radeonsi, softpipe) GL_ARB_texture_cube_map_arrayDONE (i965, nv50, nvc0, r600, radeonsi, softpipe) - GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi) + GL_ARB_texture_gatherDONE (i965, nv50, nvc0, radeonsi, r600) GL_ARB_texture_query_lod DONE (i965, nv50, nvc0, radeonsi) GL_ARB_transform_feedback2 DONE (i965, nv50, nvc0, r600, radeonsi) GL_ARB_transform_feedback3 DONE (i965, nv50, nvc0, r600, radeonsi) diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html index 2e718fc..1c0fab6 100644 --- a/docs/relnotes/10.3.html +++ b/docs/relnotes/10.3.html @@ -49,7 +49,7 @@ Note: some of the new features are only available with certain drivers. liGL_ARB_sample_shading on radeonsi/li liGL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi/li liGL_ARB_texture_cube_map_array on radeonsi/li -liGL_ARB_texture_gather on radeonsi/li +liGL_ARB_texture_gather on radeonsi, r600/li liGL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, softpipe/li liGL_ARB_texture_query_lod on radeonsi/li liGL_ARB_viewport_array on nvc0/li diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index ca6399f..d967f0f 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -303,6 +303,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: case PIPE_CAP_CUBE_MAP_ARRAY: case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: + case PIPE_CAP_TEXTURE_GATHER_SM5: + return family = CHIP_CEDAR ? 1 : 0; + case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family = CHIP_CEDAR ? 1 : 0; /* Unsupported features. */ @@ -312,8 +315,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: - case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: - case PIPE_CAP_TEXTURE_GATHER_SM5: case PIPE_CAP_TEXTURE_QUERY_LOD: case PIPE_CAP_SAMPLE_SHADING: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6952e3c..db928f3 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -4477,7 +4477,8 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) if (inst-Instruction.Opcode == TGSI_OPCODE_TEX2 || inst-Instruction.Opcode == TGSI_OPCODE_TXB2 || - inst-Instruction.Opcode == TGSI_OPCODE_TXL2) + inst-Instruction.Opcode == TGSI_OPCODE_TXL2 || + inst-Instruction.Opcode == TGSI_OPCODE_TG4) sampler_src_reg = 2; src_gpr = tgsi_tex_get_src_gpr(ctx, 0); @@ -5079,6 +5080,13 @@ static int
Re: [Mesa-dev] [PATCH 1/2] radeonsi: switch descriptors to i32 vectors
Sorry Tom, the patch does break the driver with LLVM 3.4.2. It looks like LLVM wasn't installed properly. I have reverted it. Marek On Fri, Jul 11, 2014 at 4:20 PM, Tom Stellard t...@stellard.net wrote: On Fri, Jul 11, 2014 at 01:00:34AM +0200, Marek Olšák wrote: I have just tested it and it works with LLVM 3.4.2. Ok, thanks. Both patches are: Reviewed-by: Tom Stellard thomas.stell...@amd.com Marek On Thu, Jul 10, 2014 at 5:11 PM, Tom Stellard t...@stellard.net wrote: On Tue, Jul 08, 2014 at 01:37:02AM +0200, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com This is a follow-up to the commit which adds texture fetches with offsets. Will this still work with LLVM 3.4.2 ? -Tom --- src/gallium/drivers/radeonsi/si_shader.c | 29 - 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 3dd6ad1..a28d682 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -1574,7 +1574,7 @@ static void tex_fetch_args( LLVMTypeRef i8 = LLVMInt8TypeInContext(gallivm-context); LLVMTypeRef v16i8 = LLVMVectorType(i8, 16); - /* Truncate v32i8 to v16i8. */ + /* Bitcast and truncate v8i32 to v16i8. */ LLVMValueRef res = si_shader_ctx-resources[sampler_index]; res = LLVMBuildBitCast(gallivm-builder, res, v2i128, ); res = LLVMBuildExtractElement(gallivm-builder, res, bld_base-uint_bld.zero, ); @@ -2305,12 +2305,18 @@ static void create_meta_data(struct si_shader_context *si_shader_ctx) si_shader_ctx-const_md = LLVMMDNodeInContext(gallivm-context, args, 3); } +static LLVMTypeRef const_array(LLVMTypeRef elem_type, int num_elements) +{ + return LLVMPointerType(LLVMArrayType(elem_type, num_elements), +CONST_ADDR_SPACE); +} + static void create_function(struct si_shader_context *si_shader_ctx) { struct lp_build_tgsi_context *bld_base = si_shader_ctx-radeon_bld.soa.bld_base; struct gallivm_state *gallivm = bld_base-base.gallivm; struct si_pipe_shader *shader = si_shader_ctx-shader; - LLVMTypeRef params[SI_NUM_PARAMS], f32, i8, i32, v2i32, v3i32; + LLVMTypeRef params[SI_NUM_PARAMS], f32, i8, i32, v2i32, v3i32, v16i8, v4i32, v8i32; unsigned i, last_sgpr, num_params; i8 = LLVMInt8TypeInContext(gallivm-context); @@ -2318,21 +2324,18 @@ static void create_function(struct si_shader_context *si_shader_ctx) f32 = LLVMFloatTypeInContext(gallivm-context); v2i32 = LLVMVectorType(i32, 2); v3i32 = LLVMVectorType(i32, 3); + v4i32 = LLVMVectorType(i32, 4); + v8i32 = LLVMVectorType(i32, 8); + v16i8 = LLVMVectorType(i8, 16); - params[SI_PARAM_CONST] = LLVMPointerType( - LLVMArrayType(LLVMVectorType(i8, 16), NUM_CONST_BUFFERS), CONST_ADDR_SPACE); - params[SI_PARAM_RW_BUFFERS] = params[SI_PARAM_CONST]; - - /* We assume at most 16 textures per program at the moment. - * This need probably need to be changed to support bindless textures */ - params[SI_PARAM_SAMPLER] = LLVMPointerType( - LLVMArrayType(LLVMVectorType(i8, 16), NUM_SAMPLER_STATES), CONST_ADDR_SPACE); - params[SI_PARAM_RESOURCE] = LLVMPointerType( - LLVMArrayType(LLVMVectorType(i8, 32), NUM_SAMPLER_VIEWS), CONST_ADDR_SPACE); + params[SI_PARAM_CONST] = const_array(v16i8, NUM_CONST_BUFFERS); + params[SI_PARAM_RW_BUFFERS] = const_array(v16i8, 6); /* XXX hardcoded */ + params[SI_PARAM_SAMPLER] = const_array(v4i32, NUM_SAMPLER_STATES); + params[SI_PARAM_RESOURCE] = const_array(v8i32, NUM_SAMPLER_VIEWS); switch (si_shader_ctx-type) { case TGSI_PROCESSOR_VERTEX: - params[SI_PARAM_VERTEX_BUFFER] = params[SI_PARAM_CONST]; + params[SI_PARAM_VERTEX_BUFFER] = const_array(v16i8, 16); /* XXX hardcoded */ params[SI_PARAM_START_INSTANCE] = i32; num_params = SI_PARAM_START_INSTANCE+1; if (shader-key.vs.as_es) { -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather
On Mon, 14 Jul 2014 20:33:08 +0200, Ilia Mirkin imir...@alum.mit.edu wrote: On Mon, Jul 14, 2014 at 2:20 PM, Glenn Kennard glenn.kenn...@gmail.com wrote: diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index ca6399f..d967f0f 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -303,6 +303,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: case PIPE_CAP_CUBE_MAP_ARRAY: case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: + case PIPE_CAP_TEXTURE_GATHER_SM5: + return family = CHIP_CEDAR ? 1 : 0; No clue what the hardware supports, and this CAP is actually not used by the state tracker. However I believe it is meant to imply that you can support the ARB_gs5 texture gather, including shadow comparisons and non-constant offsets. This is not yet turned on by core mesa, but do the textureGather tests still pass if you add MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5? (You can do a piglit-run with -t gather.) I'm not 100% sure, but I _think_ it also implies that you can deal with selecting any component from a 4-component texture... the ARB_gs5 spec says: Since this extension requires support for gathering from multi-component textures, the minimum value of MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB is increased to 4. My understanding is that the hardware is meant to be able to support GL4 one way or aonther, but sounds like more work needs to be done in order to claim the SM5 compatibility cap... Good catch, i missed that. It should return 0 for PIPE_CAP_TEXTURE_GATHER_SM5, and 1 for number of components, which lets it expose the ARB_texture_gather extension for now, leaving SM5 support for a later patch - the hardware apparently doesn't support texture sampler swizzling natively for the GATHER4 instruction so some logic is required to compile shader variants whenever the swizzle changes. Also, have you looked at Dave Airlie's impl? Not sure what's been going on there... http://cgit.freedesktop.org/~airlied/mesa/log/?h=r600g-texture-gather . He seemed to enable 4 components for = CEDAR. No, didn't pop up when i searched the archives for any prior work on this. Interesting! It looks like the GL_ARB_texture_gather portion in that branch is almost exactly equivalent to my patch. It also has some of the additional parts needed for GL_ARB_gpu_shader5 such as non-constant offsets, but i don't see where it would trigger shader recompiles if texture sampler swizzle changes? Does it pass piglit? David, any opinions on how to move forward with this feature? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: Don't override user -g or -O options for debug builds
On Mon, Jul 14, 2014 at 11:49 AM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com I already pass '-ggdb3 -O1' or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them for me. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- configure.ac | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 4646212..04abcc0 100644 --- a/configure.ac +++ b/configure.ac @@ -308,10 +308,20 @@ AC_ARG_ENABLE([debug], if test x$enable_debug = xyes; then DEFINES=$DEFINES -DDEBUG if test x$GCC = xyes; then -CFLAGS=$CFLAGS -g -O0 +if ! echo $CFLAGS | grep -q -e '-g'; then +CFLAGS=$CFLAGS -g +fi +if ! echo $CFLAGS | grep -q -e '-O'; then +CFLAGS=$CFLAGS -O0 +fi fi if test x$GXX = xyes; then -CXXFLAGS=$CXXFLAGS -g -O0 +if ! echo $CXXFLAGS | grep -q -e '-g'; then +CXXFLAGS=$CXXFLAGS -g +fi +if ! echo $CXXFLAGS | grep -q -e '-O'; then +CXXFLAGS=$CXXFLAGS -O0 +fi fi fi -- 1.8.1.4 This looks like strictly an improvement, so my feedback shouldn't hold this up, but maybe it's time to fight about whether --enable-debug should add -O0 at all. :) Reviewed-by: Matt Turner matts...@gmail.com Autotools defaults are -g -O2. Some people want debug builds to use -g, others -ggdb*. Some people want -O2 (optimize, but with assertions enabled), others -O0, others -Og. It'd be nice to not have --enable-debug turn on one combination of these. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeon/llvm: Fix LLVM diagnostic error reporting
We were trying to print the error message after disposing the message object. --- src/gallium/drivers/radeon/radeon_llvm_emit.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c b/src/gallium/drivers/radeon/radeon_llvm_emit.c index 6a394b2..1b17dd4 100644 --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c @@ -83,16 +83,13 @@ static LLVMTargetRef get_r600_target() { #if HAVE_LLVM = 0x0305 static void radeonDiagnosticHandler(LLVMDiagnosticInfoRef di, void *context) { - unsigned int *diagnosticflag; - char *diaginfo_message; - - diaginfo_message = LLVMGetDiagInfoDescription(di); - LLVMDisposeMessage(diaginfo_message); - - diagnosticflag = (unsigned int *)context; if (LLVMGetDiagInfoSeverity(di) == LLVMDSError) { + unsigned int *diagnosticflag = (unsigned int *)context; + char *diaginfo_message = LLVMGetDiagInfoDescription(di); + *diagnosticflag = 1; fprintf(stderr,LLVM triggered Diagnostic Handler: %s\n, diaginfo_message); + LLVMDisposeMessage(diaginfo_message); } } -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: Don't override user -g or -O options for debug builds
On Mon, Jul 14, 2014 at 4:41 PM, Matt Turner matts...@gmail.com wrote: On Mon, Jul 14, 2014 at 11:49 AM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com I already pass '-ggdb3 -O1' or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them for me. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- configure.ac | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 4646212..04abcc0 100644 --- a/configure.ac +++ b/configure.ac @@ -308,10 +308,20 @@ AC_ARG_ENABLE([debug], if test x$enable_debug = xyes; then DEFINES=$DEFINES -DDEBUG if test x$GCC = xyes; then -CFLAGS=$CFLAGS -g -O0 +if ! echo $CFLAGS | grep -q -e '-g'; then +CFLAGS=$CFLAGS -g +fi +if ! echo $CFLAGS | grep -q -e '-O'; then +CFLAGS=$CFLAGS -O0 +fi fi if test x$GXX = xyes; then -CXXFLAGS=$CXXFLAGS -g -O0 +if ! echo $CXXFLAGS | grep -q -e '-g'; then +CXXFLAGS=$CXXFLAGS -g +fi +if ! echo $CXXFLAGS | grep -q -e '-O'; then +CXXFLAGS=$CXXFLAGS -O0 +fi fi fi -- 1.8.1.4 This looks like strictly an improvement, so my feedback shouldn't hold this up, but maybe it's time to fight about whether --enable-debug should add -O0 at all. :) Reviewed-by: Matt Turner matts...@gmail.com Autotools defaults are -g -O2. Some people want debug builds to use -g, others -ggdb*. Some people want -O2 (optimize, but with assertions enabled), others -O0, others -Og. It'd be nice to not have --enable-debug turn on one combination of these. IIRC I've only seen negative comments on this rather than defend/praise. Does anyone actually _like_ this feature? Personally, I find it incredibly counter-intuitive. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: Don't override user -g or -O options for debug builds
You can also just add -DDEBUG by yourself since you override the flags anyway (me too). Marek On Mon, Jul 14, 2014 at 8:49 PM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com I already pass '-ggdb3 -O1' or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them for me. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- configure.ac | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 4646212..04abcc0 100644 --- a/configure.ac +++ b/configure.ac @@ -308,10 +308,20 @@ AC_ARG_ENABLE([debug], if test x$enable_debug = xyes; then DEFINES=$DEFINES -DDEBUG if test x$GCC = xyes; then -CFLAGS=$CFLAGS -g -O0 +if ! echo $CFLAGS | grep -q -e '-g'; then +CFLAGS=$CFLAGS -g +fi +if ! echo $CFLAGS | grep -q -e '-O'; then +CFLAGS=$CFLAGS -O0 +fi fi if test x$GXX = xyes; then -CXXFLAGS=$CXXFLAGS -g -O0 +if ! echo $CXXFLAGS | grep -q -e '-g'; then +CXXFLAGS=$CXXFLAGS -g +fi +if ! echo $CXXFLAGS | grep -q -e '-O'; then +CXXFLAGS=$CXXFLAGS -O0 +fi fi fi -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather
Also, have you looked at Dave Airlie's impl? Not sure what's been going on there... http://cgit.freedesktop.org/~airlied/mesa/log/?h=r600g-texture-gather . He seemed to enable 4 components for = CEDAR. No, didn't pop up when i searched the archives for any prior work on this. Interesting! It looks like the GL_ARB_texture_gather portion in that branch is almost exactly equivalent to my patch. It also has some of the additional parts needed for GL_ARB_gpu_shader5 such as non-constant offsets, but i don't see where it would trigger shader recompiles if texture sampler swizzle changes? Does it pass piglit? David, any opinions on how to move forward with this feature? I didn't like my implementation sufficiently to push it forward, but I think the main thing was the extra key changes, I meant to get Marek's input on that code, but then it fell off my radar. So get input from Marek, repost and merge :) Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeon/llvm: Fix LLVM diagnostic error reporting
Tested-by and Reviewed-by: Aaron Watry awa...@gmail.com On Mon, Jul 14, 2014 at 3:51 PM, Tom Stellard thomas.stell...@amd.com wrote: We were trying to print the error message after disposing the message object. --- src/gallium/drivers/radeon/radeon_llvm_emit.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c b/src/gallium/drivers/radeon/radeon_llvm_emit.c index 6a394b2..1b17dd4 100644 --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c @@ -83,16 +83,13 @@ static LLVMTargetRef get_r600_target() { #if HAVE_LLVM = 0x0305 static void radeonDiagnosticHandler(LLVMDiagnosticInfoRef di, void *context) { - unsigned int *diagnosticflag; - char *diaginfo_message; - - diaginfo_message = LLVMGetDiagInfoDescription(di); - LLVMDisposeMessage(diaginfo_message); - - diagnosticflag = (unsigned int *)context; if (LLVMGetDiagInfoSeverity(di) == LLVMDSError) { + unsigned int *diagnosticflag = (unsigned int *)context; + char *diaginfo_message = LLVMGetDiagInfoDescription(di); + *diagnosticflag = 1; fprintf(stderr,LLVM triggered Diagnostic Handler: %s\n, diaginfo_message); + LLVMDisposeMessage(diaginfo_message); } } -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Initialize new chunks of realloc'd memory.
On Wednesday, July 09, 2014 12:41:25 PM Matt Turner wrote: Otherwise we'd compare uninitialized pointers with NULL and dereference, leading to crashes. --- src/mesa/drivers/dri/i965/intel_asm_annotation.c | 4 1 file changed, 4 insertions(+) diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c b/src/mesa/drivers/dri/i965/intel_asm_annotation.c index 4717baf..6a51d89 100644 --- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c +++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c @@ -96,11 +96,15 @@ void annotate(struct brw_context *brw, struct backend_instruction *inst, unsigned offset) { if (annotation-ann_size = annotation-ann_count) { + int old_size = annotation-ann_size; annotation-ann_size = MAX2(1024, annotation-ann_size * 2); annotation-ann = reralloc(annotation-mem_ctx, annotation-ann, struct annotation, annotation-ann_size); if (!annotation-ann) return; + + memset(annotation-ann + old_size, 0, + (annotation-ann_size - old_size) * sizeof(struct annotation)); } struct annotation *ann = annotation-ann[annotation-ann_count++]; Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: Don't override user -g or -O options for debug builds
On 14/07/14 19:49, Ian Romanick wrote: From: Ian Romanick ian.d.roman...@intel.com I already pass '-ggdb3 -O1' or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them for me. The topic is a rather messy and whatever we do there will always be a case where this breaks someone's setup/workflow. In order to minimize the endless bikeshed can we just drop the switch altogether, and leave it to the builder to set these ? This way people will be able to set -O, -g and/or -DDEBUG if they are unhappy with their distro's defaults. Thanks, Emil Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- configure.ac | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 4646212..04abcc0 100644 --- a/configure.ac +++ b/configure.ac @@ -308,10 +308,20 @@ AC_ARG_ENABLE([debug], if test x$enable_debug = xyes; then DEFINES=$DEFINES -DDEBUG if test x$GCC = xyes; then -CFLAGS=$CFLAGS -g -O0 +if ! echo $CFLAGS | grep -q -e '-g'; then +CFLAGS=$CFLAGS -g +fi +if ! echo $CFLAGS | grep -q -e '-O'; then +CFLAGS=$CFLAGS -O0 +fi fi if test x$GXX = xyes; then -CXXFLAGS=$CXXFLAGS -g -O0 +if ! echo $CXXFLAGS | grep -q -e '-g'; then +CXXFLAGS=$CXXFLAGS -g +fi +if ! echo $CXXFLAGS | grep -q -e '-O'; then +CXXFLAGS=$CXXFLAGS -O0 +fi fi fi ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather
My advice is: The shader being compiled should be analyzed before compilation and an array of bool flags (or one uint16_t for all flags) should be produced saying which sampler units use TG4. Then, you only need to set the swizzle in the key for the i-th sampler if the i-th sampler is really used by TG4. The shader analysis can be done in tgsi_scan_shader, which should be pretty trivial. However, this won't work with GL4 sampler arrays and indirect addressing of those arrays, so well, if I were you, I would probably ignore the ZERO and ONE swizzle cases and just put up with the fact some piglit tests will always fail. That's the only use of the tg4_swizzle array as far as I can see, so it can be removed. Marek On Mon, Jul 14, 2014 at 11:27 PM, Dave Airlie airl...@gmail.com wrote: Also, have you looked at Dave Airlie's impl? Not sure what's been going on there... http://cgit.freedesktop.org/~airlied/mesa/log/?h=r600g-texture-gather . He seemed to enable 4 components for = CEDAR. No, didn't pop up when i searched the archives for any prior work on this. Interesting! It looks like the GL_ARB_texture_gather portion in that branch is almost exactly equivalent to my patch. It also has some of the additional parts needed for GL_ARB_gpu_shader5 such as non-constant offsets, but i don't see where it would trigger shader recompiles if texture sampler swizzle changes? Does it pass piglit? David, any opinions on how to move forward with this feature? I didn't like my implementation sufficiently to push it forward, but I think the main thing was the extra key changes, I meant to get Marek's input on that code, but then it fell off my radar. So get input from Marek, repost and merge :) Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Perform CSE on sends-from-GRF rather than textures.
On Thursday, July 10, 2014 11:59:40 AM Matt Turner wrote: Should potentially allow a few more cases, while avoiding doing CSE on texture operations on Gen = 6 with the MRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80211 --- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 5727801..7226aff 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -103,7 +103,7 @@ is_expression(const fs_inst *const inst) case SHADER_OPCODE_LOAD_PAYLOAD: return !is_copy_payload(inst); default: - return inst-is_tex(); + return inst-is_send_from_grf(); } } Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/10] glsl: Optimize (A || B) A == A
From: Thomas Helland thomashellan...@gmail.com And it's cousins Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 748fd05..230fb1b 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -559,6 +559,16 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) ir-operands[0]-equals(op_expr[1]-operands[0])) { /* A !A == 0 */ return ir_constant::zero(mem_ctx, ir-type); + } else if (op_expr[0] op_expr[0]-operation == ir_binop_logic_or + (op_expr[0]-operands[0]-equals(ir-operands[1]) || + op_expr[0]-operands[1]-equals(ir-operands[1]))) { + /* (A || B) A == A or (B || A) A == A */ + return ir-operands[1]; + } else if (op_expr[1] op_expr[1]-operation == ir_binop_logic_or + (op_expr[1]-operands[0]-equals(ir-operands[0]) || + op_expr[1]-operands[1]-equals(ir-operands[0]))) { + /* A (A || B) == A or A (B || A) == A */ + return ir-operands[1]; } break; -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/10] glsl: Optimize log(x) + log(y) == log(x*y)
From: Thomas Helland thomashellan...@gmail.com And the log2() equivalent. Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 12 1 file changed, 12 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index deaa49e..e2abf5f 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -380,6 +380,18 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) return expr; } + /* log(x) + log(y) == log(x*y) */ + if (op_expr[0] op_expr[0]-operation == ir_unop_log + op_expr[1] op_expr[1]-operation == ir_unop_log) + return new(mem_ctx) ir_expression( +ir_unop_log, mul(op_expr[0]-operands[0], op_expr[1]-operands[0]) ); + + /* log2(x) + log2(y) == log2(x*y) */ + if (op_expr[0] op_expr[0]-operation == ir_unop_log2 + op_expr[1] op_expr[1]-operation == ir_unop_log2) + return new(mem_ctx) ir_expression( +ir_unop_log2, mul(op_expr[0]-operands[0], op_expr[1]-operands[0]) ); + /* Replace (-x + y) * a + x and commutative variations with lrp(x, y, a). * * (-x + y) * a + x -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/10] glsl: Optimize X - X - 0
From: Thomas Helland thomashellan...@gmail.com Silly optimization indeed, but who knows. Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index ac7514a..c179d53 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -439,6 +439,9 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) return neg(ir-operands[1]); if (is_vec_zero(op_const[1])) return ir-operands[0]; + /* X - X == 0 */ + if (ir-operands[0]-equals(ir-operands[1])) + return ir_constant::zero(ir, ir-type); break; case ir_binop_mul: -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/10] glsl: Optimize some more pow() special cases
From: Thomas Helland thomashellan...@gmail.com Specifically x^-1 = rcp(x) .0^x = 0 .x^0 = 1 Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index fba9de6..1382067 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -662,6 +662,21 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_one(op_const[1])) return ir-operands[0]; + /* x^-1 == rcp(x) */ + if (is_vec_negative_one(op_const[1])) + return new(mem_ctx) ir_expression(ir_unop_rcp, + ir-operands[0]-type, + ir-operands[0], + NULL); + + /* 0^x == 0 */ + if (is_vec_zero(op_const[0])) + return ir_constant::zero(ir, ir-type); + + /* x^0 == 1 */ + if (is_vec_zero(op_const[1])) + return new(mem_ctx) ir_constant(1.0f, 1); + /* pow(2,x) == exp2(x) */ if (is_vec_two(op_const[0])) return expr(ir_unop_exp2, ir-operands[1]); -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/10] glsl: Optimize !A || A == 1
From: Thomas Helland thomashellan...@gmail.com Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 12 1 file changed, 12 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index c179d53..385f0a2 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -591,6 +591,18 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } else if (ir-operands[0]-equals(ir-operands[1])) { /* (a || a) == a */ return ir-operands[0]; + } else if ( (op_expr[0] op_expr[0]-operation == ir_unop_logic_not + ir-operands[1]-equals(op_expr[0]-operands[0])) || + /* !A || A == 1 */ + (op_expr[1] op_expr[1]-operation == ir_unop_logic_not + ir-operands[0]-equals(op_expr[1]-operands[0])) ) { + /* A || !A == 1 */ + ir_constant_data data; + +for (unsigned i = 0; i 16; i++) + data.b[i] = true; + +return new(mem_ctx) ir_constant(ir-type, data); } break; -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/10] glsl: Optimize exp(x)*exp(y) == exp(x+y)
From: Thomas Helland thomashellan...@gmail.com And it's exp2() equivalent. Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index e2abf5f..2361d0f 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -470,6 +470,17 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_negative_one(op_const[1])) return neg(ir-operands[0]); + if (op_expr[0] op_expr[1]) { + /* exp(x) * exp(y) == exp(x+y) */ + if (op_expr[0]-operation == ir_unop_exp + op_expr[1]-operation == ir_unop_exp) +return exp(add(op_expr[0]-operands[0], op_expr[1]-operands[0])); + /* exp2(x) * exp2(y) == exp2(x+y) */ + if (op_expr[0]-operation == ir_unop_exp2 + op_expr[1]-operation == ir_unop_exp2) +return new (mem_ctx) ir_expression(ir_unop_exp2, + add(op_expr[0]-operands[0], op_expr[1]-operands[0])); + } /* Reassociate multiplication of constants so that we can do * constant folding. -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/10] glsl: Optimize min(-8, sin(x)) == -8 and similar
From: Thomas Helland thomashellan...@gmail.com This patch is a bit sketchy. It only handles float-constants, otherwise it bails. Also, ir_unop_saturate should be added here when support for it lands in the ir. Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 37 + 1 file changed, 37 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 230fb1b..90b6cae 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -647,6 +647,43 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) break; + case ir_binop_min: { + ir_rvalue *tempExpr; + ir_rvalue *tempConst; + + if (op_expr[0] op_const[1]) { + tempExpr = ir-operands[0]; + tempConst = ir-operands[1]; + } else if (op_expr[1] op_const[0]) { + tempExpr = ir-operands[1]; + tempConst = ir-operands[0]; + } else { + break; + } + if(tempConst-type-is_float()) { + float constValue = *(tempConst-constant_expression_value()-value.f); + switch(tempExpr-as_expression()-operation) { + case ir_unop_cos: + case ir_unop_sin: + case ir_unop_sign: +/* If we are comparing an expression bound between -1 and 1 + * with a const = 1 we know that const will be the largest + */ +if(constValue = 1.0f) + return tempExpr; +/* If we are comparing an expression bound between -1 and 1 + * with a const = -1 we know that const will be the smallest + */ +if(constValue = -1.0f) + return tempConst; +break; + default: +break; + } + } + break; + } + case ir_unop_rcp: if (op_expr[0] op_expr[0]-operation == ir_unop_rcp) return op_expr[0]-operands[0]; -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/10] glsl: Optimize A - neg(B) == A + B
From: Thomas Helland thomashellan...@gmail.com ...and neg(A) - B == neg(A + B) Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 2361d0f..fba9de6 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -454,6 +454,12 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) /* X - X == 0 */ if (ir-operands[0]-equals(ir-operands[1])) return ir_constant::zero(ir, ir-type); + /* A - neg(B) = A + B */ + if (op_expr[1] op_expr[1]-operation == ir_unop_neg) + return add(ir-operands[0], op_expr[1]-operands[0]); + /* neg(A) - B = neg(A + B) */ + if (op_expr[0] op_expr[0]-operation == ir_unop_neg) + return neg(add(op_expr[0]-operands[0], ir-operands[1])); break; case ir_binop_mul: -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/10] [RFC] Probably useless algebraic optimizations
From: Thomas Helland thomashellan...@gmail.com When writing that A || (A B) patch some days ago I also wrote some other patches that have no impact on my collection of shaders. (shader-db + Some TF2 and Portal-shaders). No reduction in instruction count, and no significant increase in compilation time. I decided to put them up here anyway, as with your collection of shaders maybe YMMV. These are mostly RFC-quality, and not all are as complete and nicely formatted as they could be. Possibly some are also implemented incorrectly. (I'm still trying to get a good understanding of the buildup of the ir, the visitors, etc) Feel free to do with these patches as you please; Ignore, test, review, flame, make cookies... Thomas Helland (10): glsl: Optimize X - X - 0 glsl: Optimize !A || A == 1 glsl: Optimize !A A == 0 glsl: Optimize (A || B) A == A glsl: Optimize min(-8, sin(x)) == -8 and similar glsl: Optimize max(8, sin(x)) == 8 and similar glsl: Optimize log(x) + log(y) == log(x*y) glsl: Optimize exp(x)*exp(y) == exp(x+y) glsl: Optimize A - neg(B) == A + B and neg(A) - B == neg(A + B) glsl: Optimize some more pow() special cases src/glsl/opt_algebraic.cpp | 152 + 1 file changed, 152 insertions(+) -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/10] glsl: Optimize max(8, sin(x)) == 8 and similar
From: Thomas Helland thomashellan...@gmail.com This also only handles floats. ir_unop_saturate should also be added when the support for this lands in the ir. Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 38 ++ 1 file changed, 38 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 90b6cae..deaa49e 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -684,6 +684,44 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) break; } + case ir_binop_max: { + ir_rvalue *tempExpr; + ir_rvalue *tempConst; + + if (op_expr[0] op_const[1]) { + tempExpr = ir-operands[0]; + tempConst = ir-operands[1]; + } else if (op_expr[1] op_const[0]) { + tempExpr = ir-operands[1]; + tempConst = ir-operands[0]; + } else { + break; + } + + if(tempConst-type-is_float()) { + float constValue = *(tempConst-constant_expression_value()-value.f); + switch(tempExpr-as_expression()-operation) { + case ir_unop_cos: + case ir_unop_sin: + case ir_unop_sign: +/* If we are comparing an expression bound between -1 and 1 + * with a const = 1 we know that const will be the largest + */ +if(constValue = 1.0f) + return tempConst; +/* If we are comparing an expression bound between -1 and 1 + * with a const = -1 we know that const will be the smallest + */ +if(constValue = -1.0f) + return tempExpr; +break; + default: +break; + } + } + break; + } + case ir_unop_rcp: if (op_expr[0] op_expr[0]-operation == ir_unop_rcp) return op_expr[0]-operands[0]; -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/10] glsl: Optimize !A A == 0
From: Thomas Helland thomashellan...@gmail.com Signed-off-by: Thomas Helland thomashellan...@gmail.com --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 385f0a2..748fd05 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -551,6 +551,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } else if (ir-operands[0]-equals(ir-operands[1])) { /* (a a) == a */ return ir-operands[0]; + } else if (op_expr[0] op_expr[0]-operation == ir_unop_logic_not + ir-operands[1]-equals(op_expr[0]-operands[0])) { + /* !A A == 0 */ + return ir_constant::zero(mem_ctx, ir-type); + } else if (op_expr[1] op_expr[1]-operation == ir_unop_logic_not + ir-operands[0]-equals(op_expr[1]-operands[0])) { + /* A !A == 0 */ + return ir_constant::zero(mem_ctx, ir-type); } break; -- 2.0.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] glsl: Update expression types after rebalancing the tree.
On Thursday, July 10, 2014 11:26:52 AM Matt Turner wrote: If we saw a tree that looked like vec3 / \ vec3 float / \ vec3 float / \ vec3 float We would see that all of the expression types were vec3, and then rebalance to vec3 /\ vec3 vec3 -- should be float / \ /\ vec3 float float float This patch adds code to visit the rebalanced tree and update the expression types from the bottom up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80880 --- src/glsl/opt_rebalance_tree.cpp | 17 + 1 file changed, 17 insertions(+) diff --git a/src/glsl/opt_rebalance_tree.cpp b/src/glsl/opt_rebalance_tree.cpp index 773aab3..daabdc9 100644 --- a/src/glsl/opt_rebalance_tree.cpp +++ b/src/glsl/opt_rebalance_tree.cpp @@ -60,6 +60,7 @@ #include ir_visitor.h #include ir_rvalue_visitor.h #include ir_optimization.h +#include main/macros.h /* for MAX2 */ /* The DSW algorithm generates a degenerate tree (really, a linked list) in * tree_to_vine(). We'd rather not leave a binary expression with only one @@ -261,6 +262,20 @@ handle_expression(ir_expression *expr) return expr; } +static void +update_types(ir_instruction *ir, void *) +{ + ir_expression *expr = ir-as_expression(); + if (!expr) + return; + + expr-type = + glsl_type::get_instance(expr-type-base_type, + MAX2(expr-operands[0]-type-components(), + expr-operands[1]-type-components()), + 1); +} + void ir_rebalance_visitor::handle_rvalue(ir_rvalue **rvalue) { @@ -285,6 +300,8 @@ ir_rebalance_visitor::handle_rvalue(ir_rvalue **rvalue) if (new_rvalue == *rvalue) return; + visit_tree(new_rvalue, NULL, NULL, update_types); + *rvalue = new_rvalue; this-progress = true; } Thanks for fixing this! Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/26] glsl: Use accessors for ir_variable::warn_extension
From: Ian Romanick ian.d.roman...@intel.com The payoff for this will come in the next patch. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/builtin_variables.cpp | 4 ++-- src/glsl/ir.cpp| 11 +++ src/glsl/ir.h | 22 +- 3 files changed, 30 insertions(+), 7 deletions(-) diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp index 3497302..6fc3319 100644 --- a/src/glsl/builtin_variables.cpp +++ b/src/glsl/builtin_variables.cpp @@ -902,14 +902,14 @@ builtin_variable_generator::generate_fs_special_vars() ir_variable *const var = add_output(FRAG_RESULT_STENCIL, int_t, gl_FragStencilRefARB); if (state-ARB_shader_stencil_export_warn) - var-warn_extension = GL_ARB_shader_stencil_export; + var-enable_extension_warning(GL_ARB_shader_stencil_export); } if (state-AMD_shader_stencil_export_enable) { ir_variable *const var = add_output(FRAG_RESULT_STENCIL, int_t, gl_FragStencilRefAMD); if (state-AMD_shader_stencil_export_warn) - var-warn_extension = GL_AMD_shader_stencil_export; + var-enable_extension_warning(GL_AMD_shader_stencil_export); } if (state-ARB_sample_shading_enable) { diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 6a4fc1b..8575ac3 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1607,6 +1607,17 @@ ir_variable::determine_interpolation_mode(bool flat_shade) return INTERP_QUALIFIER_SMOOTH; } +void +ir_variable::enable_extension_warning(const char *extension) +{ + this-warn_extension = extension; +} + +const char * +ir_variable::get_extension_warning() const +{ + return this-warn_extension; +} ir_function_signature::ir_function_signature(const glsl_type *return_type, builtin_available_predicate b) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 7036c1e..bdd9a3c 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -527,6 +527,18 @@ public: } /** +* Enable emitting extension warnings for this variable +*/ + void enable_extension_warning(const char *extension); + + /** +* Get the extension warning string for this variable +* +* If warnings are not enabled, \c NULL is returned. +*/ + const char *get_extension_warning() const; + + /** * Declared type of the variable */ const struct glsl_type *type; @@ -778,11 +790,6 @@ public: /*@}*/ /** -* Emit a warning if this variable is accessed. -*/ - const char *warn_extension; - - /** * Value assigned in the initializer of a variable declared const */ ir_constant *constant_value; @@ -805,6 +812,11 @@ private: * \sa ir_variable::location */ const glsl_type *interface_type; + + /** +* Emit a warning if this variable is accessed. +*/ + const char *warn_extension; }; /** -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/26] glsl: Never put ir_var_temporary variables in the symbol table
From: Ian Romanick ian.d.roman...@intel.com Later patches will give every ir_var_temporary the same name in release builds. Adding a bunch of variables named compiler_temp to the symbol table can only cause problems. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ast_to_hir.cpp| 4 ++-- src/glsl/glsl_parser_extras.cpp| 8 ++-- src/glsl/glsl_symbol_table.cpp | 2 ++ src/glsl/linker.cpp| 4 +++- src/glsl/lower_packed_varyings.cpp | 1 + 5 files changed, 14 insertions(+), 5 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 82ac29e..0a4b3a6 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -912,7 +912,6 @@ get_lvalue_copy(exec_list *instructions, ir_rvalue *lvalue) var = new(ctx) ir_variable(lvalue-type, _post_incdec_tmp, ir_var_temporary); instructions-push_tail(var); - var-data.mode = ir_var_auto; instructions-push_tail(new(ctx) ir_assignment(new(ctx) ir_dereference_variable(var), lvalue)); @@ -2499,6 +2498,7 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, /* If there is no qualifier that changes the mode of the variable, leave * the setting alone. */ + assert(var-data.mode != ir_var_temporary); if (qual-flags.q.in qual-flags.q.out) var-data.mode = ir_var_function_inout; else if (qual-flags.q.in) @@ -4944,7 +4944,7 @@ ast_type_specifier::hir(exec_list *instructions, */ ir_variable *const junk = new(state) ir_variable(type, #default precision, - ir_var_temporary); + ir_var_auto); state-symbols-add_variable(junk); } diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 93f5e75..0555c57 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1544,9 +1544,13 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, shader-symbols-add_function(decl); break; } - case ir_type_variable: - shader-symbols-add_variable((ir_variable *) ir); + case ir_type_variable: { + ir_variable *const var = (ir_variable *) ir; + + if (var-data.mode != ir_var_temporary) +shader-symbols-add_variable(var); break; + } default: break; } diff --git a/src/glsl/glsl_symbol_table.cpp b/src/glsl/glsl_symbol_table.cpp index a052362..2294dda 100644 --- a/src/glsl/glsl_symbol_table.cpp +++ b/src/glsl/glsl_symbol_table.cpp @@ -124,6 +124,8 @@ bool glsl_symbol_table::name_declared_this_scope(const char *name) bool glsl_symbol_table::add_variable(ir_variable *v) { + assert(v-data.mode != ir_var_temporary); + if (this-separate_function_namespace) { /* In 1.10, functions and variables have separate namespaces. */ symbol_table_entry *existing = get_entry(v-name); diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 2119943..842678f 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -969,7 +969,8 @@ populate_symbol_table(gl_shader *sh) if ((func = inst-as_function()) != NULL) { sh-symbols-add_function(func); } else if ((var = inst-as_variable()) != NULL) { -sh-symbols-add_variable(var); + if (var-data.mode != ir_var_temporary) +sh-symbols-add_variable(var); } } } @@ -2166,6 +2167,7 @@ demote_shader_inputs_and_outputs(gl_shader *sh, enum ir_variable_mode mode) * to have a location assigned. */ if (var-data.is_unmatched_generic_inout) { + assert(var-data.mode != ir_var_temporary); var-data.mode = ir_var_auto; } } diff --git a/src/glsl/lower_packed_varyings.cpp b/src/glsl/lower_packed_varyings.cpp index c72b80a..175f805 100644 --- a/src/glsl/lower_packed_varyings.cpp +++ b/src/glsl/lower_packed_varyings.cpp @@ -261,6 +261,7 @@ lower_packed_varyings_visitor::run(exec_list *instructions) !var-type-contains_integer()); /* Change the old varying into an ordinary global. */ + assert(var-data.mode != ir_var_temporary); var-data.mode = ir_var_auto; /* Create a reference to the old varying. */ -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/26] glsl: Store ir_variable::depth_layout using 3 bits
From: Ian Romanick ian.d.roman...@intel.com warn_extension_index was moved to improve packing. Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,2490 After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,9210 Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,5760 After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,4800 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ast_to_hir.cpp | 4 ++-- src/glsl/ir.h | 19 +-- 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 542a869..82ac29e 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -2923,8 +2923,8 @@ get_variable_being_redeclared(ir_variable *var, YYLTYPE loc, gl_FragDepth: depth layout is declared here as '%s, but it was previously declared as '%s', - depth_layout_string(var-data.depth_layout), - depth_layout_string(earlier-data.depth_layout)); + depth_layout_string(ir_depth_layout(var-data.depth_layout)), + depth_layout_string(ir_depth_layout(earlier-data.depth_layout))); } earlier-data.depth_layout = var-data.depth_layout; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index d4010fa..d33d153 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -704,6 +704,13 @@ public: */ unsigned index:1; + /** + * \brief Layout qualifier for gl_FragDepth. + * + * This is not equal to \c ir_depth_layout_none if and only if this + * variable is \c gl_FragDepth and a layout qualifier is specified. + */ + unsigned depth_layout:3; /** * ARB_shader_image_load_store qualifiers. @@ -714,9 +721,6 @@ public: unsigned image_volatile:1; unsigned image_restrict:1; - /** Image internal format if specified explicitly, otherwise GL_NONE. */ - uint16_t image_format; - /** * Emit a warning if this variable is accessed. */ @@ -724,13 +728,8 @@ public: uint8_t warn_extension_index; public: - /** - * \brief Layout qualifier for gl_FragDepth. - * - * This is not equal to \c ir_depth_layout_none if and only if this - * variable is \c gl_FragDepth and a layout qualifier is specified. - */ - ir_depth_layout depth_layout; + /** Image internal format if specified explicitly, otherwise GL_NONE. */ + uint16_t image_format; /** * Storage location of the base of this variable -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/26] glsl: Rebuild the symbol table without unreachable symbols
From: Ian Romanick ian.d.roman...@intel.com Previously we had to keep unreachable global symbols in the symbol table because the symbol table is used during linking. Having the symbol table retain pointers to freed memory... what could possibly go wrong? At the same time, this meant that we kept live references to tons of memory that was no longer needed. New strategy: destroy the old symbol table, and make a new one from the reachable symbols. Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 59 40,642,425,451 76,337,968 69,720,886 6,617,0820 After (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,7350 Before (64-bit): 79 37,179,441,771 106,986,512 98,112,095 8,874,4170 After (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,1260 A real savings of 846KiB on 32-bit and 1.5MiB on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: Kenneth Graunke kenn...@whitecape.org --- src/glsl/glsl_parser_extras.cpp | 37 - 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index b327c2b..2962407 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1485,7 +1485,7 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, if (shader-InfoLog) ralloc_free(shader-InfoLog); - shader-symbols = state-symbols; + shader-symbols = new(shader-ir) glsl_symbol_table; shader-CompileStatus = !state-error; shader-InfoLog = state-info_log; shader-Version = state-language_version; @@ -1498,6 +1498,41 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, /* Retain any live IR, but trash the rest. */ reparent_ir(shader-ir, shader-ir); + /* Destroy the symbol table. Create a new symbol table that contains only +* the variables and functions that still exist in the IR. The symbol +* table will be used later during linking. +* +* There must NOT be any freed objects still referenced by the symbol +* table. That could cause the linker to dereference freed memory. +* +* We don't have to worry about types or interface-types here because those +* are fly-weights that are looked up by glsl_type. +*/ + foreach_in_list (ir_instruction, ir, shader-ir) { + switch (ir-ir_type) { + case ir_type_function: { + /* If the function is a built-in that is partially overridden in the + * shader, the ir_function stored in the symbol table may not be the + * same as the one that appears in the shader. The one in the shader + * will only include definitions from inside the shader. We need the + * one from the symbol table because it will include built-in + * defintions and definitions from the shader. + */ + ir_function *const def = (ir_function *) ir; + ir_function *const decl = state-symbols-get_function(def-name); + + shader-symbols-add_function(decl); + break; + } + case ir_type_variable: + shader-symbols-add_variable((ir_variable *) ir); + break; + default: + break; + } + } + + delete state-symbols; ralloc_free(state); } -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 18/26] glsl: Don't allocate a name for ir_var_temporary variables
From: Ian Romanick ian.d.roman...@intel.com Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,8040 After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,9820 Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,4890 After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,1290 A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/glsl_parser_extras.cpp | 3 +++ src/glsl/ir.cpp | 5 + src/glsl/ir.h | 19 +++ src/glsl/test_optpass.cpp | 1 + 4 files changed, 28 insertions(+) diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 0555c57..1fe38f3 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1437,6 +1437,9 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, new(shader) _mesa_glsl_parse_state(ctx, shader-Stage, shader); const char *source = shader-Source; + if (ctx-Const.GenerateTemporaryNames) + ir_variable::temporaries_allocate_names = true; + state-error = glcpp_preprocess(state, source, state-info_log, ctx-Extensions, ctx); diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 0045c2d..bc008a4 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1533,6 +1533,8 @@ ir_swizzle::variable_referenced() const } +bool ir_variable::temporaries_allocate_names = false; + const char ir_variable::tmp_name[] = compiler_temp; ir_variable::ir_variable(const struct glsl_type *type, const char *name, @@ -1541,6 +1543,9 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, { this-type = type; + if (mode == ir_var_temporary !ir_variable::temporaries_allocate_names) + name = NULL; + /* The ir_variable clone method may call this constructor with name set to * tmp_name. */ diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 1aeb1ff..42ccb00 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -891,6 +891,25 @@ private: * Name used for anonymous compiler temporaries */ static const char tmp_name[]; + +public: + /** +* Should the construct keep names for ir_var_temporary variables? +* +* When this global is false, names passed to the constructor for +* \c ir_var_temporary variables will be dropped. Instead, the variable will +* be named compiler_temp. This name will be in static storage. +* +* \warning +* \b NEVER change the mode of an \c ir_var_temporary. +* +* \warning +* This variable is \b not thread-safe. It is global, \b not +* per-context. It begins life false. A context can, at some point, make +* it true. From that point on, it will be true forever. This should be +* okay since it will only be set true while debugging. +*/ + static bool temporaries_allocate_names; }; /** diff --git a/src/glsl/test_optpass.cpp b/src/glsl/test_optpass.cpp index e4878bf..1c29705 100644 --- a/src/glsl/test_optpass.cpp +++ b/src/glsl/test_optpass.cpp @@ -200,6 +200,7 @@ int test_optpass(int argc, char **argv) initialize_context_to_defaults(ctx, API_OPENGL_COMPAT); ctx-Driver.NewShader = _mesa_new_shader; + ir_variable::temporaries_allocate_names = true; struct gl_shader *shader = rzalloc(NULL, struct gl_shader); shader-Type = shader_type; -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/26] glsl: Squish ir_variable::max_ifc_array_access and ::state_slots together
From: Ian Romanick ian.d.roman...@intel.com At least one of these pointers must be NULL, and we can determine which will be NULL by looking at other fields. Use this information to store both pointers in the same location. If anyone can think of a better name for the union than u, I'm all ears. Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,6730 After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,5450 Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,9520 After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,4080 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ir.cpp | 5 +++- src/glsl/ir.h | 75 --- src/glsl/ir_clone.cpp | 4 +-- 3 files changed, 48 insertions(+), 36 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index cfdc7f0..40dbb6b 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1535,10 +1535,13 @@ ir_swizzle::variable_referenced() const ir_variable::ir_variable(const struct glsl_type *type, const char *name, ir_variable_mode mode) - : ir_instruction(ir_type_variable), max_ifc_array_access(NULL) + : ir_instruction(ir_type_variable) { this-type = type; this-name = ralloc_strdup(this, name); + + this-u.max_ifc_array_access = NULL; + this-data.explicit_location = false; this-data.has_initializer = false; this-data.location = -1; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index c47a1fb..e73b113 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -475,7 +475,7 @@ public: assert(this-interface_type == NULL); this-interface_type = type; if (this-is_interface_instance()) { - this-max_ifc_array_access = + this-u.max_ifc_array_access = rzalloc_array(this, unsigned, type-length); } } @@ -487,7 +487,7 @@ public: */ void change_interface_type(const struct glsl_type *type) { - if (this-max_ifc_array_access != NULL) { + if (this-u.max_ifc_array_access != NULL) { /* max_ifc_array_access has already been allocated, so make sure the * new interface has the same number of fields as the old one. */ @@ -504,7 +504,7 @@ public: */ void reinit_interface_type(const struct glsl_type *type) { - if (this-max_ifc_array_access != NULL) { + if (this-u.max_ifc_array_access != NULL) { #ifndef NDEBUG /* Redeclaring gl_PerVertex is only allowed if none of the built-ins * it defines have been accessed yet; so it's safe to throw away the @@ -512,10 +512,10 @@ public: * zero. */ for (unsigned i = 0; i this-interface_type-length; i++) -assert(this-max_ifc_array_access[i] == 0); +assert(this-u.max_ifc_array_access[i] == 0); #endif - ralloc_free(this-max_ifc_array_access); - this-max_ifc_array_access = NULL; + ralloc_free(this-u.max_ifc_array_access); + this-u.max_ifc_array_access = NULL; } this-interface_type = NULL; init_interface_type(type); @@ -534,38 +534,45 @@ public: */ inline unsigned *get_max_ifc_array_access() { - return this-max_ifc_array_access; + assert(this-data._num_state_slots == 0); + return this-u.max_ifc_array_access; } inline unsigned get_num_state_slots() const { + assert(!this-is_interface_instance() + || this-data._num_state_slots == 0); return this-data._num_state_slots; } inline void set_num_state_slots(unsigned n) { + assert(!this-is_interface_instance() + || n == 0); this-data._num_state_slots = n; } inline ir_state_slot *get_state_slots() { - return this-state_slots; + return this-is_interface_instance() ? NULL : this-u.state_slots; } inline const ir_state_slot *get_state_slots() const { - return this-state_slots; + return this-is_interface_instance() ? NULL : this-u.state_slots; } inline ir_state_slot *allocate_state_slots(unsigned n) { - this-state_slots = ralloc_array(this, ir_state_slot, n); + assert(!this-is_interface_instance()); + + this-u.state_slots = ralloc_array(this, ir_state_slot, n); this-data._num_state_slots = 0; - if (this-state_slots != NULL) + if (this-u.state_slots != NULL) this-data._num_state_slots = n; - return this-state_slots; + return this-u.state_slots; } /** @@ -834,28 +841,30 @@ public: private: static const char *const warn_extension_table[]; - /**
[Mesa-dev] [PATCH 01/26] glsl: Validate that built-in uniforms have backing state
From: Ian Romanick ian.d.roman...@intel.com All built-in uniforms are supposed to be backed by some GL state. The state_slots field describes this backing state. This helped me track down a bug in a later patch. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Acked-by: Anuj Phogat anuj.pho...@gmail.com --- src/glsl/ir_validate.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index 42142da..ef660e8 100644 --- a/src/glsl/ir_validate.cpp +++ b/src/glsl/ir_validate.cpp @@ -696,6 +696,14 @@ ir_validate::visit(ir_variable *ir) abort(); } + if (ir-data.mode == ir_var_uniform +strncmp(ir-name, gl_, 3) == 0 +ir-state_slots == NULL) { + printf(built-in uniform has no state\n); + ir-print(); + abort(); + } + return visit_continue; } -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 17/26] glsl: Use ir_var_temporary for compiler generated temporaries
From: Ian Romanick ian.d.roman...@intel.com These few places were using ir_var_auto for seemingly no reason. The names were not added to the symbol table. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/lower_offset_array.cpp | 3 ++- src/glsl/lower_texture_projection.cpp | 2 +- src/glsl/opt_cse.cpp | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/src/glsl/lower_offset_array.cpp b/src/glsl/lower_offset_array.cpp index 0c235ed..5b48526 100644 --- a/src/glsl/lower_offset_array.cpp +++ b/src/glsl/lower_offset_array.cpp @@ -63,7 +63,8 @@ brw_lower_offset_array_visitor::handle_rvalue(ir_rvalue **rv) void *mem_ctx = ralloc_parent(ir); - ir_variable *var = new (mem_ctx) ir_variable(ir-type, result, ir_var_auto); + ir_variable *var = + new (mem_ctx) ir_variable(ir-type, result, ir_var_temporary); base_ir-insert_before(var); for (int i = 0; i 4; i++) { diff --git a/src/glsl/lower_texture_projection.cpp b/src/glsl/lower_texture_projection.cpp index 16d6376..95df106 100644 --- a/src/glsl/lower_texture_projection.cpp +++ b/src/glsl/lower_texture_projection.cpp @@ -62,7 +62,7 @@ lower_texture_projection_visitor::visit_leave(ir_texture *ir) void *mem_ctx = ralloc_parent(ir); ir_variable *var = new(mem_ctx) ir_variable(ir-projector-type, - projector, ir_var_auto); + projector, ir_var_temporary); base_ir-insert_before(var); ir_dereference *deref = new(mem_ctx) ir_dereference_variable(var); ir_expression *expr = new(mem_ctx) ir_expression(ir_unop_rcp, diff --git a/src/glsl/opt_cse.cpp b/src/glsl/opt_cse.cpp index 0e720cc..9c96835 100644 --- a/src/glsl/opt_cse.cpp +++ b/src/glsl/opt_cse.cpp @@ -276,7 +276,7 @@ cse_visitor::try_cse(ir_rvalue *rvalue) ir_variable *var = new(rvalue) ir_variable(rvalue-type, cse, -ir_var_auto); +ir_var_temporary); /* Write the previous expression result into a new variable. */ base_ir-insert_before(var); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 16/26] glsl: Add context-level controls for whether temporaries have real names
From: Ian Romanick ian.d.roman...@intel.com No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/main.cpp | 1 + src/mesa/main/context.c | 6 ++ src/mesa/main/mtypes.h| 13 + src/mesa/main/shaderapi.c | 3 +++ 4 files changed, 23 insertions(+) diff --git a/src/glsl/main.cpp b/src/glsl/main.cpp index a4452e0..feed100 100644 --- a/src/glsl/main.cpp +++ b/src/glsl/main.cpp @@ -210,6 +210,7 @@ initialize_context(struct gl_context *ctx, gl_api api) break; } + ctx-Const.GenerateTemporaryNames = true; ctx-Driver.NewShader = _mesa_new_shader; } diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c index 50aae8b..562abf7 100644 --- a/src/mesa/main/context.c +++ b/src/mesa/main/context.c @@ -650,6 +650,12 @@ _mesa_init_constants(struct gl_context *ctx) ctx-Const.GLSLVersion = 0; /* GLSL not supported */ } +#ifdef DEBUG + ctx-Const.GenerateTemporaryNames = true; +#else + ctx-Const.GenerateTemporaryNames = false; +#endif + /* GL_ARB_framebuffer_object */ ctx-Const.MaxSamples = 0; diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index b699021..2025fd0 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3455,6 +3455,19 @@ struct gl_constants */ GLboolean DisableVaryingPacking; + /** +* Should meaningful names be generated for compiler temporary variables? +* +* Generally, it is not useful to have the compiler generate meaningful +* names for temporary variables that it creates. This can, however, be a +* useful debugging aid. In Mesa debug builds or release builds when +* MESA_GLSL is set at run-time, meaningful names will be generated. +* Drivers can also force names to be generated by setting this field. +* For example, the i965 driver may set it when INTEL_DEBUG=vs (to dump +* vertex shader assembly) is set at run-time. +*/ + bool GenerateTemporaryNames; + /* * Maximum value supported for an index in DrawElements and friends. * diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index 2bbef35..6c74583 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -123,6 +123,9 @@ _mesa_init_shader_state(struct gl_context *ctx) ctx-Shader.Flags = _mesa_get_shader_flags(); + if (ctx-Shader.Flags != 0) + ctx-Const.GenerateTemporaryNames = true; + /* Extended for ARB_separate_shader_objects */ ctx-Shader.RefCount = 1; mtx_init(ctx-Shader.Mutex, mtx_plain); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/26] glsl: Store ir_variable_data::_num_state_slots and ::binding in 16-bits each
From: Ian Romanick ian.d.roman...@intel.com Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,5450 After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,0090 Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,4080 After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,9440 A real savings of 173KiB on 32-bit and no change on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ir.h | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index e73b113..1071858 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -768,10 +768,25 @@ public: uint16_t image_format; private: - unsigned _num_state_slots;/** Number of state slots used */ + /** + * Number of state slots used + * + * \note + * This could be stored in as few as 7-bits, if necessary. If it is made + * smaller, add an assertion to \c ir_variable::allocate_state_slots to + * be safe. + */ + uint16_t _num_state_slots; public: /** + * Initial binding point for a sampler, atomic, or UBO. + * + * For array types, this represents the binding point for the first element. + */ + int16_t binding; + + /** * Storage location of the base of this variable * * The precise meaning of this field depends on the nature of the variable. @@ -797,13 +812,6 @@ public: unsigned stream; /** - * Initial binding point for a sampler, atomic, or UBO. - * - * For array types, this represents the binding point for the first element. - */ - int binding; - - /** * Location an atomic counter is stored at. */ struct { -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 22/26] glsl: Store ir_variable::ir_type in 8 bits instead of 32
From: Ian Romanick ian.d.roman...@intel.com No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ir.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index ee42857..5c565ff 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -88,12 +88,13 @@ enum ir_node_type { */ class ir_instruction : public exec_node { private: - enum ir_node_type ir_type; + uint8_t ir_type; public: inline enum ir_node_type get_ir_type() const { - return this-ir_type; + STATIC_ASSERT(ir_type_max 256); + return (enum ir_node_type) this-ir_type; } /** -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/26] glsl: Replace ir_variable::warn_extension pointer with an 8-bit index
From: Ian Romanick ian.d.roman...@intel.com Also move the new warn_extension_index into ir_variable::data. This enables slightly better packing. Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,2970 After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,2490 Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,0560 After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,5760 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ir.cpp | 21 ++--- src/glsl/ir.h | 18 +- src/glsl/ir_clone.cpp | 2 -- 3 files changed, 31 insertions(+), 10 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 8575ac3..cfdc7f0 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1544,7 +1544,7 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, this-data.location = -1; this-data.location_frac = 0; this-data.binding = 0; - this-warn_extension = NULL; + this-data.warn_extension_index = 0; this-constant_value = NULL; this-constant_initializer = NULL; this-data.origin_upper_left = false; @@ -1607,16 +1607,31 @@ ir_variable::determine_interpolation_mode(bool flat_shade) return INTERP_QUALIFIER_SMOOTH; } +const char *const ir_variable::warn_extension_table[] = { + , + GL_ARB_shader_stencil_export, + GL_AMD_shader_stencil_export, +}; + void ir_variable::enable_extension_warning(const char *extension) { - this-warn_extension = extension; + for (unsigned i = 0; i Elements(warn_extension_table); i++) { + if (strcmp(warn_extension_table[i], extension) == 0) { + this-data.warn_extension_index = i; + return; + } + } + + assert(!Should not get here.); + this-data.warn_extension_index = 0; } const char * ir_variable::get_extension_warning() const { - return this-warn_extension; + return this-data.warn_extension_index == 0 + ? NULL : warn_extension_table[this-data.warn_extension_index]; } ir_function_signature::ir_function_signature(const glsl_type *return_type, diff --git a/src/glsl/ir.h b/src/glsl/ir.h index bdd9a3c..d4010fa 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -718,6 +718,13 @@ public: uint16_t image_format; /** + * Emit a warning if this variable is accessed. + */ + private: + uint8_t warn_extension_index; + + public: + /** * \brief Layout qualifier for gl_FragDepth. * * This is not equal to \c ir_depth_layout_none if and only if this @@ -771,6 +778,10 @@ public: */ unsigned max_array_access; + /** + * Allow (only) ir_variable direct access private members. + */ + friend class ir_variable; } data; /** @@ -805,6 +816,8 @@ public: ir_constant *constant_initializer; private: + static const char *const warn_extension_table[]; + /** * For variables that are in an interface block or are an instance of an * interface block, this is the \c GLSL_TYPE_INTERFACE type for that block. @@ -812,11 +825,6 @@ private: * \sa ir_variable::location */ const glsl_type *interface_type; - - /** -* Emit a warning if this variable is accessed. -*/ - const char *warn_extension; }; /** diff --git a/src/glsl/ir_clone.cpp b/src/glsl/ir_clone.cpp index 4b444d4..86c0e31 100644 --- a/src/glsl/ir_clone.cpp +++ b/src/glsl/ir_clone.cpp @@ -53,8 +53,6 @@ ir_variable::clone(void *mem_ctx, struct hash_table *ht) const memcpy(var-data, this-data, sizeof(var-data)); - var-warn_extension = this-warn_extension; - var-num_state_slots = this-num_state_slots; if (this-state_slots) { /* FINISHME: This really wants to use something like talloc_reference, but -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/26] glsl: Use bit-flags image attributes and uint16_t for the image format
From: Ian Romanick ian.d.roman...@intel.com All of the GL image enums fit in 16-bits. Also move the fields from the anonymous image structucture to the next higher structure. This will enable packing the bits with the other bitfield. Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,4650 After (32-bit): 70 40,577,421,777 68,487,584 62,973,695 5,513,8890 Before (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,5280 After (64-bit): 74 37,124,603,758 95,891,808 88,466,712 7,425,0960 A real savings of 346KiB on 32-bit and 262KiB on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ast_function.cpp | 10 +- src/glsl/ast_to_hir.cpp| 14 +++--- src/glsl/builtin_functions.cpp | 10 +- src/glsl/ir.cpp| 20 ++-- src/glsl/ir.h | 27 +-- src/glsl/link_uniforms.cpp | 4 ++-- 6 files changed, 42 insertions(+), 43 deletions(-) diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index 4981fe1..dbf4e66 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -103,35 +103,35 @@ verify_image_parameter(YYLTYPE *loc, _mesa_glsl_parse_state *state, * qualifiers. [...] It is legal to have additional qualifiers * on a formal parameter, but not to have fewer. */ - if (actual-data.image.coherent !formal-data.image.coherent) { + if (actual-data.image_coherent !formal-data.image_coherent) { _mesa_glsl_error(loc, state, function call parameter `%s' drops `coherent' qualifier, formal-name); return false; } - if (actual-data.image._volatile !formal-data.image._volatile) { + if (actual-data.image_volatile !formal-data.image_volatile) { _mesa_glsl_error(loc, state, function call parameter `%s' drops `volatile' qualifier, formal-name); return false; } - if (actual-data.image.restrict_flag !formal-data.image.restrict_flag) { + if (actual-data.image_restrict !formal-data.image_restrict) { _mesa_glsl_error(loc, state, function call parameter `%s' drops `restrict' qualifier, formal-name); return false; } - if (actual-data.image.read_only !formal-data.image.read_only) { + if (actual-data.image_read_only !formal-data.image_read_only) { _mesa_glsl_error(loc, state, function call parameter `%s' drops `readonly' qualifier, formal-name); return false; } - if (actual-data.image.write_only !formal-data.image.write_only) { + if (actual-data.image_write_only !formal-data.image_write_only) { _mesa_glsl_error(loc, state, function call parameter `%s' drops `writeonly' qualifier, formal-name); diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 885bee5..542a869 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -2360,11 +2360,11 @@ apply_image_qualifier_to_variable(const struct ast_type_qualifier *qual, global variables); } - var-data.image.read_only |= qual-flags.q.read_only; - var-data.image.write_only |= qual-flags.q.write_only; - var-data.image.coherent |= qual-flags.q.coherent; - var-data.image._volatile |= qual-flags.q._volatile; - var-data.image.restrict_flag |= qual-flags.q.restrict_flag; + var-data.image_read_only |= qual-flags.q.read_only; + var-data.image_write_only |= qual-flags.q.write_only; + var-data.image_coherent |= qual-flags.q.coherent; + var-data.image_volatile |= qual-flags.q._volatile; + var-data.image_restrict |= qual-flags.q.restrict_flag; var-data.read_only = true; if (qual-flags.q.explicit_image_format) { @@ -2378,7 +2378,7 @@ apply_image_qualifier_to_variable(const struct ast_type_qualifier *qual, base data type of the image); } - var-data.image.format = qual-image_format; + var-data.image_format = qual-image_format; } else { if (var-data.mode == ir_var_uniform !qual-flags.q.write_only) { _mesa_glsl_error(loc, state, uniforms not qualified with @@ -2386,7 +2386,7 @@ apply_image_qualifier_to_variable(const struct ast_type_qualifier *qual, qualifier); } - var-data.image.format = GL_NONE; + var-data.image_format = GL_NONE; } } } diff --git a/src/glsl/builtin_functions.cpp
[Mesa-dev] [PATCH 25/26] glsl: Keep common variable names in static storage
From: Ian Romanick ian.d.roman...@intel.com Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 52 40,521,071,734 66,157,928 61,054,870 5,103,0580 After (32-bit): 79 40,523,892,028 65,999,288 60,937,396 5,061,8920 Before (64-bit): 48 37,089,379,412 92,630,712 85,454,539 7,176,1730 After (64-bit): 71 37,091,183,117 92,433,072 85,309,100 7,123,9720 A real savings of 114KiB on 32-bit and 142KiB on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/.gitignore | 1 + src/glsl/Makefile.am | 10 ++ src/glsl/common_variable_names.py | 220 ++ src/glsl/ir.cpp | 32 -- src/glsl/ir.h | 9 +- 5 files changed, 256 insertions(+), 16 deletions(-) create mode 100644 src/glsl/common_variable_names.py diff --git a/src/glsl/.gitignore b/src/glsl/.gitignore index 43720f6..16e7e81 100644 --- a/src/glsl/.gitignore +++ b/src/glsl/.gitignore @@ -1,3 +1,4 @@ +get_static_name.h glsl_compiler glsl_lexer.cpp glsl_parser.cpp diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am index 00261fd..c9de9f8 100644 --- a/src/glsl/Makefile.am +++ b/src/glsl/Makefile.am @@ -149,6 +149,15 @@ glsl_test_SOURCES = \ glsl_test_LDADD = libglsl.la +# I don't understand why automake recognizes that glsl_parser_extras.lo +# depends on glsl_parser.h, but it cannot recognize that ir.lo depends on +# get_static_name.h. + +ir.lo: get_static_name.h + +get_static_name.h: common_variable_names.py + $(PYTHON2) $(GLSL_SRCDIR)/common_variable_names.py $(GLSL_BUILDDIR)/get_static_name.h + # We write our own rules for yacc and lex below. We'd rather use automake, # but automake makes it especially difficult for a number of reasons: # @@ -205,6 +214,7 @@ BUILT_SOURCES = \ glcpp/glcpp-parse.c \ glcpp/glcpp-lex.c CLEANFILES = \ + get_static_name.h \ glcpp/glcpp-parse.h \ glsl_parser.h \ $(BUILT_SOURCES) diff --git a/src/glsl/common_variable_names.py b/src/glsl/common_variable_names.py new file mode 100644 index 000..7435a12 --- /dev/null +++ b/src/glsl/common_variable_names.py @@ -0,0 +1,220 @@ +# Copyright (c) 2014 Intel Corporation +# +# Permission is hereby granted, free of charge, to any person obtaining a +# copy of this software and associated documentation files (the Software), +# to deal in the Software without restriction, including without limitation +# the rights to use, copy, modify, merge, publish, distribute, sublicense, +# and/or sell copies of the Software, and to permit persons to whom the +# Software is furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice (including the next +# paragraph) shall be included in all copies or substantial portions of the +# Software. +# +# THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +# DEALINGS IN THE SOFTWARE. + +common_names = [ +atomic_temp_var, +color, +diffuse, +factor, +fucxadd, +fucxmul, +gl_ClipVertex, +gl_Color, +gl_FogFragCoord, +gl_FragColor, +gl_FragData, +gl_FragCoord, +gl_FrontColor, +gl_FrontFacing, +gl_FrontSecondaryColor, +gl_in_TexCoord0, +gl_ModelViewProjectionMatrixTranspose, +gl_MultiTexCoord0, +gl_out_FragData0, +gl_out_TexCoord0, +gl_Position, +gl_TexCoord, +gl_Vertex, +io_vLink0, +oT0, +oT1, +oT2, +oT3, +oT4, +oT5, +oT6, +oT7, +pos, +r10, +r11, +r12, +sampler0, +sampler1, +sampler2, +sampler3, +sampler4, +sampler5, +sampler6, +sampler7, +sampler8, +va_r, +vcbones, +vcscreen, +vLit, +vTempPos, +v_vColor, +v_vNormal, +v_vPositionView, +v_vTexcoordLight +] + +template = static const char static_names[] = + compiler_temp\\0 +% for name in common_names: + ${name}\\0 +% endfor + ; + +% for s in sized_lists: +% if len(sized_lists[s]) != 1: +static const ${index_type} table_${s}[] = { +% for name in sized_lists[s]: + ${location_dict[name]}, +
[Mesa-dev] [PATCH 00/26] GLSL memory diet
Most of these patches have been sent to the list already in one form or another. There are a few changes, removals, and additions. The series has also been re-ordered. - The extra memory accounting code has been removed. This was suggested by Ken. Instead, all memory usage data is from Valgrind massiv. - The store short names where padding would be code is still there, but many of the patches to shorten the names of temporaries has been removed. Those patches became unnecessary because... - In release builds (when MESA_GLSL is also unset), all temporaries get the name compiler_temp that is in static storage. - A small set of names that frequently occur in shader-db are kept in static storage. When a variable is created with one of these names, no additional storage is allocated. We could probably give local variables and post-linking globals the same treatment as temporaries, but a lot of the frequently occuring names (that don't fit in padding) are uniforms or shader inputs. These need to keep their names. I suspect the last six patches will be contentious. I'd like to get the first 20 (or at least the first two!) reviewed and landed sooner... then we can debate the last few more leisurely. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/26] glsl: Eliminate ir_variable::data.atomic.buffer_index
From: Ian Romanick ian.d.roman...@intel.com Just use ir_variable::data.binding... because that's the where the binding is stored for everything else that can use layout(binding=). Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,5370 After (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,2170 Before (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,8880 After (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,3200 A real savings of 173KiB on 32-bit and 368KiB on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Kenneth Graunke kenn...@whitecape.org --- src/glsl/ir.cpp| 2 +- src/glsl/ir.h | 3 +-- src/glsl/link_atomics.cpp | 4 +++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 +- 5 files changed, 7 insertions(+), 6 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 28fd94b..4376a5c 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1543,6 +1543,7 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, this-data.has_initializer = false; this-data.location = -1; this-data.location_frac = 0; + this-data.binding = 0; this-warn_extension = NULL; this-constant_value = NULL; this-constant_initializer = NULL; @@ -1558,7 +1559,6 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, this-data.mode = mode; this-data.interpolation = INTERP_QUALIFIER_NONE; this-data.max_array_access = 0; - this-data.atomic.buffer_index = 0; this-data.atomic.offset = 0; this-data.image.read_only = false; this-data.image.write_only = false; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index ea19924..9dbcb7a 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -722,7 +722,7 @@ public: int index; /** - * Initial binding point for a sampler or UBO. + * Initial binding point for a sampler, atomic, or UBO. * * For array types, this represents the binding point for the first element. */ @@ -732,7 +732,6 @@ public: * Location an atomic counter is stored at. */ struct { - unsigned buffer_index; unsigned offset; } atomic; diff --git a/src/glsl/link_atomics.cpp b/src/glsl/link_atomics.cpp index 75699fd..603873a 100644 --- a/src/glsl/link_atomics.cpp +++ b/src/glsl/link_atomics.cpp @@ -201,7 +201,9 @@ link_assign_atomic_counter_resources(struct gl_context *ctx, gl_uniform_storage *const storage = prog-UniformStorage[id]; mab.Uniforms[j] = id; - var-data.atomic.buffer_index = i; + if (!var-data.explicit_binding) +var-data.binding = i; + storage-atomic_buffer_index = i; storage-offset = var-data.atomic.offset; storage-array_stride = (var-type-is_array() ? diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 90bf3fa..4511f46 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2447,7 +2447,7 @@ fs_visitor::visit_atomic_counter_intrinsic(ir_call *ir) ir-actual_parameters.get_head()); ir_variable *location = deref-variable_referenced(); unsigned surf_index = (prog_data-base.binding_table.abo_start + - location-data.atomic.buffer_index); + location-data.binding); /* Calculate the surface offset */ fs_reg offset(this, glsl_type::uint_type); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index e16e93c..481b885 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2244,7 +2244,7 @@ vec4_visitor::visit_atomic_counter_intrinsic(ir_call *ir) ir-actual_parameters.get_head()); ir_variable *location = deref-variable_referenced(); unsigned surf_index = (prog_data-base.binding_table.abo_start + - location-data.atomic.buffer_index); + location-data.binding); /* Calculate the surface offset */ src_reg offset(this, glsl_type::uint_type); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 20/26] i965/fs: Don't make a name for a vector splitting temporary
From: Ian Romanick ian.d.roman...@intel.com If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp index 422d801..03e5fdb 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp @@ -363,12 +363,17 @@ brw_do_vector_splitting(exec_list *instructions) entry-mem_ctx = ralloc_parent(entry-var); for (unsigned int i = 0; i entry-var-type-vector_elements; i++) { -const char *name = ralloc_asprintf(mem_ctx, %s_%c, - entry-var-name, - xyzw[i]); + char *const name = ir_variable::temporaries_allocate_names +? ralloc_asprintf(mem_ctx, %s_%c, + entry-var-name, + xyzw[i]) +: NULL; entry-components[i] = new(entry-mem_ctx) ir_variable(type, name, ir_var_temporary); + + ralloc_free(name); + entry-var-insert_before(entry-components[i]); } -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 23/26] glsl: Make compiler-added padding ir_instruction usable
From: Ian Romanick ian.d.roman...@intel.com No change Valgrind massif results for a trimmed apitrace of dota2. v2: Fix a couple spelling errors in the comment. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ir.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 5c565ff..ead0863 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -91,6 +91,20 @@ private: uint8_t ir_type; public: + /** +* Alignment padding that would be added by the compiler +* +* Putting a field here makes what would otherwise be dead space usable. +* Subclasses of ir_instruction can store data here. Care must be taken +* for two reasons: +* +* 1. Direct descendants in the class hierarchy (e.g., \c ir_dereference +*and \c ir_dereference_array) must not try to use this space. +* +* 2. The size of the padding depends on the architecture. +*/ + uint8_t padding[sizeof(intptr_t) - 1]; + inline enum ir_node_type get_ir_type() const { STATIC_ASSERT(ir_type_max 256); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/26] glsl: Make ir_variable::max_ifc_array_access private
From: Ian Romanick ian.d.roman...@intel.com The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ast_array_index.cpp | 10 -- src/glsl/ir.h| 37 - src/glsl/ir_validate.cpp | 9 +++-- src/glsl/link_functions.cpp | 14 +++--- src/glsl/linker.cpp | 5 +++-- 5 files changed, 53 insertions(+), 22 deletions(-) diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp index f3b060e..ecc4200 100644 --- a/src/glsl/ast_array_index.cpp +++ b/src/glsl/ast_array_index.cpp @@ -88,8 +88,14 @@ update_max_array_access(ir_rvalue *ir, unsigned idx, YYLTYPE *loc, unsigned field_index = deref_record-record-type-field_index(deref_record-field); assert(field_index interface_type-length); -if (idx deref_var-var-max_ifc_array_access[field_index]) { - deref_var-var-max_ifc_array_access[field_index] = idx; + +unsigned *const max_ifc_array_access = + deref_var-var-get_max_ifc_array_access(); + +assert(max_ifc_array_access != NULL); + +if (idx max_ifc_array_access[field_index]) { + max_ifc_array_access[field_index] = idx; /* Check whether this access will, as a side effect, implicitly * cause the size of a built-in array to be too large. diff --git a/src/glsl/ir.h b/src/glsl/ir.h index d33d153..1924098 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -527,6 +527,17 @@ public: } /** +* Get the max_ifc_array_access pointer +* +* A set function is not needed because the array is dynmically allocated +* as necessary. +*/ + inline unsigned *get_max_ifc_array_access() + { + return this-max_ifc_array_access; + } + + /** * Enable emitting extension warnings for this variable */ void enable_extension_warning(const char *extension); @@ -548,19 +559,6 @@ public: */ const char *name; - /** -* For variables which satisfy the is_interface_instance() predicate, this -* points to an array of integers such that if the ith member of the -* interface block is an array, max_ifc_array_access[i] is the maximum -* array element of that member that has been accessed. If the ith member -* of the interface block is not an array, max_ifc_array_access[i] is -* unused. -* -* For variables whose type is not an interface block, this pointer is -* NULL. -*/ - unsigned *max_ifc_array_access; - struct ir_variable_data { /** @@ -818,6 +816,19 @@ private: static const char *const warn_extension_table[]; /** +* For variables which satisfy the is_interface_instance() predicate, this +* points to an array of integers such that if the ith member of the +* interface block is an array, max_ifc_array_access[i] is the maximum +* array element of that member that has been accessed. If the ith member +* of the interface block is not an array, max_ifc_array_access[i] is +* unused. +* +* For variables whose type is not an interface block, this pointer is +* NULL. +*/ + unsigned *max_ifc_array_access; + + /** * For variables that are in an interface block or are an instance of an * interface block, this is the \c GLSL_TYPE_INTERFACE type for that block. * diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index ef660e8..b7afd08 100644 --- a/src/glsl/ir_validate.cpp +++ b/src/glsl/ir_validate.cpp @@ -678,10 +678,15 @@ ir_validate::visit(ir_variable *ir) ir-get_interface_type()-fields.structure; for (unsigned i = 0; i ir-get_interface_type()-length; i++) { if (fields[i].type-array_size() 0) { -if (ir-max_ifc_array_access[i] = fields[i].type-length) { +const unsigned *const max_ifc_array_access = + ir-get_max_ifc_array_access(); + +assert(max_ifc_array_access != NULL); + +if (max_ifc_array_access[i] = fields[i].type-length) { printf(ir_variable has maximum access out of bounds for field %s (%d vs %d)\n, fields[i].name, - ir-max_ifc_array_access[i], fields[i].type-length); + max_ifc_array_access[i], fields[i].type-length); ir-print(); abort(); } diff --git a/src/glsl/link_functions.cpp b/src/glsl/link_functions.cpp index 2ce9609..c06ffa4 100644 --- a/src/glsl/link_functions.cpp +++ b/src/glsl/link_functions.cpp @@ -243,11 +243,19 @@ public: /* Similarly, we need implicit sizes of arrays within interface * blocks to be sized by the maximal access in *any* shader. */ + unsigned *const
[Mesa-dev] [PATCH 03/26] glsl: Eliminate unused built-in variables after compilation
From: Ian Romanick ian.d.roman...@intel.com After compilation (and before linking) we can eliminate quite a few built-in variables. Basically, any uniform or constant (e.g., gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can be eliminated. System values, vertex shader inputs (with one exception), and fragment shader outputs that are not used and not re-declared in the shader text can also be removed. gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in function ftransform. There are some complications with eliminating these variables (see the comment in the patch), so they are not eliminated. Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,7350 After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,5370 Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,1260 After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,8880 A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit. v2: Don't remove any built-in with Transpose in the name. v3: Fix comment typo noticed by Anuj. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Suggested-by: Eric Anholt e...@anholt.net Acked-by: Anuj Phogat anuj.pho...@gmail.com Cc: Eric Anholt e...@anholt.net --- src/glsl/Makefile.sources | 1 + src/glsl/glsl_parser_extras.cpp | 20 src/glsl/ir_optimization.h | 2 + src/glsl/opt_dead_builtin_variables.cpp | 81 + 4 files changed, 104 insertions(+) create mode 100644 src/glsl/opt_dead_builtin_variables.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index b54eae7..5915bf8 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -87,6 +87,7 @@ LIBGLSL_FILES = \ $(GLSL_SRCDIR)/opt_copy_propagation.cpp \ $(GLSL_SRCDIR)/opt_copy_propagation_elements.cpp \ $(GLSL_SRCDIR)/opt_cse.cpp \ + $(GLSL_SRCDIR)/opt_dead_builtin_variables.cpp \ $(GLSL_SRCDIR)/opt_dead_builtin_varyings.cpp \ $(GLSL_SRCDIR)/opt_dead_code.cpp \ $(GLSL_SRCDIR)/opt_dead_code_local.cpp \ diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 2962407..93f5e75 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1480,6 +1480,26 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, ; validate_ir_tree(shader-ir); + + enum ir_variable_mode other; + switch (shader-Stage) { + case MESA_SHADER_VERTEX: + other = ir_var_shader_in; + break; + case MESA_SHADER_FRAGMENT: + other = ir_var_shader_out; + break; + default: + /* Something invalid to ensure optimize_dead_builtin_uniforms + * doesn't remove anything other than uniforms or constants. + */ + other = ir_var_mode_count; + break; + } + + optimize_dead_builtin_variables(shader-ir, other); + + validate_ir_tree(shader-ir); } if (shader-InfoLog) diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index b83c225..57e04c7 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -124,6 +124,8 @@ void lower_named_interface_blocks(void *mem_ctx, gl_shader *shader); bool optimize_redundant_jumps(exec_list *instructions); bool optimize_split_arrays(exec_list *instructions, bool linked); bool lower_offset_arrays(exec_list *instructions); +void optimize_dead_builtin_variables(exec_list *instructions, + enum ir_variable_mode other); ir_rvalue * compare_index_block(exec_list *instructions, ir_variable *index, diff --git a/src/glsl/opt_dead_builtin_variables.cpp b/src/glsl/opt_dead_builtin_variables.cpp new file mode 100644 index 000..85c75d6 --- /dev/null +++ b/src/glsl/opt_dead_builtin_variables.cpp @@ -0,0 +1,81 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE
[Mesa-dev] [PATCH 26/26] glsl: Add Bloom filter to get_static_name
From: Ian Romanick ian.d.roman...@intel.com The use of the Bloom filter rejects the vast majority of strings that are not in the table before expensive comparisons are performed. This Bloom filter uses two hashes. One is explicit (calculate_hash in the code), and the other is implicit. The implicit hash is just the length of the name, and the filter is the switch-statement in get_static_name_do_not_call that rejects name with lengths that are not in the table. No change Valgrind massif results for a trimmed apitrace of dota2. NOTE: This patch could probably get squashed into the previous patch. I kept it separate because I thought it would make things easier to review. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/common_variable_names.py | 113 +- 1 file changed, 112 insertions(+), 1 deletion(-) diff --git a/src/glsl/common_variable_names.py b/src/glsl/common_variable_names.py index 7435a12..ee5571c 100644 --- a/src/glsl/common_variable_names.py +++ b/src/glsl/common_variable_names.py @@ -119,6 +119,46 @@ name_really_is_not_in_static_names(const char *name) } #endif /* DEBUG */ +static uint32_t +calculate_hash(const char *str) +{ +uint32_t hash = 5381; + +while (*str != '\\0') { +hash = (hash * 33) + *str; +str++; +} + +return hash; +} + +/** + * Check the Bloom filter for the name + */ +static bool +name_is_in_Bloom_filter(const char *name) +{ + static const uint32_t filter[${len(filter)}] = { +% for i in range(0,len(filter), 4): + ${{:#010x}.format(filter[i+0])}, ${{:#010x}.format(filter[i+1])}, ${{:#010x}.format(filter[i+2])}, ${{:#010x}.format(filter[i+3])}, +% endfor + }; + + const uint32_t h = calculate_hash(name); + + if (h ${min(all_hash)}) + return false; + + if (h ${max(all_hash)}) + return false; + + const unsigned bit = (h - ${min(all_hash)}) % ${len(filter) * 32}; + const unsigned i = bit / 32; + const unsigned j = bit % 32; + + return (filter[i] (1U j)) != 0; +} + /** * Search the static name table for the specified name * @@ -129,6 +169,11 @@ name_really_is_not_in_static_names(const char *name) static const char * get_static_name_do_not_call(const char *name) { + /* Do the trivial rejection test before anything. +*/ + if (!name_is_in_Bloom_filter(name)) + return NULL; + const unsigned len = strlen(name); const ${index_type} *table = NULL; unsigned table_size = 0; @@ -188,6 +233,14 @@ get_static_name(const char *name) } +def calculate_hash(str): +hash = 5381 + +for c in str: +hash = ((hash * 33) + ord(c)) 0x0 + +return hash + common_names.sort() # Partiation the list of names into sublists of names with the same length @@ -213,8 +266,66 @@ elif base (1 16): else: index_type = uint32_t +all_hash = [] +for name in common_names: + all_hash.append(calculate_hash(name)) + +# Experimentally determined that 8,192 bits is sufficient for this dataset. +# This was determined by: +# +# 1. Modify the generated code to log a message on entry to get_static_name. +# +# 2. Modify the generated code to log a message when name_is_in_Bloom_filter +# returns true. +# +# 3. Modifiy the generated code to log a message each time a name is rejected +# due to its length. This is the 'default' case in the switch-statement. +# This is an implicit hash that is part of the Bloom filter. +# +# 4. Modify the generated code to log a message each time a name has actually +# been tested (using strcmp) and was not in the table. This means logging at +# the end of get_static_name_do_not_call and inside the switch-statement where +# lonely names are handled. +# +# 5. Run a ton of shaders through (e.g., shader-db) and collect the output. +# Count the number of times each message occurred. +# +# Two hash functions were tried. The current one and Murmur2. Exhaustive +# testing over shader-db produced the following results. There were a total +# of 6,749,837 calls to get_static_name in each run. The # in the column +# header refers messages mentioned in the list above. +# +#Bits Current hash Murmur2 +##2 / #3 / #4 #2 / #3 / #4 +# 128 5249223 / 262597 / 4826172 763090 / 285531 / 317105 +# 256 4955982 / 162087 / 4633441 508512 / 152491 / 195567 +# 512334736 / 52 / 63130 381691 / 98277 / 122960 +#1024236521 / 43809 / 32258 326657 / 69346 / 96857 +#2048174275 / 1533 / 12288 258885 / 49537 / 48894 +#4096171153 /341 / 10358 185632 /682 / 24496 +#8192161649 /264 / 931 163035 /142 / 2439 +# 16384160782 / 0 / 328 162764 / 15 / 2295 +# +# Past 512 bits, the current hash was clearly superior to Murmur2. 8,192 bits +# seems to be a sweet spot of a very small table size
[Mesa-dev] [PATCH 14/26] glsl: Add the possibility for ir_variable to have a non-ralloced name
From: Ian Romanick ian.d.roman...@intel.com Specifically, ir_var_temporary variables constructed with a NULL name will all have the name compiler_temp in static storage. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ir.cpp | 20 +++- src/glsl/ir.h| 10 ++ src/glsl/ir_validate.cpp | 2 +- 3 files changed, 30 insertions(+), 2 deletions(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 40dbb6b..0045c2d 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1533,12 +1533,30 @@ ir_swizzle::variable_referenced() const } +const char ir_variable::tmp_name[] = compiler_temp; + ir_variable::ir_variable(const struct glsl_type *type, const char *name, ir_variable_mode mode) : ir_instruction(ir_type_variable) { this-type = type; - this-name = ralloc_strdup(this, name); + + /* The ir_variable clone method may call this constructor with name set to +* tmp_name. +*/ + assert(name != NULL + || mode == ir_var_temporary + || mode == ir_var_function_in + || mode == ir_var_function_out + || mode == ir_var_function_inout); + assert(name != ir_variable::tmp_name + || mode == ir_var_temporary); + if (mode == ir_var_temporary +(name == NULL || name == ir_variable::tmp_name)) { + this-name = ir_variable::tmp_name; + } else { + this-name = ralloc_strdup(this, name); + } this-u.max_ifc_array_access = NULL; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 1071858..1aeb1ff 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -575,6 +575,11 @@ public: return this-u.state_slots; } + inline bool is_name_ralloced() const + { + return this-name != ir_variable::tmp_name; + } + /** * Enable emitting extension warnings for this variable */ @@ -881,6 +886,11 @@ private: * \sa ir_variable::location */ const glsl_type *interface_type; + + /** +* Name used for anonymous compiler temporaries +*/ + static const char tmp_name[]; }; /** diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index 12b8874..23f55fc 100644 --- a/src/glsl/ir_validate.cpp +++ b/src/glsl/ir_validate.cpp @@ -650,7 +650,7 @@ ir_validate::visit(ir_variable *ir) * in the ir_dereference_variable handler to ensure that a variable is * declared before it is dereferenced. */ - if (ir-name) + if (ir-name ir-is_name_ralloced()) assert(ralloc_parent(ir-name) == ir); hash_table_insert(ht, ir, ir); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 21/26] glsl: Make ir_instruction::ir_type private
From: Ian Romanick ian.d.roman...@intel.com In the next patch, the type of ir_type is going to change from enum to uint8_t. Since the type won't be an enum, we won't get compiler warnings about, for example, switch statements that don't have cases for all the enum values. Using a getter that returns the enum type will enable us to continue getting those warnings. Also, ir_type should never be changed after an object is created. Having it set in the constructor and no setter effectively makes it write-once. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ast_function.cpp | 2 +- src/glsl/ast_to_hir.cpp| 2 +- src/glsl/glsl_parser_extras.cpp| 2 +- src/glsl/ir.h | 8 +++- src/glsl/ir_constant_expression.cpp| 4 ++-- src/glsl/ir_print_visitor.cpp | 2 +- src/glsl/ir_validate.cpp | 6 +++--- src/glsl/loop_analysis.cpp | 2 +- src/glsl/loop_controls.cpp | 2 +- src/glsl/loop_unroll.cpp | 2 +- src/glsl/lower_clip_distance.cpp | 4 ++-- src/glsl/lower_if_to_cond_assign.cpp | 4 ++-- src/glsl/lower_jumps.cpp | 6 +++--- src/glsl/lower_offset_array.cpp| 2 +- src/glsl/lower_ubo_reference.cpp | 4 ++-- src/glsl/lower_vector.cpp | 4 ++-- src/glsl/lower_vector_insert.cpp | 2 +- src/glsl/opt_constant_folding.cpp | 2 +- src/glsl/opt_cse.cpp | 2 +- src/glsl/opt_dead_builtin_variables.cpp| 2 +- src/glsl/opt_rebalance_tree.cpp| 4 ++-- src/glsl/opt_redundant_jumps.cpp | 6 +++--- src/glsl/opt_structure_splitting.cpp | 2 +- src/glsl/opt_vectorize.cpp | 2 +- src/mesa/program/ir_to_mesa.cpp| 2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +- 26 files changed, 44 insertions(+), 38 deletions(-) diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index 2402e79..92c5bf1 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -171,7 +171,7 @@ verify_parameter_modes(_mesa_glsl_parse_state *state, /* Verify that 'const_in' parameters are ir_constants. */ if (formal-data.mode == ir_var_const_in - actual-ir_type != ir_type_constant) { + actual-get_ir_type() != ir_type_constant) { _mesa_glsl_error(loc, state, parameter `in %s' must be a constant expression, formal-name); diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 0a4b3a6..97b2033 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -767,7 +767,7 @@ do_assignment(exec_list *instructions, struct _mesa_glsl_parse_state *state, /* If the assignment LHS comes back as an ir_binop_vector_extract * expression, move it to the RHS as an ir_triop_vector_insert. */ - if (lhs-ir_type == ir_type_expression) { + if (lhs-get_ir_type() == ir_type_expression) { ir_expression *const lhs_expr = lhs-as_expression(); if (unlikely(lhs_expr-operation == ir_binop_vector_extract)) { diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 1fe38f3..ebfc031 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -1532,7 +1532,7 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, * are fly-weights that are looked up by glsl_type. */ foreach_in_list (ir_instruction, ir, shader-ir) { - switch (ir-ir_type) { + switch (ir-get_ir_type()) { case ir_type_function: { /* If the function is a built-in that is partially overridden in the * shader, the ir_function stored in the symbol table may not be the diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 42ccb00..ee42857 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -87,9 +87,15 @@ enum ir_node_type { * Base class of all IR instructions */ class ir_instruction : public exec_node { -public: +private: enum ir_node_type ir_type; +public: + inline enum ir_node_type get_ir_type() const + { + return this-ir_type; + } + /** * GCC 4.7+ and clang warn when deleting an ir_instruction unless * there's a virtual destructor present. Because we almost diff --git a/src/glsl/ir_constant_expression.cpp b/src/glsl/ir_constant_expression.cpp index 5570ed4..e277314 100644 --- a/src/glsl/ir_constant_expression.cpp +++ b/src/glsl/ir_constant_expression.cpp @@ -402,7 +402,7 @@ constant_referenced(const ir_dereference *deref, if (variable_context == NULL) return false; - switch (deref-ir_type) { + switch (deref-get_ir_type()) { case ir_type_dereference_array: { const ir_dereference_array *const da = (const ir_dereference_array *) deref; @@
[Mesa-dev] [PATCH 24/26] glsl: Store short variable names inside ir_variable
From: Ian Romanick ian.d.roman...@intel.com Most of the overhead of the name allocation is the ralloc tracking, especially on 64-bit. The allocation of the variable name i is 2 bytes for the name and 40 bytes for the ralloc tracking. I collected data from shader-db. At the end of link_shaders I added a visitor to print the name of every ir_variable in every shader. Out of the 188,786 names logged, 47,636 were two characters or less (the size that will fit in the padding on 32-bit builds) and 89,610 were six characters or less (the size that will fit in the padding on 64-bit builds). Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 56 40,527,975,617 66,347,992 61,187,818 5,160,1740 After (32-bit): 52 40,521,071,734 66,157,928 61,054,870 5,103,0580 Before (64-bit): 62 37,089,989,777 93,205,752 85,900,367 7,305,3850 After (64-bit): 48 37,089,379,412 92,630,712 85,454,539 7,176,1730 A real savings of 129KiB on 32-bit and 435KiB on 64-bit. The savings is much higher on 64-bit because ir_instruction::padding is 7 bytes (instead of 3 bytes on 32-bit). This allows a lot more names to fit in the padding. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ir.cpp | 5 + src/glsl/ir.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index bc008a4..4b22439 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1559,6 +1559,11 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, if (mode == ir_var_temporary (name == NULL || name == ir_variable::tmp_name)) { this-name = ir_variable::tmp_name; + } else if (name == NULL) { + this-padding[0] = 0; + this-name = (char *) this-padding; + } else if (strlen(name) sizeof(this-padding)) { + this-name = strcpy((char *) this-padding, name); } else { this-name = ralloc_strdup(this, name); } diff --git a/src/glsl/ir.h b/src/glsl/ir.h index ead0863..770fe60 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -598,7 +598,8 @@ public: inline bool is_name_ralloced() const { - return this-name != ir_variable::tmp_name; + return this-name != ir_variable::tmp_name + this-name != (char *) this-padding; } /** -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/26] glsl: Make ir_variable::num_state_slots and ir_variable::state_slots private
From: Ian Romanick ian.d.roman...@intel.com Also move num_state_slots inside ir_variable_data for better packing. The payoff for this will come in a few more patches. No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/builtin_variables.cpp | 5 +-- src/glsl/ir.h | 61 +++--- src/glsl/ir_clone.cpp | 13 ++ src/glsl/ir_validate.cpp | 2 +- src/glsl/linker.cpp| 7 +-- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +-- src/mesa/drivers/dri/i965/brw_shader.cpp | 6 +-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 6 +-- src/mesa/program/ir_to_mesa.cpp| 14 +++--- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 14 +++--- 10 files changed, 78 insertions(+), 56 deletions(-) diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp index 6fc3319..9e31fb8 100644 --- a/src/glsl/builtin_variables.cpp +++ b/src/glsl/builtin_variables.cpp @@ -478,12 +478,9 @@ builtin_variable_generator::add_uniform(const glsl_type *type, _mesa_builtin_uniform_desc[i]; const unsigned array_count = type-is_array() ? type-length : 1; - uni-num_state_slots = array_count * statevar-num_elements; ir_state_slot *slots = - ralloc_array(uni, ir_state_slot, uni-num_state_slots); - - uni-state_slots = slots; + uni-allocate_state_slots(array_count * statevar-num_elements); for (unsigned a = 0; a array_count; a++) { for (unsigned j = 0; j statevar-num_elements; j++) { diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 1924098..c47a1fb 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -537,6 +537,37 @@ public: return this-max_ifc_array_access; } + inline unsigned get_num_state_slots() const + { + return this-data._num_state_slots; + } + + inline void set_num_state_slots(unsigned n) + { + this-data._num_state_slots = n; + } + + inline ir_state_slot *get_state_slots() + { + return this-state_slots; + } + + inline const ir_state_slot *get_state_slots() const + { + return this-state_slots; + } + + inline ir_state_slot *allocate_state_slots(unsigned n) + { + this-state_slots = ralloc_array(this, ir_state_slot, n); + this-data._num_state_slots = 0; + + if (this-state_slots != NULL) + this-data._num_state_slots = n; + + return this-state_slots; + } + /** * Enable emitting extension warnings for this variable */ @@ -729,6 +760,10 @@ public: /** Image internal format if specified explicitly, otherwise GL_NONE. */ uint16_t image_format; + private: + unsigned _num_state_slots;/** Number of state slots used */ + + public: /** * Storage location of the base of this variable * @@ -782,22 +817,6 @@ public: } data; /** -* Built-in state that backs this uniform -* -* Once set at variable creation, \c state_slots must remain invariant. -* This is because, ideally, this array would be shared by all clones of -* this variable in the IR tree. In other words, we'd really like for it -* to be a fly-weight. -* -* If the variable is not a uniform, \c num_state_slots will be zero and -* \c state_slots will be \c NULL. -*/ - /*@{*/ - unsigned num_state_slots;/** Number of state slots used */ - ir_state_slot *state_slots; /** State descriptors. */ - /*@}*/ - - /** * Value assigned in the initializer of a variable declared const */ ir_constant *constant_value; @@ -829,6 +848,16 @@ private: unsigned *max_ifc_array_access; /** +* Built-in state that backs this uniform +* +* Once set at variable creation, \c state_slots must remain invariant. +* +* If the variable is not a uniform, \c _num_state_slots will be zero and +* \c state_slots will be \c NULL. +*/ + ir_state_slot *state_slots; + + /** * For variables that are in an interface block or are an instance of an * interface block, this is the \c GLSL_TYPE_INTERFACE type for that block. * diff --git a/src/glsl/ir_clone.cpp b/src/glsl/ir_clone.cpp index 86c0e31..64019fe 100644 --- a/src/glsl/ir_clone.cpp +++ b/src/glsl/ir_clone.cpp @@ -53,15 +53,10 @@ ir_variable::clone(void *mem_ctx, struct hash_table *ht) const memcpy(var-data, this-data, sizeof(var-data)); - var-num_state_slots = this-num_state_slots; - if (this-state_slots) { - /* FINISHME: This really wants to use something like talloc_reference, but - * FINISHME: ralloc doesn't have any similar function. - */ - var-state_slots = ralloc_array(var, ir_state_slot, - this-num_state_slots); - memcpy(var-state_slots, this-state_slots, -sizeof(this-state_slots[0]) *
[Mesa-dev] [PATCH 19/26] glsl: Don't make a name for the function return variable
From: Ian Romanick ian.d.roman...@intel.com If the name is just going to get dropped, don't bother making it. If the name is made, release it sooner (rather than later). No change Valgrind massif results for a trimmed apitrace of dota2. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ast_function.cpp | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index dbf4e66..2402e79 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -408,14 +408,17 @@ generate_call(exec_list *instructions, ir_function_signature *sig, ir_dereference_variable *deref = NULL; if (!sig-return_type-is_void()) { /* Create a new temporary to hold the return value. */ + char *const name = ir_variable::temporaries_allocate_names + ? ralloc_asprintf(ctx, %s_retval, sig-function_name()) + : NULL; + ir_variable *var; - var = new(ctx) ir_variable(sig-return_type, -ralloc_asprintf(ctx, %s_retval, -sig-function_name()), -ir_var_temporary); + var = new(ctx) ir_variable(sig-return_type, name, ir_var_temporary); instructions-push_tail(var); + ralloc_free(name); + deref = new(ctx) ir_dereference_variable(var); } ir_call *call = new(ctx) ir_call(sig, deref, actual_parameters); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/26] glsl: Use a single bit for the dual-source blend index
From: Ian Romanick ian.d.roman...@intel.com The only values allowed are 0 and 1, and the value is checked before assigning. Valgrind massif results for a trimmed apitrace of dota2: ntime(i) total(B) useful-heap(B) extra-heap(B)stacks(B) Before (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,2170 After (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,4650 Before (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,3200 After (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,5280 A real savings of 173KiB on 32-bit and no change on 64-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Kenneth Graunke kenn...@whitecape.org --- src/glsl/ir.h | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 9dbcb7a..147d337 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -684,6 +684,15 @@ public: unsigned must_be_shader_input:1; /** + * Output index for dual source blending. + * + * \note + * The GLSL spec only allows the values 0 or 1 for the index in \b dual + * source blending. + */ + unsigned index:1; + + /** * \brief Layout qualifier for gl_FragDepth. * * This is not equal to \c ir_depth_layout_none if and only if this @@ -717,11 +726,6 @@ public: unsigned stream; /** - * output index for dual source blending. - */ - int index; - - /** * Initial binding point for a sampler, atomic, or UBO. * * For array types, this represents the binding point for the first element. -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: add a mechanism to allow #extension directives in the middle of shaders
Looks good to me. Reviewed-by: Ian Romanick ian.d.roman...@intel.com On 07/08/2014 11:29 AM, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com This is needed to make Unigine Heaven 4.0 and Unigine Valley 1.0 work with sample shading. Also, if this is disabled, the error message at least makes sense now. --- src/glsl/glsl_parser.yy | 8 src/glsl/glsl_parser_extras.cpp | 2 ++ src/glsl/glsl_parser_extras.h | 2 ++ src/mesa/main/mtypes.h | 5 + 4 files changed, 17 insertions(+) diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy index b989749..4c87163 100644 --- a/src/glsl/glsl_parser.yy +++ b/src/glsl/glsl_parser.yy @@ -376,6 +376,14 @@ external_declaration_list: if ($2 != NULL) state-translation_unit.push_tail( $2-link); } + | external_declaration_list extension_statement { + if (!state-allow_extension_directive_midshader) { + _mesa_glsl_error( @2, state, + #extension directive is not allowed + in the middle of a shader); + YYERROR; + } + } ; variable_identifier: diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index b327c2b..890123a 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -210,6 +210,8 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx, this-early_fragment_tests = false; memset(this-atomic_counter_offsets, 0, sizeof(this-atomic_counter_offsets)); + this-allow_extension_directive_midshader = + ctx-Const.AllowGLSLExtensionDirectiveMidShader; } /** diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index 1791816..f720d6a 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -490,6 +490,8 @@ struct _mesa_glsl_parse_state { /** Atomic counter offsets by binding */ unsigned atomic_counter_offsets[MAX_COMBINED_ATOMIC_BUFFERS]; + + bool allow_extension_directive_midshader; }; # define YYLLOC_DEFAULT(Current, Rhs, N) \ diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index a7126fd..b42a175 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3349,6 +3349,11 @@ struct gl_constants GLuint ForceGLSLVersion; /** +* Allow GLSL #extension directives in the middle of shaders. +*/ + GLboolean AllowGLSLExtensionDirectiveMidShader; + + /** * Does the driver support real 32-bit integers? (Otherwise, integers are * simulated via floats.) */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: Don't override user -g or -O options for debug builds
On 07/14/2014 01:41 PM, Matt Turner wrote: On Mon, Jul 14, 2014 at 11:49 AM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com I already pass '-ggdb3 -O1' or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them for me. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- configure.ac | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/configure.ac b/configure.ac index 4646212..04abcc0 100644 --- a/configure.ac +++ b/configure.ac @@ -308,10 +308,20 @@ AC_ARG_ENABLE([debug], if test x$enable_debug = xyes; then DEFINES=$DEFINES -DDEBUG if test x$GCC = xyes; then -CFLAGS=$CFLAGS -g -O0 +if ! echo $CFLAGS | grep -q -e '-g'; then +CFLAGS=$CFLAGS -g +fi +if ! echo $CFLAGS | grep -q -e '-O'; then +CFLAGS=$CFLAGS -O0 +fi fi if test x$GXX = xyes; then -CXXFLAGS=$CXXFLAGS -g -O0 +if ! echo $CXXFLAGS | grep -q -e '-g'; then +CXXFLAGS=$CXXFLAGS -g +fi +if ! echo $CXXFLAGS | grep -q -e '-O'; then +CXXFLAGS=$CXXFLAGS -O0 +fi fi fi -- 1.8.1.4 This looks like strictly an improvement, so my feedback shouldn't hold this up, but maybe it's time to fight about whether --enable-debug should add -O0 at all. :) Reviewed-by: Matt Turner matts...@gmail.com Autotools defaults are -g -O2. Some people want debug builds to use -g, others -ggdb*. Some people want -O2 (optimize, but with assertions enabled), others -O0, others -Og. It'd be nice to not have --enable-debug turn on one combination of these. I thought about that when I wrote this patch, but I decided against it. Principle of least surprise: --enable-debug should enable debugging. Ages ago, Mesa's build system only added -g in dri-debug builds (yay for the static Makefiles). If you forgot to change it (or wrap the build with custom scripts), you would often be disappointed when trying to gdb Mesa bugs. New developers, that may not yet have custom scripts, will have this same issue. I think we should enable experienced developers to do what they want, and make things easier for new developers. That said, if the previaling opinion is to just remove it, I'm okay with that too... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 81174] Gallium: GL_LINE_LOOP broken with more than 512 points
https://bugs.freedesktop.org/show_bug.cgi?id=81174 --- Comment #7 from Roland Scheidegger srol...@vmware.com --- Hmm interesting. I thought more than 512 vertices would fit before it uses a new buffer (the buffer size is 64kB though I don't know what the vertex size ends up with). But if the bug is due to that I guess indeed changing the prim type (probably in vbo_exec_wrap_buffers()) and fixing up the copied vertices stuff (if it doesn't do the right thing already) would do the trick. But indeed closing the loop finally would be tricky (well I guess would need to keep the vertex around when doing the buffer wrap, then can simply emit it as another vertex last). I have no idea of the effects this has on things like line stippling but I guess it couldn't be worse than it is now... Also, things like GL_POLYGON with fill mode line probably has the same problem and can't be fixed that way really. Note that the draw module has its own decomposition of primitives into smaller chunks too which could also be buggy (vsplit stuff). -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 81174] Gallium: GL_LINE_LOOP broken with more than 512 points
https://bugs.freedesktop.org/show_bug.cgi?id=81174 --- Comment #8 from Marek Olšák mar...@gmail.com --- Line loops aren't so critical, since they are rarely used. But what about triangle strips... every split kills 2 triangles... -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 81174] Gallium: GL_LINE_LOOP broken with more than 512 points
https://bugs.freedesktop.org/show_bug.cgi?id=81174 --- Comment #9 from Roland Scheidegger srol...@vmware.com --- (In reply to comment #8) Line loops aren't so critical, since they are rarely used. But what about triangle strips... every split kills 2 triangles... That shouldn't happen, the code is supposed to copy over the necessary two tris. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glapi: add indexed blend functions (GL 4.0)
This makes some of the UE4 engine demos (Stylized, Mobile Temple) render correctly, tested on Intel Haswell machine. Signed-off-by: Tapani Pälli tapani.pa...@intel.com --- src/mapi/glapi/gen/GL4x.xml | 26 ++ src/mesa/main/tests/dispatch_sanity.cpp | 10 +- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/src/mapi/glapi/gen/GL4x.xml b/src/mapi/glapi/gen/GL4x.xml index 8efef0b..848316e 100644 --- a/src/mapi/glapi/gen/GL4x.xml +++ b/src/mapi/glapi/gen/GL4x.xml @@ -12,6 +12,32 @@ function name=MinSampleShading offset=assign param name=value type=GLfloat/ /function + + function name=BlendFunci static_dispatch=false alias=BlendFunciARB +param name=buf type=GLuint/ +param name=sfactor type=GLenum/ +param name=dfactor type=GLenum/ + /function + + function name=BlendFuncSeparatei static_dispatch=false alias=BlendFuncSeparateiARB +param name=buf type=GLuint/ +param name=sfactorRGB type=GLenum/ +param name=dfactorRGB type=GLenum/ +param name=sfactorAlpha type=GLenum/ +param name=dfactorAlpha type=GLenum/ + /function + + function name=BlendEquationi static_dispatch=false alias=BlendEquationiARB +param name=buf type=GLuint/ +param name=mode type=GLenum/ + /function + + function name=BlendEquationSeparatei static_dispatch=false alias=BlendEquationSeparateiARB +param name=buf type=GLuint/ +param name=modeRGB type=GLenum/ +param name=modeA type=GLenum/ + /function + /category category name=4.3 diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 0e57653..1a2c4d0 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -542,11 +542,11 @@ const struct function gl_core_functions_possible[] = { { glVertexAttribDivisor, 33, -1 }, /* GL 4.0 */ - { glMinSampleShading, 40, -1 },// XXX: Add to xml -// { glBlendEquationi, 40, -1 }, // XXX: Add to xml -// { glBlendEquationSeparatei, 40, -1 }, // XXX: Add to xml -// { glBlendFunci, 40, -1 }, // XXX: Add to xml -// { glBlendFuncSeparatei, 40, -1 }, // XXX: Add to xml + { glMinSampleShading, 40, -1 }, + { glBlendEquationi, 40, -1 }, + { glBlendEquationSeparatei, 40, -1 }, + { glBlendFunci, 40, -1 }, + { glBlendFuncSeparatei, 40, -1 }, /* GL 4.3 */ { glIsRenderbuffer, 43, -1 }, -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Release-candidate branch for upcoming 10.2.4
Hi folks, I've pushed out an update to the 10.2 branch and I need some specific testing in the next three days. I've tested the branch on Intel (Haswell) as well as both swrast and Gallium softpipe and found no piglit regressions compared to the 10.2.3 release. The branch includes a few patches to nouveau and radeonsi which I have not been able to test. If someone will test one of these drivers with piglit and let me know that all looks good, I'll be happy to include the patches in the release. Otherwise, I'll drop any untested patches before making the final release on Friday. Also, there's still time in the next three days for someone to nominate further driver-specific changes. I'll just need positive piglit test results for any such patches, (on top of the branch as it stands now), before I'll accept them. Thanks, all. -Carl -- carl.d.wo...@intel.com pgpn1miAvjfae.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 81174] Gallium: GL_LINE_LOOP broken with more than 512 points
https://bugs.freedesktop.org/show_bug.cgi?id=81174 --- Comment #10 from Florian Link florianl...@gmail.com --- What I don't understand yet is why the effect is not stable (if my example is repainted several times, the wrap appears at different positions over time). -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev