Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value
On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote: On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery nanleych...@gmail.com wrote: From: Nanley Chery nanley.g.ch...@intel.com Although the horizontal and vertical alignment fields are ignored here, 0 is a reserved value for them and may cause undefined behavior. Change the default value to an abitrary valid one. Signed-off-by: Nanley Chery nanley.g.ch...@intel.com --- src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index b2d1a57..22ae960 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw, if (brw-gen 8 (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE || surf_type == BRW_SURFACE_1D)) - return 0; + return GEN8_SURFACE_VALIGN_4; switch (mt-align_h) { case 4: @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw, if (brw-gen 8 (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE || gen9_use_linear_1d_layout(brw, mt))) - return 0; + return GEN8_SURFACE_HALIGN_4; switch (mt-align_w) { case 4: -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Good find Nanley. We had no known issues with value 0 but it's always nice to avoid undefined behavior :). Right, I thought about this when I reviewed the original. The spec says it is ignored in these cases and hence the reserved value seemed fine. Now that we put something meaningful there, somebody is going to compare the spec and wonder why we set it to 4. If we added also a comment here that says this is just an arbitrary (non-reserved) value and really ignored by the hardware, it would prevent misunderstandings. What do you guys think? Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations
On 25/06/15 01:13, Dave Airlie wrote: -fno-strict-aliasing:with strict aliasing: libGL.so 699188 699188(no change) *_dri.so 9575876 9563104(-2772) Use the size command to get the actual text segment size, otherwise debugging symbols can drown changes. Dave. I had stripped debugging symbols. size gives me: text databssdechexfilename 590628 6540 1636 598804 92314 libGL.so.1.2.0 text databssdechexfilename 590628 6540 1636 598804 92314 libGL.so.1.2.0 text databssdechexfilename 8477420 265925197861610721961 a39aa9r600_dri.so text databssdechexfilename 8464211 265925197861610708752 a36710r600_dri.so I.e. about a 10k difference in the text size of the dri modules. Davin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/4] i965: use SSA values when we can
Before, we were using a hack where when we converted out of SSA, we set a parent_instr field of the nir_register to indicate that the register was actually an SSA value. But in the future, we want to handle SSA values directly, and right now we're creating an extra nir_register for everything, even if it's not involved in a phi node. This series removes that hack for i965 and gets us using SSA values directly in most cases. The only other user of nir_convert_from_ssa() is vc4, which I believed I changed correctly, and it doesn't seem to use nir_register::parent_instr, based on my grepping. I tried to compile-test it, but it assumed I was using the simulator and died, so it would be nice to at least compile-test it. The changes are also available at: git://people.freedesktop.org/~cwabbott0/mesa i965-use-ssa Connor Abbott (4): nir/from_ssa: add a flag to not convert everything to SSA i965/fs: use SSA values directly nir: remove nir_src_get_parent_instr() nir: remove parent_instr from nir_register src/gallium/drivers/vc4/vc4_program.c | 2 +- src/glsl/nir/nir.c | 1 - src/glsl/nir/nir.h | 25 ++-- src/glsl/nir/nir_from_ssa.c| 33 +- src/mesa/drivers/dri/i965/brw_fs.h | 5 ++ src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 73 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 1 + src/mesa/drivers/dri/i965/brw_nir.c| 2 +- .../dri/i965/brw_nir_analyze_boolean_resolves.c| 12 ++-- 9 files changed, 84 insertions(+), 70 deletions(-) -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] nir: remove nir_src_get_parent_instr()
It's now unused. Signed-off-by: Connor Abbott cwabbo...@gmail.com --- src/glsl/nir/nir.h | 10 -- 1 file changed, 10 deletions(-) diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 2116f60..b33c9c5 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -565,16 +565,6 @@ nir_src_for_reg(nir_register *reg) return src; } -static inline nir_instr * -nir_src_get_parent_instr(const nir_src *src) -{ - if (src-is_ssa) { - return src-ssa-parent_instr; - } else { - return src-reg.reg-parent_instr; - } -} - static inline nir_dest nir_dest_for_reg(nir_register *reg) { -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/17] i965/fs: Make better use of the builder in shader_time
On Tue, Jun 23, 2015 at 2:09 AM, Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Thu, Jun 18, 2015 at 05:51:37PM -0700, Jason Ekstrand wrote: Previously, we were just depending on register widths to ensure that various things were exec_size of 1 etc. Now, we do so explicitly using the builder. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c13ac7d..740b51d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -557,7 +557,7 @@ fs_visitor::get_timestamp(const fs_builder bld) /* We want to read the 3 fields we care about even if it's not enabled in * the dispatch. */ - bld.exec_all().MOV(dst, ts); + bld.group(4, 0).exec_all().MOV(dst, ts); Just to make sure I understand correctly, we want SIMD4 in order to read wide enough to get all the mentioned 3 fields? I believe so, yes. /* The caller wants the low 32 bits of the timestamp. Since it's running * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds, @@ -637,17 +637,19 @@ fs_visitor::emit_shader_time_end() start.negate = true; fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1); diff.set_smear(0); - ibld.ADD(diff, start, shader_end_time); + + const fs_builder cbld = ibld.group(1, 0); + cbld.group(1, 0).ADD(diff, start, shader_end_time); /* If there were no instructions between the two timestamp gets, the diff * is 2 cycles. Remove that overhead, so I can forget about that when * trying to determine the time taken for single instructions. */ - ibld.ADD(diff, diff, fs_reg(-2u)); - SHADER_TIME_ADD(ibld, type, diff); - SHADER_TIME_ADD(ibld, written_type, fs_reg(1u)); + cbld.ADD(diff, diff, fs_reg(-2u)); + SHADER_TIME_ADD(cbld, type, diff); + SHADER_TIME_ADD(cbld, written_type, fs_reg(1u)); ibld.emit(BRW_OPCODE_ELSE); - SHADER_TIME_ADD(ibld, reset_type, fs_reg(1u)); + SHADER_TIME_ADD(cbld, reset_type, fs_reg(1u)); ibld.emit(BRW_OPCODE_ENDIF); } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 3/6] mesa/es3.1: enable GL_ARB_texture_multisample for GLES 3.1
On Thu, Jun 25, 2015 at 5:08 AM, Marta Lofstedt marta.lofst...@linux.intel.com wrote: From: Marta Lofstedt marta.lofst...@intel.com v4 : only expose GL_ARB_texture_multisample enums for gles 3.1 and desktop GL. I was suspicious of this logic. Based on my reading of the code, what your ARB_texture_multisample_es31 thing does is expose those enums when *either* the driver enables ARB_texture_multisample *or* the current context is ES3.1. ARB_draw_indirect_es31 has the same problem, btw. I could have misread the get.c extra_ext() logic, but I don't think I have. As far as I can tell there's no way to (generically) AND these things. What you really need to do is create a whole new [GL, GL_CORE, ES31] section in get_hash_params and update get_hash_generator.py accordingly. Cheers, -ilia Signed-off-by: Marta Lofstedt marta.lofst...@intel.com --- src/mesa/main/get.c | 6 ++ src/mesa/main/get_hash_params.py | 18 +- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 4a0537d..6148349 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -367,6 +367,12 @@ static const int extra_ARB_shader_atomic_counters_es31[] = { EXTRA_END }; +static const int extra_ARB_texture_multisample_es31[] = { + EXT(ARB_texture_multisample), + EXTRA_API_ES31, + EXTRA_END +}; + EXTRA_EXT(ARB_texture_cube_map); EXTRA_EXT(EXT_texture_array); EXTRA_EXT(NV_fog_distance); diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py index e25dbe1..0026000 100644 --- a/src/mesa/main/get_hash_params.py +++ b/src/mesa/main/get_hash_params.py @@ -426,6 +426,15 @@ descriptor=[ [ MAX_FRAGMENT_ATOMIC_COUNTERS, CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxAtomicCounters), extra_ARB_shader_atomic_counters_es31 ], [ MAX_COMBINED_ATOMIC_COUNTER_BUFFERS, CONTEXT_INT(Const.MaxCombinedAtomicBuffers), extra_ARB_shader_atomic_counters_es31 ], [ MAX_COMBINED_ATOMIC_COUNTERS, CONTEXT_INT(Const.MaxCombinedAtomicCounters), extra_ARB_shader_atomic_counters_es31 ], + +# GL_ARB_texture_multisample / GLES 3.1 + [ TEXTURE_BINDING_2D_MULTISAMPLE, LOC_CUSTOM, TYPE_INT, TEXTURE_2D_MULTISAMPLE_INDEX, extra_ARB_texture_multisample_es31 ], + [ TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY, LOC_CUSTOM, TYPE_INT, TEXTURE_2D_MULTISAMPLE_ARRAY_INDEX, extra_ARB_texture_multisample_es31 ], + [ MAX_COLOR_TEXTURE_SAMPLES, CONTEXT_INT(Const.MaxColorTextureSamples), extra_ARB_texture_multisample_es31 ], + [ MAX_DEPTH_TEXTURE_SAMPLES, CONTEXT_INT(Const.MaxDepthTextureSamples), extra_ARB_texture_multisample_es31 ], + [ MAX_INTEGER_SAMPLES, CONTEXT_INT(Const.MaxIntegerSamples), extra_ARB_texture_multisample_es31 ], + [ SAMPLE_MASK, CONTEXT_BOOL(Multisample.SampleMask), extra_ARB_texture_multisample_es31 ], + [ MAX_SAMPLE_MASK_WORDS, CONST(1), extra_ARB_texture_multisample_es31 ], ]}, # Enums in OpenGL Core profile and ES 3.1 @@ -717,15 +726,6 @@ descriptor=[ [ TEXTURE_BUFFER_FORMAT_ARB, LOC_CUSTOM, TYPE_INT, 0, extra_texture_buffer_object ], [ TEXTURE_BUFFER_ARB, LOC_CUSTOM, TYPE_INT, 0, extra_texture_buffer_object ], -# GL_ARB_texture_multisample / GL 3.2 - [ TEXTURE_BINDING_2D_MULTISAMPLE, LOC_CUSTOM, TYPE_INT, TEXTURE_2D_MULTISAMPLE_INDEX, extra_ARB_texture_multisample ], - [ TEXTURE_BINDING_2D_MULTISAMPLE_ARRAY, LOC_CUSTOM, TYPE_INT, TEXTURE_2D_MULTISAMPLE_ARRAY_INDEX, extra_ARB_texture_multisample ], - [ MAX_COLOR_TEXTURE_SAMPLES, CONTEXT_INT(Const.MaxColorTextureSamples), extra_ARB_texture_multisample ], - [ MAX_DEPTH_TEXTURE_SAMPLES, CONTEXT_INT(Const.MaxDepthTextureSamples), extra_ARB_texture_multisample ], - [ MAX_INTEGER_SAMPLES, CONTEXT_INT(Const.MaxIntegerSamples), extra_ARB_texture_multisample ], - [ SAMPLE_MASK, CONTEXT_BOOL(Multisample.SampleMask), extra_ARB_texture_multisample ], - [ MAX_SAMPLE_MASK_WORDS, CONST(1), extra_ARB_texture_multisample ], - # GL 3.0 [ CONTEXT_FLAGS, CONTEXT_INT(Const.ContextFlags), extra_version_30 ], -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 16/19] i965/fs: Use the builder dispatch_width for computing register offsets
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index d4cc43d..d94a842 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -72,7 +72,7 @@ offset(fs_reg reg, const brw::fs_builder bld, unsigned delta) case MRF: case ATTR: return byte_offset(reg, - delta * MAX2(reg.width * reg.stride, 1) * + delta * bld.dispatch_width() * reg.stride * type_sz(reg.type)); case UNIFORM: reg.reg_offset += delta; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: add support for geometry shader invocations.
From: Dave Airlie airl...@redhat.com Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/radeonsi/si_shader.c| 5 + src/gallium/drivers/radeonsi/si_shader.h| 1 + src/gallium/drivers/radeonsi/si_state.c | 1 - src/gallium/drivers/radeonsi/si_state_shaders.c | 7 +++ 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 87608a1..665ce83 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -630,6 +630,11 @@ static void declare_system_value( SI_PARAM_BASE_VERTEX); break; + case TGSI_SEMANTIC_INVOCATIONID: + value = LLVMGetParam(radeon_bld-main_fn, +SI_PARAM_GS_INSTANCE_ID); + break; + case TGSI_SEMANTIC_SAMPLEID: value = get_sample_id(radeon_bld); break; diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index 51055af..b4339ae 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -115,6 +115,7 @@ struct si_shader_selector { unsignedgs_output_prim; unsignedgs_max_out_vertices; + unsignedgs_num_invocations; uint64_tgs_used_inputs; /* mask of get_unique_index bits */ }; diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 752467b..0dd08a2 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -3078,7 +3078,6 @@ void si_init_config(struct si_context *sctx) si_pm4_set_reg(pm4, R_028B60_VGT_GS_VERT_ITEMSIZE_1, 0); si_pm4_set_reg(pm4, R_028B64_VGT_GS_VERT_ITEMSIZE_2, 0); si_pm4_set_reg(pm4, R_028B68_VGT_GS_VERT_ITEMSIZE_3, 0); - si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, 0); si_pm4_set_reg(pm4, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0); si_pm4_set_reg(pm4, R_028AB4_VGT_REUSE_OFF, 0); diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 48128fa..eef3baa 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -76,6 +76,7 @@ static void si_shader_gs(struct si_shader *shader) unsigned gs_vert_itemsize = shader-selector-info.num_outputs * (16 2); unsigned gs_max_vert_out = shader-selector-gs_max_out_vertices; unsigned gsvs_itemsize = gs_vert_itemsize * gs_max_vert_out; + unsigned gs_num_invocations = shader-selector-gs_num_invocations; unsigned cut_mode; struct si_pm4_state *pm4; unsigned num_sgprs, num_user_sgprs; @@ -118,6 +119,10 @@ static void si_shader_gs(struct si_shader *shader) si_pm4_set_reg(pm4, R_028B5C_VGT_GS_VERT_ITEMSIZE, gs_vert_itemsize); + si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, + S_028B90_CNT(MIN2(gs_num_invocations, 127)) | + S_028B90_ENABLE(gs_num_invocations 0)); + va = shader-bo-gpu_address; si_pm4_add_bo(pm4, shader-bo, RADEON_USAGE_READ, RADEON_PRIO_SHADER_DATA); si_pm4_set_reg(pm4, R_00B220_SPI_SHADER_PGM_LO_GS, va 8); @@ -490,6 +495,8 @@ static void *si_create_shader_state(struct pipe_context *ctx, sel-info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM]; sel-gs_max_out_vertices = sel-info.properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES]; + sel-gs_num_invocations = + sel-info.properties[TGSI_PROPERTY_GS_INVOCATIONS]; for (i = 0; i sel-info.num_inputs; i++) { unsigned name = sel-info.input_semantic_name[i]; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/hud: prevent NULL pointer dereference with pipe_query functions
On 06/25/2015 02:36 PM, Marek Olšák wrote: What's the point of drawing a HUD pane if a query cannot be created? With my series which adds support for global performance counters on NV50, query_create() may fail if we want to monitor *two* different query types with the HUD. This limitation is due to how the HUD uses the pipe_query interface and this doesn't fit well with the underlying interface exposed by Nouveau. In other words, with two different query types the scenario is as follows: CREATE Q1, BEGIN Q1, CREATE Q2, BEGIN Q2, END Q1, RESULT Q1, BEGIN Q1, END Q2, RESULT Q2, BEGIN Q2, END Q1, and so on. But, with nv50/nvc0 drivers I need to schedule hardware counters at query creation and this is going to be pretty hard without a really weird workaround. Hence, only one query type can be monitored simultaneously, and query_create() fails. A better scenario for nouveau drivers will be: CREATE Q1, CREATE Q2, BEGIN Q1, BEGIN Q2, END Q1, END Q2, RESULT Q1, RESULT Q2, BEGIN Q1, and so on. This could allow to introduce, for example, begin_all_queries() and end_all_queries() to be able to create/begin/end all queries in one shot *only*. My plan is to change this behaviour but it will require lot of changes in the HUD mainly because queries are collected by pane. Can we detect this during initialization? I'm not sure if we can detect this at initialization and if this is going to be easy to do. But, how can we handle the case where a driver will only fail one time to create a query? Do we need to remove the pane? Not sure. This is going to be hard to say, especially because nouveau drivers could fail if no hardware counters are available. Marek On Wed, Jun 24, 2015 at 9:26 PM, Samuel Pitoiset samuel.pitoi...@gmail.com wrote: The HUD doesn't check if query_create() fails and it calls other pipe_query functions with NULL pointer instead of a valid query object. Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/gallium/auxiliary/hud/hud_driver_query.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c b/src/gallium/auxiliary/hud/hud_driver_query.c index 603aba7..ee71678 100644 --- a/src/gallium/auxiliary/hud/hud_driver_query.c +++ b/src/gallium/auxiliary/hud/hud_driver_query.c @@ -62,7 +62,8 @@ query_new_value(struct hud_graph *gr) uint64_t now = os_time_get(); if (info-last_time) { - pipe-end_query(pipe, info-query[info-head]); + if (info-query[info-head]) + pipe-end_query(pipe, info-query[info-head]); /* read query results */ while (1) { @@ -70,7 +71,7 @@ query_new_value(struct hud_graph *gr) union pipe_query_result result; uint64_t *res64 = (uint64_t *)result; - if (pipe-get_query_result(pipe, query, FALSE, result)) { + if (query pipe-get_query_result(pipe, query, FALSE, result)) { info-results_cumulative += res64[info-result_index]; info-num_results++; @@ -88,7 +89,8 @@ query_new_value(struct hud_graph *gr) gallium_hud: all queries are busy after %i frames, can't add another query\n, NUM_QUERIES); - pipe-destroy_query(pipe, info-query[info-head]); + if (info-query[info-head]) + pipe-destroy_query(pipe, info-query[info-head]); info-query[info-head] = pipe-create_query(pipe, info-query_type, 0); } @@ -113,15 +115,15 @@ query_new_value(struct hud_graph *gr) info-results_cumulative = 0; info-num_results = 0; } - - pipe-begin_query(pipe, info-query[info-head]); } else { /* initialize */ info-last_time = now; info-query[info-head] = pipe-create_query(pipe, info-query_type, 0); - pipe-begin_query(pipe, info-query[info-head]); } + + if (info-query[info-head]) + pipe-begin_query(pipe, info-query[info-head]); } static void -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome
https://bugs.freedesktop.org/show_bug.cgi?id=90264 --- Comment #23 from Furkan fal...@gmail.com --- Still there for me. I'm using Ubuntu 15.04, Chrome 43.0.2357.130, Linux 4.1, and Mesa 10.7~git1506250730.d1663c~gd~v from oibaf ppa. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] i965: Define HW-binding table and resource streamer control opcodes
v2: Use macros for HW binding table edits (Topi) v3: Add Broadwell support. v4: Make hardware binding table bit definitions even more clearer (Ken) Cc: kenn...@whitecape.org Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com Reviewed-by: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_defines.h | 30 ++ src/mesa/drivers/dri/i965/intel_reg.h | 3 +++ 2 files changed, 33 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 66b9abc..ca0064e 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1640,6 +1640,36 @@ enum brw_message_target { #define _3DSTATE_BINDING_TABLE_POINTERS_GS 0x7829 /* GEN7+ */ #define _3DSTATE_BINDING_TABLE_POINTERS_PS 0x782A /* GEN7+ */ +#define _3DSTATE_BINDING_TABLE_POOL_ALLOC 0x7919 /* GEN7.5+ */ +#define BRW_HW_BINDING_TABLE_ENABLE (1 11) +#define GEN7_HW_BT_POOL_MOCS_SHIFT 7 +#define GEN7_HW_BT_POOL_MOCS_MASK INTEL_MASK(10, 7) +#define GEN8_HW_BT_POOL_MOCS_SHIFT 0 +#define GEN8_HW_BT_POOL_MOCS_MASK INTEL_MASK(6, 0) +/* Only required in HSW */ +#define HSW_BT_POOL_ALLOC_MUST_BE_ONE (3 5) + +#define _3DSTATE_BINDING_TABLE_EDIT_VS 0x7843 /* GEN7.5 */ +#define _3DSTATE_BINDING_TABLE_EDIT_GS 0x7844 /* GEN7.5 */ +#define _3DSTATE_BINDING_TABLE_EDIT_HS 0x7845 /* GEN7.5 */ +#define _3DSTATE_BINDING_TABLE_EDIT_DS 0x7846 /* GEN7.5 */ +#define _3DSTATE_BINDING_TABLE_EDIT_PS 0x7847 /* GEN7.5 */ +#define BRW_BINDING_TABLE_INDEX_SHIFT 16 +#define BRW_BINDING_TABLE_INDEX_MASKINTEL_MASK(23, 16) + +#define BRW_BINDING_TABLE_EDIT_TARGET_ALL 3 +#define BRW_BINDING_TABLE_EDIT_TARGET_CORE1 2 +#define BRW_BINDING_TABLE_EDIT_TARGET_CORE0 1 +/* In HSW, when editing binding table entries to surface state offsets, + * the surface state offset is a 16-bit value aligned to 32 bytes. But + * Surface State Pointer in dword 2 is [15:0]. Right shift surf_offset + * by 5 bits so it won't disturb bit 16 (which is used as the binding + * table index entry), otherwise it would hang the GPU. + */ +#define HSW_SURFACE_STATE_EDIT(value) (value 5) +/* Same as Haswell, but surface state offsets now aligned to 64 bytes.*/ +#define GEN8_SURFACE_STATE_EDIT(value) (value 6) + #define _3DSTATE_SAMPLER_STATE_POINTERS0x7802 /* GEN6+ */ # define PS_SAMPLER_STATE_CHANGE (1 12) # define GS_SAMPLER_STATE_CHANGE (1 9) diff --git a/src/mesa/drivers/dri/i965/intel_reg.h b/src/mesa/drivers/dri/i965/intel_reg.h index bd14e18..98adf45 100644 --- a/src/mesa/drivers/dri/i965/intel_reg.h +++ b/src/mesa/drivers/dri/i965/intel_reg.h @@ -47,6 +47,9 @@ /* Load a value from memory into a register. Only available on Gen7+. */ #define GEN7_MI_LOAD_REGISTER_MEM (CMD_MI | (0x29 23)) # define MI_LOAD_REGISTER_MEM_USE_GGTT (1 22) +/* Haswell RS control */ +#define MI_RS_CONTROL (CMD_MI | (0x6 23)) +#define MI_RS_STORE_DATA_IMM(CMD_MI | (0x2b 23)) /* Manipulate the predicate bit based on some register values. Only on Gen7+ */ #define GEN7_MI_PREDICATE (CMD_MI | (0xC 23)) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] i965: Enable resource streamer for the batchbuffer
Check first if the hardware and kernel supports resource streamer. If this is allowed, tell the kernel to enable the resource streamer enable bit on MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER execbuffer flags. v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel supports RS (Ken). - Add brw_device_info::has_resource_streamer and toggle it for Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken). v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel. v4: - Always inspect the getparam.value (Chris Wilson). Cc: kenn...@whitecape.org Cc: ch...@chris-wilson.co.uk Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_context.c | 5 + src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_device_info.c | 5 - src/mesa/drivers/dri/i965/brw_device_info.h | 1 + src/mesa/drivers/dri/i965/intel_batchbuffer.c | 8 +++- src/mesa/drivers/dri/i965/intel_screen.c | 14 ++ src/mesa/drivers/dri/i965/intel_screen.h | 5 + 7 files changed, 37 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 4b51fe5..6b08216 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -865,6 +865,11 @@ brwCreateContext(gl_api api, brw-predicate.state = BRW_PREDICATE_STATE_RENDER; + brw-use_resource_streamer = devinfo-has_resource_streamer + screen-has_resource_streamer + (brw_env_var_as_boolean(INTEL_USE_HW_BT, false) || + brw_env_var_as_boolean(INTEL_USE_GATHER, false)); + ctx-VertexProgram._MaintainTnlProgram = true; ctx-FragmentProgram._MaintainTexEnvProgram = true; diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3553f6e..10d8f1e 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1130,6 +1130,7 @@ struct brw_context bool has_pln; bool no_simd8; bool use_rep_send; + bool use_resource_streamer; /** * Some versions of Gen hardware don't do centroid interpolation correctly diff --git a/src/mesa/drivers/dri/i965/brw_device_info.c b/src/mesa/drivers/dri/i965/brw_device_info.c index 342e566..51a91b6 100644 --- a/src/mesa/drivers/dri/i965/brw_device_info.c +++ b/src/mesa/drivers/dri/i965/brw_device_info.c @@ -170,7 +170,8 @@ static const struct brw_device_info brw_device_info_byt = { #define HSW_FEATURES \ GEN7_FEATURES,\ .is_haswell = true, \ - .supports_simd16_3src = true + .supports_simd16_3src = true, \ + .has_resource_streamer = true static const struct brw_device_info brw_device_info_hsw_gt1 = { HSW_FEATURES, .gt = 1, @@ -229,6 +230,7 @@ static const struct brw_device_info brw_device_info_hsw_gt3 = { #define GEN8_FEATURES \ .gen = 8,\ .has_hiz_and_separate_stencil = true,\ + .has_resource_streamer = true, \ .must_use_separate_stencil = true, \ .has_llc = true, \ .has_pln = true, \ @@ -301,6 +303,7 @@ static const struct brw_device_info brw_device_info_chv = { #define GEN9_FEATURES \ .gen = 9,\ .has_hiz_and_separate_stencil = true,\ + .has_resource_streamer = true, \ .must_use_separate_stencil = true, \ .has_llc = true, \ .has_pln = true, \ diff --git a/src/mesa/drivers/dri/i965/brw_device_info.h b/src/mesa/drivers/dri/i965/brw_device_info.h index 7b7a1fc..2a73e93 100644 --- a/src/mesa/drivers/dri/i965/brw_device_info.h +++ b/src/mesa/drivers/dri/i965/brw_device_info.h @@ -46,6 +46,7 @@ struct brw_device_info bool has_compr4; bool has_surface_tile_offset; bool supports_simd16_3src; + bool has_resource_streamer; /** * Quirks: diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 54081a1..4e82003 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -267,6 +267,11 @@ throttle(struct brw_context *brw) } } +/* Drop when RS headers get pulled to libdrm */ +#ifndef I915_EXEC_RESOURCE_STREAMER +#define I915_EXEC_RESOURCE_STREAMER (115) +#endif + /* TODO: Push this whole function into bufmgr. */ static int @@ -293,7 +298,8 @@ do_flush_locked(struct brw_context *brw) if (brw-gen = 6 batch-ring == BLT_RING) { flags = I915_EXEC_BLT; } else { - flags = I915_EXEC_RENDER; + flags = I915_EXEC_RENDER | +
[Mesa-dev] [PATCH 5/6] i965: Upload binding tables in hw-generated binding table format.
When hardware-generated binding tables are enabled, use the hw-generated binding table format when uploading binding table state. Normally, the CS will will just consume the binding table pointer commands as pipelined state. When the RS is enabled however, the RS flushes whatever edited surface state entries of our on-chip binding table to the binding table pool before passing the command on to the CS. Note that the the binding table pointer offset is relative to the binding table pool base address when resource streamer instead of the surface state base address. v2: Fix possible buffer overflow when allocating a chunk out of the hw-binding table pool (Ken). Cc: kenn...@whitecape.org Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_binding_tables.c | 73 -- 1 file changed, 57 insertions(+), 16 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c b/src/mesa/drivers/dri/i965/brw_binding_tables.c index a0bc2f3..a438d10 100644 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c @@ -50,6 +50,26 @@ static const GLuint stage_to_bt_edit[MESA_SHADER_FRAGMENT + 1] = { _3DSTATE_BINDING_TABLE_EDIT_PS, }; +static uint32_t +reserve_hw_bt_space(struct brw_context *brw, unsigned bytes) +{ + if (brw-hw_bt_pool.next_offset + bytes = brw-hw_bt_pool.bo-size - 128) { + gen7_reset_hw_bt_pool_offsets(brw); + } + + uint32_t offset = brw-hw_bt_pool.next_offset; + + /* From the Haswell PRM, Volume 2b: Command Reference: Instructions, +* 3DSTATE_BINDING_TABLE_POINTERS_xS: +* +* If HW Binding Table is enabled, the offset is relative to the +* Binding Table Pool Base Address and the alignment is 64 bytes. +*/ + brw-hw_bt_pool.next_offset += ALIGN(bytes, 64); + + return offset; +} + /** * Upload a shader stage's binding table as indirect state. * @@ -70,30 +90,51 @@ brw_upload_binding_table(struct brw_context *brw, stage_state-bind_bo_offset = 0; } else { - /* Upload a new binding table. */ - if (INTEL_DEBUG DEBUG_SHADER_TIME) { - brw-vtbl.emit_buffer_surface_state( -brw, stage_state-surf_offset[ -prog_data-binding_table.shader_time_start], -brw-shader_time.bo, 0, BRW_SURFACEFORMAT_RAW, -brw-shader_time.bo-size, 1, true); + /* When RS is enabled use hw-binding table uploads, otherwise fallback to + * software-uploads. + */ + if (brw-use_resource_streamer) { + gen7_update_binding_table_from_array(brw, stage_state-stage, + stage_state-surf_offset, + prog_data-binding_table + .size_bytes / 4); + } else { + /* Upload a new binding table. */ + if (INTEL_DEBUG DEBUG_SHADER_TIME) { +brw-vtbl.emit_buffer_surface_state( + brw, stage_state-surf_offset[ + prog_data-binding_table.shader_time_start], + brw-shader_time.bo, 0, BRW_SURFACEFORMAT_RAW, + brw-shader_time.bo-size, 1, true); + } + + uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE, + prog_data-binding_table.size_bytes, + 32, + stage_state-bind_bo_offset); + + /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */ + memcpy(bind, stage_state-surf_offset, +prog_data-binding_table.size_bytes); } - - uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE, - prog_data-binding_table.size_bytes, 32, - stage_state-bind_bo_offset); - - /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */ - memcpy(bind, stage_state-surf_offset, - prog_data-binding_table.size_bytes); } brw-ctx.NewDriverState |= brw_new_binding_table; if (brw-gen = 7) { + + if (brw-use_resource_streamer) + stage_state-bind_bo_offset = +reserve_hw_bt_space(brw, prog_data-binding_table.size_bytes); + BEGIN_BATCH(2); OUT_BATCH(packet_name 16 | (2 - 2)); - OUT_BATCH(stage_state-bind_bo_offset); + /* Align SurfaceStateOffset[16:6] format to [15:5] PS Binding Table field + * when hw-generated binding table is enabled. + */ + OUT_BATCH(brw-use_resource_streamer ? +(stage_state-bind_bo_offset 1) : +stage_state-bind_bo_offset); ADVANCE_BATCH(); } } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] i965: Implement interface to edit binding table entries
Unlike normal software binding tables where the driver has to manually generate and fill a binding table array which are then uploaded to the hardware, the resource streamer instead presents the driver with an option to fill out slots for individual binding table indices. The hardware accumulates the state for these combined edits which it then automatically flushes to a binding table pool when the binding table pointer state command is invoked. v2: Clarify binding table edit bit aligment (Topi). v3: Make comments and function names more clearer (Ken). Reviewed-by: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_binding_tables.c | 55 ++ src/mesa/drivers/dri/i965/brw_state.h | 9 + 2 files changed, 64 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c b/src/mesa/drivers/dri/i965/brw_binding_tables.c index 6bc540f..a0bc2f3 100644 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c @@ -44,6 +44,12 @@ #include brw_state.h #include intel_batchbuffer.h +static const GLuint stage_to_bt_edit[MESA_SHADER_FRAGMENT + 1] = { + _3DSTATE_BINDING_TABLE_EDIT_VS, + _3DSTATE_BINDING_TABLE_EDIT_GS, + _3DSTATE_BINDING_TABLE_EDIT_PS, +}; + /** * Upload a shader stage's binding table as indirect state. * @@ -171,6 +177,55 @@ const struct brw_tracked_state brw_gs_binding_table = { }; /** + * Edit a single entry in a hardware-generated binding table + */ +void +gen7_edit_hw_binding_table_entry(struct brw_context *brw, + gl_shader_stage stage, + uint32_t index, + uint32_t surf_offset) +{ + assert(stage = MESA_SHADER_FRAGMENT); + + uint32_t dw2 = SET_FIELD(index, BRW_BINDING_TABLE_INDEX) | + (brw-gen = 8 ? GEN8_SURFACE_STATE_EDIT(surf_offset) : + HSW_SURFACE_STATE_EDIT(surf_offset)); + + BEGIN_BATCH(3); + OUT_BATCH(stage_to_bt_edit[stage] 16 | (3 - 2)); + OUT_BATCH(BRW_BINDING_TABLE_EDIT_TARGET_ALL); + OUT_BATCH(dw2); + ADVANCE_BATCH(); +} + +/** + * Upload a whole hardware binding table for the given stage. + * + * Takes an array of surface offsets and the number of binding table + * entries. + */ +void +gen7_update_binding_table_from_array(struct brw_context *brw, + gl_shader_stage stage, + const uint32_t* binding_table, + int num_surfaces) +{ + uint32_t dw2 = 0; + assert(stage = MESA_SHADER_FRAGMENT); + + BEGIN_BATCH(num_surfaces + 2); + OUT_BATCH(stage_to_bt_edit[stage] 16 | num_surfaces); + OUT_BATCH(BRW_BINDING_TABLE_EDIT_TARGET_ALL); + for (int i = 0; i num_surfaces; i++) { + dw2 = SET_FIELD(i, BRW_BINDING_TABLE_INDEX) | + (brw-gen = 8 ? GEN8_SURFACE_STATE_EDIT(binding_table[i]) : + HSW_SURFACE_STATE_EDIT(binding_table[i])); + OUT_BATCH(dw2); + } + ADVANCE_BATCH(); +} + +/** * Hardware-generated binding tables for the resource streamer */ void diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index f8ef98f..2eff1b5 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -374,6 +374,15 @@ gen7_upload_constant_state(struct brw_context *brw, bool active, unsigned opcode); void gen7_rs_control(struct brw_context *brw, int enable); + +void gen7_edit_hw_binding_table_entry(struct brw_context *brw, + gl_shader_stage stage, + uint32_t index, + uint32_t surf_offset); +void gen7_update_binding_table_from_array(struct brw_context *brw, + gl_shader_stage stage, + const uint32_t* binding_table, + int num_surfaces); void gen7_enable_hw_binding_tables(struct brw_context *brw); void gen7_disable_hw_binding_tables(struct brw_context *brw); void gen7_reset_hw_bt_pool_offsets(struct brw_context *brw); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] i965: Disable resource streamer in BLORP
Switch off hardware-generated binding tables and gather push constants in the blorp. Blorp requires only a minimal set of simple constants. There is no need for the extra complexity to program a gather table entry into the pipeline. Cc: kenn...@whitecape.org Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- src/mesa/drivers/dri/i965/gen7_blorp.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp index abace6d..9822dc1 100644 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp @@ -794,6 +794,8 @@ gen7_blorp_exec(struct brw_context *brw, } depthstencil_offset = gen6_blorp_emit_depth_stencil_state(brw, params); gen7_blorp_emit_depth_stencil_state_pointers(brw, depthstencil_offset); + if (brw-use_resource_streamer) + gen7_disable_hw_binding_tables(brw); if (params-use_wm_prog) { uint32_t wm_surf_offset_renderbuffer; uint32_t wm_surf_offset_texture = 0; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] i965: Enable hardware-generated binding tables on render path.
This patch implements the binding table enable command which is also used to allocate a binding table pool where where hardware-generated binding table entries are flushed into. Each binding table offset in the binding table pool is unique per each shader stage that are enabled within a batch. Also insert the required brw_tracked_state objects to enable hw-generated binding tables in normal render path. v2: - Use MOCS in binding table pool alloc for GEN8 - Fix spurious offset when allocating binding table pool entry and start from zero instead. v3 - Include GEN8 fix for spurious offset above. Cc: kenn...@whitecape.org Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com --- src/mesa/drivers/dri/i965/brw_binding_tables.c | 87 ++ src/mesa/drivers/dri/i965/brw_context.c| 4 ++ src/mesa/drivers/dri/i965/brw_context.h| 6 ++ src/mesa/drivers/dri/i965/brw_state.h | 6 ++ src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++ src/mesa/drivers/dri/i965/gen7_disable.c | 4 +- src/mesa/drivers/dri/i965/gen8_disable.c | 4 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 4 ++ 8 files changed, 115 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c b/src/mesa/drivers/dri/i965/brw_binding_tables.c index 98ff0dd..6bc540f 100644 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c @@ -170,6 +170,93 @@ const struct brw_tracked_state brw_gs_binding_table = { .emit = brw_gs_upload_binding_table, }; +/** + * Hardware-generated binding tables for the resource streamer + */ +void +gen7_disable_hw_binding_tables(struct brw_context *brw) +{ + int pkt_len = brw-gen = 8 ? 4 : 3; + + BEGIN_BATCH(3); + OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC 16 | (pkt_len - 2)); + OUT_BATCH(brw-is_haswell ? HSW_BT_POOL_ALLOC_MUST_BE_ONE : 0); + OUT_BATCH(0); + ADVANCE_BATCH(); + + /* From the Haswell PRM, Volume 7: 3D Media GPGPU, +* 3DSTATE_BINDING_TABLE_POOL_ALLOC Programming Note: +* +* When switching between HW and SW binding table generation, SW must +* issue a state cache invalidate. +*/ + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE); +} + +void +gen7_enable_hw_binding_tables(struct brw_context *brw) +{ + if (!brw-use_resource_streamer) + return; + + if (!brw-hw_bt_pool.bo) { + /* We use a single re-usable buffer object for the lifetime of the + * context and size it to maximum allowed binding tables that can be + * programmed per batch: + * + * From the Haswell PRM, Volume 7: 3D Media GPGPU, + * 3DSTATE_BINDING_TABLE_POOL_ALLOC Programming Note: + * A maximum of 16,383 Binding tables are allowed in any batch buffer + */ + static const int max_size = 16383 * 4; + brw-hw_bt_pool.bo = drm_intel_bo_alloc(brw-bufmgr, hw_bt, + max_size, 64); + brw-hw_bt_pool.next_offset = 0; + } + + int pkt_len = brw-gen = 8 ? 4 : 3; + uint32_t dw1 = BRW_HW_BINDING_TABLE_ENABLE; + if (brw-is_haswell) + dw1 |= SET_FIELD(GEN7_MOCS_L3, GEN7_HW_BT_POOL_MOCS) | + HSW_BT_POOL_ALLOC_MUST_BE_ONE; + else if (brw-gen = 8) + dw1 |= BDW_MOCS_WB; + + BEGIN_BATCH(3); + OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC 16 | (pkt_len - 2)); + if (brw-gen = 8) { + OUT_RELOC64(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1); + OUT_BATCH(brw-hw_bt_pool.bo-size); + } else { + OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1); + OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, + brw-hw_bt_pool.bo-size); + } + ADVANCE_BATCH(); + + /* From the Haswell PRM, Volume 7: 3D Media GPGPU, +* 3DSTATE_BINDING_TABLE_POOL_ALLOC Programming Note: +* +* When switching between HW and SW binding table generation, SW must +* issue a state cache invalidate. +*/ + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE); +} + +void +gen7_reset_hw_bt_pool_offsets(struct brw_context *brw) +{ + brw-hw_bt_pool.next_offset = 0; +} + +const struct brw_tracked_state gen7_hw_binding_tables = { + .dirty = { + .mesa = 0, + .brw = BRW_NEW_BATCH, + }, + .emit = gen7_enable_hw_binding_tables +}; + /** @} */ /** diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 6b08216..f2bd2d3 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -936,6 +936,10 @@ intelDestroyContext(__DRIcontext * driContextPriv) if (brw-wm.base.scratch_bo) drm_intel_bo_unreference(brw-wm.base.scratch_bo); + gen7_reset_hw_bt_pool_offsets(brw); + drm_intel_bo_unreference(brw-hw_bt_pool.bo); + brw-hw_bt_pool.bo = NULL; + drm_intel_gem_context_destroy(brw-hw_ctx); if (ctx-swrast_context) { diff --git
[Mesa-dev] i965: hardware-generated binding tables
Time for another rebase, yet again. My previous patches were getting stale... Changes since last posting: - Always inspect the getparam.value in the I915_PARAM_HAS_RESOURCE_STREAMER ioctl as suggested by Chris Wilson. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa : NULL check InfoLog
-Original Message- From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of Matt Turner Sent: Tuesday, June 23, 2015 11:05 PM To: Marta Lofstedt Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH] mesa : NULL check InfoLog On Tue, Jun 23, 2015 at 4:03 AM, Marta Lofstedt marta.lofst...@linux.intel.com wrote: From: Marta Lofstedt marta.lofst...@intel.com When a program is compiled, but linking failed the sh-InfoLog could be NULL. This is expoloited by OpenGL ES 3.1 conformance tests. Signed-off-by: Marta Lofstedt marta.lofst...@intel.com --- src/mesa/main/shaderapi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index a4296ad..c783c69 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -1920,8 +1920,8 @@ _mesa_create_shader_program(struct gl_context* ctx, GLboolean separate, } #endif } - -ralloc_strcat(shProg-InfoLog, sh-InfoLog); + if (sh-InfoLog) + ralloc_strcat(shProg-InfoLog, sh-InfoLog); Wrong indentation on this line. There is a strange mix of tabs and space indentation in this file. I hope it is Correct of me to use space indentation in my V2. Surely just not writing to the info log isn't the right fix? If it's null, shouldn't we instead ralloc_strdup() the string? I ralloc_strdup the shProg-InfoLog in my V2. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- Intel Sweden AB Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden Registration Number: 556189-6027 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] gallium/hud: fix issue w/ tgsi_to_nir
This documentation was only written for TEMPs. I agree that TEMP indexing doesn't necessarily need arrays to work, but INPUT and OUTPUT indexing always needs them. Marek On Thu, Jun 25, 2015 at 1:31 AM, Rob Clark robdcl...@gmail.com wrote: tgsi.rst currently says: If no ArrayID is specified with an indirect addressing operand the whole register file might be accessed by this operand. This is strongly discouraged and will prevent packing of scalar/vec2 arrays and effective alias analysis. I'd be pretty happy to remove it and replace it w/ something to the effect of indirect addressing where no ArrayID is specified is undefined :-) BR, -R On Wed, Jun 24, 2015 at 6:17 PM, Marek Olšák mar...@gmail.com wrote: Indirect addressing without the ArrayID is undefined in the general case. You need the ArrayID to be able to tell where the array declaration starts and what its semantic name is. Marek On Wed, Jun 24, 2015 at 8:35 PM, Rob Clark robdcl...@gmail.com wrote: On Wed, Jun 24, 2015 at 12:31 PM, Marek Olšák mar...@gmail.com wrote: Yes, ArrayID != 0 means it's an array, otherwise it's just a compact way to add declarations. ok.. hmm, I guess tgsi.rst needs an update then.. I recently added the array support for inputs and outputs, we just need to enable it on non-radeon non-swrast drivers: http://cgit.freedesktop.org/mesa/mesa/commit/?id=b6ebe7eabf54936a02acc0968e718e0c264a73f5 ok, in principle (after fixing ttn based on the assumption that we won't get indirect access if ArrayID==0, and now that tgsi f/e is dropped) freedreno should be ok for array in/out's.. I'll enable the cap and see what happens after fixing ttn and debugging some indirect issues (seems either there is something I don't understand about the hw yet, or maybe an a4xx bug.. right now shaders that I think should work aren't and I'm in android hell trying to get some more traces from blob :-/) BR, -R It would be nice to enable the arrays on all drivers instead of working around it indefinitely. Marek On Wed, Jun 24, 2015 at 1:31 PM, Rob Clark robdcl...@gmail.com wrote: Ok, I *thought* we didn't get ArrayID on IN/OUT, but only TEMP? If it is safe to assume that we always get ArrayID that makes it much easier. BR, -R On Wed, Jun 24, 2015 at 5:39 AM, Marek Olšák mar...@gmail.com wrote: It's not an array, because the ArrayID is 0. It's a valid non-array declaration. If any TGSI user doesn't understand it, that user should be fixed. Marek On Tue, Jun 23, 2015 at 3:20 PM, Rob Clark robdcl...@gmail.com wrote: From: Rob Clark robcl...@freedesktop.org Ok, so actually there is a ttn issue here to fix as well.. but it brought up a question in my mind. When ttn sees something like DCL IN[0..1] it will treat that as an array (which in the end will result in constraints about where the registers get allocated. Which is not really ideal. With glsl we don't actually get input arrays (but instead a bunch of MOV's to a TEMP array) currently. So I'm not quite sure how an actual input array should look. (But my preference would be IN[a..b] for arrays and IN[c] otherwise) --- src/gallium/auxiliary/hud/hud_context.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 6a124f7..2b6d3a7 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -1163,7 +1163,8 @@ hud_create(struct pipe_context *pipe, struct cso_context *cso) { static const char *vertex_shader_text = { VERT\n - DCL IN[0..1]\n + DCL IN[0]\n + DCL IN[1]\n DCL OUT[0], POSITION\n DCL OUT[1], COLOR[0]\n /* color */ DCL OUT[2], GENERIC[0]\n /* texcoord */ -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations
Hi, On 06/25/2015 02:48 AM, Davin McCall wrote: In terms of performance: (export LIBGL_ALWAYS_SOFTWARE=1; time glmark2) For Intel driver, INTEL_NO_HW=1 could be used. (Do other drivers have something similar?) -fno-strict-aliasing: glmark2 Score: 244 real5m34.707s user11m36.192s sys0m29.596s with strict aliasing: glmark2 Score: 247 real5m34.438s user11m29.904s sys0m29.556s Again, only a very small improvement when strict aliasing is enabled. With the above in mind it is reasonable to question whether this patch is worthwhile. IMHO to evaluate that, we would need data for more drivers, not just for SW rendering. However, it's done, and it seems to work, so I offer it for review. - Eero ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/hud: prevent NULL pointer dereference with pipe_query functions
What's the point of drawing a HUD pane if a query cannot be created? Can we detect this during initialization? Marek On Wed, Jun 24, 2015 at 9:26 PM, Samuel Pitoiset samuel.pitoi...@gmail.com wrote: The HUD doesn't check if query_create() fails and it calls other pipe_query functions with NULL pointer instead of a valid query object. Signed-off-by: Samuel Pitoiset samuel.pitoi...@gmail.com --- src/gallium/auxiliary/hud/hud_driver_query.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c b/src/gallium/auxiliary/hud/hud_driver_query.c index 603aba7..ee71678 100644 --- a/src/gallium/auxiliary/hud/hud_driver_query.c +++ b/src/gallium/auxiliary/hud/hud_driver_query.c @@ -62,7 +62,8 @@ query_new_value(struct hud_graph *gr) uint64_t now = os_time_get(); if (info-last_time) { - pipe-end_query(pipe, info-query[info-head]); + if (info-query[info-head]) + pipe-end_query(pipe, info-query[info-head]); /* read query results */ while (1) { @@ -70,7 +71,7 @@ query_new_value(struct hud_graph *gr) union pipe_query_result result; uint64_t *res64 = (uint64_t *)result; - if (pipe-get_query_result(pipe, query, FALSE, result)) { + if (query pipe-get_query_result(pipe, query, FALSE, result)) { info-results_cumulative += res64[info-result_index]; info-num_results++; @@ -88,7 +89,8 @@ query_new_value(struct hud_graph *gr) gallium_hud: all queries are busy after %i frames, can't add another query\n, NUM_QUERIES); - pipe-destroy_query(pipe, info-query[info-head]); + if (info-query[info-head]) + pipe-destroy_query(pipe, info-query[info-head]); info-query[info-head] = pipe-create_query(pipe, info-query_type, 0); } @@ -113,15 +115,15 @@ query_new_value(struct hud_graph *gr) info-results_cumulative = 0; info-num_results = 0; } - - pipe-begin_query(pipe, info-query[info-head]); } else { /* initialize */ info-last_time = now; info-query[info-head] = pipe-create_query(pipe, info-query_type, 0); - pipe-begin_query(pipe, info-query[info-head]); } + + if (info-query[info-head]) + pipe-begin_query(pipe, info-query[info-head]); } static void -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations
On 25/06/15 14:32, Eero Tamminen wrote: Hi, On 06/25/2015 03:53 PM, Davin McCall wrote: On 25/06/15 12:27, Eero Tamminen wrote: On 06/25/2015 02:48 AM, Davin McCall wrote: In terms of performance: (export LIBGL_ALWAYS_SOFTWARE=1; time glmark2) For Intel driver, INTEL_NO_HW=1 could be used. (Do other drivers have something similar?) Unfortunately I do not have an Intel display set up. If you can get libINTEL_DEVID_OVERRIDEdrm to use libdrm_intel, you can fake desired InteL HW to Mesa with INTEL_DEVID_OVERRIDE environment variable. Similarly to INTEL_NO_HW, it prevents batches from being submitted to GPU. Ok, thanks, I'll look into this shortly. Any pointers on how to get libdrm to use libdrm_intel? When testing 3D driver CPU side optimizations, one should either use test specifically written for testing driver CPU overhead (proprietary benchmarks have such tests) or force test-case to be CPU bound e.g. with INTEL_NO_HW. Understood. The 'user' time divided by the glmark2 score should however give a relative indication of the CPU processing required per frame, right? Davin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: remove unnecessary checks in _mesa_readpixels_needs_slow_path
Gallium should be alright. We'll let you know if we find a regression, but I don't think there will be any. Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Thu, Jun 25, 2015 at 2:45 AM, Anuj Phogat anuj.pho...@gmail.com wrote: On Tue, Jun 23, 2015 at 3:34 AM, Iago Toral Quiroga ito...@igalia.com wrote: readpixels_can_use_memcpy will later call _mesa_format_matches_format_and_type which does much tighter checks than these to decide if we can use memcpy for readpixels. Also, the checks do not seem to be extensive enough anyway, since we are checking for signed/unsigned conversion only when the framebuffer has integers, but the same checks could be done for other types anyway, since as long as there is a signed/unsigned conversion we can't memcpy. No regressions observed on i965/llvmpipe. --- The way gallium uses _mesa_format_matches_format_and_type and _mesa_readpixels_needs_slow_path is a bit different, they call these functions to decide if they want to fallback to Mesa's implementation of ReadPixels, not exactly to check if we can memcpy... so it is unclear to me if some combinations may still need these checks to make the right decision. I did not see any regressions with llvmpipe at least, so I am assuming that they are not needed, but maybe someone wants to give this patch a test run on nouveau or radeon, just in case. If they are needed I guess the right thing would be to move the checks to st_cb_readpixels.c. src/mesa/main/readpix.c | 16 1 file changed, 16 deletions(-) diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c index a3357cd..e256695 100644 --- a/src/mesa/main/readpix.c +++ b/src/mesa/main/readpix.c @@ -128,7 +128,6 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, { struct gl_renderbuffer *rb = _mesa_get_read_renderbuffer_for_format(ctx, format); - GLenum srcType; assert(rb); @@ -153,21 +152,6 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, return GL_TRUE; } - /* Conversion between signed and unsigned integers needs masking - * (it isn't just memcpy). */ - srcType = _mesa_get_format_datatype(rb-Format); - - if ((srcType == GL_INT - (type == GL_UNSIGNED_INT || -type == GL_UNSIGNED_SHORT || -type == GL_UNSIGNED_BYTE)) || - (srcType == GL_UNSIGNED_INT - (type == GL_INT || -type == GL_SHORT || -type == GL_BYTE))) { - return GL_TRUE; - } - /* And finally, see if there are any transfer ops. */ return get_readpixels_transfer_ops(ctx, rb-Format, format, type, uses_blit) != 0; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Thanks for the patch Iago. Reviewed-by: Anuj Phogat anuj.pho...@gmail.com You might want to wait few days to hear any comments on the gallium usage of the function. If no one objects you can push it upstream. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome
https://bugs.freedesktop.org/show_bug.cgi?id=90264 --- Comment #21 from Jose P. lbdkm...@sharklasers.com --- This seems to have been fixed, I can't see the corruption anymore. I'm using Chromium 43.0.2357.81 on Ubuntu 14.04 (64-bit) and Mesa 10.7.0-devel (git-20dca37 2015-06-23 trusty-oibaf-ppa). -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: use $target_cpu, not $host_cpu when setting asm_arch
I've got 32-bit libs building on 64-bit Ubuntu and Fedora. But I've found a weird problem. On Fedora 22, for example, Mesa's make install creates a /usr/lib/libGL.la file which contains the line: dependency_libs=' -L/usr/lib -lexpat -L/usr/lib64 /usr/lib64/libglapi.la -lXext -lXdamage -lXfixes -lX11-xcb -lX11 -lxcb-glx -lxcb-dri2 -lxcb -lXxf86vm /usr/lib/libdrm.la -ludev -lm -lpthread -ldl' Note the -L/usr/lib/64 and /usr/lib64/libglapi.la parts. When I try to build 32-bit Mesa demos, libtools finds this libGL.la file and tries to link the 32-bit glxgears.o (for example) with /usr/lib64/libglapi.so. That fails, of course. Removing the /usr/lib/libGL.la file is a work-around but I'm interested in finding a better solution. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] gallium/hud: fix issue w/ tgsi_to_nir
hmm, well I still think tgsi.rst should get some clarification, if for no other reason than I misunderstood it (and therefore sooner or later I guess someone else will too).. I can take a shot at a doc patch.. BR, -R On Thu, Jun 25, 2015 at 8:00 AM, Marek Olšák mar...@gmail.com wrote: This documentation was only written for TEMPs. I agree that TEMP indexing doesn't necessarily need arrays to work, but INPUT and OUTPUT indexing always needs them. Marek On Thu, Jun 25, 2015 at 1:31 AM, Rob Clark robdcl...@gmail.com wrote: tgsi.rst currently says: If no ArrayID is specified with an indirect addressing operand the whole register file might be accessed by this operand. This is strongly discouraged and will prevent packing of scalar/vec2 arrays and effective alias analysis. I'd be pretty happy to remove it and replace it w/ something to the effect of indirect addressing where no ArrayID is specified is undefined :-) BR, -R On Wed, Jun 24, 2015 at 6:17 PM, Marek Olšák mar...@gmail.com wrote: Indirect addressing without the ArrayID is undefined in the general case. You need the ArrayID to be able to tell where the array declaration starts and what its semantic name is. Marek On Wed, Jun 24, 2015 at 8:35 PM, Rob Clark robdcl...@gmail.com wrote: On Wed, Jun 24, 2015 at 12:31 PM, Marek Olšák mar...@gmail.com wrote: Yes, ArrayID != 0 means it's an array, otherwise it's just a compact way to add declarations. ok.. hmm, I guess tgsi.rst needs an update then.. I recently added the array support for inputs and outputs, we just need to enable it on non-radeon non-swrast drivers: http://cgit.freedesktop.org/mesa/mesa/commit/?id=b6ebe7eabf54936a02acc0968e718e0c264a73f5 ok, in principle (after fixing ttn based on the assumption that we won't get indirect access if ArrayID==0, and now that tgsi f/e is dropped) freedreno should be ok for array in/out's.. I'll enable the cap and see what happens after fixing ttn and debugging some indirect issues (seems either there is something I don't understand about the hw yet, or maybe an a4xx bug.. right now shaders that I think should work aren't and I'm in android hell trying to get some more traces from blob :-/) BR, -R It would be nice to enable the arrays on all drivers instead of working around it indefinitely. Marek On Wed, Jun 24, 2015 at 1:31 PM, Rob Clark robdcl...@gmail.com wrote: Ok, I *thought* we didn't get ArrayID on IN/OUT, but only TEMP? If it is safe to assume that we always get ArrayID that makes it much easier. BR, -R On Wed, Jun 24, 2015 at 5:39 AM, Marek Olšák mar...@gmail.com wrote: It's not an array, because the ArrayID is 0. It's a valid non-array declaration. If any TGSI user doesn't understand it, that user should be fixed. Marek On Tue, Jun 23, 2015 at 3:20 PM, Rob Clark robdcl...@gmail.com wrote: From: Rob Clark robcl...@freedesktop.org Ok, so actually there is a ttn issue here to fix as well.. but it brought up a question in my mind. When ttn sees something like DCL IN[0..1] it will treat that as an array (which in the end will result in constraints about where the registers get allocated. Which is not really ideal. With glsl we don't actually get input arrays (but instead a bunch of MOV's to a TEMP array) currently. So I'm not quite sure how an actual input array should look. (But my preference would be IN[a..b] for arrays and IN[c] otherwise) --- src/gallium/auxiliary/hud/hud_context.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 6a124f7..2b6d3a7 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -1163,7 +1163,8 @@ hud_create(struct pipe_context *pipe, struct cso_context *cso) { static const char *vertex_shader_text = { VERT\n - DCL IN[0..1]\n + DCL IN[0]\n + DCL IN[1]\n DCL OUT[0], POSITION\n DCL OUT[1], COLOR[0]\n /* color */ DCL OUT[2], GENERIC[0]\n /* texcoord */ -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations
Hi Eero, On 25/06/15 12:27, Eero Tamminen wrote: Hi, On 06/25/2015 02:48 AM, Davin McCall wrote: In terms of performance: (export LIBGL_ALWAYS_SOFTWARE=1; time glmark2) For Intel driver, INTEL_NO_HW=1 could be used. (Do other drivers have something similar?) Unfortunately I do not have an Intel display set up. What exactly would INTEL_NO_HW=1 do? Again, only a very small improvement when strict aliasing is enabled. With the above in mind it is reasonable to question whether this patch is worthwhile. IMHO to evaluate that, we would need data for more drivers, not just for SW rendering. Given that we are talking about using a compiler option when compiling Mesa itself, I do not imagine that we would see a more significant change in performance when using HW assisted rendering. The code that runs on the GPU is the same in either case. Using the R600 driver (and Radeon 6770 hardware) I get: With -fno-strict-aliasing: glmark2 Score: 1614 real5m31.729s user0m43.828s sys0m20.892s With strict aliasing: glmark2 Score: 1626 real5m31.725s user0m43.496s sys0m20.924s This is all on x86, 32-bit. It's possible that more of an improvement would be seen on less register-starved architectures (including Intel x86-64), since more register spills/loads can potentially be avoided. I don't have the capability to test that right now. It's entirely possible that compiling without -fno-strict-aliasing on x86-64 will not (at present, even with this patch applied) produce a working library. Davin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Delete linked GLSL IR when using NIR.
Huh I see this went in already, I've noticed a problem and thought to share it. Currently program resource list (used by gl api shader queries) is generated in linker, before backend LinkShader call. What this means is that it relies on frontend optimization passes to get rid of dead inputs and outputs. But .. this does not seem to always happen, sometimes these get removed only during backend optimization passes. I have a bug on this as #90925. There's 2 possibilities to move with this, either move resource list creation to happen after LinkShader (which would mean we should not free IR during LinkShader) or try to fix frontend dead code removal to recognize the case in the bug. I will keep digging why the variable in question is not recognized by the frontend passes, just wanted to let you know! On 06/11/2015 07:40 PM, Jason Ekstrand wrote: On Thu, Jun 11, 2015 at 12:41 AM, Tapani Pälli tapani.pa...@intel.com wrote: This is based on Kenneth's patch to delete 'most of the IR'. Due to linker changes to clone variables, we can now free all of IR. Saves 58MB of memory when replaying a Dota 2 trace on Broadwell. I think we've saved ~50 MB 3 times now on that one dota trace. Good work guys! Signed-off-by: Tapani Pälli tapani.pa...@intel.com --- src/mesa/drivers/dri/i965/brw_shader.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 76285f2..99de1cd 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -297,8 +297,11 @@ brw_link_shader(struct gl_context *ctx, struct gl_shader_program *shProg) brw_add_texrect_params(prog); - if (options-NirOptions) + if (options-NirOptions) { prog-nir = brw_create_nir(brw, shProg, prog, (gl_shader_stage) stage); + ralloc_free(shader-ir); + shader-ir = NULL; + } _mesa_reference_program(ctx, prog, NULL); } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] mesa : NULL check InfoLog
From: Marta Lofstedt marta.lofst...@intel.com When a program is compiled, but linking failed the sh-InfoLog could be NULL. This is expoloited by OpenGL ES 3.1 conformance tests. V2: ralloc_strdup shProg-InfoLog Signed-off-by: Marta Lofstedt marta.lofst...@intel.com --- src/mesa/main/shaderapi.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index a4296ad..bc6625a 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -1921,7 +1921,10 @@ _mesa_create_shader_program(struct gl_context* ctx, GLboolean separate, #endif } -ralloc_strcat(shProg-InfoLog, sh-InfoLog); + if (sh-InfoLog) +ralloc_strcat(shProg-InfoLog, sh-InfoLog); + else +ralloc_strdup(ctx, shProg-InfoLog); } delete_shader(ctx, shader); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] i965: Stop aux data compare preventing program binary re-use
Items in the program cache consist of three things: key, the data representing the instructions and auxiliary data representing uniform storage. The data consisting of instructions is stored into a drm buffer object while the key and the auxiliary data reside in malloced section. Now the cache uploading is equipped with a check that iterates over existing items and seeks to find a another item using identical instruction data than the one being just uploaded. If such is found there is no need to add another section into the drm buffer object holding identical copy of the existing one. The item just being uploaded should instead simply point to the same offset in the underlying drm buffer object. Unfortunately the check for the matching instruction data is coupled with a check for matching auxiliary data also. This effectively prevents the cache from ever containing two items that could share a section in the drm buffer object. The constraint for the instruction data and auxiliary data to match is, fortunately, unnecessary strong. When items are stored into the cache they will anyway contain their own copy of the auxiliary data (even if they matched - which they in real world never will). The only thing the items would be sharing is the instruction data and hence we should only check for that to match and nothing else. No piglit regression in jenkins. CC: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_state_cache.c | 52 +++-- 1 file changed, 20 insertions(+), 32 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index d42b4b4..a0ec280 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -200,36 +200,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t new_size) } /** - * Attempts to find an item in the cache with identical data and aux - * data to use + * Attempts to find an item in the cache with identical data. */ -static bool -brw_try_upload_using_copy(struct brw_cache *cache, - struct brw_cache_item *result_item, - const void *data, - const void *aux) +static const struct brw_cache_item * +brw_lookup_prog(const struct brw_cache *cache, +enum brw_cache_id cache_id, +const void *data, unsigned data_size) { - struct brw_context *brw = cache-brw; + const struct brw_context *brw = cache-brw; int i; - struct brw_cache_item *item; + const struct brw_cache_item *item; for (i = 0; i cache-size; i++) { for (item = cache-items[i]; item; item = item-next) { -const void *item_aux = item-key + item-key_size; int ret; -if (item-cache_id != result_item-cache_id || -item-size != result_item-size || -item-aux_size != result_item-aux_size) { - continue; -} - - if (cache-aux_compare[result_item-cache_id]) { -if (!cache-aux_compare[result_item-cache_id](item_aux, aux)) - continue; - } else if (memcmp(item_aux, aux, item-aux_size) != 0) { +if (item-cache_id != cache_id || item-size != data_size) continue; -} if (!brw-has_llc) drm_intel_bo_map(cache-bo, false); @@ -239,13 +226,11 @@ brw_try_upload_using_copy(struct brw_cache *cache, if (ret) continue; -result_item-offset = item-offset; - -return true; +return item; } } - return false; + return NULL; } static uint32_t @@ -294,6 +279,8 @@ brw_upload_cache(struct brw_cache *cache, { struct brw_context *brw = cache-brw; struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item); + const struct brw_cache_item *matching_data = + brw_lookup_prog(cache, cache_id, data, data_size); GLuint hash; void *tmp; @@ -305,14 +292,15 @@ brw_upload_cache(struct brw_cache *cache, hash = hash_key(item); item-hash = hash; - /* If we can find a matching prog/prog_data combo in the cache -* already, then reuse the existing stuff. This will mean not -* flagging CACHE_NEW_* when transitioning between the two -* equivalent hash keys. This is notably useful for programs -* generating shaders at runtime, where multiple shaders may -* compile to the thing in our backend. + /* If we can find a matching prog in the cache already, then reuse the +* existing stuff without creating new copy into the underlying buffer +* object. This is notably useful for programs generating shaders at +* runtime, where multiple shaders may compile to the same thing in our +* backend. */ - if (!brw_try_upload_using_copy(cache, item, data, aux)) { + if (matching_data) { + item-offset = matching_data-offset; + } else {
[Mesa-dev] [PATCH 2/3] i965: Only write program to cache when it doesn't exist yet
Current logic re-writes the same data when existing data is found. Not that this actually matters at the moment in practice, the contraint for finding matching data is too severe to ever allow data to be shared between two items in the cache. CC: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_state_cache.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index 97a41b9..d42b4b4 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -314,6 +314,13 @@ brw_upload_cache(struct brw_cache *cache, */ if (!brw_try_upload_using_copy(cache, item, data, aux)) { item-offset = brw_alloc_item_data(cache, data_size); + + /* Copy data to the buffer */ + if (brw-has_llc) { + memcpy((char *)cache-bo-virtual + item-offset, data, data_size); + } else { + drm_intel_bo_subdata(cache-bo, item-offset, data_size, data); + } } /* Set up the memory containing the key and aux_data */ @@ -332,13 +339,6 @@ brw_upload_cache(struct brw_cache *cache, cache-items[hash] = item; cache-n_items++; - /* Copy data to the buffer */ - if (brw-has_llc) { - memcpy((char *) cache-bo-virtual + item-offset, data, data_size); - } else { - drm_intel_bo_subdata(cache-bo, item-offset, data_size, data); - } - *out_offset = item-offset; *(void **)out_aux = (void *)((char *)item-key + item-key_size); cache-brw-ctx.NewDriverState |= 1 cache_id; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965: Rename brw_upload_item_data to brw_alloc_item_data
and simplify the interface to take directly the size and to return the offset. The routine does nothing more than allocate, it doesn't upload anything. CC: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_state_cache.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index 157b33d..97a41b9 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -248,18 +248,17 @@ brw_try_upload_using_copy(struct brw_cache *cache, return false; } -static void -brw_upload_item_data(struct brw_cache *cache, -struct brw_cache_item *item, -const void *data) +static uint32_t +brw_alloc_item_data(struct brw_cache *cache, uint32_t size) { + uint32_t offset; struct brw_context *brw = cache-brw; /* Allocate space in the cache BO for our new program. */ - if (cache-next_offset + item-size cache-bo-size) { + if (cache-next_offset + size cache-bo-size) { uint32_t new_size = cache-bo-size * 2; - while (cache-next_offset + item-size new_size) + while (cache-next_offset + size new_size) new_size *= 2; brw_cache_new_bo(cache, new_size); @@ -273,10 +272,12 @@ brw_upload_item_data(struct brw_cache *cache, brw_cache_new_bo(cache, cache-bo-size); } - item-offset = cache-next_offset; + offset = cache-next_offset; /* Programs are always 64-byte aligned, so set up the next one now */ - cache-next_offset = ALIGN(item-offset + item-size, 64); + cache-next_offset = ALIGN(offset + size, 64); + + return offset; } void @@ -312,7 +313,7 @@ brw_upload_cache(struct brw_cache *cache, * compile to the thing in our backend. */ if (!brw_try_upload_using_copy(cache, item, data, aux)) { - brw_upload_item_data(cache, item, data); + item-offset = brw_alloc_item_data(cache, data_size); } /* Set up the memory containing the key and aux_data */ -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome
https://bugs.freedesktop.org/show_bug.cgi?id=90264 Boyan Ding stu_...@126.com changed: What|Removed |Added CC||stu_...@126.com --- Comment #22 from Boyan Ding stu_...@126.com --- This bug also occur on my computer with Intel graphics. It seems that it only occur with DRI3 enabled, tested with both xf86-video-intel and xf86-video-modesetting drivers. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] glsls: Modify exec_list to avoid strict-aliasing violations
Hi, On 06/25/2015 03:53 PM, Davin McCall wrote: On 25/06/15 12:27, Eero Tamminen wrote: On 06/25/2015 02:48 AM, Davin McCall wrote: In terms of performance: (export LIBGL_ALWAYS_SOFTWARE=1; time glmark2) For Intel driver, INTEL_NO_HW=1 could be used. (Do other drivers have something similar?) Unfortunately I do not have an Intel display set up. If you can get libINTEL_DEVID_OVERRIDEdrm to use libdrm_intel, you can fake desired InteL HW to Mesa with INTEL_DEVID_OVERRIDE environment variable. Similarly to INTEL_NO_HW, it prevents batches from being submitted to GPU. What exactly would INTEL_NO_HW=1 do? http://www.mesa3d.org/envvars.html INTEL_NO_HW - if set to 1, prevents batches from being submitted to the hardware. I.e. driver does only CPU / user-space side things and skips GPU / kernel side work. Again, only a very small improvement when strict aliasing is enabled. With the above in mind it is reasonable to question whether this patch is worthwhile. IMHO to evaluate that, we would need data for more drivers, not just for SW rendering. Given that we are talking about using a compiler option when compiling Mesa itself, I do not imagine that we would see a more significant change in performance when using HW assisted rendering. The code that runs on the GPU is the same in either case. When testing 3D driver CPU side optimizations, one should either use test specifically written for testing driver CPU overhead (proprietary benchmarks have such tests) or force test-case to be CPU bound e.g. with INTEL_NO_HW. Using the R600 driver (and Radeon 6770 hardware) I get: With -fno-strict-aliasing: glmark2 Score: 1614 real5m31.729s user0m43.828s sys0m20.892s With strict aliasing: glmark2 Score: 1626 real5m31.725s user0m43.496s sys0m20.924s This is all on x86, 32-bit. It's possible that more of an improvement would be seen on less register-starved architectures (including Intel x86-64), since more register spills/loads can potentially be avoided. I don't have the capability to test that right now. It's entirely possible that compiling without -fno-strict-aliasing on x86-64 will not (at present, even with this patch applied) produce a working library. - Eero ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 02/19] i965/fs: Actually set/use the mlen for gen7 uniform pull constant loads
Previously, we were allocating the payload with different sizes per gen and then figuring out the mlen in the generator based on gen. This meant, among other things, that the higher level passes knew nothing about it. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 19 --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 9 +++-- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 3ec8e6a..31dfb24 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2909,14 +2909,18 @@ fs_visitor::lower_uniform_pull_constant_loads() assert(const_offset_reg.file == IMM const_offset_reg.type == BRW_REGISTER_TYPE_UD); const_offset_reg.fixed_hw_reg.dw1.ud /= 4; - fs_reg payload = fs_reg(GRF, alloc.allocate(1)); - /* We have to use a message header on Skylake to get SIMD4x2 mode. - * Reserve space for the register. - */ + fs_reg payload, offset; if (devinfo-gen = 9) { -payload.reg_offset++; -alloc.sizes[payload.reg] = 2; +/* We have to use a message header on Skylake to get SIMD4x2 + * mode. Reserve space for the register. +*/ +offset = payload = fs_reg(GRF, alloc.allocate(2)); +offset.reg_offset++; +inst-mlen = 2; + } else { +offset = payload = fs_reg(GRF, alloc.allocate(1)); +inst-mlen = 1; } /* This is actually going to be a MOV, but since only the first dword @@ -2925,7 +2929,7 @@ fs_visitor::lower_uniform_pull_constant_loads() * by live variable analysis, or register allocation will explode. */ fs_inst *setup = new(mem_ctx) fs_inst(FS_OPCODE_SET_SIMD4X2_OFFSET, - 8, payload, const_offset_reg); + 8, offset, const_offset_reg); setup-force_writemask_all = true; setup-ir = inst-ir; @@ -2938,6 +2942,7 @@ fs_visitor::lower_uniform_pull_constant_loads() */ inst-opcode = FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7; inst-src[1] = payload; + inst-base_mrf = -1; invalidate_live_intervals(); } else { diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 2ed0bac..8d821ab 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1054,7 +1054,6 @@ fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst, struct brw_reg index, struct brw_reg offset) { - assert(inst-mlen == 0); assert(index.type == BRW_REGISTER_TYPE_UD); assert(offset.file == BRW_GENERAL_REGISTER_FILE); @@ -1069,12 +1068,10 @@ fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst, struct brw_reg src = offset; bool header_present = false; - int mlen = 1; if (devinfo-gen = 9) { /* Skylake requires a message header in order to use SIMD4x2 mode. */ - src = retype(brw_vec4_grf(offset.nr - 1, 0), BRW_REGISTER_TYPE_UD); - mlen = 2; + src = retype(brw_vec4_grf(offset.nr, 0), BRW_REGISTER_TYPE_UD); header_present = true; brw_push_insn_state(p); @@ -1105,7 +1102,7 @@ fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst, 0, /* LD message ignores sampler unit */ GEN5_SAMPLER_MESSAGE_SAMPLE_LD, 1, /* rlen */ - mlen, + inst-mlen, header_present, BRW_SAMPLER_SIMD_MODE_SIMD4X2, 0); @@ -1135,7 +1132,7 @@ fs_generator::generate_uniform_pull_constant_load_gen7(fs_inst *inst, 0, /* LD message ignores sampler unit */ GEN5_SAMPLER_MESSAGE_SAMPLE_LD, 1, /* rlen */ - mlen, + inst-mlen, header_present, BRW_SAMPLER_SIMD_MODE_SIMD4X2, 0); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 05/19] i965/fs: Explicitly set the exec_size on the add(32) in interpolation setup
Soon we will start using the builder to explicitly set all the execution sizes. We could make a 32-wide builder, but the builder asserts that we never grow it which is usually a reasonable assumption. Sinc this one instruction is a bit of an odd-ball, we just set the exec_size explicitly. Reviewed-by: Iago Toral Quiroga ito...@igalia.com v2: Explicitly new the fs_inst instead of using the builder and setting exec_size after the fact. v3: Set force_writemask_all with the builder instead of directly. The builder over-writes it if we set it manually. Also, if we don't have force_writemask_all in the builder it will assert-fail on SIMD32. --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 9a4bad6..8976c25 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1357,10 +1357,12 @@ fs_visitor::emit_interpolation_setup_gen6() */ fs_reg int_pixel_xy(GRF, alloc.allocate(dispatch_width / 8), BRW_REGISTER_TYPE_UW, dispatch_width * 2); - abld.exec_all() - .ADD(int_pixel_xy, - fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)), - fs_reg(brw_imm_v(0x11001010))); + fs_inst *add = + new (mem_ctx) fs_inst(BRW_OPCODE_ADD, dispatch_width * 2, + int_pixel_xy, + fs_reg(stride(suboffset(g1_uw, 4), 1, 4, 0)), + fs_reg(brw_imm_v(0x11001010))); + abld.exec_all().emit(add); this-pixel_x = vgrf(glsl_type::float_type); this-pixel_y = vgrf(glsl_type::float_type); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 10/19] i965/fs: Make better use of the builder in shader_time
Previously, we were just depending on register widths to ensure that various things were exec_size of 1 etc. Now, we do so explicitly using the builder. Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 9855bfb..8024fae 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -557,7 +557,7 @@ fs_visitor::get_timestamp(const fs_builder bld) /* We want to read the 3 fields we care about even if it's not enabled in * the dispatch. */ - bld.exec_all().MOV(dst, ts); + bld.group(4, 0).exec_all().MOV(dst, ts); /* The caller wants the low 32 bits of the timestamp. Since it's running * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds, @@ -604,17 +604,19 @@ fs_visitor::emit_shader_time_end() start.negate = true; fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1); diff.set_smear(0); - ibld.ADD(diff, start, shader_end_time); + + const fs_builder cbld = ibld.group(1, 0); + cbld.group(1, 0).ADD(diff, start, shader_end_time); /* If there were no instructions between the two timestamp gets, the diff * is 2 cycles. Remove that overhead, so I can forget about that when * trying to determine the time taken for single instructions. */ - ibld.ADD(diff, diff, fs_reg(-2u)); - SHADER_TIME_ADD(ibld, 0, diff); - SHADER_TIME_ADD(ibld, 1, fs_reg(1u)); + cbld.ADD(diff, diff, fs_reg(-2u)); + SHADER_TIME_ADD(cbld, 0, diff); + SHADER_TIME_ADD(cbld, 1, fs_reg(1u)); ibld.emit(BRW_OPCODE_ELSE); - SHADER_TIME_ADD(ibld, 2, fs_reg(1u)); + SHADER_TIME_ADD(cbld, 2, fs_reg(1u)); ibld.emit(BRW_OPCODE_ENDIF); } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 12/19] i965/fs: Use exec_size for determining regs read/written and partial writes
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d1e253a..4f56865 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -101,7 +101,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg dst, case MRF: case ATTR: this-regs_written = - DIV_ROUND_UP(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32); + DIV_ROUND_UP(MAX2(exec_size * dst.stride, 1) * type_sz(dst.type), 32); break; case BAD_FILE: this-regs_written = 0; @@ -675,7 +675,7 @@ bool fs_inst::is_partial_write() const { return ((this-predicate this-opcode != BRW_OPCODE_SEL) || - (this-dst.width * type_sz(this-dst.type)) 32 || + (this-exec_size * type_sz(this-dst.type)) 32 || !this-dst.is_contiguous()); } @@ -729,8 +729,8 @@ fs_inst::regs_read(int arg) const if (src[arg].stride == 0) { return 1; } else { - int size = src[arg].width * src[arg].stride * type_sz(src[arg].type); - return (size + 31) / 32; + int size = this-exec_size * src[arg].stride * type_sz(src[arg].type); + return DIV_ROUND_UP(size, 32); } case MRF: unreachable(MRF registers are not allowed as sources); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 04/19] i965/fs: Report the right value in fs_inst::regs_read() for PIXEL_X/Y
Reviewed-by: Iago Toral Quiroga ito...@igalia.com Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 589b74c..6cf9e96 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -726,6 +726,12 @@ fs_inst::regs_read(int arg) const return exec_size / 4; break; + case FS_OPCODE_PIXEL_X: + case FS_OPCODE_PIXEL_Y: + if (arg == 0) + return 2; + break; + default: if (is_tex() arg == 0 src[0].file == GRF) return mlen; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 01/19] i965/fs: Use a switch statement in fs_inst::regs_read()
This makes things a little simpler, more efficient, and quite a bit more readable. Reviewed-by: Iago Toral Quiroga ito...@igalia.com Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 45 ++-- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 4292aa6..3ec8e6a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -701,28 +701,29 @@ fs_inst::is_partial_write() const int fs_inst::regs_read(int arg) const { - if (is_tex() arg == 0 src[0].file == GRF) { - return mlen; - } else if (opcode == FS_OPCODE_FB_WRITE arg == 0) { - return mlen; - } else if (opcode == SHADER_OPCODE_URB_WRITE_SIMD8 arg == 0) { - return mlen; - } else if (opcode == SHADER_OPCODE_UNTYPED_ATOMIC arg == 0) { - return mlen; - } else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_READ arg == 0) { - return mlen; - } else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_WRITE arg == 0) { - return mlen; - } else if (opcode == SHADER_OPCODE_TYPED_ATOMIC arg == 0) { - return mlen; - } else if (opcode == SHADER_OPCODE_TYPED_SURFACE_READ arg == 0) { - return mlen; - } else if (opcode == SHADER_OPCODE_TYPED_SURFACE_WRITE arg == 0) { - return mlen; - } else if (opcode == FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET arg == 0) { - return mlen; - } else if (opcode == FS_OPCODE_LINTERP arg == 0) { - return exec_size / 4; + switch (opcode) { + case FS_OPCODE_FB_WRITE: + case SHADER_OPCODE_URB_WRITE_SIMD8: + case SHADER_OPCODE_UNTYPED_ATOMIC: + case SHADER_OPCODE_UNTYPED_SURFACE_READ: + case SHADER_OPCODE_UNTYPED_SURFACE_WRITE: + case SHADER_OPCODE_TYPED_ATOMIC: + case SHADER_OPCODE_TYPED_SURFACE_READ: + case SHADER_OPCODE_TYPED_SURFACE_WRITE: + case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET: + if (arg == 0) + return mlen; + break; + + case FS_OPCODE_LINTERP: + if (arg == 0) + return exec_size / 4; + break; + + default: + if (is_tex() arg == 0 src[0].file == GRF) + return mlen; + break; } switch (src[arg].file) { -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 17/19] i965/fs: Use exec_size instead of dst.width for computing component size
There are a variety of places where we use dst.width / 8 to compute the size of a single logical channel. Instead, we should be using exec_size. Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp| 6 +++--- src/mesa/drivers/dri/i965/brw_fs_cse.cpp| 2 +- src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp| 2 +- src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 4 ++-- 5 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index aeaa1c4..6e8d9e8 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2274,12 +2274,12 @@ fs_visitor::opt_register_renaming() if (depth == 0 inst-dst.file == GRF - alloc.sizes[inst-dst.reg] == inst-dst.width / 8 + alloc.sizes[inst-dst.reg] == inst-exec_size / 8 !inst-is_partial_write()) { if (remap[dst] == -1) { remap[dst] = dst; } else { -remap[dst] = alloc.allocate(inst-dst.width / 8); +remap[dst] = alloc.allocate(inst-exec_size / 8); inst-dst.reg = remap[dst]; progress = true; } @@ -2410,7 +2410,7 @@ fs_visitor::compute_to_mrf() /* Things returning more than one register would need us to * understand coalescing out more than one MOV at a time. */ -if (scan_inst-regs_written scan_inst-dst.width / 8) +if (scan_inst-regs_written scan_inst-exec_size / 8) break; /* SEND instructions can't have MRF as a destination. */ diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 29d1f2a..29b46b9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -179,7 +179,7 @@ static void create_copy_instr(const fs_builder bld, fs_inst *inst, fs_reg src, bool negate) { int written = inst-regs_written; - int dst_width = inst-dst.width / 8; + int dst_width = inst-exec_size / 8; const fs_builder ubld = bld.group(inst-exec_size, inst-force_sechalf) .exec_all(inst-force_writemask_all); fs_inst *copy; diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp index 2ad7079..149c0f0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp @@ -196,7 +196,7 @@ fs_visitor::register_coalesce() continue; } reg_to_offset[offset] = inst-dst.reg_offset; - if (inst-src[0].width == 16) + if (inst-exec_size == 16) reg_to_offset[offset + 1] = inst-dst.reg_offset + 1; mov[offset] = inst; channels_remaining -= inst-regs_written; diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 7651e96..df24fb9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -912,7 +912,7 @@ fs_visitor::emit_texture(ir_texture_opcode op, bld.emit(SHADER_OPCODE_INT_QUOTIENT, fixed_depth, depth, fs_reg(6)); fs_reg *fixed_payload = ralloc_array(mem_ctx, fs_reg, inst-regs_written); - int components = inst-regs_written / (dst.width / 8); + int components = inst-regs_written / (inst-exec_size / 8); for (int i = 0; i components; i++) { if (i == 2) { fixed_payload[i] = fixed_depth; diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp index ee0add5..b49961f 100644 --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp @@ -1314,8 +1314,8 @@ fs_instruction_scheduler::choose_instruction_to_schedule() * single-result send is probably actually reducing register * pressure. */ - if (inst-regs_written = inst-dst.width / 8 - chosen_inst-regs_written chosen_inst-dst.width / 8) { + if (inst-regs_written = inst-exec_size / 8 + chosen_inst-regs_written chosen_inst-exec_size / 8) { chosen = n; continue; } else if (inst-regs_written chosen_inst-regs_written) { -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 15/19] i965/fs: Use the builder dispatch width instead of dst.width for pull constants
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 6e45fa7..aeaa1c4 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -188,7 +188,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder bld, bld.ADD(vec4_offset, varying_offset, fs_reg(const_offset ~3)); int scale = 1; - if (devinfo-gen == 4 dst.width == 8) { + if (devinfo-gen == 4 bld.dispatch_width() == 8) { /* Pre-gen5, we can either use a SIMD8 message that requires (header, * u, v, r) as parameters, or we can just use the SIMD16 message * consisting of (header, u). We choose the second, at the cost of a @@ -204,9 +204,9 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder bld, op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD; assert(dst.width % 8 == 0); - int regs_written = 4 * (dst.width / 8) * scale; + int regs_written = 4 * (bld.dispatch_width() / 8) * scale; fs_reg vec4_result = fs_reg(GRF, alloc.allocate(regs_written), - dst.type, dst.width); + dst.type, bld.dispatch_width()); fs_inst *inst = bld.emit(op, vec4_result, surf_index, vec4_offset); inst-regs_written = regs_written; @@ -216,7 +216,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder bld, if (devinfo-gen == 4) inst-mlen = 3; else - inst-mlen = 1 + dispatch_width / 8; + inst-mlen = 1 + bld.dispatch_width() / 8; } bld.MOV(dst, offset(vec4_result, bld, (const_offset 3) * scale)); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 19/19] i965/fs: Remove the width field from fs_reg
As of now, the width field is no longer used for anything. The width field seemed like a good idea at the time but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 62 -- src/mesa/drivers/dri/i965/brw_fs_builder.h | 19 ++- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 4 -- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 6 +-- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 4 +- .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 1 - src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 26 - src/mesa/drivers/dri/i965/brw_ir_fs.h | 15 +- 8 files changed, 30 insertions(+), 107 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 6e8d9e8..a96424e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -203,10 +203,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder bld, else op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD; - assert(dst.width % 8 == 0); int regs_written = 4 * (bld.dispatch_width() / 8) * scale; - fs_reg vec4_result = fs_reg(GRF, alloc.allocate(regs_written), - dst.type, bld.dispatch_width()); + fs_reg vec4_result = fs_reg(GRF, alloc.allocate(regs_written), dst.type); fs_inst *inst = bld.emit(op, vec4_result, surf_index, vec4_offset); inst-regs_written = regs_written; @@ -310,7 +308,6 @@ fs_inst::is_copy_payload(const brw::simple_allocator grf_alloc) const for (int i = 0; i this-sources; i++) { reg.type = this-src[i].type; - reg.width = this-src[i].width; if (!this-src[i].equals(reg)) return false; @@ -366,7 +363,6 @@ fs_reg::fs_reg(float f) this-file = IMM; this-type = BRW_REGISTER_TYPE_F; this-fixed_hw_reg.dw1.f = f; - this-width = 1; } /** Immediate value constructor. */ @@ -376,7 +372,6 @@ fs_reg::fs_reg(int32_t i) this-file = IMM; this-type = BRW_REGISTER_TYPE_D; this-fixed_hw_reg.dw1.d = i; - this-width = 1; } /** Immediate value constructor. */ @@ -386,7 +381,6 @@ fs_reg::fs_reg(uint32_t u) this-file = IMM; this-type = BRW_REGISTER_TYPE_UD; this-fixed_hw_reg.dw1.ud = u; - this-width = 1; } /** Vector float immediate value constructor. */ @@ -417,7 +411,6 @@ fs_reg::fs_reg(struct brw_reg fixed_hw_reg) this-file = HW_REG; this-fixed_hw_reg = fixed_hw_reg; this-type = fixed_hw_reg.type; - this-width = 1 fixed_hw_reg.width; } bool @@ -432,7 +425,6 @@ fs_reg::equals(const fs_reg r) const abs == r.abs !reladdr !r.reladdr memcmp(fixed_hw_reg, r.fixed_hw_reg, sizeof(fixed_hw_reg)) == 0 - width == r.width stride == r.stride); } @@ -504,7 +496,7 @@ fs_visitor::get_timestamp(const fs_builder bld) 0), BRW_REGISTER_TYPE_UD)); - fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 4); + fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD); /* We want to read the 3 fields we care about even if it's not enabled in * the dispatch. @@ -554,7 +546,7 @@ fs_visitor::emit_shader_time_end() fs_reg start = shader_start_time; start.negate = true; - fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1); + fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD); diff.set_smear(0); const fs_builder cbld = ibld.group(1, 0); @@ -803,7 +795,7 @@ fs_visitor::vgrf(const glsl_type *const type) { int reg_width = dispatch_width / 8; return fs_reg(GRF, alloc.allocate(type_size(type) * reg_width), - brw_type_for_base_type(type), dispatch_width); + brw_type_for_base_type(type)); } /** Fixed HW reg constructor. */ @@ -813,14 +805,6 @@ fs_reg::fs_reg(enum register_file file, int reg) this-file = file; this-reg = reg; this-type = BRW_REGISTER_TYPE_F; - - switch (file) { - case UNIFORM: - this-width = 1; - break; - default: - this-width = 8; - } } /** Fixed HW reg constructor. */ @@ -830,25 +814,6 @@ fs_reg::fs_reg(enum register_file file, int reg, enum brw_reg_type type) this-file = file; this-reg = reg; this-type = type; - - switch (file) { - case UNIFORM: - this-width = 1; - break; - default: - this-width = 8; - } -} - -/** Fixed HW reg constructor. */ -fs_reg::fs_reg(enum
[Mesa-dev] [PATCH v2 14/19] i965/fs: Remove exec_size guessing from fs_inst::init()
Now that all of the non-explicit constructors are gone, we don't need to guess anymore. Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 22 -- 1 file changed, 22 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 4f56865..6e45fa7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -68,28 +68,6 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg dst, assert(dst.file != IMM dst.file != UNIFORM); - /* If exec_size == 0, try to guess it from the registers. Since all -* manner of things may use hardware registers, we first try to guess -* based on GRF registers. If this fails, we will go ahead and take the -* width from the destination register. -*/ - if (this-exec_size == 0) { - if (dst.file == GRF) { - this-exec_size = dst.width; - } else { - for (unsigned i = 0; i sources; ++i) { -if (src[i].file != GRF src[i].file != ATTR) - continue; - -if (this-exec_size = 1) - this-exec_size = src[i].width; -assert(src[i].width == 1 || src[i].width == this-exec_size); - } - } - - if (this-exec_size == 0 dst.file != BAD_FILE) - this-exec_size = dst.width; - } assert(this-exec_size != 0); this-conditional_mod = BRW_CONDITIONAL_NONE; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 18/19] i965/fs_generator: Use inst-exec_size for determining hardware reg widths
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 8d821ab..0a70bdc 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -48,7 +48,7 @@ static uint32_t brw_file_from_reg(fs_reg *reg) } static struct brw_reg -brw_reg_from_fs_reg(fs_reg *reg) +brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg) { struct brw_reg brw_reg; @@ -57,10 +57,10 @@ brw_reg_from_fs_reg(fs_reg *reg) case MRF: if (reg-stride == 0) { brw_reg = brw_vec1_reg(brw_file_from_reg(reg), reg-reg, 0); - } else if (reg-width 8) { + } else if (inst-exec_size 8) { brw_reg = brw_vec8_reg(brw_file_from_reg(reg), reg-reg, 0); - brw_reg = stride(brw_reg, reg-width * reg-stride, - reg-width, reg-stride); + brw_reg = stride(brw_reg, inst-exec_size * reg-stride, + inst-exec_size, reg-stride); } else { /* From the Haswell PRM: * @@ -414,7 +414,7 @@ fs_generator::generate_blorp_fb_write(fs_inst *inst) brw_fb_WRITE(p, 16 /* dispatch_width */, brw_message_reg(inst-base_mrf), -brw_reg_from_fs_reg(inst-src[0]), +brw_reg_from_fs_reg(inst, inst-src[0]), BRW_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE, inst-target, inst-mlen, @@ -1560,7 +1560,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) annotate(p-devinfo, annotation, cfg, inst, p-next_insn_offset); for (unsigned int i = 0; i inst-sources; i++) { -src[i] = brw_reg_from_fs_reg(inst-src[i]); +src[i] = brw_reg_from_fs_reg(inst, inst-src[i]); /* The accumulator result appears to get used for the * conditional modifier generation. When negating a UD @@ -1572,7 +1572,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) inst-src[i].type != BRW_REGISTER_TYPE_UD || !inst-src[i].negate); } - dst = brw_reg_from_fs_reg(inst-dst); + dst = brw_reg_from_fs_reg(inst, inst-dst); brw_set_default_predicate_control(p, inst-predicate); brw_set_default_predicate_inverse(p, inst-predicate_inverse); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 13/19] i965/fs_builder: Use the dispatch width for setting exec sizes
Previously we used dst.width but the two *should* be the same. Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs_builder.h | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h index c823190..8af16a0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h @@ -235,7 +235,7 @@ namespace brw { instruction * emit(enum opcode opcode, const dst_reg dst) const { - return emit(instruction(opcode, dst.width, dst)); + return emit(instruction(opcode, dispatch_width(), dst)); } /** @@ -253,11 +253,11 @@ namespace brw { case SHADER_OPCODE_SIN: case SHADER_OPCODE_COS: return fix_math_instruction( - emit(instruction(opcode, dst.width, dst, + emit(instruction(opcode, dispatch_width(), dst, fix_math_operand(src0; default: -return emit(instruction(opcode, dst.width, dst, src0)); +return emit(instruction(opcode, dispatch_width(), dst, src0)); } } @@ -273,12 +273,12 @@ namespace brw { case SHADER_OPCODE_INT_QUOTIENT: case SHADER_OPCODE_INT_REMAINDER: return fix_math_instruction( - emit(instruction(opcode, dst.width, dst, + emit(instruction(opcode, dispatch_width(), dst, fix_math_operand(src0), fix_math_operand(src1; default: -return emit(instruction(opcode, dst.width, dst, src0, src1)); +return emit(instruction(opcode, dispatch_width(), dst, src0, src1)); } } @@ -295,13 +295,14 @@ namespace brw { case BRW_OPCODE_BFI2: case BRW_OPCODE_MAD: case BRW_OPCODE_LRP: -return emit(instruction(opcode, dst.width, dst, +return emit(instruction(opcode, dispatch_width(), dst, fix_3src_operand(src0), fix_3src_operand(src1), fix_3src_operand(src2))); default: -return emit(instruction(opcode, dst.width, dst, src0, src1, src2)); +return emit(instruction(opcode, dispatch_width(), dst, +src0, src1, src2)); } } @@ -517,7 +518,8 @@ namespace brw { { assert(dst.width % 8 == 0); instruction *inst = emit(instruction(SHADER_OPCODE_LOAD_PAYLOAD, - dst.width, dst, src, sources)); + dispatch_width(), dst, + src, sources)); inst-header_size = header_size; for (unsigned i = 0; i header_size; i++) @@ -528,7 +530,7 @@ namespace brw { for (unsigned i = header_size; i sources; ++i) assert(src[i].file != GRF || src[i].width == dst.width); - inst-regs_written += (sources - header_size) * (dst.width / 8); + inst-regs_written += (sources - header_size) * (dispatch_width() / 8); return inst; } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add support for viewport array (v2)
In that case, feel free to push. Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Thu, Jun 25, 2015 at 10:25 PM, Dave Airlie airl...@gmail.com wrote: On 26 June 2015 at 00:26, Marek Olšák mar...@gmail.com wrote: Hi Dave, The change in si_shader_io_get_unique_index can be dropped. The function is only used for shaders before GS. Ok okay I was hitting the assert in there for the layer/viewport index cases, but if the patch you pushed to master helps I'll drop it. This looks good, but I've had a different plan for this feature: Yeah I thought you might, I just wanted to hack something up to see it working since it seems new Dota 2 uses this feature. I'd like the states to be converted into 2 atoms: 1 r600_atom for all 16 viewports 1 r600_atom for all 16 scissors Each atom should have a bitmask saying which slots are dirty. (the same idea as resource slots) The emit functions should only emit dirty viewports/scissors. Also, the emit functions shouldn't emit non-zero viewports/scissors if the viewport index isn't written by the hardware VS stage (si_get_vs_info(sctx)-...). This should keep the same level of effectiveness as before. When a shader that writes the viewport index is bound *and* there are any dirty viewports or scissors, that's the right time to mark the atoms as dirty again, so that non-zero dirty viewports/scissors are finally emitted. I'm not sure if I'll get to doing more of it, I just had a day at home, and wanted to get up to speed on radeonsi a bit, so I can just leave that here, in case anyone wants to pick it up, Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] nir: remove nir_src_get_parent_instr()
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote: It's now unused. Signed-off-by: Connor Abbott cwabbo...@gmail.com --- src/glsl/nir/nir.h | 10 -- 1 file changed, 10 deletions(-) diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 2116f60..b33c9c5 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -565,16 +565,6 @@ nir_src_for_reg(nir_register *reg) return src; } -static inline nir_instr * -nir_src_get_parent_instr(const nir_src *src) -{ - if (src-is_ssa) { - return src-ssa-parent_instr; - } else { - return src-reg.reg-parent_instr; - } -} - static inline nir_dest nir_dest_for_reg(nir_register *reg) { -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 0/6] Continue enabling Open GL ES 3.1
Reviewed-by: Tapani Pälli tapani.pa...@intel.com (no v4 really needed for the little nitpicks but I forgot to say this) On 06/25/2015 12:08 PM, Marta Lofstedt wrote: This are for the V4s. Marta Lofstedt (6): mesa/es3.1: enable GL_ARB_shader_image_load_store for gles3.1 mesa/es3.1: enable GL_ARB_shader_atomic_counters for GLES 3.1 mesa/es3.1: enable GL_ARB_texture_multisample for GLES 3.1 mesa/es3.1: enable GL_ARB_texture_gather for GLES 3.1 mesa/es3.1: enable GL_ARB_compute_shader for GLES 3.1 mesa/es3.1: enable GL_ARB_explicit_uniform_location for GLES 3.1 src/mesa/main/get.c | 36 + src/mesa/main/get_hash_params.py | 86 +--- 2 files changed, 82 insertions(+), 40 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 00/19] i965/fs: Remove the width field from fs_reg
This is a re-send of the series I did a week or two ago to remove the width field from the fs_reg class. I really didn't want to do a re-send but there have been enough fixes since then that I thought it was worth re-sending. Most of these patches have already been reviewed but not all. 02: New. Needs to be reviewed by someone familiar with SKL 03: Needs re-review. This one is affected by 02. 05: Needs re-review. This one went through a lot of changes to actually get it right. It should be the way we want now. 08: New. It's just moving code around so it should be trivial. 09: New. This is a complete replacement of patch 07 from the previous series. Cc: Topi Pohjolainen topi.pohjolai...@intel.com Cc: Iago Toral Quiroga ito...@igalia.com Cc: Francisco Jerez curroje...@riseup.net Cc: Neil Roberts n...@linux.intel.com Jason Ekstrand (19): i965/fs: Use a switch statement in fs_inst::regs_read() i965/fs: Actually set/use the mlen for gen7 uniform pull constant loads i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads i965/fs: Report the right value in fs_inst::regs_read() for PIXEL_X/Y i965/fs: Explicitly set the exec_size on the add(32) in interpolation setup i965/fs: Set the builder group for emitting FB-write stencil/AA alpha i965/blorp: Explicitly set execution sizes for new'd instructions i965/fs: Move offset(fs_reg, unsigned) to brw_fs.h i965/fs: Add a builder argument to offset() i965/fs: Make better use of the builder in shader_time i965/fs: Remove fs_inst constructors that don't take an explicit exec_size i965/fs: Use exec_size for determining regs read/written and partial writes i965/fs_builder: Use the dispatch width for setting exec sizes i965/fs: Remove exec_size guessing from fs_inst::init() i965/fs: Use the builder dispatch width instead of dst.width for pull constants i965/fs: Use the builder dispatch_width for computing register offsets i965/fs: Use exec_size instead of dst.width for computing component size i965/fs_generator: Use inst-exec_size for determining hardware reg widths i965/fs: Remove the width field from fs_reg src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp| 9 +- src/mesa/drivers/dri/i965/brw_fs.cpp | 266 - src/mesa/drivers/dri/i965/brw_fs.h | 21 ++ src/mesa/drivers/dri/i965/brw_fs_builder.h | 37 ++- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 4 - src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 10 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 23 +- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 64 ++--- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 4 +- .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 3 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 183 +++--- src/mesa/drivers/dri/i965/brw_ir_fs.h | 45 +--- .../drivers/dri/i965/brw_schedule_instructions.cpp | 4 +- 13 files changed, 287 insertions(+), 386 deletions(-) -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add support for viewport array (v2)
On 26 June 2015 at 00:26, Marek Olšák mar...@gmail.com wrote: Hi Dave, The change in si_shader_io_get_unique_index can be dropped. The function is only used for shaders before GS. Ok okay I was hitting the assert in there for the layer/viewport index cases, but if the patch you pushed to master helps I'll drop it. This looks good, but I've had a different plan for this feature: Yeah I thought you might, I just wanted to hack something up to see it working since it seems new Dota 2 uses this feature. I'd like the states to be converted into 2 atoms: 1 r600_atom for all 16 viewports 1 r600_atom for all 16 scissors Each atom should have a bitmask saying which slots are dirty. (the same idea as resource slots) The emit functions should only emit dirty viewports/scissors. Also, the emit functions shouldn't emit non-zero viewports/scissors if the viewport index isn't written by the hardware VS stage (si_get_vs_info(sctx)-...). This should keep the same level of effectiveness as before. When a shader that writes the viewport index is bound *and* there are any dirty viewports or scissors, that's the right time to mark the atoms as dirty again, so that non-zero dirty viewports/scissors are finally emitted. I'm not sure if I'll get to doing more of it, I just had a day at home, and wanted to get up to speed on radeonsi a bit, so I can just leave that here, in case anyone wants to pick it up, Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 11/19] i965/fs: Remove fs_inst constructors that don't take an explicit exec_size
Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs.cpp | 30 ++ src/mesa/drivers/dri/i965/brw_fs_builder.h | 2 +- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 6 -- src/mesa/drivers/dri/i965/brw_ir_fs.h | 9 + 4 files changed, 8 insertions(+), 39 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 8024fae..d1e253a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -126,9 +126,9 @@ fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size) init(opcode, exec_size, reg_undef, NULL, 0); } -fs_inst::fs_inst(enum opcode opcode, const fs_reg dst) +fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst) { - init(opcode, 0, dst, NULL, 0); + init(opcode, exec_size, dst, NULL, 0); } fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst, @@ -138,12 +138,6 @@ fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst, init(opcode, exec_size, dst, src, 1); } -fs_inst::fs_inst(enum opcode opcode, const fs_reg dst, const fs_reg src0) -{ - const fs_reg src[1] = { src0 }; - init(opcode, 0, dst, src, 1); -} - fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst, const fs_reg src0, const fs_reg src1) { @@ -151,13 +145,6 @@ fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst, init(opcode, exec_size, dst, src, 2); } -fs_inst::fs_inst(enum opcode opcode, const fs_reg dst, const fs_reg src0, - const fs_reg src1) -{ - const fs_reg src[2] = { src0, src1 }; - init(opcode, 0, dst, src, 2); -} - fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst, const fs_reg src0, const fs_reg src1, const fs_reg src2) { @@ -165,19 +152,6 @@ fs_inst::fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst, init(opcode, exec_size, dst, src, 3); } -fs_inst::fs_inst(enum opcode opcode, const fs_reg dst, const fs_reg src0, - const fs_reg src1, const fs_reg src2) -{ - const fs_reg src[3] = { src0, src1, src2 }; - init(opcode, 0, dst, src, 3); -} - -fs_inst::fs_inst(enum opcode opcode, const fs_reg dst, - const fs_reg src[], unsigned sources) -{ - init(opcode, 0, dst, src, sources); -} - fs_inst::fs_inst(enum opcode opcode, uint8_t exec_width, const fs_reg dst, const fs_reg src[], unsigned sources) { diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h index 58ac598..c823190 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h @@ -235,7 +235,7 @@ namespace brw { instruction * emit(enum opcode opcode, const dst_reg dst) const { - return emit(instruction(opcode, dst)); + return emit(instruction(opcode, dst.width, dst)); } /** diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 61eb904..50d6014 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -109,7 +109,8 @@ fs_visitor::nir_setup_inputs(nir_shader *shader) if (var-data.location == VARYING_SLOT_POS) { reg = *emit_fragcoord_interpolation(var-data.pixel_center_integer, var-data.origin_upper_left); -emit_percomp(bld, fs_inst(BRW_OPCODE_MOV, input, reg), 0xF); +emit_percomp(bld, fs_inst(BRW_OPCODE_MOV, bld.dispatch_width(), + input, reg), 0xF); } else { emit_general_interpolation(input, var-name, var-type, (glsl_interp_qualifier) var-data.interpolation, @@ -1743,7 +1744,8 @@ fs_visitor::nir_emit_texture(const fs_builder bld, nir_tex_instr *instr) fs_reg dest = get_nir_dest(instr-dest); dest.type = this-result.type; unsigned num_components = nir_tex_instr_dest_size(instr); - emit_percomp(bld, fs_inst(BRW_OPCODE_MOV, dest, this-result), + emit_percomp(bld, fs_inst(BRW_OPCODE_MOV, bld.dispatch_width(), + dest, this-result), (1 num_components) - 1); } diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h index 16b20be..d6b617a 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h @@ -189,20 +189,13 @@ public: fs_inst(); fs_inst(enum opcode opcode, uint8_t exec_size); - fs_inst(enum opcode opcode, const fs_reg dst); + fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst); fs_inst(enum opcode opcode, uint8_t exec_size, const fs_reg dst, const fs_reg src0); - fs_inst(enum opcode opcode, const fs_reg dst, const
[Mesa-dev] [PATCH v2 09/19] i965/fs: Add a builder argument to offset()
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 42 src/mesa/drivers/dri/i965/brw_fs.h | 2 +- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 58 +-- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 143 ++- 5 files changed, 128 insertions(+), 119 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 6cf9e96..9855bfb 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -267,7 +267,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder bld, inst-mlen = 1 + dispatch_width / 8; } - bld.MOV(dst, offset(vec4_result, (const_offset 3) * scale)); + bld.MOV(dst, offset(vec4_result, bld, (const_offset 3) * scale)); } /** @@ -361,7 +361,12 @@ fs_inst::is_copy_payload(const brw::simple_allocator grf_alloc) const reg.width = this-src[i].width; if (!this-src[i].equals(reg)) return false; - reg = ::offset(reg, 1); + + if (i this-header_size) { + reg.reg_offset += 1; + } else { + reg.reg_offset += this-exec_size / 8; + } } return true; @@ -920,7 +925,7 @@ fs_visitor::emit_fragcoord_interpolation(bool pixel_center_integer, } else { bld.ADD(wpos, this-pixel_x, fs_reg(0.5f)); } - wpos = offset(wpos, 1); + wpos = offset(wpos, bld, 1); /* gl_FragCoord.y */ if (!flip pixel_center_integer) { @@ -936,7 +941,7 @@ fs_visitor::emit_fragcoord_interpolation(bool pixel_center_integer, bld.ADD(wpos, pixel_y, fs_reg(offset)); } - wpos = offset(wpos, 1); + wpos = offset(wpos, bld, 1); /* gl_FragCoord.z */ if (devinfo-gen = 6) { @@ -946,7 +951,7 @@ fs_visitor::emit_fragcoord_interpolation(bool pixel_center_integer, this-delta_xy[BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC], interp_reg(VARYING_SLOT_POS, 2)); } - wpos = offset(wpos, 1); + wpos = offset(wpos, bld, 1); /* gl_FragCoord.w: Already set up in emit_interpolation */ bld.MOV(wpos, this-wpos_w); @@ -1029,7 +1034,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, const char *name, /* If there's no incoming setup data for this slot, don't * emit interpolation for it. */ - attr = offset(attr, type-vector_elements); + attr = offset(attr, bld, type-vector_elements); location++; continue; } @@ -1044,7 +1049,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, const char *name, interp = suboffset(interp, 3); interp.type = attr.type; bld.emit(FS_OPCODE_CINTERP, attr, fs_reg(interp)); - attr = offset(attr, 1); + attr = offset(attr, bld, 1); } } else { /* Smooth/noperspective interpolation case. */ @@ -1082,7 +1087,7 @@ fs_visitor::emit_general_interpolation(fs_reg attr, const char *name, if (devinfo-gen 6 interpolation_mode == INTERP_QUALIFIER_SMOOTH) { bld.MUL(attr, attr, this-pixel_w); } - attr = offset(attr, 1); + attr = offset(attr, bld, 1); } } @@ -1190,7 +1195,7 @@ fs_visitor::emit_samplepos_setup() } /* Compute gl_SamplePosition.x */ compute_sample_position(pos, int_sample_x); - pos = offset(pos, 1); + pos = offset(pos, abld, 1); if (dispatch_width == 8) { abld.MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1))); } else { @@ -2980,10 +2985,6 @@ fs_visitor::lower_load_payload() assert(inst-dst.file == MRF || inst-dst.file == GRF); assert(inst-saturate == false); - - const fs_builder ibld = bld.group(inst-exec_size, inst-force_sechalf) - .exec_all(inst-force_writemask_all) - .at(block, inst); fs_reg dst = inst-dst; /* Get rid of COMPR4. We'll add it back in if we need it */ @@ -2991,17 +2992,23 @@ fs_visitor::lower_load_payload() dst.reg = dst.reg ~BRW_MRF_COMPR4; dst.width = 8; + const fs_builder hbld = bld.group(8, 0).exec_all().at(block, inst); + for (uint8_t i = 0; i inst-header_size; i++) { if (inst-src[i].file != BAD_FILE) { fs_reg mov_dst = retype(dst, BRW_REGISTER_TYPE_UD); fs_reg mov_src = retype(inst-src[i], BRW_REGISTER_TYPE_UD); mov_src.width = 8; -ibld.exec_all().MOV(mov_dst, mov_src); +hbld.MOV(mov_dst, mov_src); } - dst = offset(dst, 1); + dst = offset(dst, hbld, 1); } dst.width = inst-exec_size; + const fs_builder ibld = bld.group(inst-exec_size, inst-force_sechalf) + .exec_all(inst-force_writemask_all) +
[Mesa-dev] [PATCH v2 08/19] i965/fs: Move offset(fs_reg, unsigned) to brw_fs.h
Shortly, offset() will depend on the builder so we need it moved to some place where it has access to that. --- src/mesa/drivers/dri/i965/brw_fs.h| 21 + src/mesa/drivers/dri/i965/brw_ir_fs.h | 21 - 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 243baf6..c1819cc 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -62,6 +62,27 @@ namespace brw { class fs_live_variables; } +static inline fs_reg +offset(fs_reg reg, unsigned delta) +{ + switch (reg.file) { + case BAD_FILE: + break; + case GRF: + case MRF: + case ATTR: + return byte_offset(reg, + delta * MAX2(reg.width * reg.stride, 1) * + type_sz(reg.type)); + case UNIFORM: + reg.reg_offset += delta; + break; + default: + assert(delta == 0); + } + return reg; +} + /** * The fragment shader front-end. * diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h index 96dc20d..16b20be 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h @@ -129,27 +129,6 @@ horiz_offset(fs_reg reg, unsigned delta) } static inline fs_reg -offset(fs_reg reg, unsigned delta) -{ - switch (reg.file) { - case BAD_FILE: - break; - case GRF: - case MRF: - case ATTR: - return byte_offset(reg, - delta * MAX2(reg.width * reg.stride, 1) * - type_sz(reg.type)); - case UNIFORM: - reg.reg_offset += delta; - break; - default: - assert(delta == 0); - } - return reg; -} - -static inline fs_reg component(fs_reg reg, unsigned idx) { assert(reg.subreg_offset == 0); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 06/19] i965/fs: Set the builder group for emitting FB-write stencil/AA alpha
Reviewed-by: Iago Toral Quiroga ito...@igalia.com Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 8976c25..2341d02 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1529,7 +1529,7 @@ fs_visitor::emit_single_fb_write(const fs_builder bld, if (payload.aa_dest_stencil_reg) { sources[length] = fs_reg(GRF, alloc.allocate(1)); - bld.exec_all().annotate(FB write stencil/AA alpha) + bld.group(8, 0).exec_all().annotate(FB write stencil/AA alpha) .MOV(sources[length], fs_reg(brw_vec8_grf(payload.aa_dest_stencil_reg, 0))); length++; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 03/19] i965/fs: Fix fs_inst::regs_read() for uniform pull constant loads
Previously, fs_inst::regs_read() fell back to depending on the register width for the second source. This isn't really correct since it isn't a SIMD8 value at all, but a SIMD4x2 value. This commit changes it to explicitly be always one register. Reviewed-by: Iago Toral Quiroga ito...@igalia.com v2: Use mlen for determining the number of registers written --- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 31dfb24..589b74c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -715,6 +715,12 @@ fs_inst::regs_read(int arg) const return mlen; break; + case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7: + /* The payload is actually stored in src1 */ + if (arg == 1) + return mlen; + break; + case FS_OPCODE_LINTERP: if (arg == 0) return exec_size / 4; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 07/19] i965/blorp: Explicitly set execution sizes for new'd instructions
This doesn't affect instructions allocated using the builder. Reviewed-by: Iago Toral Quiroga ito...@igalia.com Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index 789520c..d458ad8 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -73,7 +73,7 @@ brw_blorp_eu_emitter::emit_kill_if_outside_rect(const struct brw_reg x, emit_cmp(BRW_CONDITIONAL_L, x, dst_x1)-predicate = BRW_PREDICATE_NORMAL; emit_cmp(BRW_CONDITIONAL_L, y, dst_y1)-predicate = BRW_PREDICATE_NORMAL; - fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, g1, f0, g1); + fs_inst *inst = new (mem_ctx) fs_inst(BRW_OPCODE_AND, 16, g1, f0, g1); inst-force_writemask_all = true; insts.push_tail(inst); } @@ -84,7 +84,7 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct brw_reg dst, unsigned base_mrf, unsigned msg_length) { - fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf), + fs_inst *inst = new (mem_ctx) fs_inst(op, 16, dst, brw_message_reg(base_mrf), fs_reg(0u)); inst-base_mrf = base_mrf; @@ -119,7 +119,8 @@ brw_blorp_eu_emitter::emit_combine(enum opcode combine_opcode, { assert(combine_opcode == BRW_OPCODE_ADD || combine_opcode == BRW_OPCODE_AVG); - insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, dst, src_1, src_2)); + insts.push_tail(new (mem_ctx) fs_inst(combine_opcode, 16, dst, + src_1, src_2)); } fs_inst * @@ -127,7 +128,7 @@ brw_blorp_eu_emitter::emit_cmp(enum brw_conditional_mod op, const struct brw_reg x, const struct brw_reg y) { - fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP, + fs_inst *cmp = new (mem_ctx) fs_inst(BRW_OPCODE_CMP, 16, vec16(brw_null_reg()), x, y); cmp-conditional_mod = op; insts.push_tail(cmp); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add support for geometry shader invocations.
On Thu, Jun 25, 2015 at 4:26 PM, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/radeonsi/si_shader.c| 5 + src/gallium/drivers/radeonsi/si_shader.h| 1 + src/gallium/drivers/radeonsi/si_state.c | 1 - src/gallium/drivers/radeonsi/si_state_shaders.c | 7 +++ 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 87608a1..665ce83 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -630,6 +630,11 @@ static void declare_system_value( SI_PARAM_BASE_VERTEX); break; + case TGSI_SEMANTIC_INVOCATIONID: + value = LLVMGetParam(radeon_bld-main_fn, +SI_PARAM_GS_INSTANCE_ID); + break; + case TGSI_SEMANTIC_SAMPLEID: value = get_sample_id(radeon_bld); break; diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index 51055af..b4339ae 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -115,6 +115,7 @@ struct si_shader_selector { unsignedgs_output_prim; unsignedgs_max_out_vertices; + unsignedgs_num_invocations; uint64_tgs_used_inputs; /* mask of get_unique_index bits */ }; diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 752467b..0dd08a2 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -3078,7 +3078,6 @@ void si_init_config(struct si_context *sctx) si_pm4_set_reg(pm4, R_028B60_VGT_GS_VERT_ITEMSIZE_1, 0); si_pm4_set_reg(pm4, R_028B64_VGT_GS_VERT_ITEMSIZE_2, 0); si_pm4_set_reg(pm4, R_028B68_VGT_GS_VERT_ITEMSIZE_3, 0); - si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, 0); si_pm4_set_reg(pm4, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0); si_pm4_set_reg(pm4, R_028AB4_VGT_REUSE_OFF, 0); diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 48128fa..eef3baa 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -76,6 +76,7 @@ static void si_shader_gs(struct si_shader *shader) unsigned gs_vert_itemsize = shader-selector-info.num_outputs * (16 2); unsigned gs_max_vert_out = shader-selector-gs_max_out_vertices; unsigned gsvs_itemsize = gs_vert_itemsize * gs_max_vert_out; + unsigned gs_num_invocations = shader-selector-gs_num_invocations; unsigned cut_mode; struct si_pm4_state *pm4; unsigned num_sgprs, num_user_sgprs; @@ -118,6 +119,10 @@ static void si_shader_gs(struct si_shader *shader) si_pm4_set_reg(pm4, R_028B5C_VGT_GS_VERT_ITEMSIZE, gs_vert_itemsize); + si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, + S_028B90_CNT(MIN2(gs_num_invocations, 127)) | + S_028B90_ENABLE(gs_num_invocations 0)); Without knowing how radeonsi works, just want to point out that previously GS_INSTANCE_CNT was being always set to 0, even though a single invocation was still being run. Perhaps this really needs to be gs_num_invocations_minus_one? GS_INVOCATIONS will always be at least 1. + va = shader-bo-gpu_address; si_pm4_add_bo(pm4, shader-bo, RADEON_USAGE_READ, RADEON_PRIO_SHADER_DATA); si_pm4_set_reg(pm4, R_00B220_SPI_SHADER_PGM_LO_GS, va 8); @@ -490,6 +495,8 @@ static void *si_create_shader_state(struct pipe_context *ctx, sel-info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM]; sel-gs_max_out_vertices = sel-info.properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES]; + sel-gs_num_invocations = + sel-info.properties[TGSI_PROPERTY_GS_INVOCATIONS]; for (i = 0; i sel-info.num_inputs; i++) { unsigned name = sel-info.input_semantic_name[i]; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] nir/from_ssa: add a flag to not convert everything to SSA
On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote: We already don't convert constants out of SSA, and in our backend we'd like to have only one way of saying something is still in SSA. The one tricky part about this is that we may now leave some undef instructions around if they aren't part of a phi-web, so we have to be more careful about deleting them. Signed-off-by: Connor Abbott cwabbo...@gmail.com --- src/gallium/drivers/vc4/vc4_program.c | 2 +- src/glsl/nir/nir.h| 7 ++- src/glsl/nir/nir_from_ssa.c | 25 ++--- src/mesa/drivers/dri/i965/brw_nir.c | 2 +- 4 files changed, 26 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/vc4/vc4_program.c b/src/gallium/drivers/vc4/vc4_program.c index 2061631..1a550e1 100644 --- a/src/gallium/drivers/vc4/vc4_program.c +++ b/src/gallium/drivers/vc4/vc4_program.c @@ -2102,7 +2102,7 @@ vc4_shader_ntq(struct vc4_context *vc4, enum qstage stage, nir_remove_dead_variables(c-s); -nir_convert_from_ssa(c-s); +nir_convert_from_ssa(c-s, true); if (vc4_debug VC4_DEBUG_SHADERDB) { fprintf(stderr, SHADER-DB: %s prog %d/%d: %d NIR instructions\n, diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 697d37e..2116f60 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1676,7 +1676,12 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def *b); void nir_convert_to_ssa_impl(nir_function_impl *impl); void nir_convert_to_ssa(nir_shader *shader); -void nir_convert_from_ssa(nir_shader *shader); + +/* If convert_everything is true, convert all values (even those not involved + * in a phi node) to registers. If false, only convert SSA values involved in + * phi nodes to registers. + */ +void nir_convert_from_ssa(nir_shader *shader, bool convert_everything); I don't think convert everything is really what we want to call it. A better idea might be to flip the bool and call it phi_webs_only. With that changed, Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com bool nir_opt_algebraic(nir_shader *shader); bool nir_opt_algebraic_late(nir_shader *shader); diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c index 67733e6..966c2fe 100644 --- a/src/glsl/nir/nir_from_ssa.c +++ b/src/glsl/nir/nir_from_ssa.c @@ -37,6 +37,7 @@ struct from_ssa_state { void *mem_ctx; void *dead_ctx; + bool convert_everything; struct hash_table *merge_node_table; nir_instr *instr; nir_function_impl *impl; @@ -482,6 +483,9 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) reg = node-set-reg; } else { + if (!state-convert_everything) + return true; + /* We leave load_const SSA values alone. They act as immediates to * the backend. If it got coalesced into a phi, that's ok. */ @@ -505,8 +509,15 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state-mem_ctx); assert(list_empty(def-uses) list_empty(def-if_uses)); - if (def-parent_instr-type == nir_instr_type_ssa_undef) + if (def-parent_instr-type == nir_instr_type_ssa_undef) { + /* If it's an ssa_undef instruction, remove it since we know we just got + * rid of all its uses. + */ + nir_instr *parent_instr = def-parent_instr; + nir_instr_remove(parent_instr); + ralloc_steal(state-dead_ctx, parent_instr); return true; + } assert(def-parent_instr-type != nir_instr_type_load_const); @@ -523,7 +534,7 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) } /* Resolves ssa definitions to registers. While we're at it, we also - * remove phi nodes and ssa_undef instructions + * remove phi nodes. */ static bool resolve_registers_block(nir_block *block, void *void_state) @@ -534,8 +545,7 @@ resolve_registers_block(nir_block *block, void *void_state) state-instr = instr; nir_foreach_ssa_def(instr, rewrite_ssa_def, state); - if (instr-type == nir_instr_type_ssa_undef || - instr-type == nir_instr_type_phi) { + if (instr-type == nir_instr_type_phi) { nir_instr_remove(instr); ralloc_steal(state-dead_ctx, instr); } @@ -765,13 +775,14 @@ resolve_parallel_copies_block(nir_block *block, void *void_state) } static void -nir_convert_from_ssa_impl(nir_function_impl *impl) +nir_convert_from_ssa_impl(nir_function_impl *impl, bool convert_everything) { struct from_ssa_state state; state.mem_ctx = ralloc_parent(impl); state.dead_ctx = ralloc_context(NULL); state.impl = impl; + state.convert_everything = convert_everything; state.merge_node_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer, _mesa_key_pointer_equal); @@ -801,10 +812,10 @@
Re: [Mesa-dev] [PATCH 4/4] nir: remove parent_instr from nir_register
Yes, please! It was nice at the time, but it was always a hack. Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote: It's no longer used Signed-off-by: Connor Abbott cwabbo...@gmail.com --- src/glsl/nir/nir.c | 1 - src/glsl/nir/nir.h | 8 src/glsl/nir/nir_from_ssa.c | 8 3 files changed, 17 deletions(-) diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c index f03e80a..f661249 100644 --- a/src/glsl/nir/nir.c +++ b/src/glsl/nir/nir.c @@ -57,7 +57,6 @@ reg_create(void *mem_ctx, struct exec_list *list) { nir_register *reg = ralloc(mem_ctx, nir_register); - reg-parent_instr = NULL; list_inithead(reg-uses); list_inithead(reg-defs); list_inithead(reg-if_uses); diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index b33c9c5..e818acc 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -389,14 +389,6 @@ typedef struct { */ bool is_packed; - /** -* If this pointer is non-NULL then this register has exactly one -* definition and that definition dominates all of its uses. This is -* set by the out-of-SSA pass so that backends can get SSA-like -* information even once they have gone out of SSA. -*/ - struct nir_instr *parent_instr; - /** set of nir_instr's where this register is used (read from) */ struct list_head uses; diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c index 966c2fe..57bbdde 100644 --- a/src/glsl/nir/nir_from_ssa.c +++ b/src/glsl/nir/nir_from_ssa.c @@ -496,14 +496,6 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) reg-name = def-name; reg-num_components = def-num_components; reg-num_array_elems = 0; - - /* This register comes from an SSA definition that is defined and not - * part of a phi-web. Therefore, we know it has a single unique - * definition that dominates all of its uses; we can copy the - * parent_instr from the SSA def safely. - */ - if (def-parent_instr-type != nir_instr_type_ssa_undef) - reg-parent_instr = def-parent_instr; } nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state-mem_ctx); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value
On Thu, Jun 25, 2015 at 08:40:33AM -0700, Nanley Chery wrote: On Thu, Jun 25, 2015 at 12:37 AM, Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote: On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery nanleych...@gmail.com wrote: From: Nanley Chery nanley.g.ch...@intel.com Although the horizontal and vertical alignment fields are ignored here, 0 is a reserved value for them and may cause undefined behavior. Change the default value to an abitrary valid one. Signed-off-by: Nanley Chery nanley.g.ch...@intel.com --- src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index b2d1a57..22ae960 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw, if (brw-gen 8 (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE || surf_type == BRW_SURFACE_1D)) - return 0; + return GEN8_SURFACE_VALIGN_4; switch (mt-align_h) { case 4: @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw, if (brw-gen 8 (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE || gen9_use_linear_1d_layout(brw, mt))) - return 0; + return GEN8_SURFACE_HALIGN_4; switch (mt-align_w) { case 4: -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Good find Nanley. We had no known issues with value 0 but it's always nice to avoid undefined behavior :). Right, I thought about this when I reviewed the original. The spec says it is ignored in these cases and hence the reserved value seemed fine. Now that we put something meaningful there, somebody is going to compare the spec and wonder why we set it to 4. If we added also a comment here that says this is just an arbitrary (non-reserved) value and really ignored by the hardware, it would prevent misunderstandings. What do you guys think? There's enough space to insert Set to an arbitrary non-reserved value. in both of the comments preceding the conditional without adding an extra line. I wouldn't mind including it. Sounds great, thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91111] [BYT regression] dEQP-GLES3.functional.shaders.texture_functions.texturelod.usamplercube_*
https://bugs.freedesktop.org/show_bug.cgi?id=9 Bug ID: 9 Summary: [BYT regression] dEQP-GLES3.functional.shaders.texture_functions.textur elod.usamplercube_* Product: Mesa Version: 10.6 Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: brian.wil...@intel.com QA Contact: mesa-dev@lists.freedesktop.org Tested Mesa commit: c2a0600d5b0645533ba442b5ab879b23c2564a4d: i965: Don't set NirOptions for stages that will use the vec4 backend. Plus patches: - 10.6-i965-do-not-round-line-width-when-multisampling-or-a.patch - 10.6-mesa-add-GL_RED-GL_RG-support-for-floating-point-tex.patch - 10.6-i965-Momentarily-pretend-to-support-ARB_texture_sten.patch - 10.6-Revert-i965-Advertise-a-line-width-of-40.0-on-Cherry.patch Testing a Mesa update on ChromeOS, the following dEQP tests fail on BYT: - dEQP-GLES3.functional.shaders.texture_functions.texturelod.usamplercube_vertex - dEQP-GLES3.functional.shaders.texture_functions.texturelod.usamplercube_fragment Sample failure: Image comparison failed: difference = 450.979, threshold = 0.05 These tests pass on HSW, BDW and BSW, and passed on BYT with an older Mesa commit (November 18, 2014: 129178893b2260df22db96327c5ca9c2ce7db046 glsl: Generate unique names for each const array lowered to uniforms + ~10 patches) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] i965/fs: use SSA values directly
On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote: Before, we would use registers, but set a magical parent_instr field to indicate that it was actually purely an SSA value (i.e., it wasn't involved in any phi nodes). Instead, just use SSA values directly, which lets us get rid of the hack and reduces memory usage since we're not allocating a nir_register for every value. It also makes our handling of load_const more consistent compared to the other instructions. Signed-off-by: Connor Abbott cwabbo...@gmail.com --- src/mesa/drivers/dri/i965/brw_fs.h | 5 ++ src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 73 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 1 + src/mesa/drivers/dri/i965/brw_nir.c| 2 +- .../dri/i965/brw_nir_analyze_boolean_resolves.c| 12 ++-- 5 files changed, 59 insertions(+), 34 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 243baf6..a6fee35 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -249,6 +249,10 @@ public: void nir_emit_block(nir_block *block); void nir_emit_instr(nir_instr *instr); void nir_emit_alu(const brw::fs_builder bld, nir_alu_instr *instr); + void nir_emit_load_const(const brw::fs_builder bld, +nir_load_const_instr *instr); + void nir_emit_undef(const brw::fs_builder bld, + nir_ssa_undef_instr *instr); void nir_emit_intrinsic(const brw::fs_builder bld, nir_intrinsic_instr *instr); void nir_emit_texture(const brw::fs_builder bld, @@ -345,6 +349,7 @@ public: unsigned max_grf; fs_reg *nir_locals; + fs_reg *nir_ssa_values; fs_reg *nir_globals; fs_reg nir_inputs; fs_reg nir_outputs; diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 59081ea..58896d7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -366,6 +366,9 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl) nir_locals[reg-index] = bld.vgrf(BRW_REGISTER_TYPE_F, size); } + nir_ssa_values = reralloc(mem_ctx, nir_ssa_values, fs_reg, + impl-ssa_alloc); + nir_emit_cf_list(impl-body); } @@ -459,9 +462,11 @@ fs_visitor::nir_emit_instr(nir_instr *instr) break; case nir_instr_type_load_const: - /* We can hit these, but we do nothing now and use them as - * immediates later. - */ + nir_emit_load_const(abld, nir_instr_as_load_const(instr)); + break; + + case nir_instr_type_ssa_undef: + nir_emit_undef(abld, nir_instr_as_ssa_undef(instr)); break; case nir_instr_type_jump: @@ -495,17 +500,12 @@ bool fs_visitor::optimize_frontfacing_ternary(nir_alu_instr *instr, const fs_reg result) { - if (instr-src[0].src.is_ssa || - !instr-src[0].src.reg.reg || - !instr-src[0].src.reg.reg-parent_instr) - return false; - - if (instr-src[0].src.reg.reg-parent_instr-type != - nir_instr_type_intrinsic) + if (!instr-src[0].src.is_ssa || + instr-src[0].src.ssa-parent_instr-type != nir_instr_type_intrinsic) return false; nir_intrinsic_instr *src0 = - nir_instr_as_intrinsic(instr-src[0].src.reg.reg-parent_instr); + nir_instr_as_intrinsic(instr-src[0].src.ssa-parent_instr); if (src0-intrinsic != nir_intrinsic_load_front_face) return false; @@ -1146,6 +1146,25 @@ fs_visitor::nir_emit_alu(const fs_builder bld, nir_alu_instr *instr) } } +void +fs_visitor::nir_emit_load_const(const fs_builder bld, +nir_load_const_instr *instr) +{ + fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_D, instr-def.num_components); + + for (unsigned i = 0; i instr-def.num_components; i++) + bld.MOV(offset(reg, i), fs_reg(instr-value.i[i])); + + nir_ssa_values[instr-def.index] = reg; +} + +void +fs_visitor::nir_emit_undef(const fs_builder bld, nir_ssa_undef_instr *instr) +{ + nir_ssa_values[instr-def.index] = bld.vgrf(BRW_REGISTER_TYPE_D, + instr-def.num_components); +} + static fs_reg fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg, unsigned base_offset, nir_src *indirect) @@ -1171,30 +1190,30 @@ fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg, fs_reg fs_visitor::get_nir_src(nir_src src) { + fs_reg reg; if (src.is_ssa) { - assert(src.ssa-parent_instr-type == nir_instr_type_load_const); - nir_load_const_instr *load = nir_instr_as_load_const(src.ssa-parent_instr); - fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_D, src.ssa-num_components); - - for (unsigned i = 0; i
Re: [Mesa-dev] [PATCH 0/4] i965: use SSA values when we can
And, you got some shader-db stats: total instructions in shared programs: 6078991 - 6073118 (-0.10%) instructions in affected programs: 402221 - 396348 (-1.46%) helped:1527 HURT: 0 GAINED:8 LOST: 2 I'm not sure which commit it was that helped. I'm guessing one of our on-the-fly peepholes started working better. On Thu, Jun 25, 2015 at 12:29 PM, Connor Abbott cwabbo...@gmail.com wrote: Before, we were using a hack where when we converted out of SSA, we set a parent_instr field of the nir_register to indicate that the register was actually an SSA value. But in the future, we want to handle SSA values directly, and right now we're creating an extra nir_register for everything, even if it's not involved in a phi node. This series removes that hack for i965 and gets us using SSA values directly in most cases. The only other user of nir_convert_from_ssa() is vc4, which I believed I changed correctly, and it doesn't seem to use nir_register::parent_instr, based on my grepping. I tried to compile-test it, but it assumed I was using the simulator and died, so it would be nice to at least compile-test it. The changes are also available at: git://people.freedesktop.org/~cwabbott0/mesa i965-use-ssa Connor Abbott (4): nir/from_ssa: add a flag to not convert everything to SSA i965/fs: use SSA values directly nir: remove nir_src_get_parent_instr() nir: remove parent_instr from nir_register src/gallium/drivers/vc4/vc4_program.c | 2 +- src/glsl/nir/nir.c | 1 - src/glsl/nir/nir.h | 25 ++-- src/glsl/nir/nir_from_ssa.c| 33 +- src/mesa/drivers/dri/i965/brw_fs.h | 5 ++ src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 73 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 1 + src/mesa/drivers/dri/i965/brw_nir.c| 2 +- .../dri/i965/brw_nir_analyze_boolean_resolves.c| 12 ++-- 9 files changed, 84 insertions(+), 70 deletions(-) -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 5/7] util: port _mesa_strto[df] to C
_mesa_strtod and _mesa_strtof are only used from the GLSL compiler and the ARB_[vertex|fragment]_program code, meaning that the locale doesn't need to be initialized before the first OpenGL context gets initialized. So let's use explicit initialization from the one-time init code instead of depending on a C++ compiler to initialize at image-load time. Signed-off-by: Erik Faye-Lund kusmab...@gmail.com Reviewed-by: Matt Turner matts...@gmail.com --- src/glsl/glcpp/glcpp.c| 3 ++ src/glsl/main.cpp | 3 ++ src/mesa/main/context.c | 3 ++ src/util/Makefile.sources | 2 +- src/util/strtod.c | 78 +++ src/util/strtod.cpp | 75 - src/util/strtod.h | 3 ++ 7 files changed, 91 insertions(+), 76 deletions(-) create mode 100644 src/util/strtod.c delete mode 100644 src/util/strtod.cpp diff --git a/src/glsl/glcpp/glcpp.c b/src/glsl/glcpp/glcpp.c index 5144516..c62f4ef 100644 --- a/src/glsl/glcpp/glcpp.c +++ b/src/glsl/glcpp/glcpp.c @@ -29,6 +29,7 @@ #include glcpp.h #include main/mtypes.h #include main/shaderobj.h +#include util/strtod.h extern int glcpp_parser_debug; @@ -168,6 +169,8 @@ main (int argc, char *argv[]) if (shader == NULL) return 1; + _mesa_locale_init(); + ret = glcpp_preprocess(ctx, shader, info_log, NULL, gl_ctx); printf(%s, shader); diff --git a/src/glsl/main.cpp b/src/glsl/main.cpp index 2341298..58651df 100644 --- a/src/glsl/main.cpp +++ b/src/glsl/main.cpp @@ -38,6 +38,7 @@ #include program/hash_table.h #include loop_analysis.h #include standalone_scaffolding.h +#include util/strtod.h static int glsl_version = 330; @@ -46,6 +47,8 @@ initialize_context(struct gl_context *ctx, gl_api api) { initialize_context_to_defaults(ctx, api); + _mesa_locale_init(); + /* The standalone compiler needs to claim support for almost * everything in order to compile the built-in functions. */ diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c index c4af8ea..e68de68 100644 --- a/src/mesa/main/context.c +++ b/src/mesa/main/context.c @@ -120,6 +120,7 @@ #include shaderobj.h #include shaderimage.h #include util/simple_list.h +#include util/strtod.h #include state.h #include stencil.h #include texcompress_s3tc.h @@ -374,6 +375,8 @@ one_time_init( struct gl_context *ctx ) assert( sizeof(GLint) == 4 ); assert( sizeof(GLuint) == 4 ); + _mesa_locale_init(); + _mesa_one_time_init_extension_overrides(); _mesa_get_cpu_features(); diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources index dc55939..82df3bc 100644 --- a/src/util/Makefile.sources +++ b/src/util/Makefile.sources @@ -19,7 +19,7 @@ MESA_UTIL_FILES :=\ set.c \ set.h \ simple_list.h \ - strtod.cpp \ + strtod.c \ strtod.h \ texcompress_rgtc_tmp.h \ u_atomic.h diff --git a/src/util/strtod.c b/src/util/strtod.c new file mode 100644 index 000..e5e6f76 --- /dev/null +++ b/src/util/strtod.c @@ -0,0 +1,78 @@ +/* + * Copyright 2010 VMware, Inc. + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * Software), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. + * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ + + +#include stdlib.h + +#ifdef _GNU_SOURCE +#include locale.h +#ifdef HAVE_XLOCALE_H +#include xlocale.h +static locale_t loc; +static int initialized; +#endif +#endif + +#include strtod.h + + +void +_mesa_locale_init(void) +{ +#if defined(_GNU_SOURCE) defined(HAVE_XLOCALE_H) + loc = newlocale(LC_CTYPE_MASK, C, NULL); +#endif +} + +/** + * Wrapper around strtod which uses the C locale so the decimal + * point is always '.' + */ +double +_mesa_strtod(const char *s, char **end) +{ +#if defined(_GNU_SOURCE) defined(HAVE_XLOCALE_H) + return strtod_l(s, end, loc); +#else + return strtod(s, end); +#endif +} + + +/** + * Wrapper around strtof which uses
[Mesa-dev] [PATCH v2 6/7] mesa/main: free locale at exit
In order to save a small leak if mesa is continously loaded and unloaded, let's free the locale when the shared object is unloaded. Signed-off-by: Erik Faye-Lund kusmab...@gmail.com --- src/mesa/main/context.c | 12 +++- src/util/strtod.c | 8 src/util/strtod.h | 3 +++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c index e68de68..dee1fa8 100644 --- a/src/mesa/main/context.c +++ b/src/mesa/main/context.c @@ -346,6 +346,16 @@ _mesa_destroy_visual( struct gl_config *vis ) mtx_t OneTimeLock = _MTX_INITIALIZER_NP; +/** + * Calls all the various one-time-fini functions in Mesa + */ + +static void +one_time_fini() +{ + _mesa_destroy_shader_compiler(); + _mesa_locale_fini(); +} /** * Calls all the various one-time-init functions in Mesa. @@ -385,7 +395,7 @@ one_time_init( struct gl_context *ctx ) _mesa_ubyte_to_float_color_tab[i] = (float) i / 255.0F; } - atexit(_mesa_destroy_shader_compiler); + atexit(one_time_fini); #if defined(DEBUG) defined(__DATE__) defined(__TIME__) if (MESA_VERBOSE != 0) { diff --git a/src/util/strtod.c b/src/util/strtod.c index e5e6f76..5c36b05 100644 --- a/src/util/strtod.c +++ b/src/util/strtod.c @@ -46,6 +46,14 @@ _mesa_locale_init(void) #endif } +void +_mesa_locale_fini(void) +{ +#if defined(_GNU_SOURCE) defined(HAVE_XLOCALE_H) + freelocale(loc); +#endif +} + /** * Wrapper around strtod which uses the C locale so the decimal * point is always '.' diff --git a/src/util/strtod.h b/src/util/strtod.h index b7e2beb..60e15cf 100644 --- a/src/util/strtod.h +++ b/src/util/strtod.h @@ -34,6 +34,9 @@ extern C { extern void _mesa_locale_init(void); +extern void +_mesa_locale_fini(void); + extern double _mesa_strtod(const char *s, char **end); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 7/7] util: assert to verify that locale is initialized
Add an assert to Verify that the locale has been initialized when we call strtod. This might help some developers sleep better at night. Signed-off-by: Erik Faye-Lund kusmab...@gmail.com --- src/util/strtod.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/util/strtod.c b/src/util/strtod.c index 5c36b05..7a73041 100644 --- a/src/util/strtod.c +++ b/src/util/strtod.c @@ -25,6 +25,7 @@ #include stdlib.h +#include assert.h #ifdef _GNU_SOURCE #include locale.h @@ -43,6 +44,7 @@ _mesa_locale_init(void) { #if defined(_GNU_SOURCE) defined(HAVE_XLOCALE_H) loc = newlocale(LC_CTYPE_MASK, C, NULL); + initialized = 1; #endif } @@ -51,6 +53,7 @@ _mesa_locale_fini(void) { #if defined(_GNU_SOURCE) defined(HAVE_XLOCALE_H) freelocale(loc); + initialized = 0; #endif } @@ -62,6 +65,7 @@ double _mesa_strtod(const char *s, char **end) { #if defined(_GNU_SOURCE) defined(HAVE_XLOCALE_H) + assert(initialized); return strtod_l(s, end, loc); #else return strtod(s, end); @@ -77,6 +81,7 @@ float _mesa_strtof(const char *s, char **end) { #if defined(_GNU_SOURCE) defined(HAVE_XLOCALE_H) + assert(initialized); return strtof_l(s, end, loc); #elif defined(HAVE_STRTOF) return strtof(s, end); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 4/7] glsl: No need to lock in _mesa_glsl_release_types
This function only gets called while mesa is unloading, so there's no potential of racing or multiple calls at the same time. So let's just get rid of the locking. Signed-off-by: Erik Faye-Lund kusmab...@gmail.com --- src/glsl/glsl_types.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index f675e90..dbe2382 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -324,8 +324,10 @@ const glsl_type *glsl_type::get_scalar_type() const void _mesa_glsl_release_types(void) { - mtx_lock(glsl_type::mutex); - + /* should only be called during atexit (either when unloading shared +* object, or if process terminates), so no mutex-locking should be +* nessecary. +*/ if (glsl_type::array_types != NULL) { hash_table_dtor(glsl_type::array_types); glsl_type::array_types = NULL; @@ -335,8 +337,6 @@ _mesa_glsl_release_types(void) hash_table_dtor(glsl_type::record_types); glsl_type::record_types = NULL; } - - mtx_unlock(glsl_type::mutex); } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/7] mesa/main: Get rid of outdated GDB-hack
All of these enums are now in use around in the code, so there's no need to explicitly use them here any more. Signed-off-by: Erik Faye-Lund kusmab...@gmail.com --- src/mesa/main/context.c | 27 --- 1 file changed, 27 deletions(-) diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c index 79fa018..265f98a 100644 --- a/src/mesa/main/context.c +++ b/src/mesa/main/context.c @@ -338,31 +338,6 @@ _mesa_destroy_visual( struct gl_config *vis ) /** - * This is lame. gdb only seems to recognize enum types that are - * actually used somewhere. We want to be able to print/use enum - * values such as TEXTURE_2D_INDEX in gdb. But we don't actually use - * the gl_texture_index type anywhere. Thus, this lame function. - */ -static void -dummy_enum_func(void) -{ - gl_buffer_index bi = BUFFER_FRONT_LEFT; - gl_face_index fi = FACE_POS_X; - gl_frag_result fr = FRAG_RESULT_DEPTH; - gl_texture_index ti = TEXTURE_2D_ARRAY_INDEX; - gl_vert_attrib va = VERT_ATTRIB_POS; - gl_varying_slot vs = VARYING_SLOT_POS; - - (void) bi; - (void) fi; - (void) fr; - (void) ti; - (void) va; - (void) vs; -} - - -/** * One-time initialization mutex lock. * * \sa Used by one_time_init(). @@ -434,8 +409,6 @@ one_time_init( struct gl_context *ctx ) * #ifdef tests here. */ atexit(_mesa_destroy_shader_compiler); - - dummy_enum_func(); } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/7] dri: don't touch the shader compiler
This function is for deleting per-screen resources, and the shader compiler resources are not of such nature. Besides, dri shouldn't need to even know about the presence of a shader compiler. These resources will already be released when mesa gets unloaded, and that should be sufficient. Signed-off-by: Erik Faye-Lund kusmab...@gmail.com --- src/mesa/drivers/dri/common/dri_util.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/mesa/drivers/dri/common/dri_util.c b/src/mesa/drivers/dri/common/dri_util.c index e7ababe..ae4592c 100644 --- a/src/mesa/drivers/dri/common/dri_util.c +++ b/src/mesa/drivers/dri/common/dri_util.c @@ -46,7 +46,6 @@ #include dri_util.h #include utils.h #include xmlpool.h -#include ../glsl/glsl_parser_extras.h #include main/mtypes.h #include main/version.h #include main/errors.h @@ -238,8 +237,6 @@ static void driDestroyScreen(__DRIscreen *psp) * stream open to the X-server anymore. */ - _mesa_destroy_shader_compiler(); - psp-driver-DestroyScreen(psp); driDestroyOptionCache(psp-optionCache); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/7] mesa/main: only call _mesa_destroy_shader_compiler once on exit
There's no point in calling _mesa_destroy_shader_compiler multiple times on exit; the resources will only be released once anyway. So let's move the atexit-call into the part that is only called once. Signed-off-by: Erik Faye-Lund kusmab...@gmail.com --- src/mesa/main/context.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c index 265f98a..c4af8ea 100644 --- a/src/mesa/main/context.c +++ b/src/mesa/main/context.c @@ -382,6 +382,8 @@ one_time_init( struct gl_context *ctx ) _mesa_ubyte_to_float_color_tab[i] = (float) i / 255.0F; } + atexit(_mesa_destroy_shader_compiler); + #if defined(DEBUG) defined(__DATE__) defined(__TIME__) if (MESA_VERBOSE != 0) { _mesa_debug(ctx, Mesa %s DEBUG build %s %s\n, @@ -404,11 +406,6 @@ one_time_init( struct gl_context *ctx ) api_init_mask |= 1 ctx-API; mtx_unlock(OneTimeLock); - - /* Hopefully atexit() is widely available. If not, we may need some -* #ifdef tests here. -*/ - atexit(_mesa_destroy_shader_compiler); } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 0/7] port _mesa_strto[df] to C
Back in March[1], I sent a patch porting _mesa_strto[df] to C rather than C++. I fixed up the patch according to the criticism, but unfortunately I dropped the ball before I sent out the result. So here I am, picking it back up! This time I've taken a deeper dive into the whole init/deinit of Mesa, and cleaned up a bunch of stuff in that area. And as a result, this time we end up freeing the locale also. No Piglit regressions observed. [1]: 1426446329-23984-1-git-send-email-kusmab...@gmail.com Erik Faye-Lund (7): mesa/main: Get rid of outdated GDB-hack dri: don't touch the shader compiler mesa/main: only call _mesa_destroy_shader_compiler once on exit glsl: No need to lock in _mesa_glsl_release_types util: port _mesa_strto[df] to C mesa/main: free locale at exit util: assert to verify that locale is initialized src/glsl/glcpp/glcpp.c | 3 ++ src/glsl/glsl_types.cpp| 8 +-- src/glsl/main.cpp | 3 ++ src/mesa/drivers/dri/common/dri_util.c | 3 -- src/mesa/main/context.c| 47 ++ src/util/Makefile.sources | 2 +- src/util/strtod.c | 91 ++ src/util/strtod.cpp| 75 src/util/strtod.h | 6 +++ 9 files changed, 123 insertions(+), 115 deletions(-) create mode 100644 src/util/strtod.c delete mode 100644 src/util/strtod.cpp -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add support for viewport array (v2)
Wait a moment, how did it fail with si_shader_io_get_unique_index? The function shouldn't be called for ES with the viewport index, because ES can't pass the output to GS. If it was called, ignoring the viewport index in si_llvm_emit_es_epilogue should fix it. Marek On Thu, Jun 25, 2015 at 10:29 PM, Marek Olšák mar...@gmail.com wrote: In that case, feel free to push. Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Thu, Jun 25, 2015 at 10:25 PM, Dave Airlie airl...@gmail.com wrote: On 26 June 2015 at 00:26, Marek Olšák mar...@gmail.com wrote: Hi Dave, The change in si_shader_io_get_unique_index can be dropped. The function is only used for shaders before GS. Ok okay I was hitting the assert in there for the layer/viewport index cases, but if the patch you pushed to master helps I'll drop it. This looks good, but I've had a different plan for this feature: Yeah I thought you might, I just wanted to hack something up to see it working since it seems new Dota 2 uses this feature. I'd like the states to be converted into 2 atoms: 1 r600_atom for all 16 viewports 1 r600_atom for all 16 scissors Each atom should have a bitmask saying which slots are dirty. (the same idea as resource slots) The emit functions should only emit dirty viewports/scissors. Also, the emit functions shouldn't emit non-zero viewports/scissors if the viewport index isn't written by the hardware VS stage (si_get_vs_info(sctx)-...). This should keep the same level of effectiveness as before. When a shader that writes the viewport index is bound *and* there are any dirty viewports or scissors, that's the right time to mark the atoms as dirty again, so that non-zero dirty viewports/scissors are finally emitted. I'm not sure if I'll get to doing more of it, I just had a day at home, and wanted to get up to speed on radeonsi a bit, so I can just leave that here, in case anyone wants to pick it up, Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91112] [HSW/BYT] dEQP-GLES3.functional.uniform_api.random.23 test timeout
https://bugs.freedesktop.org/show_bug.cgi?id=91112 Bug ID: 91112 Summary: [HSW/BYT] dEQP-GLES3.functional.uniform_api.random.23 test timeout Product: Mesa Version: 10.6 Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: brian.wil...@intel.com QA Contact: mesa-dev@lists.freedesktop.org This isn't an outright test failure per-se, but dEQP is marking the test timeout as such. Timeout is on: dEQP-GLES3.functional.uniform_api.random.23 Mesa commit being tested: c2a0600d5b0645533ba442b5ab879b23c2564a4d: i965: Don't set NirOptions for stages that will use the vec4 backend. Plus patches: - 10.6-i965-do-not-round-line-width-when-multisampling-or-a.patch - 10.6-mesa-add-GL_RED-GL_RG-support-for-floating-point-tex.patch - 10.6-i965-Momentarily-pretend-to-support-ARB_texture_sten.patch - 10.6-Revert-i965-Advertise-a-line-width-of-40.0-on-Cherry.patch This test passes on BDW and BSW, and passed on HSW/BYT with an older Mesa commit (November 18, 2014: 129178893b2260df22db96327c5ca9c2ce7db046 glsl: Generate unique names for each const array lowered to uniforms + ~10 patches) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add support for geometry shader invocations.
On Thu, Jun 25, 2015 at 4:29 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Thu, Jun 25, 2015 at 4:26 PM, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/radeonsi/si_shader.c| 5 + src/gallium/drivers/radeonsi/si_shader.h| 1 + src/gallium/drivers/radeonsi/si_state.c | 1 - src/gallium/drivers/radeonsi/si_state_shaders.c | 7 +++ 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 87608a1..665ce83 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -630,6 +630,11 @@ static void declare_system_value( SI_PARAM_BASE_VERTEX); break; + case TGSI_SEMANTIC_INVOCATIONID: + value = LLVMGetParam(radeon_bld-main_fn, +SI_PARAM_GS_INSTANCE_ID); + break; + case TGSI_SEMANTIC_SAMPLEID: value = get_sample_id(radeon_bld); break; diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index 51055af..b4339ae 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -115,6 +115,7 @@ struct si_shader_selector { unsignedgs_output_prim; unsignedgs_max_out_vertices; + unsignedgs_num_invocations; uint64_tgs_used_inputs; /* mask of get_unique_index bits */ }; diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 752467b..0dd08a2 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -3078,7 +3078,6 @@ void si_init_config(struct si_context *sctx) si_pm4_set_reg(pm4, R_028B60_VGT_GS_VERT_ITEMSIZE_1, 0); si_pm4_set_reg(pm4, R_028B64_VGT_GS_VERT_ITEMSIZE_2, 0); si_pm4_set_reg(pm4, R_028B68_VGT_GS_VERT_ITEMSIZE_3, 0); - si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, 0); si_pm4_set_reg(pm4, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0); si_pm4_set_reg(pm4, R_028AB4_VGT_REUSE_OFF, 0); diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 48128fa..eef3baa 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -76,6 +76,7 @@ static void si_shader_gs(struct si_shader *shader) unsigned gs_vert_itemsize = shader-selector-info.num_outputs * (16 2); unsigned gs_max_vert_out = shader-selector-gs_max_out_vertices; unsigned gsvs_itemsize = gs_vert_itemsize * gs_max_vert_out; + unsigned gs_num_invocations = shader-selector-gs_num_invocations; unsigned cut_mode; struct si_pm4_state *pm4; unsigned num_sgprs, num_user_sgprs; @@ -118,6 +119,10 @@ static void si_shader_gs(struct si_shader *shader) si_pm4_set_reg(pm4, R_028B5C_VGT_GS_VERT_ITEMSIZE, gs_vert_itemsize); + si_pm4_set_reg(pm4, R_028B90_VGT_GS_INSTANCE_CNT, + S_028B90_CNT(MIN2(gs_num_invocations, 127)) | + S_028B90_ENABLE(gs_num_invocations 0)); Without knowing how radeonsi works, just want to point out that previously GS_INSTANCE_CNT was being always set to 0, even though a single invocation was still being run. Perhaps this really needs to be gs_num_invocations_minus_one? GS_INVOCATIONS will always be at least 1. OK, I looked at the docs. Looks like the enable is for instance id generation, and if invocations == 0, that's also treated as a disable. This may end up generating the instanceid unnecessarily, but that's probably OK. I might try changing that enable to be 1, but probably not a big deal either way. -ilia + va = shader-bo-gpu_address; si_pm4_add_bo(pm4, shader-bo, RADEON_USAGE_READ, RADEON_PRIO_SHADER_DATA); si_pm4_set_reg(pm4, R_00B220_SPI_SHADER_PGM_LO_GS, va 8); @@ -490,6 +495,8 @@ static void *si_create_shader_state(struct pipe_context *ctx, sel-info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM]; sel-gs_max_out_vertices = sel-info.properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES]; + sel-gs_num_invocations = + sel-info.properties[TGSI_PROPERTY_GS_INVOCATIONS]; for (i = 0; i sel-info.num_inputs; i++) { unsigned name = sel-info.input_semantic_name[i]; -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91106] glx: make check fails to build on osx
https://bugs.freedesktop.org/show_bug.cgi?id=91106 Julien Isorce julien.iso...@gmail.com changed: What|Removed |Added CC||bri...@vmware.com, ||emil.l.veli...@gmail.com, ||jerem...@freedesktop.org, ||jon.tur...@dronecode.org.uk ||, k...@bitplanet.net, ||matts...@gmail.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH mesa] i965/gen8+: bo in state base address must be in 32-bit address range
On 6/24/2015 4:51 AM, Ben Widawsky wrote: Hi. Feel free to Cc me on patches of this nature. I am far behind on mesa-dev, and no longer read intel-gfx. I'm probably one of the sensible people to look at this... On Tue, Jun 23, 2015 at 01:21:27PM +0100, Michel Thierry wrote: Gen8+ supports 48-bit virtual addresses, but some objects must always be allocated inside the 32-bit address range. In specific, any resource used with flat/heapless (0x-0xf000) General State Heap (GSH) or Intruction State Heap (ISH) must be in a 32-bit range, because the General State Offset and Instruction State Offset are limited to 32-bits. I don't think GSH, or ISH are well known terms that have every appeared anywhere. I'd just keep the bit after the final comma (...because ...) Set provided bo flag when the 4GB limit is not necessary, to be able to use the full address space. I'm glad you got around to this. We'd been putting it off for a long time. Cc: mesa-dev@lists.freedesktop.org Signed-off-by: Michel Thierry michel.thie...@intel.com --- src/mesa/drivers/dri/i965/gen8_misc_state.c | 6 +++--- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 7 +++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_misc_state.c b/src/mesa/drivers/dri/i965/gen8_misc_state.c index b20038e..26531d0 100644 --- a/src/mesa/drivers/dri/i965/gen8_misc_state.c +++ b/src/mesa/drivers/dri/i965/gen8_misc_state.c @@ -41,17 +41,17 @@ void gen8_upload_state_base_address(struct brw_context *brw) OUT_BATCH(0); OUT_BATCH(mocs_wb 16); /* Surface state base address: */ - OUT_RELOC64(brw-batch.bo, I915_GEM_DOMAIN_SAMPLER, 0, + OUT_RELOC64_32BWA(brw-batch.bo, I915_GEM_DOMAIN_SAMPLER, 0, mocs_wb 4 | 1); /* Dynamic state base address: */ - OUT_RELOC64(brw-batch.bo, + OUT_RELOC64_32BWA(brw-batch.bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION, 0, mocs_wb 4 | 1); /* Indirect object base address: MEDIA_OBJECT data */ OUT_BATCH(mocs_wb 4 | 1); OUT_BATCH(0); /* Instruction base address: shader kernels (incl. SIP) */ - OUT_RELOC64(brw-cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0, + OUT_RELOC64_32BWA(brw-cache.bo, I915_GEM_DOMAIN_INSTRUCTION, 0, mocs_wb 4 | 1); /* General state buffer size */ diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h index 7bdd836..5aa741e 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h @@ -177,6 +177,13 @@ intel_batchbuffer_advance(struct brw_context *brw) /* Handle 48-bit address relocations for Gen8+ */ #define OUT_RELOC64(buf, read_domains, write_domain, delta) do { \ + drm_intel_bo_set_supports_48baddress(buf); \ + intel_batchbuffer_emit_reloc64(brw, buf, read_domains, write_domain, delta);\ +} while (0) + +/* Handle 48-bit address relocations for Gen8+, ask for 32-bit address */ +#define OUT_RELOC64_32BWA(buf, read_domains, write_domain, delta) do { \ + drm_intel_bo_clear_supports_48baddress(buf); \ intel_batchbuffer_emit_reloc64(brw, buf, read_domains, write_domain, delta); \ } while (0) First and least bikesheddy, you need to bump the required libdrm in the configure.ac to support this new libdrm function (maybe you did, but I don't see it on mesa-dev). I didn't bump the libdrm version either. I'll make sure to include this in the next version. More bikesheddy, and forgive me here because I haven't looked at any of the kernel interfaces or libdrm patches (you can Cc those to mesa-dev if they're relevant fwiw). Presumably at the end of the day it's drm_intel_bo_emit_reloc which needs to know about these limitations. Unfortunately we don't have a flags field there. The implementation here seems like a somewhat cumbersome workaround for that (it looks like the context execbuf which is pretty crappy - yes, I know who the author was). Have you already discussed adding a new emit_reloc? I suppose if people are opposed to a new emit reloc, the only I'd like to see different is have the functions which need the workaround just call OUT_RELOC, instead of OUT_RELOC64 (put a comment in the call sites), and make OUT_RELOC call the drm_intel_bo_clear_supports_48baddress() (which is obviously a nop on pre-gen8 platforms). The OUT_RELOC64 case should be left alone - we shouldn't need to tell libdrm that I want a 64bit relocation, and it can actually be 64b... Right, at the end I was only looking to flag the exec_objects that can be safely outside the first 4GBs (it was decided to be an opt-in to not break any other user). I can change it like you suggest, OUT_RELOC will mean that we need the wa, while OUT_RELOC64 means we don't. If you want to take a look, these are the libdrm and kernel changes: libdrm:
Re: [Mesa-dev] [Mesa-stable] [PATCH 2/6] clover: Call clBuildProgram() notification function when build completes v2
On 3 June 2015 at 14:27, Francisco Jerez curroje...@riseup.net wrote: Emil Velikov emil.l.veli...@gmail.com writes: Hi Tom, On 31 March 2015 at 15:29, Francisco Jerez curroje...@riseup.net wrote: Tom Stellard thomas.stell...@amd.com writes: v2: - Only call notification for build errors - Fix clCompileProgram() Cc: 10.5 10.4 mesa-sta...@lists.freedesktop.org --- src/gallium/state_trackers/clover/api/program.cpp | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/gallium/state_trackers/clover/api/program.cpp b/src/gallium/state_trackers/clover/api/program.cpp index 60184ed..5cd543c 100644 --- a/src/gallium/state_trackers/clover/api/program.cpp +++ b/src/gallium/state_trackers/clover/api/program.cpp @@ -180,12 +180,18 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs, validate_build_program_common(prog, num_devs, d_devs, pfn_notify, user_data); prog.build(devs, opts); + if (pfn_notify) + pfn_notify(d_prog, user_data); Maybe leave blank lines around the conditional where preceded/followed by another statement? return CL_SUCCESS; +} catch (const build_error e) { + if (pfn_notify) + pfn_notify(d_prog, user_data); + if (e.get() == CL_COMPILE_PROGRAM_FAILURE) + return CL_BUILD_PROGRAM_FAILURE; + return e.get(); You can just return CL_BUILD_PROGRAM_FAILURE here unconditionally. } catch (error e) { if (e.get() == CL_INVALID_COMPILER_OPTIONS) return CL_INVALID_BUILD_OPTIONS; - if (e.get() == CL_COMPILE_PROGRAM_FAILURE) - return CL_BUILD_PROGRAM_FAILURE; return e.get(); } @@ -223,8 +229,14 @@ clCompileProgram(cl_program d_prog, cl_uint num_devs, objsallow_empty_tag(d_header_progs, num_headers)); prog.build(devs, opts, headers); + if (pfn_notify) + pfn_notify(d_prog, user_data); Same here. Did you had the chance to address Francisco's comments ? Did this patch fell through the cracks, or should I consider it obsolete/rejected ? I think the patch was good it just fell through the cracks. Tom? Humble reminder. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH 4/4] glsl: validate sampler array indexing for 'constant-index-expression'
Hi gents, On 9 June 2015 at 14:09, Francisco Jerez curroje...@riseup.net wrote: Francisco Jerez curroje...@riseup.net writes: Tapani Pälli tapani.pa...@intel.com writes: Desktop GLSL 130 and GLSL ES 300 allow sampler array indexing where index can contain a loop induction variable. This extra check will warn during linking if some of the indexes could not be turned in to constant expressions. v2: warning instead of error for backends that did not enable UnrollSamplerArrayDynamicIndexing option (have dynamic indexing) Signed-off-by: Tapani Pälli tapani.pa...@intel.com Cc: 10.5 and 10.6 mesa-sta...@lists.freedesktop.org --- src/glsl/linker.cpp | 77 + 1 file changed, 77 insertions(+) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 9978380..27d7c18 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -346,6 +346,39 @@ private: bool uses_non_zero_stream; }; +/* Class that finds array derefs and check if indexes are dynamic. */ +class dynamic_sampler_array_indexing_visitor : public ir_hierarchical_visitor +{ +public: + dynamic_sampler_array_indexing_visitor() : + dynamic_sampler_array_indexing(false) + { + } + + ir_visitor_status visit_enter(ir_dereference_array *ir) + { + if (!ir-variable_referenced()) + return visit_continue; + + if (!ir-variable_referenced()-type-contains_sampler()) + return visit_continue; + + if (!ir-array_index-constant_expression_value()) { + dynamic_sampler_array_indexing = true; + return visit_stop; + } + return visit_continue; + } + + bool uses_dynamic_sampler_array_indexing() + { + return dynamic_sampler_array_indexing; + } + +private: + bool dynamic_sampler_array_indexing; +}; + } /* anonymous namespace */ void @@ -2736,6 +2769,40 @@ build_program_resource_list(struct gl_context *ctx, */ } +/** + * This check is done to make sure we allow only constant expression + * indexing and constant-index-expression (indexing with an expression + * that includes loop induction variable). + */ +static bool +validate_sampler_array_indexing(struct gl_context *ctx, +struct gl_shader_program *prog) +{ + dynamic_sampler_array_indexing_visitor v; + for (unsigned i = 0; i MESA_SHADER_STAGES; i++) { + if (prog-_LinkedShaders[i] == NULL) + continue; + + bool no_dynamic_indexing = + ctx-Const.ShaderCompilerOptions[i].UnrollSamplerArrayDynamicIndexing; + + /* Search for array derefs in shader. */ + v.run(prog-_LinkedShaders[i]-ir); + if (v.uses_dynamic_sampler_array_indexing()) { + const char *msg = sampler arrays indexed with non-constant + expressions is forbidden in GLSL %s %u; For the sake of clarity, maybe add that it's sampler array indexing with *general* non-constant expressions what is forbidden, loop induction variables are allowed and they are technically a kind of non-constant expression. + /* Backend has indicated that it has no dynamic indexing support. */ + if (no_dynamic_indexing) { +linker_error(prog, msg, prog-IsES ? ES : , prog-Version); +return false; + } else { +linker_warning(prog, msg, prog-IsES ? ES : , prog-Version); It seems a bit mean to spam the user with another warning at link time, you've already warned in PATCH 01 that this feature will be removed in more recent GLSL versions. If you drop the warning: Er, nevermind, the warning here is indeed subtly different (you are doing a kind of indexing not considered under the constant-index-expression wording), disregard my comment about dropping the warning, it seems fine to warn the user twice. Reviewed-by: Francisco Jerez curroje...@riseup.net If I understand things correctly the series (patches 1/4 and 4/4 from stable pov) are reviewed but are not in master. Are there any obstacles/objections against doing so ? Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value
On Thu, Jun 25, 2015 at 12:37 AM, Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote: On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery nanleych...@gmail.com wrote: From: Nanley Chery nanley.g.ch...@intel.com Although the horizontal and vertical alignment fields are ignored here, 0 is a reserved value for them and may cause undefined behavior. Change the default value to an abitrary valid one. Signed-off-by: Nanley Chery nanley.g.ch...@intel.com --- src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index b2d1a57..22ae960 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw, if (brw-gen 8 (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE || surf_type == BRW_SURFACE_1D)) - return 0; + return GEN8_SURFACE_VALIGN_4; switch (mt-align_h) { case 4: @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw, if (brw-gen 8 (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE || gen9_use_linear_1d_layout(brw, mt))) - return 0; + return GEN8_SURFACE_HALIGN_4; switch (mt-align_w) { case 4: -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Good find Nanley. We had no known issues with value 0 but it's always nice to avoid undefined behavior :). Right, I thought about this when I reviewed the original. The spec says it is ignored in these cases and hence the reserved value seemed fine. Now that we put something meaningful there, somebody is going to compare the spec and wonder why we set it to 4. If we added also a comment here that says this is just an arbitrary (non-reserved) value and really ignored by the hardware, it would prevent misunderstandings. What do you guys think? There's enough space to insert Set to an arbitrary non-reserved value. in both of the comments preceding the conditional without adding an extra line. I wouldn't mind including it. Thanks, Nanley ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add support for viewport array (v2)
Hi Dave, The change in si_shader_io_get_unique_index can be dropped. The function is only used for shaders before GS. This looks good, but I've had a different plan for this feature: I'd like the states to be converted into 2 atoms: 1 r600_atom for all 16 viewports 1 r600_atom for all 16 scissors Each atom should have a bitmask saying which slots are dirty. (the same idea as resource slots) The emit functions should only emit dirty viewports/scissors. Also, the emit functions shouldn't emit non-zero viewports/scissors if the viewport index isn't written by the hardware VS stage (si_get_vs_info(sctx)-...). This should keep the same level of effectiveness as before. When a shader that writes the viewport index is bound *and* there are any dirty viewports or scissors, that's the right time to mark the atoms as dirty again, so that non-zero dirty viewports/scissors are finally emitted. Marek On Thu, Jun 25, 2015 at 6:38 AM, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com This isn't pretty and I'd suggest it the pm4 interface builder could be tweaked to do this more efficently, but I'd need guidance on how that would look. This seems to pass the few piglit tests I threw at it. v2: handle passing layer/viewport index to fragment shader. fix crash in blit changes, add support to io_get_unique_index for layer/viewport index update docs. Signed-off-by: Dave Airlie airl...@redhat.com --- docs/GL3.txt| 4 +- docs/relnotes/10.7.0.html | 3 ++ src/gallium/drivers/radeonsi/si_blit.c | 8 +-- src/gallium/drivers/radeonsi/si_pipe.c | 2 +- src/gallium/drivers/radeonsi/si_shader.c| 26 +++--- src/gallium/drivers/radeonsi/si_state.c | 66 +++-- src/gallium/drivers/radeonsi/si_state.h | 4 +- src/gallium/drivers/radeonsi/si_state_shaders.c | 2 - 8 files changed, 73 insertions(+), 42 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 220bcc8..df913bd 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -128,7 +128,7 @@ GL 4.1, GLSL 4.10: GL_ARB_separate_shader_objects DONE (all drivers) GL_ARB_shader_precision started (Micah) GL_ARB_vertex_attrib_64bit DONE (nvc0, softpipe) - GL_ARB_viewport_arrayDONE (i965, nv50, nvc0, r600, llvmpipe) + GL_ARB_viewport_arrayDONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe) GL 4.2, GLSL 4.20: @@ -156,7 +156,7 @@ GL 4.3, GLSL 4.30: GL_ARB_copy_imageDONE (i965) (gallium - in progress, VMware) GL_KHR_debug DONE (all drivers) GL_ARB_explicit_uniform_location DONE (all drivers that support GLSL) - GL_ARB_fragment_layer_viewport DONE (nv50, nvc0, r600, llvmpipe) + GL_ARB_fragment_layer_viewport DONE (nv50, nvc0, r600, radeonsi, llvmpipe) GL_ARB_framebuffer_no_attachmentsDONE (i965) GL_ARB_internalformat_query2 not started GL_ARB_invalidate_subdataDONE (all drivers) diff --git a/docs/relnotes/10.7.0.html b/docs/relnotes/10.7.0.html index e089889..fcc5081 100644 --- a/docs/relnotes/10.7.0.html +++ b/docs/relnotes/10.7.0.html @@ -44,8 +44,11 @@ Note: some of the new features are only available with certain drivers. /p ul +liGL_AMD_vertex_shader_viewport_index on radeonsi/li liGL_ARB_framebuffer_no_attachments on i965/li liGL_ARB_shader_stencil_export on llvmpipe/li +liGL_ARB_viewport_array on radeonsi/li +liGL_ARB_fragment_layer_viewport on radeonsi/li /ul h2Bug fixes/h2 diff --git a/src/gallium/drivers/radeonsi/si_blit.c b/src/gallium/drivers/radeonsi/si_blit.c index 1f2c408..6c7b383 100644 --- a/src/gallium/drivers/radeonsi/si_blit.c +++ b/src/gallium/drivers/radeonsi/si_blit.c @@ -63,11 +63,11 @@ static void si_blitter_begin(struct pipe_context *ctx, enum si_blitter_op op) util_blitter_save_sample_mask(sctx-blitter, sctx-queued.named.sample_mask-sample_mask); } - if (sctx-queued.named.viewport) { - util_blitter_save_viewport(sctx-blitter, sctx-queued.named.viewport-viewport); + if (sctx-queued.named.viewport[0]) { + util_blitter_save_viewport(sctx-blitter, sctx-queued.named.viewport[0]-viewport); } - if (sctx-queued.named.scissor) { - util_blitter_save_scissor(sctx-blitter, sctx-queued.named.scissor-scissor); + if (sctx-queued.named.scissor[0]) { + util_blitter_save_scissor(sctx-blitter, sctx-queued.named.scissor[0]-scissor); }
Re: [Mesa-dev] [Mesa-stable] [PATCH 2/6] clover: Call clBuildProgram() notification function when build completes v2
On Thu, Jun 25, 2015 at 03:19:40PM +0100, Emil Velikov wrote: On 3 June 2015 at 14:27, Francisco Jerez curroje...@riseup.net wrote: Emil Velikov emil.l.veli...@gmail.com writes: Hi Tom, On 31 March 2015 at 15:29, Francisco Jerez curroje...@riseup.net wrote: Tom Stellard thomas.stell...@amd.com writes: v2: - Only call notification for build errors - Fix clCompileProgram() Cc: 10.5 10.4 mesa-sta...@lists.freedesktop.org --- src/gallium/state_trackers/clover/api/program.cpp | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/gallium/state_trackers/clover/api/program.cpp b/src/gallium/state_trackers/clover/api/program.cpp index 60184ed..5cd543c 100644 --- a/src/gallium/state_trackers/clover/api/program.cpp +++ b/src/gallium/state_trackers/clover/api/program.cpp @@ -180,12 +180,18 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs, validate_build_program_common(prog, num_devs, d_devs, pfn_notify, user_data); prog.build(devs, opts); + if (pfn_notify) + pfn_notify(d_prog, user_data); Maybe leave blank lines around the conditional where preceded/followed by another statement? return CL_SUCCESS; +} catch (const build_error e) { + if (pfn_notify) + pfn_notify(d_prog, user_data); + if (e.get() == CL_COMPILE_PROGRAM_FAILURE) + return CL_BUILD_PROGRAM_FAILURE; + return e.get(); You can just return CL_BUILD_PROGRAM_FAILURE here unconditionally. } catch (error e) { if (e.get() == CL_INVALID_COMPILER_OPTIONS) return CL_INVALID_BUILD_OPTIONS; - if (e.get() == CL_COMPILE_PROGRAM_FAILURE) - return CL_BUILD_PROGRAM_FAILURE; return e.get(); } @@ -223,8 +229,14 @@ clCompileProgram(cl_program d_prog, cl_uint num_devs, objsallow_empty_tag(d_header_progs, num_headers)); prog.build(devs, opts, headers); + if (pfn_notify) + pfn_notify(d_prog, user_data); Same here. Did you had the chance to address Francisco's comments ? Did this patch fell through the cracks, or should I consider it obsolete/rejected ? I think the patch was good it just fell through the cracks. Tom? Humble reminder. Sorry, I haven't had a chance to update this patch, you can ignore it. -Tom -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations
On 2015-06-09 22:52, Francisco Jerez wrote: + + if (blocking) + hev().wait(); + hard_event::wait() may fail, so this should probably be done before the ret_object() call to avoid leaks. Alright... C++ exceptions are a minefield. :) Is there any reason you didn't make the same change in clEnqueueReadBuffer() and clEnqueueWriteBuffer()? Must be an oversight. I think I did that, or at least I intended to do so. Same comment as above. Also note that this is being more strict than the spec requires (which I believe is what Tom was referring to). From the CL 1.2 spec: | If blocking_write is CL_TRUE, the OpenCL implementation copies the data | referred to by ptr and enqueues the write operation in the | command-queue. The memory pointed to by ptr can be reused by the | application after the clEnqueueWriteBufferRect call returns. The spec is giving you no guarantee that the write to the actual memory object will be complete by the time the clEnqueueWriteBufferRect call returns -- Only that your data will have been buffered somewhere and the memory pointed to by the argument can be reused immediately by the application. The reason why I was reluctant to make this change last time it came up was that it's likely to hurt performance unnecessarily because the wait() call blocks until *all* previous commands in the same queue have completed execution, even though in the most common case the copy is performed synchronously using soft_copy_op(), so the wait() call is redundant even for blocking copies. OK, maybe we could drop the wait completely for all of the write calls. The case with blocking reads is similar, the copy is handled synchronously using soft_copy_op() when no user events are present in the list of dependencies, so calling wait() on the event is unnecessary to guarantee that the execution of the read has completed, and will cause a pipe_context flush and wait until the most recent fence is signalled. I think it's reasonable to expect that the event is ready for profile queries after a blocking read has finished. That was the initial motivation for this patch. Other implementations behave like that. I didn't expect wait() to completely flush everything. Won't that cause a lot of needless flushing with event wait lists? Ideally we would have a weaker variant of event::wait() (e.g. wait_signalled()) that doesn't flush and just waits for the associated action call-back to have been executed without giving any guarantees about the corresponding GPU command. The event interface doesn't expose such a functionality right now, I'm attaching two (completely untested) patches implementing it, you should be able to use them as starting point to fix blocking transfers. Thanks, I'll look into that later when I get some free time. Grigori ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/gen9: use an unreserved surface alignment value
On Thu, Jun 25, 2015 at 12:37 AM, Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Wed, Jun 24, 2015 at 05:57:13PM -0700, Anuj Phogat wrote: On Wed, Jun 24, 2015 at 3:51 PM, Nanley Chery nanleych...@gmail.com wrote: From: Nanley Chery nanley.g.ch...@intel.com Although the horizontal and vertical alignment fields are ignored here, 0 is a reserved value for them and may cause undefined behavior. Change the default value to an abitrary valid one. Signed-off-by: Nanley Chery nanley.g.ch...@intel.com --- src/mesa/drivers/dri/i965/gen8_surface_state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index b2d1a57..22ae960 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -93,7 +93,7 @@ vertical_alignment(const struct brw_context *brw, if (brw-gen 8 (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE || surf_type == BRW_SURFACE_1D)) - return 0; + return GEN8_SURFACE_VALIGN_4; switch (mt-align_h) { case 4: @@ -118,7 +118,7 @@ horizontal_alignment(const struct brw_context *brw, if (brw-gen 8 (mt-tr_mode != INTEL_MIPTREE_TRMODE_NONE || gen9_use_linear_1d_layout(brw, mt))) - return 0; + return GEN8_SURFACE_HALIGN_4; switch (mt-align_w) { case 4: -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Good find Nanley. We had no known issues with value 0 but it's always nice to avoid undefined behavior :). Right, I thought about this when I reviewed the original. The spec says it is ignored in these cases and hence the reserved value seemed fine. Now that we put something meaningful there, somebody is going to compare the spec and wonder why we set it to 4. If we added also a comment here that says this is just an arbitrary (non-reserved) value and really ignored by the hardware, it would prevent misunderstandings. What do you guys think? That's a good suggestion Topi. Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] clover: implement CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE
On 2015-05-28 13:04, Grigori Goronzy wrote: Work-group size should always be aligned to subgroup size; this is a basic requirement, otherwise some work-items will be no-operation. It might make sense to refine the value according to a kernel's resource usage, but that's a possible optimization for the future. Ping? This is rather simple, but I'd like an Rb, if possible. That also goes for the Gallium support patch. Grigori --- src/gallium/state_trackers/clover/api/kernel.cpp | 2 +- src/gallium/state_trackers/clover/core/device.cpp | 5 + src/gallium/state_trackers/clover/core/device.hpp | 1 + 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp b/src/gallium/state_trackers/clover/api/kernel.cpp index 05cc392..857a152 100644 --- a/src/gallium/state_trackers/clover/api/kernel.cpp +++ b/src/gallium/state_trackers/clover/api/kernel.cpp @@ -169,7 +169,7 @@ clGetKernelWorkGroupInfo(cl_kernel d_kern, cl_device_id d_dev, break; case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: - buf.as_scalarsize_t() = 1; + buf.as_scalarsize_t() = dev.subgroup_size(); break; case CL_KERNEL_PRIVATE_MEM_SIZE: diff --git a/src/gallium/state_trackers/clover/core/device.cpp b/src/gallium/state_trackers/clover/core/device.cpp index 42b45b7..c42d1d2 100644 --- a/src/gallium/state_trackers/clover/core/device.cpp +++ b/src/gallium/state_trackers/clover/core/device.cpp @@ -185,6 +185,11 @@ device::max_block_size() const { return { v.begin(), v.end() }; } +cl_uint +device::subgroup_size() const { + return get_compute_paramuint32_t(pipe, PIPE_COMPUTE_CAP_SUBGROUP_SIZE)[0]; +} + std::string device::device_name() const { return pipe-get_name(pipe); diff --git a/src/gallium/state_trackers/clover/core/device.hpp b/src/gallium/state_trackers/clover/core/device.hpp index de5fc6b..2857847 100644 --- a/src/gallium/state_trackers/clover/core/device.hpp +++ b/src/gallium/state_trackers/clover/core/device.hpp @@ -67,6 +67,7 @@ namespace clover { bool has_doubles() const; std::vectorsize_t max_block_size() const; + cl_uint subgroup_size() const; std::string device_name() const; std::string vendor_name() const; enum pipe_shader_ir ir_format() const; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] ARB_arrays_of_arrays GLSL ES
On Thu, Jun 25, 2015 at 1:19 AM, Timothy Arceri t_arc...@yahoo.com.au wrote: On Wed, 2015-06-24 at 11:17 -0700, Jason Ekstrand wrote: On Sat, Jun 20, 2015 at 5:32 AM, Timothy Arceri t_arc...@yahoo.com.au wrote: Hi all, The restrictions in ES make the extension easier to implement so I thought I'd try get this stuff reviewed an committed before finishing up the full extension. The bits that I'm still working on for the desktop version are AoA inputs outputs, and interface blocks. The only thing I know is definatly missing in this series for ES is support for indirect indexing of samplers, but that didn't seem like something that should hold up the series. Once the SSBO series lands (with a patch that restricts unsized arrays) then all the AoA ES conformance tests will pass. There are already a bunch of piglit tests in git but I've just sent a series with all the patches still waiting review here: http://lists.freedesktop.org/archives/piglit/2015-June/016312.html I haven't made a patch marking this as done yet because currently the i965 backend takes a very long time trying to optimise some of the conformance tests. They still pass but they are taking 15-minutes+ just to compile so this really needs to be sorted out first. If someone with more knowledge in this area than me wants to take a look at this I would be greatful for being pointed in the right direction. Can you try it with this patch: http://lists.freedesktop.org/archives/mesa-dev/2015-June/086011.html Hi Jason, I tried that patch it didn't seem to make any difference, then I noticed its for fs. The slowdown is currently happening in vec4_live_variables. I've also noticed other large slowdowns with some piglit tests I've been working on this time in the register allocate code. I just sent an equivalent patch for vec4: http://patchwork.freedesktop.org/patch/52801/ I would love to know if it helps something so that I can put a good justification in the commit message beyond the same as for FS. --Jason Once I finish fixing up some bugs with my current patchset I'll start digging deeper, just thought since it's so noticeably slow it might be easy for someone with good knowledge of the backend to point out where I should start looking, or to say your doing this wrong. One thing the slow shaders have in common is the use of arrays of arrays in nested loops so maybe something funny is going on when they go through the optimisation paths, I haven't spent much time looking at any of these passes yet so its highly likely they don't work as expected for arrays of arrays. Thanks anyway, Tim If useful the series is in the 'gles4' branch of the repo here: https://github.com/tarceri/Mesa_arrays_of_arrays.git Thanks, Tim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/vec4_live_variables: Do liveness analysis bottom-to-top
From Muchnick's Advanced Compiler Design and Implementation: To determine which variables are live at each point in a flowgraph, we perform a backward data-flow analysis Previously, we were walking the blocks forwards and updating the livein and then the liveout. However, the livein calculation depends on the liveout and the liveout depends on the successor blocks. The net result is that it takes one full iteration to go from liveout to livein and then another full iteration to propagate to the predecessors. This works out to an O(n^2) computation where n is the number of blocks. If we run things in the other order, it's O(nl) where l is the maximum loop depth which is practically bounded by 3. In b2c6ba0c4b21391dc35018e1c8c4f7f7d8952bea, we made this same change in the FS backend to great effect. Might as well keep it consistent and make the same change for vec4. --- .../drivers/dri/i965/brw_vec4_live_variables.cpp | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index 95b9d90..29b4a53 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -133,27 +133,9 @@ vec4_live_variables::compute_live_variables() while (cont) { cont = false; - foreach_block (block, cfg) { + foreach_block_reverse (block, cfg) { struct block_data *bd = block_data[block-num]; -/* Update livein */ -for (int i = 0; i bitset_words; i++) { -BITSET_WORD new_livein = (bd-use[i] | - (bd-liveout[i] - ~bd-def[i])); -if (new_livein ~bd-livein[i]) { - bd-livein[i] |= new_livein; - cont = true; - } -} - BITSET_WORD new_livein = (bd-flag_use[0] | - (bd-flag_liveout[0] -~bd-flag_def[0])); - if (new_livein ~bd-flag_livein[0]) { -bd-flag_livein[0] |= new_livein; -cont = true; - } - /* Update liveout */ foreach_list_typed(bblock_link, child_link, link, block-children) { struct block_data *child_bd = block_data[child_link-block-num]; @@ -173,6 +155,24 @@ vec4_live_variables::compute_live_variables() cont = true; } } + + /* Update livein */ + for (int i = 0; i bitset_words; i++) { +BITSET_WORD new_livein = (bd-use[i] | + (bd-liveout[i] + ~bd-def[i])); +if (new_livein ~bd-livein[i]) { + bd-livein[i] |= new_livein; + cont = true; +} + } + BITSET_WORD new_livein = (bd-flag_use[0] | + (bd-flag_liveout[0] +~bd-flag_def[0])); + if (new_livein ~bd-flag_livein[0]) { +bd-flag_livein[0] |= new_livein; +cont = true; + } } } } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Use more compact hiz dimensions
Reviewed-by: Jordan Justen jordan.l.jus...@intel.com On 2015-06-24 20:07:54, Ben Widawsky wrote: gen8 had some special restrictions which don't seem to carry over to gen9. Quoting the spec for SKL: The Z_Height and Z_Width values must equal those present in 3DSTATE_DEPTH_BUFFER incremented by one. This fixes nothing in piglit (and regresses nothing). Cc: Jordan Justen jordan.l.jus...@intel.com Signed-off-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 32 ++- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 6aa969a..432a47c 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -1550,21 +1550,23 @@ intel_gen8_hiz_buf_create(struct brw_context *brw, /* Gen7 PRM Volume 2, Part 1, 11.5.3 Hierarchical Depth Buffer documents * adjustments required for Z_Height and Z_Width based on multisampling. */ - switch (mt-num_samples) { - case 0: - case 1: - break; - case 2: - case 4: - z_width *= 2; - z_height *= 2; - break; - case 8: - z_width *= 4; - z_height *= 2; - break; - default: - unreachable(unsupported sample count); + if (brw-gen 9) { + switch (mt-num_samples) { + case 0: + case 1: + break; + case 2: + case 4: + z_width *= 2; + z_height *= 2; + break; + case 8: + z_width *= 4; + z_height *= 2; + break; + default: + unreachable(unsupported sample count); + } } const unsigned vertical_align = 8; /* 'j' in the docs */ -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] New stable-branch 10.6 candidate pushed
Hello list, The candidate for the Mesa 10.6.1 is now available. Currently we have: - 22 queued - 19 nominated (outstanding) - and 0 rejected (obsolete) patches The present queue consist of core mesa and glsl patches (amonst which a Dota2 Reborn bugfix) affecting all dri drivers, nouveau specific fixes and a selection of shared-glapi commits - from build fixes, to ones ensuring that the dri modules can be loaded. Take a look at section Mesa stable queue for more information. Testing --- The following results are against piglit 246791c51ec. Changes - classic i965(snb) --- None. Changes - swrast classic None. Changes - gallium softpipe -- None. Changes - gallium llvmpipe (LLVM 3.6) - None. Testing reports/general approval Any testing reports (or general approval of the state of the branch) will be greatly appreciated. Trivial merge conflicts --- None. The plan is to have 10.6.1 this Friday(26th of June). If you have any questions or comments that you would like to share before the release, please go ahead. Cheers, Emil Mesa stable queue - Nominated (19) == Anuj Phogat (8): mesa: Turn get_readpixels_transfer_ops() in to a global function meta: Fix transfer operations check in meta pbo path for readpixels mesa: Fix conditions to test signed, unsigned integer format mesa: Add a mesa utility function _mesa_need_signed_unsigned_int_conversion() meta: Abort meta pbo path if readpixels need signed-unsigned meta: Don't do fragment color clamping in case of ReadPixels mesa: Add a helper function _mesa_need_luminance_to_rgb_conversion() meta: Fix reading luminance texture as rgba in _mesa_meta_pbo_GetTexSubImage() Boyan Ding (1): i915: Add XRGB format to intel_screen_make_configs Brian Paul (1): configure: don't try to build gallium DRI drivers if --disable-dri is set Emil Velikov (1): bugzilla_mesa.sh: sort the bugs list by number Kenneth Graunke (1) i965/fs: Fix ir_txs in emit_texture_gen4_simd16(). Mario Kleiner(1): nouveau: Use dup fd as key in drm-winsys hash table to fix ZaphodHeads Matt Turner (1): i965/fs: Don't mess up stride for uniform integer multiplication. Tapani Pälli (2): glsl: Allow dynamic sampler array indexing with GLSL ES 3.00 glsl: validate sampler array indexing for 'constant-index-expression' Tom Stellard (3): clover: Call clBuildProgram() notification function when build completes v2 gallium/drivers: Add threadsafe wrappers for pipe_context and pipe_screen clover: Use threadsafe wrappers for pipe_screen and pipe_context Queued (22) === Anuj Phogat (4): mesa: Handle integer formats in need_rgb_to_luminance_conversion() mesa: Use helper function need_rgb_to_luminance_conversion() mesa: Turn need_rgb_to_luminance_conversion() in to a global function meta: Abort meta path if ReadPixels need rgb to luminance conversion Ben Widawsky (1): i965/gen9: Implement Push Constant Buffer workaround Boyan Ding (2): egl/x11: Set version of swrastLoader to 2 egl/x11: Remove duplicate call to dri2_x11_add_configs_for_visuals Emil Velikov (5): docs: Add sha256sums for the 10.6.0 release configure: warn about shared_glapi xlib-glx only when both are set configure: error out when building backend-less libEGL configure: error out when building libEGL without shared-glapi gbm: do not (over)link against libglapi.so Frank Henigman (1): gbm: dlopen libglapi so gbm_create_device works Ilia Mirkin (9): nvc0/ir: fix collection of first uses for texture barrier insertion nv50,nvc0: clamp uniform size to 64k nvc0/ir: can't have a join on a load with an indirect source glsl: handle conversions to double when comparing param matches glsl: add version checks to conditionals for builtin variable enablement mesa: add GL_PROGRAM_PIPELINE support in KHR_debug calls glsl: binding point is a texture unit, which is a combined space nvc0: always put all tfb bufs into bufctx nv50,nvc0: make sure to pushbuf_refn before putting bo into pushbuf_data ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Remove special case for layered drawbuffer attachments.
When binding a layered texture, the layer is already 0. There's no need to special case this. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen6_surface_state.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen6_surface_state.c b/src/mesa/drivers/dri/i965/gen6_surface_state.c index 03e913a..39de62f 100644 --- a/src/mesa/drivers/dri/i965/gen6_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen6_surface_state.c @@ -88,7 +88,8 @@ gen6_update_renderbuffer_surface(struct brw_context *brw, break; } - const int min_array_element = layered ? 0 : irb-mt_layer; + const int min_array_element = irb-mt_layer; + assert(!layered || irb-mt_layer == 0); surf[0] = SET_FIELD(surftype, BRW_SURFACE_TYPE) | SET_FIELD(format, BRW_SURFACE_FORMAT); -- 2.4.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Use more compact hiz dimensions
On Wed, Jun 24, 2015 at 8:07 PM, Ben Widawsky benjamin.widaw...@intel.com wrote: gen8 had some special restrictions which don't seem to carry over to gen9. Quoting the spec for SKL: The Z_Height and Z_Width values must equal those present in 3DSTATE_DEPTH_BUFFER incremented by one. This fixes nothing in piglit (and regresses nothing). Cc: Jordan Justen jordan.l.jus...@intel.com Signed-off-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 32 ++- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 6aa969a..432a47c 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -1550,21 +1550,23 @@ intel_gen8_hiz_buf_create(struct brw_context *brw, /* Gen7 PRM Volume 2, Part 1, 11.5.3 Hierarchical Depth Buffer documents * adjustments required for Z_Height and Z_Width based on multisampling. */ - switch (mt-num_samples) { - case 0: - case 1: - break; - case 2: - case 4: - z_width *= 2; - z_height *= 2; - break; - case 8: - z_width *= 4; - z_height *= 2; - break; - default: - unreachable(unsupported sample count); + if (brw-gen 9) { + switch (mt-num_samples) { + case 0: + case 1: + break; + case 2: + case 4: + z_width *= 2; + z_height *= 2; + break; + case 8: + z_width *= 4; + z_height *= 2; + break; + default: + unreachable(unsupported sample count); + } } const unsigned vertical_align = 8; /* 'j' in the docs */ -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/gen6: Set up layer constraints properly for depth buffers.
This ports over Chris Forbes' equivalent fixes in gen7_misc_state.c from commit 77d55ef4819436ebbf9786a1e720ec00707bbb19. No Piglit changes on Sandybridge. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen6_depth_state.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen6_depth_state.c b/src/mesa/drivers/dri/i965/gen6_depth_state.c index 8f0d7dc..febd478 100644 --- a/src/mesa/drivers/dri/i965/gen6_depth_state.c +++ b/src/mesa/drivers/dri/i965/gen6_depth_state.c @@ -73,7 +73,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw, rb = (struct gl_renderbuffer*) irb; if (rb) { - depth = MAX2(rb-Depth, 1); + depth = MAX2(irb-layer_count, 1); if (rb-TexImage) gl_target = rb-TexImage-TexObject-Target; } @@ -89,6 +89,10 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw, surftype = BRW_SURFACE_2D; depth *= 6; break; + case GL_TEXTURE_3D: + assert(mt); + depth = MAX2(mt-logical_depth0, 1); + /* fallthrough */ default: surftype = translate_tex_target(gl_target); break; -- 2.4.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Remove special case for layered drawbuffer attachments.
On Thu, Jun 25, 2015 at 10:08 AM, Kenneth Graunke kenn...@whitecape.org wrote: When binding a layered texture, the layer is already 0. There's no need to special case this. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen6_surface_state.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen6_surface_state.c b/src/mesa/drivers/dri/i965/gen6_surface_state.c index 03e913a..39de62f 100644 --- a/src/mesa/drivers/dri/i965/gen6_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen6_surface_state.c @@ -88,7 +88,8 @@ gen6_update_renderbuffer_surface(struct brw_context *brw, break; } - const int min_array_element = layered ? 0 : irb-mt_layer; + const int min_array_element = irb-mt_layer; + assert(!layered || irb-mt_layer == 0); surf[0] = SET_FIELD(surftype, BRW_SURFACE_TYPE) | SET_FIELD(format, BRW_SURFACE_FORMAT); -- 2.4.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/cs: Initialize GPGPU Thread Count
On Thu, Jun 11, 2015 at 09:04:45PM -0700, Jordan Justen wrote: This field should always be set for gen8. In the bdw PRM, Volume 2d: Command Reference: Structures under INTERFACE_DESCRIPTOR_DATA, DWORD 6, Bits 9:0, Number of Threads in GPGPU Thread Group: This field should not be set to 0 even if the barrier is disabled, since an accurate value is needed for proper pre-emption. I am pretty skeptical that we actually need this. It's pretty clear this is a requirement for preemption, and do we ever plan to support gpgpu preemption with compute shaders? Since I did some research, here's why I think it's a requirement. BDW added supported of doing mid thread-group preemption. Doing this requires that when the workload is restored, it has a concept of how many more threads need to complete. So I don't see this patch as being a requirement, but it shouldn't hurt and probably makes looking a debug slightly easier. Also, if we do ever support preemption, it should work. One comment inline, and then it's Reviewed-by: Ben Widawsky b...@bwidawsk.net In the HSW PRM, the it doesn't mention that it must always be set, but it should not hurt. Reported-by: Kristian Høgsberg k...@bitplanet.net Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_cs.cpp| 19 +++ src/mesa/drivers/dri/i965/brw_defines.h | 5 + 2 files changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp b/src/mesa/drivers/dri/i965/brw_cs.cpp index 1f2a9d2..44c76ba 100644 --- a/src/mesa/drivers/dri/i965/brw_cs.cpp +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp @@ -284,6 +284,17 @@ brw_cs_precompile(struct gl_context *ctx, } +static unsigned +get_cs_thread_count(const struct brw_cs_prog_data *cs_prog_data) +{ + const unsigned simd_size = cs_prog_data-simd_size; + unsigned group_size = cs_prog_data-local_size[0] * + cs_prog_data-local_size[1] * cs_prog_data-local_size[2]; + + return (group_size + simd_size - 1) / simd_size; +} + + static void brw_upload_cs_state(struct brw_context *brw) { @@ -309,6 +320,8 @@ brw_upload_cs_state(struct brw_context *brw) prog_data-binding_table.size_bytes, 32, stage_state-bind_bo_offset); + unsigned threads = get_cs_thread_count(cs_prog_data); + uint32_t dwords = brw-gen 8 ? 8 : 9; BEGIN_BATCH(dwords); OUT_BATCH(MEDIA_VFE_STATE 16 | (dwords - 2)); @@ -358,6 +371,12 @@ brw_upload_cs_state(struct brw_context *brw) desc[dw++] = 0; desc[dw++] = 0; desc[dw++] = stage_state-bind_bo_offset; + desc[dw++] = 0; + const uint32_t media_threads = + brw-gen = 8 ? + SET_FIELD(threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) : + SET_FIELD(threads, MEDIA_GPGPU_THREAD_COUNT); + desc[dw++] = media_threads; What's the deal with, The maximum value for global barriers is limited by the number of threads in the system, or by 511, Can we add an assert? BEGIN_BATCH(4); OUT_BATCH(MEDIA_INTERFACE_DESCRIPTOR_LOAD 16 | (4 - 2)); diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index f6da305..2a8f500 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -2495,6 +2495,11 @@ enum brw_wm_barycentric_interp_mode { # define MEDIA_VFE_STATE_CURBE_ALLOC_MASK INTEL_MASK(15, 0) #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002 +/* GEN7 DW5, GEN8+ DW6 */ +# define MEDIA_GPGPU_THREAD_COUNT_SHIFT 0 +# define MEDIA_GPGPU_THREAD_COUNT_MASK INTEL_MASK(7, 0) +# define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0 +# define GEN8_MEDIA_GPGPU_THREAD_COUNT_MASK INTEL_MASK(9, 0) #define MEDIA_STATE_FLUSH 0x7004 #define GPGPU_WALKER0x7105 /* GEN8+ DW2 */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] nir/from_ssa: add a flag to not convert everything to SSA
We already don't convert constants out of SSA, and in our backend we'd like to have only one way of saying something is still in SSA. The one tricky part about this is that we may now leave some undef instructions around if they aren't part of a phi-web, so we have to be more careful about deleting them. Signed-off-by: Connor Abbott cwabbo...@gmail.com --- src/gallium/drivers/vc4/vc4_program.c | 2 +- src/glsl/nir/nir.h| 7 ++- src/glsl/nir/nir_from_ssa.c | 25 ++--- src/mesa/drivers/dri/i965/brw_nir.c | 2 +- 4 files changed, 26 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/vc4/vc4_program.c b/src/gallium/drivers/vc4/vc4_program.c index 2061631..1a550e1 100644 --- a/src/gallium/drivers/vc4/vc4_program.c +++ b/src/gallium/drivers/vc4/vc4_program.c @@ -2102,7 +2102,7 @@ vc4_shader_ntq(struct vc4_context *vc4, enum qstage stage, nir_remove_dead_variables(c-s); -nir_convert_from_ssa(c-s); +nir_convert_from_ssa(c-s, true); if (vc4_debug VC4_DEBUG_SHADERDB) { fprintf(stderr, SHADER-DB: %s prog %d/%d: %d NIR instructions\n, diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 697d37e..2116f60 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1676,7 +1676,12 @@ bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def *b); void nir_convert_to_ssa_impl(nir_function_impl *impl); void nir_convert_to_ssa(nir_shader *shader); -void nir_convert_from_ssa(nir_shader *shader); + +/* If convert_everything is true, convert all values (even those not involved + * in a phi node) to registers. If false, only convert SSA values involved in + * phi nodes to registers. + */ +void nir_convert_from_ssa(nir_shader *shader, bool convert_everything); bool nir_opt_algebraic(nir_shader *shader); bool nir_opt_algebraic_late(nir_shader *shader); diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c index 67733e6..966c2fe 100644 --- a/src/glsl/nir/nir_from_ssa.c +++ b/src/glsl/nir/nir_from_ssa.c @@ -37,6 +37,7 @@ struct from_ssa_state { void *mem_ctx; void *dead_ctx; + bool convert_everything; struct hash_table *merge_node_table; nir_instr *instr; nir_function_impl *impl; @@ -482,6 +483,9 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) reg = node-set-reg; } else { + if (!state-convert_everything) + return true; + /* We leave load_const SSA values alone. They act as immediates to * the backend. If it got coalesced into a phi, that's ok. */ @@ -505,8 +509,15 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state-mem_ctx); assert(list_empty(def-uses) list_empty(def-if_uses)); - if (def-parent_instr-type == nir_instr_type_ssa_undef) + if (def-parent_instr-type == nir_instr_type_ssa_undef) { + /* If it's an ssa_undef instruction, remove it since we know we just got + * rid of all its uses. + */ + nir_instr *parent_instr = def-parent_instr; + nir_instr_remove(parent_instr); + ralloc_steal(state-dead_ctx, parent_instr); return true; + } assert(def-parent_instr-type != nir_instr_type_load_const); @@ -523,7 +534,7 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) } /* Resolves ssa definitions to registers. While we're at it, we also - * remove phi nodes and ssa_undef instructions + * remove phi nodes. */ static bool resolve_registers_block(nir_block *block, void *void_state) @@ -534,8 +545,7 @@ resolve_registers_block(nir_block *block, void *void_state) state-instr = instr; nir_foreach_ssa_def(instr, rewrite_ssa_def, state); - if (instr-type == nir_instr_type_ssa_undef || - instr-type == nir_instr_type_phi) { + if (instr-type == nir_instr_type_phi) { nir_instr_remove(instr); ralloc_steal(state-dead_ctx, instr); } @@ -765,13 +775,14 @@ resolve_parallel_copies_block(nir_block *block, void *void_state) } static void -nir_convert_from_ssa_impl(nir_function_impl *impl) +nir_convert_from_ssa_impl(nir_function_impl *impl, bool convert_everything) { struct from_ssa_state state; state.mem_ctx = ralloc_parent(impl); state.dead_ctx = ralloc_context(NULL); state.impl = impl; + state.convert_everything = convert_everything; state.merge_node_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer, _mesa_key_pointer_equal); @@ -801,10 +812,10 @@ nir_convert_from_ssa_impl(nir_function_impl *impl) } void -nir_convert_from_ssa(nir_shader *shader) +nir_convert_from_ssa(nir_shader *shader, bool convert_everything) { nir_foreach_overload(shader, overload) { if (overload-impl) - nir_convert_from_ssa_impl(overload-impl); + nir_convert_from_ssa_impl(overload-impl, convert_everything); } } diff
[Mesa-dev] [PATCH 4/4] nir: remove parent_instr from nir_register
It's no longer used Signed-off-by: Connor Abbott cwabbo...@gmail.com --- src/glsl/nir/nir.c | 1 - src/glsl/nir/nir.h | 8 src/glsl/nir/nir_from_ssa.c | 8 3 files changed, 17 deletions(-) diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c index f03e80a..f661249 100644 --- a/src/glsl/nir/nir.c +++ b/src/glsl/nir/nir.c @@ -57,7 +57,6 @@ reg_create(void *mem_ctx, struct exec_list *list) { nir_register *reg = ralloc(mem_ctx, nir_register); - reg-parent_instr = NULL; list_inithead(reg-uses); list_inithead(reg-defs); list_inithead(reg-if_uses); diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index b33c9c5..e818acc 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -389,14 +389,6 @@ typedef struct { */ bool is_packed; - /** -* If this pointer is non-NULL then this register has exactly one -* definition and that definition dominates all of its uses. This is -* set by the out-of-SSA pass so that backends can get SSA-like -* information even once they have gone out of SSA. -*/ - struct nir_instr *parent_instr; - /** set of nir_instr's where this register is used (read from) */ struct list_head uses; diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c index 966c2fe..57bbdde 100644 --- a/src/glsl/nir/nir_from_ssa.c +++ b/src/glsl/nir/nir_from_ssa.c @@ -496,14 +496,6 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) reg-name = def-name; reg-num_components = def-num_components; reg-num_array_elems = 0; - - /* This register comes from an SSA definition that is defined and not - * part of a phi-web. Therefore, we know it has a single unique - * definition that dominates all of its uses; we can copy the - * parent_instr from the SSA def safely. - */ - if (def-parent_instr-type != nir_instr_type_ssa_undef) - reg-parent_instr = def-parent_instr; } nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg), state-mem_ctx); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] i965/fs: use SSA values directly
Before, we would use registers, but set a magical parent_instr field to indicate that it was actually purely an SSA value (i.e., it wasn't involved in any phi nodes). Instead, just use SSA values directly, which lets us get rid of the hack and reduces memory usage since we're not allocating a nir_register for every value. It also makes our handling of load_const more consistent compared to the other instructions. Signed-off-by: Connor Abbott cwabbo...@gmail.com --- src/mesa/drivers/dri/i965/brw_fs.h | 5 ++ src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 73 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 1 + src/mesa/drivers/dri/i965/brw_nir.c| 2 +- .../dri/i965/brw_nir_analyze_boolean_resolves.c| 12 ++-- 5 files changed, 59 insertions(+), 34 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 243baf6..a6fee35 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -249,6 +249,10 @@ public: void nir_emit_block(nir_block *block); void nir_emit_instr(nir_instr *instr); void nir_emit_alu(const brw::fs_builder bld, nir_alu_instr *instr); + void nir_emit_load_const(const brw::fs_builder bld, +nir_load_const_instr *instr); + void nir_emit_undef(const brw::fs_builder bld, + nir_ssa_undef_instr *instr); void nir_emit_intrinsic(const brw::fs_builder bld, nir_intrinsic_instr *instr); void nir_emit_texture(const brw::fs_builder bld, @@ -345,6 +349,7 @@ public: unsigned max_grf; fs_reg *nir_locals; + fs_reg *nir_ssa_values; fs_reg *nir_globals; fs_reg nir_inputs; fs_reg nir_outputs; diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 59081ea..58896d7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -366,6 +366,9 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl) nir_locals[reg-index] = bld.vgrf(BRW_REGISTER_TYPE_F, size); } + nir_ssa_values = reralloc(mem_ctx, nir_ssa_values, fs_reg, + impl-ssa_alloc); + nir_emit_cf_list(impl-body); } @@ -459,9 +462,11 @@ fs_visitor::nir_emit_instr(nir_instr *instr) break; case nir_instr_type_load_const: - /* We can hit these, but we do nothing now and use them as - * immediates later. - */ + nir_emit_load_const(abld, nir_instr_as_load_const(instr)); + break; + + case nir_instr_type_ssa_undef: + nir_emit_undef(abld, nir_instr_as_ssa_undef(instr)); break; case nir_instr_type_jump: @@ -495,17 +500,12 @@ bool fs_visitor::optimize_frontfacing_ternary(nir_alu_instr *instr, const fs_reg result) { - if (instr-src[0].src.is_ssa || - !instr-src[0].src.reg.reg || - !instr-src[0].src.reg.reg-parent_instr) - return false; - - if (instr-src[0].src.reg.reg-parent_instr-type != - nir_instr_type_intrinsic) + if (!instr-src[0].src.is_ssa || + instr-src[0].src.ssa-parent_instr-type != nir_instr_type_intrinsic) return false; nir_intrinsic_instr *src0 = - nir_instr_as_intrinsic(instr-src[0].src.reg.reg-parent_instr); + nir_instr_as_intrinsic(instr-src[0].src.ssa-parent_instr); if (src0-intrinsic != nir_intrinsic_load_front_face) return false; @@ -1146,6 +1146,25 @@ fs_visitor::nir_emit_alu(const fs_builder bld, nir_alu_instr *instr) } } +void +fs_visitor::nir_emit_load_const(const fs_builder bld, +nir_load_const_instr *instr) +{ + fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_D, instr-def.num_components); + + for (unsigned i = 0; i instr-def.num_components; i++) + bld.MOV(offset(reg, i), fs_reg(instr-value.i[i])); + + nir_ssa_values[instr-def.index] = reg; +} + +void +fs_visitor::nir_emit_undef(const fs_builder bld, nir_ssa_undef_instr *instr) +{ + nir_ssa_values[instr-def.index] = bld.vgrf(BRW_REGISTER_TYPE_D, + instr-def.num_components); +} + static fs_reg fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg, unsigned base_offset, nir_src *indirect) @@ -1171,30 +1190,30 @@ fs_reg_for_nir_reg(fs_visitor *v, nir_register *nir_reg, fs_reg fs_visitor::get_nir_src(nir_src src) { + fs_reg reg; if (src.is_ssa) { - assert(src.ssa-parent_instr-type == nir_instr_type_load_const); - nir_load_const_instr *load = nir_instr_as_load_const(src.ssa-parent_instr); - fs_reg reg = bld.vgrf(BRW_REGISTER_TYPE_D, src.ssa-num_components); - - for (unsigned i = 0; i src.ssa-num_components; ++i) - bld.MOV(offset(reg, i), fs_reg(load-value.i[i])); - - return reg; + reg = nir_ssa_values[src.ssa-index]; } else { - fs_reg reg =
Re: [Mesa-dev] [PATCH v4 3/6] mesa/es3.1: enable GL_ARB_texture_multisample for GLES 3.1
On Thu, Jun 25, 2015 at 4:22 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Thu, Jun 25, 2015 at 5:08 AM, Marta Lofstedt marta.lofst...@linux.intel.com wrote: From: Marta Lofstedt marta.lofst...@intel.com v4 : only expose GL_ARB_texture_multisample enums for gles 3.1 and desktop GL. I was suspicious of this logic. Based on my reading of the code, what your ARB_texture_multisample_es31 thing does is expose those enums when *either* the driver enables ARB_texture_multisample *or* the current context is ES3.1. ARB_draw_indirect_es31 has the same problem, btw. I could have misread the get.c extra_ext() logic, but I don't think I have. As far as I can tell there's no way to (generically) AND these things. What you really need to do is create a whole new [GL, GL_CORE, ES31] section in get_hash_params and update get_hash_generator.py accordingly. An alternative, BTW, is to just say screw it and expose the enums in GL ES 3.0. In that case, just move the enums into a section that includes GLES3. I'm not sure how big of a deal it is. But your current patches are definitely wrong. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 11/18] mesa/macros: add power-of-two assertions for alignment macros
How about if I create a patch which puts the greater than 0 check into is_power_of_two()? (value 0) (value (value - 1)) == 0); Thanks, Nanley On Wed, Jun 24, 2015 at 3:22 PM, Anuj Phogat anuj.pho...@gmail.com wrote: On Mon, Jun 22, 2015 at 4:02 PM, Nanley Chery nanleych...@gmail.com wrote: From: Nanley Chery nanley.g.ch...@intel.com ALIGN and ROUND_DOWN_TO both require that the alignment value passed into the macro be a power of two in the comments. Using software assertions verifies this to be the case. v2: use static inline functions instead of gcc-specific statement expressions. Signed-off-by: Nanley Chery nanley.g.ch...@intel.com --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 2 +- src/mesa/main/macros.h | 16 +--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 59081ea..1a57784 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -134,7 +134,7 @@ fs_visitor::nir_setup_outputs(nir_shader *shader) : var-type-vector_elements; if (stage == MESA_SHADER_VERTEX) { - for (int i = 0; i ALIGN(type_size(var-type), 4) / 4; i++) { + for (unsigned int i = 0; i ALIGN(type_size(var-type), 4) / 4; i++) { int output = var-data.location + i; this-outputs[output] = offset(reg, 4 * i); this-output_components[output] = vector_elements; diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h index 0608650..4a640ad 100644 --- a/src/mesa/main/macros.h +++ b/src/mesa/main/macros.h @@ -684,7 +684,7 @@ minify(unsigned value, unsigned levels) * Note that this considers 0 a power of two. */ static inline bool -is_power_of_two(unsigned value) +is_power_of_two(uintptr_t value) { return (value (value - 1)) == 0; } @@ -700,7 +700,12 @@ is_power_of_two(unsigned value) * * \sa ROUND_DOWN_TO() */ -#define ALIGN(value, alignment) (((value) + (alignment) - 1) ~((alignment) - 1)) +static inline uintptr_t +ALIGN(uintptr_t value, uintptr_t alignment) +{ + assert(is_power_of_two(alignment)); Also handle the 0 alignment. is_power_of_two() returns true for 0. Use assert(alignment 0 is_power_of_two(alignment))? + return (((value) + (alignment) - 1) ~((alignment) - 1)); +} /** * Align a value down to an alignment value @@ -713,7 +718,12 @@ is_power_of_two(unsigned value) * * \sa ALIGN() */ -#define ROUND_DOWN_TO(value, alignment) ((value) ~(alignment - 1)) +static inline uintptr_t +ROUND_DOWN_TO(uintptr_t value, uintptr_t alignment) +{ + assert(is_power_of_two(alignment)); Here as well. + return ((value) ~(alignment - 1)); +} /** Cross product of two 3-element vectors */ -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev With above changes and indentation fixes: Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 4/7] glsl: No need to lock in _mesa_glsl_release_types
On Thu, Jun 25, 2015 at 2:05 PM, Erik Faye-Lund kusmab...@gmail.com wrote: This function only gets called while mesa is unloading, so there's no potential of racing or multiple calls at the same time. So let's just get rid of the locking. Signed-off-by: Erik Faye-Lund kusmab...@gmail.com --- src/glsl/glsl_types.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index f675e90..dbe2382 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -324,8 +324,10 @@ const glsl_type *glsl_type::get_scalar_type() const void _mesa_glsl_release_types(void) { - mtx_lock(glsl_type::mutex); - + /* should only be called during atexit (either when unloading shared Let's capitalize should +* object, or if process terminates), so no mutex-locking should be +* nessecary. typo: necessary ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev