Re: [Mesa-dev] [PATCH 01/22] glsl: Reorder optimization-passes
On Sat, Jan 17, 2015 at 8:31 AM, Thomas Helland thomashellan...@gmail.com wrote: 2015-01-03 22:48 GMT+01:00 Matt Turner matts...@gmail.com: On Sat, Jan 3, 2015 at 11:18 AM, Thomas Helland thomashellan...@gmail.com wrote: This allows opt_algebraic to resolve open-coded saturates into ir_unop_saturate before we potentially mess it up by removing the min or max in min/max-pruning. Since we are now emitting more free saturates on i965 this gives us some decrease in instruction count. total instructions in shared programs: 1317459 - 1317065 (-0.03%) instructions in affected programs: 4084 - 3690 (-9.65%) GAINED:0 LOST: 0 You're definitely onto something here. On our collection of shaders: total instructions in shared programs: 5876617 - 5875919 (-0.01%) instructions in affected programs: 9443 - 8745 (-7.39%) with some fragment shaders hurt in Natural Selection 2 and Kerbal Space program. I'll investigate these. Hi Matt, Don't want to be a nuisance (if that is even the right word? English is not my native tongue), but did you find the time to look at these regressions? Nuisance is indeed the right word, but you are not being one. :) I'll definitely look into this. Sorry that I haven't had a chance yet. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] nir: use Python to autogenerate opcode information
On Sat, Jan 17, 2015 at 11:42 AM, ahmad luig...@yandex.com wrote: hi. #! /usr/bin/env python corresponds python 3.x series for some major distro (arch,fedora ...) and python 2.x for some others. python 2.x and python 3.x are not source compatible each other. python 3.x not contains xrange funcion anymore. range vs xrange only meaningfull for python 2.x. http://www.pythoncentral.io/how-to-use-pythons-xrange-and-range/ Distros that which still use 2.x series as default python interpreter going to 3.x. regargs. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Yes... if you look at the part of the patch that modifies Makefile.am, it's actually called with $(PYTHON) which will be python2 on distro's where python3 is the default. Unfortunately, on some distros there's no python2, so #!/usr/bin/env python2 won't work either... you can't please everyone. So the line you mentioned is more customary than anything else. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] nir: use Python to autogenerate opcode information
On Saturday, January 17, 2015 01:09:45 PM Connor Abbott wrote: On Sat, Jan 17, 2015 at 11:42 AM, ahmad luig...@yandex.com wrote: hi. #! /usr/bin/env python corresponds python 3.x series for some major distro (arch,fedora ...) and python 2.x for some others. python 2.x and python 3.x are not source compatible each other. python 3.x not contains xrange funcion anymore. range vs xrange only meaningfull for python 2.x. http://www.pythoncentral.io/how-to-use-pythons-xrange-and-range/ Distros that which still use 2.x series as default python interpreter going to 3.x. regargs. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Yes... if you look at the part of the patch that modifies Makefile.am, it's actually called with $(PYTHON) which will be python2 on distro's where python3 is the default. Unfortunately, on some distros there's no python2, so #!/usr/bin/env python2 won't work either... you can't please everyone. So the line you mentioned is more customary than anything else. While I agree with you Conner, when I did a survey for piglit I found that OSX was the only major OS that didn't provide a python2 symlink, Arch, Gentoo, Debian, Fedora, and CentOS all did, and Windows doesn't care about a shbang. I would actually be in favor of using /usr/bin/python2 anyway, just because it makes it clear we're using python2, but ultimately you're right and it doesn't really matter. It's also been on my todo list to get all of the python in mesa working with both python2 and python3, but I have too many things to get done. Dylan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] i965: Enable L3 caching of buffer surfaces.
And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its semantics vary greatly from one generation to another, so it kind of encourages the caller to pass 0 which is the only valid setting across generations. After this commit the hardware-specific code decides what the best cacheability settings are for buffer surfaces, just like we do for textures. This together with some additional changes coming is expected to improve performance of pull constants, buffer textures, atomic counters and image objects on Gen7 and up. --- src/mesa/drivers/dri/i965/brw_context.h | 1 - src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 4 +--- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 4 +--- src/mesa/drivers/dri/i965/gen8_surface_state.c| 3 +-- 4 files changed, 3 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index a4b29fa..6195d3d 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -975,7 +975,6 @@ struct brw_context unsigned surface_format, unsigned buffer_size, unsigned pitch, -unsigned mocs, bool rw); /** diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index 85a08d5..ece352b 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -221,7 +221,6 @@ gen4_emit_buffer_surface_state(struct brw_context *brw, unsigned surface_format, unsigned buffer_size, unsigned pitch, - unsigned mocs, bool rw) { uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, @@ -279,7 +278,6 @@ brw_update_buffer_texture_surface(struct gl_context *ctx, brw_format, size / texel_size, texel_size, - 0, /* mocs */ false /* rw */); } @@ -382,7 +380,7 @@ brw_create_constant_surface(struct brw_context *brw, brw-vtbl.emit_buffer_surface_state(brw, out_offset, bo, offset, BRW_SURFACEFORMAT_R32G32B32A32_FLOAT, - elements, stride, 0, false); + elements, stride, false); } /** diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c index e2c347a..24547d9 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c @@ -225,7 +225,6 @@ gen7_emit_buffer_surface_state(struct brw_context *brw, unsigned surface_format, unsigned buffer_size, unsigned pitch, - unsigned mocs, bool rw) { uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, @@ -241,7 +240,7 @@ gen7_emit_buffer_surface_state(struct brw_context *brw, surf[3] = SET_FIELD(((buffer_size - 1) 21) 0x3f, BRW_SURFACE_DEPTH) | (pitch - 1); - surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS); + surf[5] = SET_FIELD(GEN7_MOCS_L3, GEN7_SURFACE_MOCS); if (brw-is_haswell) { surf[7] |= (SET_FIELD(HSW_SCS_RED, GEN7_SURFACE_SCS_R) | @@ -385,7 +384,6 @@ gen7_create_raw_surface(struct brw_context *brw, drm_intel_bo *bo, BRW_SURFACEFORMAT_RAW, size, 1, - 0 /* mocs */, true /* rw */); } diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index d1b095c..8d4e180 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -116,9 +116,9 @@ gen8_emit_buffer_surface_state(struct brw_context *brw, unsigned surface_format, unsigned buffer_size, unsigned pitch, - unsigned mocs, bool rw) { + const unsigned mocs = brw-gen = 9 ? SKL_MOCS_WB : BDW_MOCS_WB; uint32_t *surf = allocate_surface_state(brw, out_offset); surf[0] = BRW_SURFACE_BUFFER BRW_SURFACE_TYPE_SHIFT | @@ -286,7 +286,6 @@ gen8_create_raw_surface(struct brw_context *brw, drm_intel_bo *bo,
[Mesa-dev] [PATCH 2/7] i965: Remove the create_raw_surface vtbl hook.
It's a wrapper around emit_buffer_surface_state with format=RAW, pitch=1, rw=true and the remaining arguments ordered differently. There's no point in having a separate vtbl pointer for that. --- src/mesa/drivers/dri/i965/brw_binding_tables.c| 8 +--- src/mesa/drivers/dri/i965/brw_context.h | 6 -- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 6 +++--- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 19 --- src/mesa/drivers/dri/i965/gen8_surface_state.c| 16 5 files changed, 8 insertions(+), 47 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c b/src/mesa/drivers/dri/i965/brw_binding_tables.c index ea82e71..08e4191 100644 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c @@ -68,9 +68,11 @@ brw_upload_binding_table(struct brw_context *brw, } else { /* Upload a new binding table. */ if (INTEL_DEBUG DEBUG_SHADER_TIME) { - brw-vtbl.create_raw_surface( -brw, brw-shader_time.bo, 0, brw-shader_time.bo-size, - stage_state-surf_offset[prog_data-binding_table.shader_time_start], true); + brw-vtbl.emit_buffer_surface_state( +brw, stage_state-surf_offset[ +prog_data-binding_table.shader_time_start], +brw-shader_time.bo, 0, BRW_SURFACEFORMAT_RAW, +brw-shader_time.bo-size, 1, true); } uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE, diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 6195d3d..d21e175 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -962,12 +962,6 @@ struct brw_context void (*update_null_renderbuffer_surface)(struct brw_context *brw, unsigned unit); - void (*create_raw_surface)(struct brw_context *brw, - drm_intel_bo *bo, - uint32_t offset, - uint32_t size, - uint32_t *out_offset, - bool rw); void (*emit_buffer_surface_state)(struct brw_context *brw, uint32_t *out_offset, drm_intel_bo *bo, diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index ece352b..e5f2058 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -920,9 +920,9 @@ brw_upload_abo_surfaces(struct brw_context *brw, drm_intel_bo *bo = intel_bufferobj_buffer( brw, intel_bo, binding-Offset, intel_bo-Base.Size - binding-Offset); - brw-vtbl.create_raw_surface(brw, bo, binding-Offset, - bo-size - binding-Offset, - surf_offsets[i], true); + brw-vtbl.emit_buffer_surface_state(brw, surf_offsets[i], bo, + binding-Offset, BRW_SURFACEFORMAT_RAW, + bo-size - binding-Offset, 1, true); } if (prog-NumAtomicBuffers) diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c index 24547d9..1421ac4 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c @@ -370,24 +370,6 @@ gen7_update_texture_surface(struct gl_context *ctx, } /** - * Create a raw surface for untyped R/W access. - */ -static void -gen7_create_raw_surface(struct brw_context *brw, drm_intel_bo *bo, -uint32_t offset, uint32_t size, -uint32_t *out_offset, bool rw) -{ - gen7_emit_buffer_surface_state(brw, - out_offset, - bo, - offset, - BRW_SURFACEFORMAT_RAW, - size, - 1, - true /* rw */); -} - -/** * Creates a null renderbuffer surface. * * This is used when the shader doesn't write to any color output. An FB @@ -563,6 +545,5 @@ gen7_init_vtable_surface_functions(struct brw_context *brw) brw-vtbl.update_renderbuffer_surface = gen7_update_renderbuffer_surface; brw-vtbl.update_null_renderbuffer_surface = gen7_update_null_renderbuffer_surface; - brw-vtbl.create_raw_surface = gen7_create_raw_surface; brw-vtbl.emit_buffer_surface_state = gen7_emit_buffer_surface_state; } diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index 8d4e180..9ddbbad 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c
[Mesa-dev] [PATCH 5/7] i965/fs: Less broken handling of force_writemask_all in lower_load_payload().
It's perfectly fine to read the second half of a register written with force_writemask_all from a first half MOV instruction or vice versa, and lower_load_payload shouldn't mark the whole MOV as belonging to the second half in that case. Replicate the same metadata to both halves of the destination when writemasking is disabled. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 4a61943..d585a67 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3059,9 +3059,11 @@ fs_visitor::lower_load_payload() } if (inst-dst.file == MRF || inst-dst.file == GRF) { - bool force_sechalf = inst-force_sechalf; + bool force_sechalf = inst-force_sechalf + !inst-force_writemask_all; bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4; + type_sz(inst-dst.type) == 4 + !inst-force_writemask_all; for (int i = 0; i inst-regs_written; ++i) { metadata[dst_reg + i].written = true; metadata[dst_reg + i].force_sechalf = force_sechalf; @@ -3104,11 +3106,15 @@ fs_visitor::lower_load_payload() mov-force_writemask_all = metadata[src_reg].force_writemask_all; metadata[dst_reg] = metadata[src_reg]; if (dst.width * type_sz(dst.type) 32) { - assert((!metadata[src_reg].written || - !metadata[src_reg].force_sechalf) -(!metadata[src_reg + 1].written || - metadata[src_reg + 1].force_sechalf)); - metadata[dst_reg + 1] = metadata[src_reg + 1]; + if (metadata[src_reg].force_writemask_all) { +metadata[dst_reg + 1] = metadata[src_reg]; + } else { +assert((!metadata[src_reg].written || +!metadata[src_reg].force_sechalf) + (!metadata[src_reg + 1].written || +metadata[src_reg + 1].force_sechalf)); +metadata[dst_reg + 1] = metadata[src_reg + 1]; + } } } else { metadata[dst_reg].force_writemask_all = false; -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] i965: Let the caller of brw_set_dp_write/read_message control the target cache.
brw_set_dp_read_message already had a target_cache argument, but its interpretation was rather contrived: On Gen7+ it was ignored and the data cache was always used, on Gen6 the render cache was used if the caller asked for it, otherwise it was ignored using the sampler cache instead. brw_set_dp_write_message used the data cache on Gen7+ except for RENDER_TARGET_WRITE messages, in which case it would use the render cache. On Gen6 the render cache was always used. Makes no functional changes. Some of the nested ternary operators introduced here will go away in a future commit. --- src/mesa/drivers/dri/i965/brw_eu.h | 1 + src/mesa/drivers/dri/i965/brw_eu_emit.c | 58 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 16 ++- 3 files changed, 45 insertions(+), 30 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 22d5a0a..60f6f69 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -225,6 +225,7 @@ void brw_set_dp_write_message(struct brw_compile *p, unsigned binding_table_index, unsigned msg_control, unsigned msg_type, + unsigned target_cache, unsigned msg_length, bool header_present, unsigned last_render_target, diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 8f15db9..c2e490d 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -675,6 +675,7 @@ brw_set_dp_write_message(struct brw_compile *p, unsigned binding_table_index, unsigned msg_control, unsigned msg_type, + unsigned target_cache, unsigned msg_length, bool header_present, unsigned last_render_target, @@ -683,20 +684,8 @@ brw_set_dp_write_message(struct brw_compile *p, unsigned send_commit_msg) { struct brw_context *brw = p-brw; - unsigned sfid; - - if (brw-gen = 7) { - /* Use the Render Cache for RT writes; otherwise use the Data Cache */ - if (msg_type == GEN6_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE) -sfid = GEN6_SFID_DATAPORT_RENDER_CACHE; - else -sfid = GEN7_SFID_DATAPORT_DATA_CACHE; - } else if (brw-gen == 6) { - /* Use the render cache for all write messages. */ - sfid = GEN6_SFID_DATAPORT_RENDER_CACHE; - } else { - sfid = BRW_SFID_DATAPORT_WRITE; - } + const unsigned sfid = (brw-gen = 6 ? target_cache : + BRW_SFID_DATAPORT_WRITE); brw_set_message_descriptor(p, insn, sfid, msg_length, response_length, header_present, end_of_thread); @@ -722,18 +711,8 @@ brw_set_dp_read_message(struct brw_compile *p, unsigned response_length) { struct brw_context *brw = p-brw; - unsigned sfid; - - if (brw-gen = 7) { - sfid = GEN7_SFID_DATAPORT_DATA_CACHE; - } else if (brw-gen == 6) { - if (target_cache == BRW_DATAPORT_READ_TARGET_RENDER_CACHE) -sfid = GEN6_SFID_DATAPORT_RENDER_CACHE; - else -sfid = GEN6_SFID_DATAPORT_SAMPLER_CACHE; - } else { - sfid = BRW_SFID_DATAPORT_READ; - } + const unsigned sfid = (brw-gen = 6 ? target_cache : + BRW_SFID_DATAPORT_READ); brw_set_message_descriptor(p, insn, sfid, msg_length, response_length, header_present, false); @@ -1989,6 +1968,10 @@ void brw_oword_block_write_scratch(struct brw_compile *p, unsigned offset) { struct brw_context *brw = p-brw; + const unsigned target_cache = + (brw-gen = 7 ? GEN7_SFID_DATAPORT_DATA_CACHE : + brw-gen = 6 ? GEN6_SFID_DATAPORT_RENDER_CACHE : + BRW_DATAPORT_READ_TARGET_RENDER_CACHE); uint32_t msg_control, msg_type; int mlen; @@ -2077,6 +2060,7 @@ void brw_oword_block_write_scratch(struct brw_compile *p, 255, /* binding table index (255=stateless) */ msg_control, msg_type, + target_cache, mlen, true, /* header_present */ 0, /* not a render target */ @@ -2102,6 +2086,10 @@ brw_oword_block_read_scratch(struct brw_compile *p, unsigned offset) { struct brw_context *brw = p-brw; + const unsigned target_cache = + (brw-gen = 7 ? GEN7_SFID_DATAPORT_DATA_CACHE : + brw-gen = 6 ? GEN6_SFID_DATAPORT_RENDER_CACHE : + BRW_DATAPORT_READ_TARGET_RENDER_CACHE);
[Mesa-dev] [PATCH 4/7] i965/fs: Switch to the constant cache for uniform pull constants.
This reverts to using the oword block read messages for uniform pull constant loads, as used to be the case until 4c1fdae0a01b3f92ec03b61aac1d3df5. There are two important differences though: Now the L3 cacheability bits are set up correctly, and we target the constant cache instead of the data cache. The latter turns out to get no L3 way allocation on boot on most platforms, so data cache messages are currently *not* cached on L3 regardless of the MOCS bits, what probably explains the apparent slowness of oword fetches back then. Constant cache loads seem to perform better than SIMD4x2 sampler loads in a number of cases, they alleviate some of the cache thrashing caused by the competition with textures for the L1/L2 sampler caches, and they allow fetching up to 8 consecutive owords (128B) with just one message. FPS deltas relative to master are shown below for all generations since Gen6 and all oword block sizes from 1 to 8. 1oword 2oword 4oword 8oword OglShMapPcf SNB 3% 3% 5% 6% IVB 9% 11% 28% 41% BYT 5% 7% 25% 42% HSW 9% 9% 20% 30% BDW 3% 5% 19% 33% BSW 3% 5% 25% 44% ubo-worst SNB 2% 2% 2% 3% IVB -85%-71%-71%-71% BYT 14% 14% 14% 14% HSW 0% -1% -1% -1% BDW -40%0% 0% 0% BSW 0% 0% 0% 0% ubo-bestSNB 191%190%205%205% IVB 152%350%474%563% BYT 83% 208%292%353% HSW 292%464%615%726% BDW 38% 267%546%581% BSW 124%721%1135% 580% shader-db HSW gained - lost 0 -3 -1 -1 instruction delta in 3.44% 1.90% -1.77% -3.48% affected programs OglShMapPcf is a PCF shadow mapping benchmark from SynMark that exercises pull constants and texture sampling, other tests from the Finnish benchmarking system show either a smaller improvement or no significant change. ubo-worst and ubo-best are the worst- and best-case scenarios of a simple microbenchmark that reads n constants (with n between 1 and 128) from a UBO, accessing up to 2kB of memory per invocation with alignment taken into account. Typically the gap between master and my constant cache branch increases with the amount of bandwidth used by the shader, with n=128 showing the greatest improvement. IVB's apparent worst-case regression deserves an explanation. After some investigation it seems like it's caused by a hardware bug leading to serialization of read requests to the L3 for the same cacheline as result of a (on IVB buggy) mechanism of the L3 to preserve coherency. As read requests for matching cachelines from any L3 client are not pipelined throughput will decrease in cases where there are no non-overlapping requests left in the queue that can be processed in between. I suspect that this situation is relatively uncommon in real-world applications, as the regression disappears completely from my microbenchmark as soon as each individual shader invocation accesses more than two non-overlapping cachelines from L3. To make this situation less likely we should make sure that we don't use the 1/2 oword messages at all if the shader intends to read from any other location in the same cacheline at some other point. This is generally a good idea anyway on all generations because using the 1 and 2 oword messages is expected to waste bandwidth since the minimum L3 request size for the DC is exactly 4 owords (i.e. one cacheline. This probably explains the negative result in the first column for BDW). A future commit will have this effect. I haven't been able to find any real-world example where this would still result in a regression, but if someone happens to find one it shouldn't be too difficult to add an IVB-specific heuristic that falls back to using the sampler for pull constant loads when a shader uses less than certain amount of L3 bandwidth. --- src/mesa/drivers/dri/i965/brw_eu_emit.c| 5 +- src/mesa/drivers/dri/i965/brw_fs.cpp | 34 --- src/mesa/drivers/dri/i965/brw_fs.h | 2 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 84 -- 4 files changed, 40 insertions(+), 85 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index c2e490d..7829878 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2194,7 +2194,7 @@ gen7_block_read_scratch(struct brw_compile *p, } /** - * Read a float[4] vector from the data port Data Cache (const buffer). + * Read a
[Mesa-dev] [PATCH 7/7] i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode.
Not used anymore. It was just a scalar MOV. --- src/mesa/drivers/dri/i965/brw_defines.h| 1 - src/mesa/drivers/dri/i965/brw_fs.h | 3 --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 26 -- src/mesa/drivers/dri/i965/brw_shader.cpp | 2 -- 4 files changed, 32 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index f02a0b8..fe255cc 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -933,7 +933,6 @@ enum opcode { FS_OPCODE_DISCARD_JUMP, FS_OPCODE_SET_OMASK, FS_OPCODE_SET_SAMPLE_ID, - FS_OPCODE_SET_SIMD4X2_OFFSET, FS_OPCODE_PACK_HALF_2x16_SPLIT, FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X, FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y, diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 8349ad2..28a427e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -830,9 +830,6 @@ private: struct brw_reg src0, struct brw_reg src1); - void generate_set_simd4x2_offset(fs_inst *inst, -struct brw_reg dst, -struct brw_reg offset); void generate_discard_jump(fs_inst *inst); void generate_pack_half_2x16_split(fs_inst *inst, diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index b1fca41..e9cd0d9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1276,28 +1276,6 @@ fs_generator::generate_pixel_interpolator_query(fs_inst *inst, inst-regs_written); } - -/** - * Sets the first word of a vgrf for gen7+ simd4x2 uniform pull constant - * sampler LD messages. - * - * We don't want to bake it into the send message's code generation because - * that means we don't get a chance to schedule the instructions. - */ -void -fs_generator::generate_set_simd4x2_offset(fs_inst *inst, - struct brw_reg dst, - struct brw_reg value) -{ - assert(value.file == BRW_IMMEDIATE_VALUE); - - brw_push_insn_state(p); - brw_set_default_compression_control(p, BRW_COMPRESSION_NONE); - brw_set_default_mask_control(p, BRW_MASK_DISABLE); - brw_MOV(p, retype(brw_vec1_reg(dst.file, dst.nr, 0), value.type), value); - brw_pop_insn_state(p); -} - /* Sets vstride=16, width=8, hstride=2 or vstride=0, width=1, hstride=0 * (when mask is passed as a uniform) of register mask before moving it * to register dst. @@ -1947,10 +1925,6 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) generate_untyped_surface_read(inst, dst, src[0], src[1]); break; - case FS_OPCODE_SET_SIMD4X2_OFFSET: - generate_set_simd4x2_offset(inst, dst, src[0]); - break; - case FS_OPCODE_SET_OMASK: generate_set_omask(inst, dst, src[0]); break; diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index d76134b..f77c9a2 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -512,8 +512,6 @@ brw_instruction_name(enum opcode op) return set_omask; case FS_OPCODE_SET_SAMPLE_ID: return set_sample_id; - case FS_OPCODE_SET_SIMD4X2_OFFSET: - return set_simd4x2_offset; case FS_OPCODE_PACK_HALF_2x16_SPLIT: return pack_half_2x16_split; -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/7] i965 L3 caching and pull constant improvements.
This is the first part of a series meant to improve our usage of the L3 cache. Currently it's far from ideal since the following objects aren't taking any advantage of it: - Pull constants (i.e. UBOs and demoted uniforms) - Buffer textures - Shader scratch space (i.e. register spills and fills) - Atomic counters - (Soon) Images This first series addresses the first two issues. Fixing the last three is going to be a bit more difficult because we need to modify the partitioning of the L3 cache in order to increase the number of ways assigned to the DC, which happens to be zero on boot until Gen8. That's likely to require kernel changes because we don't have any extremely satisfactory API to change that from userspace right now. The first patch in the series sets the MOCS L3 cacheability bit in the surface state structure for buffers so the mentioned memory objects (except the shader scratch space that gets its MOCS from elsewhere) have a chance of getting cached in L3. The fourth patch in the series switches to using the constant cache (which, unlike the data cache that was used years ago before we started using the sampler, is cached on L3 with the default partitioning on all gens) for uniform pull constants loads. The overall performance numbers I've collected are included in the commit message of the same patch for future reference. Most of it points at the constant cache being faster than the sampler in a number of cases (assuming the L3 caching settings are correct), it's also likely to alleviate some cache thrashing caused by the competition with textures for the L1/L2 sampler caches, and it allows fetching up to eight consecutive owords (128B) with just one message. The sixth patch enables 4 oword loads because they're basically for free and they avoid some of the shortcomings of the 1 and 2 oword messages (see the commit message for more details). I'll have a look into enabling 8 oword loads but it's going to require an analysis pass to avoid wasting bandwidth and increasing the register pressure unnecessarily when the shader doesn't actually need as many constants. We could do something similar for non-uniform offset pull constant loads and for both kinds of pull constant loads on the vec4 back-end, but I don't have enough performance data to support that yet. [PATCH 1/7] i965: Enable L3 caching of buffer surfaces. [PATCH 2/7] i965: Remove the create_raw_surface vtbl hook. [PATCH 3/7] i965: Let the caller of brw_set_dp_write/read_message control the target cache. [PATCH 4/7] i965/fs: Switch to the constant cache for uniform pull constants. [PATCH 5/7] i965/fs: Less broken handling of force_writemask_all in lower_load_payload(). [PATCH 6/7] i965/fs: Fetch one cacheline of pull constants at a time. [PATCH 7/7] i965/fs: Remove the FS_OPCODE_SET_SIMD4X2_OFFSET virtual opcode. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] i965/fs: Fetch one cacheline of pull constants at a time.
Asking the DC for less than one cacheline (4 owords) of data for uniform pull constants is suboptimal because the DC itself cannot request less than that from L3, resulting in wasted bandwidth, unnecessary message dispatch overhead and exacerbating the L3 serialization bug on IVB. Improves performance of pull constants on all generations I've tested so far. On BDW and BSW the FPS of a microbenchmark increases up to 5-6x, see the third column of the table in i965/fs: Switch to the constant cache for uniform pull constants. for more detailed numbers. Going up to 8 oword blocks would improve performance of pull constants even more, but at the cost of some additional bandwidth and register pressure, so I'd rather do that as a follow-up together with some on-demand mechanism to calculate the block size based on the number of constants actually used by the shader. Currently untested on Gen4-5. --- src/mesa/drivers/dri/i965/brw_eu_emit.c| 10 src/mesa/drivers/dri/i965/brw_fs.cpp | 33 -- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 13 +- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 25 +++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 27 - 5 files changed, 62 insertions(+), 46 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 7829878..b30db88 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2194,7 +2194,7 @@ gen7_block_read_scratch(struct brw_compile *p, } /** - * Read a float[4] vector from the data port constant cache. + * Read four float[4] vectors from the data port constant cache. * Location (in buffer) should be a multiple of 16. * Used for fetching shader constants. */ @@ -2231,8 +2231,8 @@ void brw_oword_block_read(struct brw_compile *p, brw_inst *insn = next_insn(p, BRW_OPCODE_SEND); - /* cast dest to a uword[8] vector */ - dest = retype(vec8(dest), BRW_REGISTER_TYPE_UW); + /* cast dest to a dword[16] vector */ + dest = retype(vec16(dest), BRW_REGISTER_TYPE_UD); brw_set_dest(p, insn, dest); if (brw-gen = 6) { @@ -2245,12 +2245,12 @@ void brw_oword_block_read(struct brw_compile *p, brw_set_dp_read_message(p, insn, bind_table_index, - BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW, + BRW_DATAPORT_OWORD_BLOCK_4_OWORDS, BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ, target_cache, 1, /* msg_length */ true, /* header_present */ - 1); /* response_length (1 reg, 2 owords!) */ + 2); /* response_length (2 regs, 4 owords!) */ brw_pop_insn_state(p); } diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d585a67..3c41e01 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2261,29 +2261,40 @@ fs_visitor::demote_pull_constants() current_annotation = inst-annotation; fs_reg surf_index(stage_prog_data-binding_table.pull_constants_start); - fs_reg dst = fs_reg(this, glsl_type::float_type); /* Generate a pull load into dst. */ if (inst-src[i].reladdr) { +const fs_reg dst = fs_reg(this, glsl_type::float_type); exec_list list = VARYING_PULL_CONSTANT_LOAD(dst, surf_index, *inst-src[i].reladdr, pull_index); inst-insert_before(block, list); + +/* Rewrite the instruction to use the temporary VGRF. */ +inst-src[i].file = GRF; inst-src[i].reladdr = NULL; +inst-src[i].reg = dst.reg; +inst-src[i].reg_offset = 0; } else { -fs_reg offset = fs_reg((unsigned)(pull_index * 4) ~15); +const unsigned num_regs = 2; /* Fetch 4 owords at a time. */ +const unsigned base = (pull_index * 4) ~(32 * num_regs - 1); +const fs_reg dst(GRF, virtual_grf_alloc(num_regs), + BRW_REGISTER_TYPE_F, dispatch_width); fs_inst *pull = - new(mem_ctx) fs_inst(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, 8, -dst, surf_index, offset); + new(mem_ctx) fs_inst(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, +dst, surf_index, fs_reg(base)); +pull-force_writemask_all = true; +pull-regs_written = num_regs; inst-insert_before(block, pull); -inst-src[i].set_smear(pull_index 3); + +/* Rewrite the instruction to
Re: [Mesa-dev] [PATCH 1/7] i965: Enable L3 caching of buffer surfaces.
On Sunday, January 18, 2015 01:04:03 AM Francisco Jerez wrote: And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its semantics vary greatly from one generation to another, so it kind of encourages the caller to pass 0 which is the only valid setting across generations. After this commit the hardware-specific code decides what the best cacheability settings are for buffer surfaces, just like we do for textures. This together with some additional changes coming is expected to improve performance of pull constants, buffer textures, atomic counters and image objects on Gen7 and up. Thanks! I had a version of this lying around, but never measured any gain from it, so I never bothered to send it. I definitely like removing the parameter, and we probably should set it - we do everywhere else... This patch is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] nir: use Python to autogenerate opcode information
thats make sense. regards. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Work around mysterious Gen4 GPU hangs with minimal state changes.
Gen4 hardware appears to GPU hang frequently when using Chromium, and also when running 'glmark2 -b ideas'. Most of the error states contain 3DPRIMITIVE commands in quick succession, with very few state packets between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER. I trimmed an apitrace of the glmark2 hang down to two draw calls with a glUniformMatrix4fv call between the two. Either draw by itself works fine, but together, they hang the GPU. Removing the glUniform call makes the hangs disappear. In the hardware state, this translates to removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets. Flushing before emitting CONSTANT_BUFFER packets also appears to make the hangs disappear. I observed a slowdown in glxgears by doing it all the time, so I've chosen to only do it when BRW_NEW_BATCH and BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or already flushed the whole pipeline). I'd much rather understand the problem, but at this point, I don't see how we'd ever be able to track it down further. We have no real tools, and the hardware people moved on years ago. I've analyzed 20+ error states and read every scrap of documentation I could find. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367 Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: 10.4 10.3 mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_curbe.c | 13 + 1 file changed, 13 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_curbe.c b/src/mesa/drivers/dri/i965/brw_curbe.c index c3d3b9d..d0ec859 100644 --- a/src/mesa/drivers/dri/i965/brw_curbe.c +++ b/src/mesa/drivers/dri/i965/brw_curbe.c @@ -285,6 +285,19 @@ brw_upload_constant_buffer(struct brw_context *brw) */ emit: + /* Work around mysterious 965 hangs that appear to happen if you do +* two 3DPRIMITIVEs with only a CONSTANT_BUFFER inbetween. If we +* haven't already flushed for some other reason, explicitly do so. +* +* We've found no documented reason why this should be necessary. +*/ + if (brw-gen == 4 !brw-is_g4x + (brw-state.dirty.brw (BRW_NEW_BATCH | BRW_NEW_PSP)) == 0) { + BEGIN_BATCH(1); + OUT_BATCH(MI_FLUSH); + ADVANCE_BATCH(); + } + /* BRW_NEW_URB_FENCE: From the gen4 PRM, volume 1, section 3.9.8 * (CONSTANT_BUFFER (CURBE Load)): * -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/fs: Don't use backend_visitor::instructions after creating the CFG.
On Friday, January 16, 2015 11:55:33 PM Matt Turner wrote: On Fri, Jan 16, 2015 at 11:45 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Tuesday, January 13, 2015 03:35:57 PM Matt Turner wrote: This is a fix for a regression introduced in commit a9f8296d (i965/fs: Preserve the CFG in a few more places.). The errata this code works around is described in a comment before the function: [DevBW, DevCL] Errata: A destination register from a send can not be used as a destination register until after it has been sourced by an instruction with a different destination register. The framebuffer write's sources must be in message registers, which SEND instructions cannot have as a destination. There's no way for this errata to affect anything at the end of the program. Just remove the code. I don't think that's the point. The idea is that code such as SEND g10 ...sources... rlen 4 MUL g10 ... ... needs a workaround - you can't write to the destination of a SEND safely without reading them first. You'd have to do: SEND g10 ...sources... rlen 4 MOV null g10 pointless read of g10, any instruction will do MUL g10 ... Normally, the results of SEND instructions are actually used. However, they aren't always - for example, depth texturing returns 4 values, but we only care about the .X channel. Right, and we throw up our hands and resolve all remaining dependencies when we see the end of the basic block because there's a subsequent basic block that may write the destination. At the end of the program though... we can't possibly need to resolve anything outstanding because we can't possibly overwrite it. Can we? I agree, I think this should be safe. It sounds like the effects of the bug are an undefined write ordering...probably not GPU hangs. If that's true, then we're obviously fine - we never overwrite it. On the completely paranoid side of things, there could be some bit in the hardware that leaves the register stuck: I'm not done with the last write, I need to stall until it completes before doing this one. And, it's possible it could persist between threads. Which would leave us stalled forever, and we'd hang the GPU. But I sincerely doubt that's the case, and I agree with you that this should be fine. I would like to see the commit message updated - instead of the bit about MRFs, say that we think it's pointless to apply the workaround for registers that are never written again, and that deleting the code is an alternative to making it work in CFG-land. With an updated commit message and Piglit passing (I'll test and let you know), Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 88523] sha1.c:37: error: 'SHA1_CTX' undeclared (first use in this function)
https://bugs.freedesktop.org/show_bug.cgi?id=88523 José Fonseca jfons...@vmware.com changed: What|Removed |Added CC||jfons...@vmware.com --- Comment #4 from José Fonseca jfons...@vmware.com --- I think that for src/util either we: - name headers as prefix_foo.h and include them as include prefix_foo.h - or we always include the directory name include util/foo.h Naming headers as foo.h and including as foo.h is bound to cause conflicts. I also think that util might not be a good prefix for this. I'd suggest we rename src/util to for exmaple src/cgr -- for common graphics runtime. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965/fs: Only apply Gen4 work-arounds if regs_written 1.
On Tuesday, January 13, 2015 03:40:32 PM Matt Turner wrote: On Tue, Jan 13, 2015 at 3:35 PM, Matt Turner matts...@gmail.com wrote: Otherwise, we would have necessarily read the results or eliminated the dead SEND. In either case, no work around is necessary. Noticed when debugging the problem the previous patch fixed that any time we hit a math instruction, we'd walk through subsequent instructions, and of course each time discover that its result was in fact used. --- I was thinking through the pre-send dependency work around: /** * Implements this workaround for the original 965: * * [DevBW, DevCL] Implementation Restrictions: As the hardware does not * check for post destination dependencies on this instruction, software * must ensure that there is no destination hazard for the case of ‘write * followed by a posted write’ shown in the following example. * * 1. mov r3 0 * 2. send r3.xy rest of send instruction * 3. mov r2 r3 * * Due to no post-destination dependency check on the ‘send’, the above * code sequence could have two instructions (1 and 2) in flight at the * same time that both consider ‘r3’ as the target of their final writes. */ While this is a hardware problem or something, isn't it impossible for us to hit? If the first MOV's results weren't read, we would have dead code eliminated it. If they were read (necessarily between it and the SEND), we would never have both instructions in flight at once. It's definitely pretty rare, though I'm not certain I can say it never happens. If you care to look into it further, I found the bug report which spawned this code: https://bugs.freedesktop.org/show_bug.cgi?id=58960 The attachment contains a sample application which I managed to compile via: $ for file in *.h; do moc-qt4 $file moc-$(basename $file .h).cpp; done $ g++ -Wall -g $(pkg-config --libs --cflags QtCore QtGui QtOpenGL gl) *.cpp It would be great if we could make a Piglit test. Is there some case where we could realistically hit this problem? Maybe with control flow? I would like to mention that neither of these work arounds are implemented in the vec4 backend. That's true, but they probably should be. We originally reproduced this bug with texturing instructions, which at the time were only supported in the FS. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium: add MULTISAMPLE_Z_RESOLVE cap
Resolving a multisampled depth texture into a single sampled texture is supported on = SM4.1 hw. It is possible some previous hw support it. The ability was tested on radeonsi and nvc0. Apparently is is also supported for radeon = r700. This patch adds the MULTISAMPLE_Z_RESOLVE cap and add it to the drivers. It is advertised for drivers for which it is sure the ability is supported. A comment was added for drivers for which the feature is probably supported. Signed-off-by: Axel Davy axel.d...@ens.fr --- This feature corresponds to the RESZ d3d9 hack. d3d9 hacks are equivalent to GL extensions. RESZ is advertised under win by amd = r700 and intel = G45. Nv doesn't advertise the extension but allows similar feature in some Nv specific Api. I don't send right away the gallium Nine RESZ support patch, as I want other patches be merged first. src/gallium/docs/source/screen.rst | 2 ++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 2 ++ src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 2 ++ src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 2 ++ src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + 15 files changed, 19 insertions(+) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 55d114c..b2485bc 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -241,6 +241,8 @@ The integer capabilities: semantics. Only relevant if geometry shaders are supported. (Currently not possible to query availability of these two semantics outside this, at least BASEVERTEX should be exposed separately too). +* ``PIPE_CAP_MULTISAMPLE_Z_RESOLVE``: Whether the driver supports blitting + a multisampled depth buffer into a single-sampled texture (or depth buffer). .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 084a0ec..bf8d4e9 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -229,6 +229,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_SAMPLER_VIEW_TARGET: case PIPE_CAP_CLIP_HALFZ: case PIPE_CAP_VERTEXID_NOBASE: + case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: return 0; case PIPE_CAP_MAX_VIEWPORTS: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 1277de3..1393e7e 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -227,6 +227,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_CONDITIONAL_RENDER_INVERTED: case PIPE_CAP_CLIP_HALFZ: case PIPE_CAP_VERTEXID_NOBASE: + case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: return 0; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index 0c948f4..a4c9b03 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -470,6 +470,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: case PIPE_CAP_CONDITIONAL_RENDER_INVERTED: case PIPE_CAP_SAMPLER_VIEW_TARGET: + case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: /* may be supported */ return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 0e4456a..f6e1e52 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -284,6 +284,8 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) return 1; case PIPE_CAP_VERTEXID_NOBASE: return 0; + case PIPE_CAP_MULTISAMPLE_Z_RESOLVE: /* may be supported */ + return 0; } /* should only get here on unhandled cases */ debug_printf(Unexpected PIPE_CAP %d query\n, param); diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index 46c21a1..f7809cb 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -158,6 +158,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_SAMPLER_VIEW_TARGET: case PIPE_CAP_CLIP_HALFZ: case
[Mesa-dev] [Bug 88534] include/c11/threads_posix.h PTHREAD_MUTEX_RECURSIVE_NP not defined
https://bugs.freedesktop.org/show_bug.cgi?id=88534 Bug ID: 88534 Summary: include/c11/threads_posix.h PTHREAD_MUTEX_RECURSIVE_NP not defined Product: Mesa Version: git Hardware: Other OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: felix.ja...@posteo.de Created attachment 112394 -- https://bugs.freedesktop.org/attachment.cgi?id=112394action=edit Proposed patch The non-portable version of PTHREAD_MUTEX_RECURSIVE is used since older glibc didn't have the POSIX version. The attached patch makes the code only fall back to PTHREAD_MUTEX_RECURSIVE_NP if PTHREAD_MUTEX_RECURSIVE is not defined. This fixes compilation with other libcs such as musl, which don't have the nonstandard version. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 64449] xorg hangs randomly with Radeon HD 7450A
https://bugs.freedesktop.org/show_bug.cgi?id=64449 Alberto Salvia Novella es204904...@gmail.com changed: What|Removed |Added Priority|medium |highest URL||https://bugs.launchpad.net/ ||ubuntu/+source/xserver-xorg ||-video-ati/+bug/881526 CC||es204904...@gmail.com Component|Drivers/Gallium/r600|GLX Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop. |.org|org Summary|AMD graphics hardware hangs |xorg hangs randomly with |with an homogeneous |Radeon HD 7450A |coloured screen or blank| |screen, and with chirp | |coming from the graphics| |card| -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/22] glsl: Add sqrt, rsq, exp, exp2 to get_range
I see why you are worried, and I agree 100%. This just reinforces my impression that expanding this pass does not give adequate return on investment. If we had even better coverage we just might get some advantage, but even then I have a bad feeling about this. Do you have any suggestions for operations apart from expressions and constants that we can get a range of? If so I could work on it some more to figure out if this is getting us anywhere at all. If I recall correctly the z component of gl_Position is bound between 0 and 1? 2015-01-09 4:15 GMT+01:00 Connor Abbott cwabbo...@gmail.com: On Sat, Jan 3, 2015 at 2:18 PM, Thomas Helland thomashellan...@gmail.com wrote: Also handle undefined behaviour for sqrt(x) where x 0 and rsq(x) where x = 0. This gives us some reduction in instruction count on three Dungeon Defenders shaders as they are doing: max(exp(x), 0) So initially when you said that Dungeon Defenders was doing max(exp(x), 0), my thought was wat? but after thinking about it some more, I can see why it would do this. The GLSL spec doesn't guarantee that implementations of +, *, exp(), etc. will return NaN when one of the arguments is NaN, but it also doesn't guarantee that they *won't*; in other words, if for some strange reason you need the old-style never-return-NaN functionality, you need to do something like what this game is doing. For implementations that don't return NaN, this optimization is just fine, but if you remove it when the HW does return NaN, then whatever's using the result might get a NaN when it's not expecting it, leading to Bad Things happening. Maybe it isn't an issue with this particular game, but in order to be correct here it seems like we do have to take NaN's into account after all. There was a related thread (and other discussions) about the behavior of min/max wrt NaN's: http://lists.freedesktop.org/archives/mesa-dev/2014-December/073182.html My conclusion is that basically everyone that actually produces NaN's follows the IEEE/D3D behavior here, which I'm assuming the Dungeon Defenders developers were probably depending on. v2: Change to use new IS_CONSTANT() macro Fix high unintenionally not being returned Add some air for readability Comment on the exploit of undefined behavior Constify mem_ctx --- src/glsl/opt_minmax.cpp | 31 +++ 1 file changed, 31 insertions(+) diff --git a/src/glsl/opt_minmax.cpp b/src/glsl/opt_minmax.cpp index 56805c0..2faa3c3 100644 --- a/src/glsl/opt_minmax.cpp +++ b/src/glsl/opt_minmax.cpp @@ -274,9 +274,40 @@ get_range(ir_rvalue *rval) minmax_range r0; minmax_range r1; + void *const mem_ctx = ralloc_parent(rval); + + ir_constant *low = NULL; + ir_constant *high = NULL; + if (expr) { switch (expr-operation) { + case ir_unop_exp: + case ir_unop_exp2: + case ir_unop_sqrt: + case ir_unop_rsq: + r0 = get_range(expr-operands[0]); + + /* The spec says sqrt is undefined if x 0 + * We can use this to set the range to whatever we want + */ + if (expr-operation == ir_unop_sqrt + IS_CONSTANT(r0.high, , 0.0f)) +high = new(mem_ctx) ir_constant(0.0f); + + /* The spec says rsq is undefined if x = 0 + * We can use this to set the range to whatever we want + */ + if (expr-operation == ir_unop_rsq + IS_CONSTANT(r0.high, =, 0.0f)) +high = new(mem_ctx) ir_constant(0.0f); + + /* TODO: If we know, i.e, the lower range of the operand + * we can calculate the lower range + */ + low = new(mem_ctx) ir_constant(0.0f); + return minmax_range(low, high); + case ir_binop_min: case ir_binop_max: r0 = get_range(expr-operands[0]); -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/22] glsl: Reorder optimization-passes
2015-01-03 22:48 GMT+01:00 Matt Turner matts...@gmail.com: On Sat, Jan 3, 2015 at 11:18 AM, Thomas Helland thomashellan...@gmail.com wrote: This allows opt_algebraic to resolve open-coded saturates into ir_unop_saturate before we potentially mess it up by removing the min or max in min/max-pruning. Since we are now emitting more free saturates on i965 this gives us some decrease in instruction count. total instructions in shared programs: 1317459 - 1317065 (-0.03%) instructions in affected programs: 4084 - 3690 (-9.65%) GAINED:0 LOST: 0 You're definitely onto something here. On our collection of shaders: total instructions in shared programs: 5876617 - 5875919 (-0.01%) instructions in affected programs: 9443 - 8745 (-7.39%) with some fragment shaders hurt in Natural Selection 2 and Kerbal Space program. I'll investigate these. Hi Matt, Don't want to be a nuisance (if that is even the right word? English is not my native tongue), but did you find the time to look at these regressions? If I had some information about what regressions you are seeing I could try to work them out. Then this patch would be merge-material I guess. The rest of the series I'm not that happy about. Seems to me the return on investment is not adequate. But I'll leave that up to other people to decide. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] nir: use Python to autogenerate opcode information
hi. #! /usr/bin/env python corresponds python 3.x series for some major distro (arch,fedora ...) and python 2.x for some others. python 2.x and python 3.x are not source compatible each other. python 3.x not contains xrange funcion anymore. range vs xrange only meaningfull for python 2.x. http://www.pythoncentral.io/how-to-use-pythons-xrange-and-range/ Distros that which still use 2.x series as default python interpreter going to 3.x. regargs. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev