Re: [Mesa-dev] [PATCH 04/11] st/mesa: Use Array._DrawVAO in st_atom_array.c.
Good Morning, Again sorry, but since I only work here in the spare time, I did not find enough to respond earlier. On Tuesday, 4 December 2018 10:35:58 CET Erik Faye-Lund wrote: > > Sounds to me like that, or even worse something with the > > supertuxkart. > > I have not yet understood what they are doing in detail with the > > VAO's. > > But I was slightly looking into the direction of mmapping the buffer > > objects > > and not flushing them correctly. That could potentially also lead to > > such failures. Especially since some people observe and some not. > > Not finally finished with investigating, but up to now I did not see > > something > > wrong there. > > > > One more breadcrumb: > > Gert informed me (through other channels) that he had isolated this > issue to only trigger for indirect draws. > > That might clear up a bit why this seems to happen in so few > applications; there's probably some combination of input layouts that > together with indirect draws become a very rare combination. That does not directly ring a bell for me, but At least that narrows down the issue somehow. Gert, do you know, indirect draws with the struct Draw{Arrays,Elements}IndirectCommand in user space memory or in a buffer object? > > > > Hmm, one question, on the mentioned setup on virgl. How does > > > > glxgears render on that setup? Or alternatively how do other > > > > OpenGL > > > > applications different from supertuxkart on that setup? > > > > > > glxgears renders just fine. We'er also passing pretty much all of > > > the > > > OpenGL 4.3 CTS and most of piglit. Generally speaking, this doesn't > > > trigger. > > > > That was my expectation as well as it took so long to find something. > > But still, not impossible that something is wrong. > > > > > I just got a notice that Serious Sam 3 has a similar problem (I > > > haven't > > > tested this myself)... So perhaps there's some pattern that can be > > > found? > > > > One observation that I saw with supertuxkart. > > They really have a VAO that ends up with two effective bindings used > > by 3 and 2 vertex attributes. That is the old gallium array > > translation code > > did produce 5 vertex_element struct entries and each of that has a > > vertex_buffer struct assigned. The minimal pipe_vertex_buffer > > translation > > only happened in the old code if it could be reduced to a single > > vertex buffer entry. > > Now the code produces that 3 pipe_vertex_element referencing 1 > > pipe_vertex_buffer and 2 pipe_vertex_element referencing an other > > pipe_vertex_buffer. Which should be more optimal now but is it > > possible > > that virgl somewhere down the road only handles the n elements to one > > buffer > > and the n element to n buffer case. So the question is is there a bug > > in the n elements > > to 1 < m < n buffer case? > > Do you know what I mean with the effective binding? > > I'm not quite sure I follow here. What's n and m in this case (I seem > to see three definitions of n, where two are similar, and none of m)? > > But looking through both virgl and virglrenderer, I can't spot anything > obviously wrong with the way inputs are being set up... Ok, thanks for looking. I meant with M:N a layout something like pipe_vertex_elements: { vertex_buffer_index = 0, src_offset = 0, ...}, { vertex_buffer_index = 0, src_offset = 12, ...}, { vertex_buffer_index = 1, src_offset = 0, ...}, { vertex_buffer_index = 1, src_offset = 8, ...} pipe_vertex_buffer: { buffer_offset = 0, buffer.resource = }, { buffer_offset = 0, buffer.resource = } Finally this is a 4:2 mapping. The previous code did only produce either a N:1 mapping like pipe_vertex_elements: { vertex_buffer_index = 0, src_offset = 0, ...}, { vertex_buffer_index = 0, src_offset = 12, ...}, { vertex_buffer_index = 0, src_offset = 24, ...}, { vertex_buffer_index = 0, src_offset = 36, ...} pipe_vertex_buffer: { buffer_offset = 0, buffer.resource = } this one, if there was a single buffer object used that allows this kind of layout. So there are *all* pipe_vertex_elements refering to a single pipe_vertex_buffer. Or a N:N mapping like pipe_vertex_elements: { vertex_buffer_index = 0, src_offset = 0, ...}, { vertex_buffer_index = 1, src_offset = 0, ...}, { vertex_buffer_index = 2, src_offset = 0, ...}, { vertex_buffer_index = 3, src_offset = 0, ...} pipe_vertex_buffer: { buffer_offset = 0, buffer.resource = }, { buffer_offset = 12, buffer.resource = }, { buffer_offset = 36, buffer.resource = }, { buffer_offset = 24, buffer.resource = } where you have one pipe_vertex_buffer per pipe_vertex_element. So, If the backing driver somehow 'knew' that we could only get N:1 or N:N and not something like the 4:2 example above, we could easily fail with the change you found. best Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/11] st/mesa: Use Array._DrawVAO in st_atom_array.c.
Good Morning, On Tuesday, 4 December 2018 13:33:29 CET Gert Wollny wrote: > Am Dienstag, den 04.12.2018, 10:35 +0100 schrieb Erik Faye-Lund: > > > > But looking through both virgl and virglrenderer, I can't spot > > anything obviously wrong with the way inputs are being set up... > > > > > May be you can observe the same type of VAO in Serious Sam 3? > > > > I don't have Serious Sam 3 running myself. Gert? > > I have not yet looked in detail, but disabling ARB_draw_indirect > doesn't help neither with SS3 nor with Shadow Warrior - at least the > latter renders fine on the r600 host, the SS3 doesn't properly start > currently, but the Talos Principle runs fine on the host (takes forever > to load in the guest so I didn't check with virgl) and this is more or > less the same engine. So, that means it's not just happening with indirect draws. Hmm ... best Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 100960] Special block from Minecraft mod rendered out of place
https://bugs.freedesktop.org/show_bug.cgi?id=100960 --- Comment #20 from Tapani Pälli --- (In reply to Fabian Maurer from comment #19) > Hey there, sorry for the late answer. > > The mod has fixed the issue in its latest version, so for me this issue > isn't relevant anymore. Do we still want a change in this, or do we consider > this a "NOTOURBUG"? > > Sidenote, why i965_dri.so? I though I'm using radeonsi. IMO NOTOURBUG seems the correct resolution here -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/4] nir: rewrite varying component packing
Sorry please ignore this for now. I've realised there is a bug here where we could end up packing components in only one of the shaders but not the other. For example if we have an array on one side but just a bunch of individual varyings on the other (which is legal I believe). I'll send a version 3 to fix this. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] travis: radeonsi and radv require LLVM 7.0
Fixes: 3fbdcd942fe ("amd: remove support for LLVM 6.0") Cc: Samuel Pitoiset Cc: Marek Olšák Cc: Emil Velikov Cc: Jan Vesely Cc: Andres Gomez Cc: Dylan Baker Signed-off-by: Rhys Kidd --- .travis.yml | 29 - 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/.travis.yml b/.travis.yml index 6b50d49e143..8eceb4e86f5 100644 --- a/.travis.yml +++ b/.travis.yml @@ -38,17 +38,18 @@ matrix: - DRI_DRIVERS="" - GALLIUM_DRIVERS="" - VULKAN_DRIVERS="intel,amd" -- LLVM_VERSION=6.0 +- LLVM_VERSION=7 - LLVM_CONFIG="llvm-config-${LLVM_VERSION}" addons: apt: sources: -- llvm-toolchain-trusty-6.0 -# llvm-6 requires libstdc++4.9 which is not in main repo +- sourceline: 'deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-7 main' + key_url: https://apt.llvm.org/llvm-snapshot.gpg.key +# llvm-7 requires libstdc++4.9 which is not in main repo - ubuntu-toolchain-r-test packages: # From sources above -- llvm-6.0-dev +- llvm-7-dev # Common - xz-utils - libexpat1-dev @@ -131,7 +132,7 @@ matrix: - BUILD=make - MAKEFLAGS="-j4" - MAKE_CHECK_COMMAND="true" -- LLVM_VERSION=6.0 +- LLVM_VERSION=7 - LLVM_CONFIG="llvm-config-${LLVM_VERSION}" - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl" - DRI_DRIVERS="" @@ -142,12 +143,13 @@ matrix: addons: apt: sources: -- llvm-toolchain-trusty-6.0 -# llvm-6 requires libstdc++4.9 which is not in main repo +- sourceline: 'deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-7 main' + key_url: https://apt.llvm.org/llvm-snapshot.gpg.key +# llvm-7 requires libstdc++4.9 which is not in main repo - ubuntu-toolchain-r-test packages: # From sources above -- llvm-6.0-dev +- llvm-7-dev # Common - xz-utils - libexpat1-dev @@ -305,7 +307,7 @@ matrix: - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl" - DRI_DRIVERS="" - GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa" -- GALLIUM_DRIVERS="r600,radeonsi" +- GALLIUM_DRIVERS="r600" - VULKAN_DRIVERS="" - LIBUNWIND_FLAGS="--enable-libunwind" addons: @@ -400,7 +402,7 @@ matrix: - BUILD=make - MAKEFLAGS="-j4" - MAKE_CHECK_COMMAND="make -C src/gtest check && make -C src/intel check" -- LLVM_VERSION=6.0 +- LLVM_VERSION=7 - LLVM_CONFIG="llvm-config-${LLVM_VERSION}" - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl --with-platforms=x11,wayland" - DRI_DRIVERS="" @@ -411,12 +413,13 @@ matrix: addons: apt: sources: -- llvm-toolchain-trusty-6.0 -# llvm-6 requires libstdc++4.9 which is not in main repo +- sourceline: 'deb http://apt.llvm.org/trusty/ llvm-toolchain-trusty-7 main' + key_url: https://apt.llvm.org/llvm-snapshot.gpg.key +# llvm-7 requires libstdc++4.9 which is not in main repo - ubuntu-toolchain-r-test packages: # From sources above -- llvm-6.0-dev +- llvm-7-dev # Common - xz-utils - libexpat1-dev -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 77449] Tracker bug for all bugs related to Steam titles
https://bugs.freedesktop.org/show_bug.cgi?id=77449 Bug 77449 depends on bug 101978, which changed state. Bug 101978 Summary: [bisected] war thunder performance reduced by ~28% https://bugs.freedesktop.org/show_bug.cgi?id=101978 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] nir: fixed some missed varying compaction opportunities
I'd much rather land the first 3 patches from this series if possible. https://patchwork.freedesktop.org/series/53800/ I've confirmed it packs the shaders you were looking at as expected once you patch 2 is applied. The series makes this code much more flexible (for future improvements) and easier to follow. On 9/12/18 5:28 am, Rob Clark wrote: Previously, if we had a .z or .w component that could be compacted to .y, we'd could overlook that opportunity. Signed-off-by: Rob Clark --- src/compiler/nir/nir_linking_helpers.c | 30 -- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index 1ab9c095657..ce368a3c132 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -431,12 +431,30 @@ compact_components(nir_shader *producer, nir_shader *consumer, uint8_t *comps, uint8_t interp = get_interp_type(var, type, default_to_smooth_interp); for (; cursor[interp] < 32; cursor[interp]++) { uint8_t cursor_used_comps = comps[cursor[interp]]; +uint8_t unused_comps = ~cursor_used_comps; -/* We couldn't find anywhere to pack the varying continue on. */ -if (cursor[interp] == location && -(var->data.location_frac == 0 || - cursor_used_comps & ((1 << (var->data.location_frac)) - 1))) - break; +/* Don't search beyond our current location, we are just trying + * to pack later varyings to lower positions: + */ +if (cursor[interp] == location) { + if (var->data.location_frac == 0) + break; + + /* If not already aligned to slot, see if we can shift it up. +* Note that if we get this far it is a scalar so we know that +* shifting this var to any other open position won't conflict +* with it's current position. +*/ + unsigned p = ffs(unused_comps & 0xf); + if (!p) + break; + + /* ffs returns 1 for bit zero: */ + p--; + + if (p >= var->data.location_frac) + break; +} /* We can only pack varyings with matching interpolation types */ if (interp_type[cursor[interp]] != interp) @@ -460,8 +478,6 @@ compact_components(nir_shader *producer, nir_shader *consumer, uint8_t *comps, if (!cursor_used_comps) continue; -uint8_t unused_comps = ~cursor_used_comps; - for (unsigned i = 0; i < 4; i++) { uint8_t new_var_comps = 1 << i; if (unused_comps & new_var_comps) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC v2 4/4] nir: add crude live range analysis to nir_compact_varyings()
Using robs packing fix for the st I'm actually getting results for radeonsi now but they are pretty mixed for this patch: Totals from affected shaders: SGPRS: 35992 -> 35520 (-1.31 %) VGPRS: 20688 -> 20808 (0.58 %) Spilled SGPRs: 1926 -> 1996 (3.63 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1053168 -> 1055452 (0.22 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 4636 -> 4616 (-0.43 %) Wait states: 0 -> 0 (0.00 %) On 10/12/18 11:31 am, Timothy Arceri wrote: vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 27168 -> 27872 (2.59 %) VGPRS: 24180 -> 24056 (-0.51 %) Spilled SGPRs: 28 -> 24 (-14.29 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1584936 -> 1585552 (0.04 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 3804 -> 3824 (0.53 %) Wait states: 0 -> 0 (0.00 %) --- src/compiler/nir/nir_linking_helpers.c | 45 ++ 1 file changed, 45 insertions(+) diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index 8bd4acc2ee..badda80979 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -415,6 +415,8 @@ remap_slots_and_components(struct exec_list *var_list, gl_shader_stage stage, struct varying_component { nir_variable *var; + unsigned first_block_use; + unsigned last_block_use; uint8_t interp_type; uint8_t interp_loc; bool is_patch; @@ -441,10 +443,36 @@ cmp_varying_component(const void *comp1_v, const void *comp2_v) if (comp1->interp_loc != comp2->interp_loc) return comp1->interp_loc - comp2->interp_loc; + /* Attempt to reduce register pressure with crude live range analysis */ + if (comp1->first_block_use != comp2->first_block_use) + return comp1->first_block_use - comp2->first_block_use; + if (comp1->last_block_use != comp2->last_block_use) + return comp1->last_block_use - comp2->last_block_use; + /* If everything else matches just use the original location to sort */ return comp1->var->data.location - comp2->var->data.location; } +static void +set_block_use(struct varying_component *vc_info, nir_src *src, + bool is_if_condition) +{ + nir_block *blk + = nir_cursor_current_block(nir_before_src(src, is_if_condition)); + + if (vc_info->initialised) { + if (vc_info->first_block_use > blk->index) + vc_info->first_block_use = blk->index; + + if (vc_info->last_block_use < blk->index) + vc_info->last_block_use = blk->index; + } else { + vc_info->first_block_use = blk->index; + vc_info->last_block_use = blk->index; + vc_info->initialised = true; + } +} + static void gather_varying_component_info(nir_shader *consumer, struct varying_component **varying_comp_info, @@ -533,6 +561,14 @@ gather_varying_component_info(nir_shader *consumer, vc_info->interp_loc = get_interp_loc(in_var); vc_info->is_patch = in_var->data.patch; } + + nir_foreach_use(src, &intr->dest.ssa) { +set_block_use(vc_info, src, false); + } + + nir_foreach_if_use(src, &intr->dest.ssa) { +set_block_use(vc_info, src, true); + } } } } @@ -651,6 +687,12 @@ void nir_compact_varyings(nir_shader *producer, nir_shader *consumer, bool default_to_smooth_interp) { + nir_function_impl *p_impl = nir_shader_get_entrypoint(producer); + nir_function_impl *c_impl = nir_shader_get_entrypoint(consumer); + + nir_metadata_require(p_impl, nir_metadata_block_index); + nir_metadata_require(c_impl, nir_metadata_block_index); + assert(producer->info.stage != MESA_SHADER_FRAGMENT); assert(consumer->info.stage != MESA_SHADER_VERTEX); @@ -665,6 +707,9 @@ nir_compact_varyings(nir_shader *producer, nir_shader *consumer, compact_components(producer, consumer, unmoveable_comps, default_to_smooth_interp); + + nir_metadata_preserve(p_impl, nir_metadata_block_index); + nir_metadata_preserve(c_impl, nir_metadata_block_index); } /* ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/4] nir: rewrite varying component packing
There are three reasons for the rewrite. 1. Adding support for packing tess patch varyings in a sane way. 2. Making use of qsort allowing the code to be much easier to follow. 3. Allowing us to add a crude live range analysis for deciding which components should be packed together. This support will be added in a future patch. v2: pack moveable components with the unmoveable components. The new pass is now functionally the same as the old pass besides the new support for packing patches. --- src/compiler/nir/nir_linking_helpers.c | 305 - 1 file changed, 196 insertions(+), 109 deletions(-) diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index c26582ddec..8bd4acc2ee 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -247,22 +247,20 @@ is_packing_supported_for_type(const struct glsl_type *type) return true; } +/* Packing arrays and dual slot varyings is difficult so to avoid complex + * algorithms this function marks these components as unmoveable. + */ static void -get_slot_component_masks_and_interp_types(struct exec_list *var_list, - uint8_t *comps, - uint8_t *interp_type, - uint8_t *interp_loc, - gl_shader_stage stage, - bool default_to_smooth_interp) +get_unmoveable_components_masks(struct exec_list *var_list, uint8_t *comps, +gl_shader_stage stage, +bool default_to_smooth_interp) { nir_foreach_variable_safe(var, var_list) { assert(var->data.location >= 0); - /* Only remap things that aren't built-ins. - * TODO: add TES patch support. - */ + /* Only remap things that aren't built-ins. */ if (var->data.location >= VARYING_SLOT_VAR0 && - var->data.location - VARYING_SLOT_VAR0 < 32) { + var->data.location - VARYING_SLOT_VAR0 < MAX_VARYINGS_INCL_PATCH) { const struct glsl_type *type = var->type; if (nir_is_per_vertex_io(var, stage)) { @@ -270,6 +268,12 @@ get_slot_component_masks_and_interp_types(struct exec_list *var_list, type = glsl_get_array_element(type); } + /* If we can pack this varying then don't mark the components as + * used. + */ + if (is_packing_supported_for_type(type)) +continue; + unsigned location = var->data.location - VARYING_SLOT_VAR0; unsigned elements = glsl_get_vector_elements(glsl_without_array(type)); @@ -278,10 +282,6 @@ get_slot_component_masks_and_interp_types(struct exec_list *var_list, unsigned slots = glsl_count_attribute_slots(type, false); unsigned comps_slot2 = 0; for (unsigned i = 0; i < slots; i++) { -interp_type[location + i] = - get_interp_type(var, type, default_to_smooth_interp); -interp_loc[location + i] = get_interp_loc(var); - if (dual_slot) { if (i & 1) { comps[location + i] |= ((1 << comps_slot2) - 1); @@ -413,32 +413,55 @@ remap_slots_and_components(struct exec_list *var_list, gl_shader_stage stage, *p_out_slots_read = out_slots_read_tmp[1]; } -/* If there are empty components in the slot compact the remaining components - * as close to component 0 as possible. This will make it easier to fill the - * empty components with components from a different slot in a following pass. - */ -static void -compact_components(nir_shader *producer, nir_shader *consumer, uint8_t *comps, - uint8_t *interp_type, uint8_t *interp_loc, - bool default_to_smooth_interp) +struct varying_component { + nir_variable *var; + uint8_t interp_type; + uint8_t interp_loc; + bool is_patch; + bool initialised; +}; + +static int +cmp_varying_component(const void *comp1_v, const void *comp2_v) { - struct exec_list *input_list = &consumer->inputs; - struct exec_list *output_list = &producer->outputs; - struct varying_loc remap[MAX_VARYINGS_INCL_PATCH][4] = {{{0}, {0}}}; + struct varying_component *comp1 = (struct varying_component *) comp1_v; + struct varying_component *comp2 = (struct varying_component *) comp2_v; - /* Create a cursor for each interpolation type */ - unsigned cursor[4] = {0}; + /* We want patches to be order at the end of the array */ + if (comp1->is_patch != comp2->is_patch) + return comp1->is_patch ? 1 : -1; - /* We only need to pass over one stage and we choose the consumer as it seems -* to cause a larger reduction in instruction counts (tested on i965). + /* We can only pack varyings with matching interpolation types so group +* them together. */ - nir_foreach_var
[Mesa-dev] [RFC v2 4/4] nir: add crude live range analysis to nir_compact_varyings()
vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 27168 -> 27872 (2.59 %) VGPRS: 24180 -> 24056 (-0.51 %) Spilled SGPRs: 28 -> 24 (-14.29 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1584936 -> 1585552 (0.04 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 3804 -> 3824 (0.53 %) Wait states: 0 -> 0 (0.00 %) --- src/compiler/nir/nir_linking_helpers.c | 45 ++ 1 file changed, 45 insertions(+) diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index 8bd4acc2ee..badda80979 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -415,6 +415,8 @@ remap_slots_and_components(struct exec_list *var_list, gl_shader_stage stage, struct varying_component { nir_variable *var; + unsigned first_block_use; + unsigned last_block_use; uint8_t interp_type; uint8_t interp_loc; bool is_patch; @@ -441,10 +443,36 @@ cmp_varying_component(const void *comp1_v, const void *comp2_v) if (comp1->interp_loc != comp2->interp_loc) return comp1->interp_loc - comp2->interp_loc; + /* Attempt to reduce register pressure with crude live range analysis */ + if (comp1->first_block_use != comp2->first_block_use) + return comp1->first_block_use - comp2->first_block_use; + if (comp1->last_block_use != comp2->last_block_use) + return comp1->last_block_use - comp2->last_block_use; + /* If everything else matches just use the original location to sort */ return comp1->var->data.location - comp2->var->data.location; } +static void +set_block_use(struct varying_component *vc_info, nir_src *src, + bool is_if_condition) +{ + nir_block *blk + = nir_cursor_current_block(nir_before_src(src, is_if_condition)); + + if (vc_info->initialised) { + if (vc_info->first_block_use > blk->index) + vc_info->first_block_use = blk->index; + + if (vc_info->last_block_use < blk->index) + vc_info->last_block_use = blk->index; + } else { + vc_info->first_block_use = blk->index; + vc_info->last_block_use = blk->index; + vc_info->initialised = true; + } +} + static void gather_varying_component_info(nir_shader *consumer, struct varying_component **varying_comp_info, @@ -533,6 +561,14 @@ gather_varying_component_info(nir_shader *consumer, vc_info->interp_loc = get_interp_loc(in_var); vc_info->is_patch = in_var->data.patch; } + + nir_foreach_use(src, &intr->dest.ssa) { +set_block_use(vc_info, src, false); + } + + nir_foreach_if_use(src, &intr->dest.ssa) { +set_block_use(vc_info, src, true); + } } } } @@ -651,6 +687,12 @@ void nir_compact_varyings(nir_shader *producer, nir_shader *consumer, bool default_to_smooth_interp) { + nir_function_impl *p_impl = nir_shader_get_entrypoint(producer); + nir_function_impl *c_impl = nir_shader_get_entrypoint(consumer); + + nir_metadata_require(p_impl, nir_metadata_block_index); + nir_metadata_require(c_impl, nir_metadata_block_index); + assert(producer->info.stage != MESA_SHADER_FRAGMENT); assert(consumer->info.stage != MESA_SHADER_VERTEX); @@ -665,6 +707,9 @@ nir_compact_varyings(nir_shader *producer, nir_shader *consumer, compact_components(producer, consumer, unmoveable_comps, default_to_smooth_interp); + + nir_metadata_preserve(p_impl, nir_metadata_block_index); + nir_metadata_preserve(c_impl, nir_metadata_block_index); } /* -- 2.19.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/4] nir: add support for marking used patches when packing varyings
This adds support needed for marking the varyings as used but we don't actually support packing patches in this patch. --- src/compiler/nir/nir_linking_helpers.c | 73 ++ 1 file changed, 51 insertions(+), 22 deletions(-) diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index a05890ada4..845aba5c87 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -289,15 +289,35 @@ struct varying_loc uint32_t location; }; +static void +mark_all_slots_used(nir_variable *var, uint64_t *slots_used, +uint64_t slots_used_mask, unsigned num_slots) +{ + unsigned loc_offset = var->data.patch ? VARYING_SLOT_PATCH0 : 0; + + slots_used[var->data.patch ? 1 : 0] |= slots_used_mask & + (((uint64_t)1 << num_slots) - 1) << (var->data.location - loc_offset); +} + +static void +mark_used_slots(nir_variable *var, uint64_t *slots_used, unsigned offset) +{ + unsigned loc_offset = offset - (var->data.patch ? VARYING_SLOT_PATCH0 : 0); + + slots_used[var->data.patch ? 1 : 0] |= (uint64_t)1 << (var->data.location + loc_offset); +} + static void remap_slots_and_components(struct exec_list *var_list, gl_shader_stage stage, struct varying_loc (*remap)[4], - uint64_t *slots_used, uint64_t *out_slots_read) + uint64_t *slots_used, uint64_t *out_slots_read, + uint32_t *p_slots_used, uint32_t *p_out_slots_read) { - uint64_t out_slots_read_tmp = 0; + uint64_t out_slots_read_tmp[2] = {0}; + uint64_t slots_used_tmp[2] = {0}; /* We don't touch builtins so just copy the bitmask */ - uint64_t slots_used_tmp = + slots_used_tmp[0] = *slots_used & (((uint64_t)1 << (VARYING_SLOT_VAR0 - 1)) - 1); nir_foreach_variable(var, var_list) { @@ -305,8 +325,8 @@ remap_slots_and_components(struct exec_list *var_list, gl_shader_stage stage, /* Only remap things that aren't built-ins */ if (var->data.location >= VARYING_SLOT_VAR0 && - var->data.location - VARYING_SLOT_VAR0 < 32) { - assert(var->data.location - VARYING_SLOT_VAR0 < 32); + var->data.location - VARYING_SLOT_VAR0 < MAX_VARYINGS_INCL_PATCH) { + assert(var->data.location - VARYING_SLOT_VAR0 < MAX_VARYINGS_INCL_PATCH); const struct glsl_type *type = var->type; if (nir_is_per_vertex_io(var, stage)) { @@ -321,11 +341,17 @@ remap_slots_and_components(struct exec_list *var_list, gl_shader_stage stage, unsigned location = var->data.location - VARYING_SLOT_VAR0; struct varying_loc *new_loc = &remap[location][var->data.location_frac]; - uint64_t slots = (((uint64_t)1 << num_slots) - 1) << var->data.location; - if (slots & *slots_used) + unsigned loc_offset = var->data.patch ? VARYING_SLOT_PATCH0 : 0; + uint64_t used = var->data.patch ? *p_slots_used : *slots_used; + uint64_t outs_used = +var->data.patch ? *p_out_slots_read : *out_slots_read; + uint64_t slots = +(((uint64_t)1 << num_slots) - 1) << (var->data.location - loc_offset); + + if (slots & used) used_across_stages = true; - if (slots & *out_slots_read) + if (slots & outs_used) outputs_read = true; if (new_loc->location) { @@ -339,30 +365,29 @@ remap_slots_and_components(struct exec_list *var_list, gl_shader_stage stage, * otherwise we will mess up the mask for things like partially * marked arrays. */ -if (used_across_stages) { - slots_used_tmp |= - *slots_used & (((uint64_t)1 << num_slots) - 1) << var->data.location; -} +if (used_across_stages) + mark_all_slots_used(var, slots_used_tmp, used, num_slots); if (outputs_read) { - out_slots_read_tmp |= - *out_slots_read & (((uint64_t)1 << num_slots) - 1) << var->data.location; + mark_all_slots_used(var, out_slots_read_tmp, outs_used, + num_slots); } - } else { for (unsigned i = 0; i < num_slots; i++) { if (used_across_stages) - slots_used_tmp |= (uint64_t)1 << (var->data.location + i); + mark_used_slots(var, slots_used_tmp, i); if (outputs_read) - out_slots_read_tmp |= (uint64_t)1 << (var->data.location + i); + mark_used_slots(var, out_slots_read_tmp, i); } } } } - *slots_used = slots_used_tmp; - *out_slots_read = out_slots_read_tmp; + *slots_used = slots_used_tmp[0]; + *out_slots_read = out_slots_read_tmp[0]; + *p_slots_used = slots_used_tmp[1]; + *p_out_slots_read = out_slots_read_tmp[1]; }
[Mesa-dev] [PATCH v2 2/4] nir: add is_packing_supported_for_type() helper
This will be used in the following patches to determine if we support packing the components of a varying. --- src/compiler/nir/nir_linking_helpers.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index 845aba5c87..c26582ddec 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -224,6 +224,29 @@ get_interp_loc(nir_variable *var) return INTERPOLATE_LOC_CENTER; } +static bool +is_packing_supported_for_type(const struct glsl_type *type) +{ + /* Skip types that require more complex packing handling. +* TODO: add support for these types? +*/ + if (glsl_type_is_array(type) || + glsl_type_is_dual_slot(type) || + glsl_type_is_matrix(type) || + glsl_type_is_struct(type) || + glsl_type_is_64bit(type)) + return false; + + /* We ignore complex types above and all other vector types should +* have been split into scalar variables by the lower_io_to_scalar +* pass. The only exeption should by OpenGL xfb varyings. +*/ + if (glsl_get_vector_elements(type) != 1) + return false; + + return true; +} + static void get_slot_component_masks_and_interp_types(struct exec_list *var_list, uint8_t *comps, -- 2.19.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa/st/nir: fix missing nir_compact_varyings
On 9/12/18 5:28 am, Rob Clark wrote: Not entirely sure when this changed, but it seem like LinkedTransformFeedback is (usually?) populated, Yeah it looks like this code was wrong when introduced. I also recall somebody complaining the performance dropped in Shadow of Mordor with Eric's fix, which makes a little more sense now. Looking over the code in link_varying.cpp what happens is we always create LinkedTransformFeedback for the last vertex stage. Which means we have not been packing the frgament shader inputs since this fix was introduced :( Maybe update the commit message to make this a little clearer. Also please add: Fixes: dbd52585fa9f ("st/nir: Disable varying packing when doing transform feedback.") With those changes patches 1-2 are: Reviewed-by: Timothy Arceri Thanks for looking into this. even if NumVaryings is zero. So make the check about whether it is safe to nir_compact_varyings() a bit more complete. Signed-off-by: Rob Clark --- src/mesa/state_tracker/st_glsl_to_nir.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp b/src/mesa/state_tracker/st_glsl_to_nir.cpp index d0475fb538a..7406e26e2f8 100644 --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp @@ -758,7 +758,8 @@ st_link_nir(struct gl_context *ctx, * the pipe_stream_output->output_register field is based on the * pre-compacted driver_locations. */ - if (!prev_shader->sh.LinkedTransformFeedback) + if (!(prev_shader->sh.LinkedTransformFeedback && + prev_shader->sh.LinkedTransformFeedback->NumVarying > 0)) nir_compact_varyings(shader_program->_LinkedShaders[prev]->Program->nir, nir, ctx->API != API_OPENGL_COMPAT); } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108996] Leaks buffer when closing wayland window
https://bugs.freedesktop.org/show_bug.cgi?id=108996 Bug ID: 108996 Summary: Leaks buffer when closing wayland window Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: EGL/Wayland Assignee: wayland-b...@lists.freedesktop.org Reporter: sjo...@luon.net QA Contact: mesa-dev@lists.freedesktop.org Hey, When closing a EGL using window on wayland it seems there is always a (color) buffer leaked until the process exits; We first noticed this with Qt using apps on an etnaviv based system, however the same can be reproduce on Intel gpu's To reproduce I created a modified weston-simple-egl[0] which simply every 60 frames will destroy the current surface and create a new one[1]. As mentioned the issue both occurs on both etnaviv and intel systems where there is seemingly one dmabuf leaked every time the window gets closed (as per /sys/kernel/debug/dma_buf/bufinfo). It can also be reproduce by using software rendering (LIBGL_ALWAYS_SOFTWARE=1) in which case memory grows every time the window is recreated (presumably for the same reason). While tracing it down as far as i could it seems that the `current` buffer of the relevant dri2_egl_surface has extra references on it when ->destroyImage gets called on match color_buffer in `dri2_wl_destroy_surface` (while the other allocated buffers/driImages don't). Why these extra references (presumably for the next render?) aren't cleared up is unfortunately a bit beyond my understanding of mesa. This is reproducable both with mesa 18.2.6 and git master [0] https://gitlab.freedesktop.org/sjoerd/weston/tree/mesa-leak [1] https://gitlab.freedesktop.org/sjoerd/weston/commit/934e195c1b4d5fdd7ec006c8f7048ad30df8738f -- You are receiving this mail because: You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108578] RADV reports wrong hardcoded Vulkan API Version
https://bugs.freedesktop.org/show_bug.cgi?id=108578 --- Comment #5 from Shmerl --- (In reply to Samuel Pitoiset from comment #4) > This is not a bug. We should be able to bump the patch version but that > requires to look at the changelog since 1.1.70. Do you mean it's a review issue, i.e. someone needs to go over it and confirm that all needed features are supported? -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108933] Unreal Tournament (UT99) segfault on opengl init
https://bugs.freedesktop.org/show_bug.cgi?id=108933 --- Comment #13 from network...@rkmail.ru --- (In reply to Gustaw Smolarczyk from comment #12) > On a slightly unrelated topic, UT99 seems to work fine (at least for me) > while run on wine (or Steam's proton). You might want to try this path as a > work-around. On my system, UT stops working in wine after you run it few time, until next reboot. This seems to be unrelated to mesa, as it behaves like this with any renderer, including software and third-party d3d9 one paired with nine. (Just a side note, I don't really care about playing it. But I think that checking if older native programs still work is generally a good idea) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/20] nir: clarify some nit_loop_info member names
Den lör 8 dec. 2018 00:10 skrev Jason Ekstrand : > Replacing min with max without changing any real code always looks a biit > weird but it does make sense. :-) > > Reviewed-by: Jason Ekstrand > > On Thu, Dec 6, 2018 at 9:08 PM Timothy Arceri > wrote: > >> Following commits will introduce additional fields such as >> guessed_trip_count. Renaming these will help avoid confusion >> as our unrolling feature set grows. >> >> Reviewed-by: Thomas Helland >> --- >> src/compiler/nir/nir.h | 8 +--- >> src/compiler/nir/nir_loop_analyze.c| 14 +++--- >> src/compiler/nir/nir_opt_loop_unroll.c | 14 +++--- >> 3 files changed, 19 insertions(+), 17 deletions(-) >> >> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h >> index db935c8496..ce4a81fbe1 100644 >> --- a/src/compiler/nir/nir.h >> +++ b/src/compiler/nir/nir.h >> @@ -1886,9 +1886,11 @@ typedef struct { >> /* Number of instructions in the loop */ >> unsigned num_instructions; >> >> - /* How many times the loop is run (if known) */ >> - unsigned trip_count; >> - bool is_trip_count_known; >> + /* Maximum number of times the loop is run (if known) */ >> + unsigned max_trip_count; >> + >> + /* Do we know the exact number of times the loop will be run */ >> + bool exact_trip_count_known; >> >> /* Unroll the loop regardless of its size */ >> bool force_unroll; >> diff --git a/src/compiler/nir/nir_loop_analyze.c >> b/src/compiler/nir/nir_loop_analyze.c >> index c779383b36..700d1fe552 100644 >> --- a/src/compiler/nir/nir_loop_analyze.c >> +++ b/src/compiler/nir/nir_loop_analyze.c >> @@ -527,7 +527,7 @@ find_trip_count(loop_info_state *state) >> { >> bool trip_count_known = true; >> nir_loop_terminator *limiting_terminator = NULL; >> - int min_trip_count = -1; >> + int max_trip_count = -1; >> >> list_for_each_entry(nir_loop_terminator, terminator, >> &state->loop->info->loop_terminator_list, >> @@ -606,8 +606,8 @@ find_trip_count(loop_info_state *state) >>* iterations than previously (we have identified a more >> limiting >>* terminator) set the trip count and limiting terminator. >>*/ >> - if (min_trip_count == -1 || iterations < min_trip_count) { >> -min_trip_count = iterations; >> + if (max_trip_count == -1 || iterations < max_trip_count) { >> +max_trip_count = iterations; >> limiting_terminator = terminator; >> } >> break; >> @@ -617,9 +617,9 @@ find_trip_count(loop_info_state *state) >>} >> } >> >> - state->loop->info->is_trip_count_known = trip_count_known; >> - if (min_trip_count > -1) >> - state->loop->info->trip_count = min_trip_count; >> + state->loop->info->exact_trip_count_known = trip_count_known; >> + if (max_trip_count > -1) >> + state->loop->info->max_trip_count = max_trip_count; >> state->loop->info->limiting_terminator = limiting_terminator; >> } >> >> @@ -639,7 +639,7 @@ force_unroll_array_access(loop_info_state *state, >> nir_deref_instr *deref) >>nir_deref_instr *parent = nir_deref_instr_parent(d); >>assert(glsl_type_is_array(parent->type) || >> glsl_type_is_matrix(parent->type)); >> - if (glsl_get_length(parent->type) == state->loop->info->trip_count) >> + if (glsl_get_length(parent->type) == >> state->loop->info->max_trip_count) >> return true; >> >>if (deref->mode & state->indirect_mask) >> diff --git a/src/compiler/nir/nir_opt_loop_unroll.c >> b/src/compiler/nir/nir_opt_loop_unroll.c >> index ea2012e292..0e9966320b 100644 >> --- a/src/compiler/nir/nir_opt_loop_unroll.c >> +++ b/src/compiler/nir/nir_opt_loop_unroll.c >> @@ -181,7 +181,7 @@ simple_unroll(nir_loop *loop) >> nir_cf_list unrolled_lp_body; >> >> /* Clone loop header and append to the loop body */ >> - for (unsigned i = 0; i < loop->info->trip_count; i++) { >> + for (unsigned i = 0; i < loop->info->max_trip_count; i++) { >>/* Clone loop body */ >>nir_cf_list_clone(&unrolled_lp_body, &loop_body, >> loop->cf_node.parent, >> remap_table); >> @@ -340,7 +340,7 @@ complex_unroll(nir_loop *loop, nir_loop_terminator >> *unlimit_term, >> * trip count == 1 we execute the code above the break twice and >> the >> * code below it once so we need clone things twice and so on. >> */ >> - num_times_to_clone = loop->info->trip_count + 1; >> + num_times_to_clone = loop->info->max_trip_count + 1; >> } else { >>/* Pluck out the loop header */ >>nir_cf_extract(&lp_header, nir_before_block(header_blk), >> @@ -368,7 +368,7 @@ complex_unroll(nir_loop *loop, nir_loop_terminator >> *unlimit_term, >> >>nir_cf_node_remove(&limiting_term->nif->cf_node); >> >> - num_times_to_clone = loop->info->trip_count; >> + num_times_to_clone = loop->info->max_trip_count; >> } >>