On 20/02/17 21:31, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > >> On Mon, 2017-02-20 at 08:58 +0100, Samuel Iglesias Gonsálvez wrote: >>> On Sat, 2017-02-18 at 18:58 -0800, Francisco Jerez wrote: >>>> Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: >>>> >>>>> The lowered BSW/BXT indirect move instructions had incorrect >>>>> source types, which luckily wasn't causing incorrect assembly to >>>>> be >>>>> generated due to the bug fixed in the next patch, but would have >>>>> confused the remaining back-end IR infrastructure due to the >>>>> mismatch >>>>> between the IR source types and the emitted machine code. >>>>> >>>>> v2: >>>>> - Improve commit log (Curro) >>>>> - Fix read_size (Curro) >>>>> - Fix DF uniform array detection in assign_constant_locations() >>>>> when >>>>> it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT. >>>>> >>>>> Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> >>>>> Cc: "17.0" <mesa-sta...@lists.freedesktop.org> >>>>> --- >>>>> src/mesa/drivers/dri/i965/brw_fs.cpp | 11 ++++++++- >>>>> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 41 ++++++++++++++++-- >>>>> -------------- >>>>> 2 files changed, 30 insertions(+), 22 deletions(-) >>>>> >>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >>>>> b/src/mesa/drivers/dri/i965/brw_fs.cpp >>>>> index c348bc7138d..93ab84b5845 100644 >>>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >>>>> @@ -1944,10 +1944,19 @@ fs_visitor::assign_constant_locations() >>>>> unsigned last = constant_nr + (inst->src[2].ud / 4) >>>>> - >>>>> 1; >>>>> assert(last < uniforms); >>>>> >>>>> + bool supports_64bit_indirects = >>>>> + !devinfo->is_cherryview && !devinfo->is_broxton; >>>>> + /* Detect if this is as a result 64-bit MOV >>>>> INDIRECT. >>>>> In case >>>>> + * of BSW/BXT, we substitute each DF MOV INDIRECT by >>>>> two 32-bit MOV >>>>> + * INDIRECT. >>>>> + */ >>>>> + bool mov_indirect_64bit = (type_sz(inst- >>>>>> src[i].type) >>>>> == 8) || >>>>> + (!supports_64bit_indirects && inst->dst.type == >>>>> BRW_REGISTER_TYPE_UD && >>>>> + inst->src[0].type == BRW_REGISTER_TYPE_UD && >>>>> inst- >>>>>> dst.stride == 2); >>>> >>>> This seems kind of fragile, I don't think the optimizer gives you >>>> any >>>> guarantees that the stride of a lowered 64-bit indirect move will >>>> remain >>>> equal to two, or that the destination stride of an actual 32-bit >>>> indirect uniform load will never end up being two as well. That >>>> said, >>>> because you access these with 32-bit indirect moves, I don't see >>>> why >>>> you'd need to treat them as 64-bit uniforms here, the usual >>>> alignment >>>> requirements for 64-bit uniforms no longer apply, so you can treat >>>> them >>>> as regular 32-bit uniforms AFAICT. Why did you add this hunk? >>>> >>> >>> I added it because of this case: if we access to one DF uniform array >>> element with a normal MOV and the rest with MOV INDIRECT, we will >>> mark >>> the former as a live 64bit variable. Then we have the array scattered >>> as part of it is uploaded as a 64-bits uniform and the other as 32- >>> bits. Even if we avoid this by uploading everything together as 32- >>> bits, then the access to that DF could not be aligned to 64-bits. >>> >>> So my idea was to find a way to identify somehow those MOV INDIRECT >>> in >>> BSW to mark all the array as a 64-bit one. >>> >> >> Mmm, maybe I can fix this case without the hack I did. I can add the >> following code after marking all live variables accessed by the >> instructions. >> >> It is very similar to the one to detect live variables but it is fixing >> the case where any MOV INDIRECT in BSW is accessing to an uniform array >> of DF elements where one of these elements is directly accessed by >> another instruction. >> >> What do you think? >> > > Looks somewhat better, but I don't think this is correct if you have > multiple overlapping indirect loads of the same uniform array and only > one of them overlaps with a direct 64-bit load.
In that case, I think this is correct. The 2 32-bit MOV INDIRECTs where emitted as a "lowering" of an unsupported 64-bit MOV INDIRECT. They both keep the 'read_size' of the original one, so they both overlap to any other direct 64-bit load to that array like with the original instruction. If none of them overlap to the direct 64-bit access, then I think they can be handled as non-contiguous to the latter without any issue. > Apparently > assign_constant_locations() already has code to attempt to push > indirectly accessed uniform sections contiguously, but the logic seems > broken in multiple ways, even on platforms other than CHV/BXT... The > first is_live_64bit location assignment loop doesn't consider non-64bit > uniforms even if they're marked as contiguous with a 64-bit uniform, > because the contiguous block tracking is done from within > set_push_pull_constant_loc() which may not be called at all if the > continue condition evaluates to true at the top of the loop. The idea for that block was to push all 64-bit uniforms first and together so they are aligned to 64-bit, there are no gaps between them and assuming, of course, they are not contiguous to non-64-bit data. They cannot be contiguous to non-64-bit data due to a declaration in a GLSL shader because it is not allowed to have two uniforms with the same location, i.e. we cannot have two "views" of the same uniform variable (for view, I mean read the same uniform as 32-bits type in one, and as 64-bits type in other). See "4.4.3 Uniform Variable Layout Qualifiers" section in GLSL 4.50 spec. And if we have two consecutive arrays, we can safely upload them in different locations of the push constant buffer. So, this can only happen with instructions accessing the same area, with a different bit-size and generated by the driver itself. I am going to assume here than we are talking about a uniform variable defined with a type of X-bit size and we access to it as Y-bit size, be X > Y. If the vice-versa is possible (I am not aware of any case though), then the solution would be more complex as we need to take into account alignments. Knowing that, we can do one of these solutions: 1) Fix case by case. This is what I have proposed in BSW. The idea is to see if any overlapped area is accessed with different bit-sizes and mark the whole area with the bigger bit-size and upload it in the proper place in the push constant buffer. With BSW, we know this is happening due to a lowered DF-MOV INDIRECT so we can safely mark the whole area as DF in case of mixed accesses. 2) Generalize 1) to all bit-sizes and generations. > Also > AFAICT this could potentially lead to a very large amount of data being > considered as contiguous if you have a contiguous 64-bit block, then > random unrelated 32-bit data, and then another contiguous 64-bit block, > because the very last element of a MOV_INDIRECT block doesn't seem to be > marked is_live_64bit, which is also the one that would terminate the > 64-bit contiguous block and actually cause it to be assigned a location. Right. Marking the last element as is_live_64_bit is a trivial fix, I will do it too. > I don't think the current code would even guarantee proper alignment for > subsequent 64-blocks in that case because it may be pushing unrelated > 32-bit data in between. > Worse, if the UNIFORM space ends with a 64-bit > contiguous uniform block it may be considered contiguous with whatever > unrelated data is handled first by the next 32-bit uniform assignment > pass. > I don't see how this can happen. 'last' is marking the end of the area and its position is not marked as contiguous except with overlapped accesses, where we know are done in purpose by the driver (e.g BSW's MOV INDIRECT). The whole contiguous area is marked with the same bit-size, after the aforementioned solutions. What do you think about looping on contiguous[], look for contiguous uniforms and assert if they all have the same bit-size, just before setting pull/push constant locations? Then we can easily know if we are wrong before uploading anything. > Sigh... Seems like this was only scratching the surface... > :-( As I might be missing something, I am open to suggestions/opinions... Sam >> Sam >> >> + /* Detect if we accessed to an array of DF uniforms in both direct >> and >> + * indirect ways on BSW/BXT. If this is the case, then we need >> + * to take into account in order to push all the elements together. >> + */ >> + if (devinfo->is_cherryview || devinfo->is_broxton) { >> + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { >> + for (int i = 0 ; i < inst->sources; i++) { >> + if (inst->src[i].file != UNIFORM) >> + continue; >> + >> + int constant_nr = inst->src[i].nr + inst->src[i].offset / >> 4; >> + >> + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) >> { >> + assert(inst->src[2].ud % 4 == 0); >> + unsigned last = constant_nr + (inst->src[2].ud / 4) - >> 1; >> + assert(last < uniforms); >> + >> + bool is_df_array = false; >> + for (unsigned j = constant_nr; j < last; j++) { >> + if (is_live_64bit[j]) { >> + is_df_array = true; >> + break; >> + } >> + } >> + >> + if (is_df_array) { >> + /* If part of it is accessed with direct addressing >> as a DF >> + * then we mark the whole array as DF. >> + */ >> + for (unsigned j = constant_nr; j < last; j++) >> + is_live_64bit[j] = true; >> + } >> + } >> + } >> + } >> + } >> + >> >>>> Other than that patch looks good to me. >>>> >>> >>> OK, thanks. >>> >>> Sam >>> >>>>> for (unsigned j = constant_nr; j < last; j++) { >>>>> is_live[j] = true; >>>>> contiguous[j] = true; >>>>> - if (type_sz(inst->src[i].type) == 8) { >>>>> + if (mov_indirect_64bit) { >>>>> is_live_64bit[j] = true; >>>>> } >>>>> } >>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>>>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>>>> index 991c20fad62..4aaf84964e9 100644 >>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>>>> @@ -3878,31 +3878,30 @@ fs_visitor::nir_emit_intrinsic(const >>>>> fs_builder &bld, nir_intrinsic_instr *instr >>>>> unsigned read_size = instr->const_index[1] - >>>>> (instr->num_components - 1) * type_sz(dest.type); >>>>> >>>>> - fs_reg indirect_chv_high_32bit; >>>>> - bool is_chv_bxt_64bit = >>>>> - (devinfo->is_cherryview || devinfo->is_broxton) && >>>>> - type_sz(dest.type) == 8; >>>>> - if (is_chv_bxt_64bit) { >>>>> - indirect_chv_high_32bit = >>>>> vgrf(glsl_type::uint_type); >>>>> - /* Calculate indirect address to read high 32 bits >>>>> */ >>>>> - bld.ADD(indirect_chv_high_32bit, indirect, >>>>> brw_imm_ud(4)); >>>>> - } >>>>> + bool supports_64bit_indirects = >>>>> + !devinfo->is_cherryview && !devinfo->is_broxton; >>>>> >>>>> - for (unsigned j = 0; j < instr->num_components; j++) { >>>>> - if (!is_chv_bxt_64bit) { >>>>> + if (type_sz(dest.type) != 8 || >>>>> supports_64bit_indirects) >>>>> { >>>>> + for (unsigned j = 0; j < instr->num_components; j++) >>>>> { >>>>> bld.emit(SHADER_OPCODE_MOV_INDIRECT, >>>>> offset(dest, bld, j), offset(src, bld, >>>>> j), >>>>> indirect, brw_imm_ud(read_size)); >>>>> - } else { >>>>> - bld.emit(SHADER_OPCODE_MOV_INDIRECT, >>>>> - subscript(offset(dest, bld, j), >>>>> BRW_REGISTER_TYPE_UD, 0), >>>>> - offset(src, bld, j), >>>>> - indirect, brw_imm_ud(read_size)); >>>>> - >>>>> - bld.emit(SHADER_OPCODE_MOV_INDIRECT, >>>>> - subscript(offset(dest, bld, j), >>>>> BRW_REGISTER_TYPE_UD, 1), >>>>> - offset(src, bld, j), >>>>> - indirect_chv_high_32bit, >>>>> brw_imm_ud(read_size)); >>>>> + } >>>>> + } else { >>>>> + const unsigned num_mov_indirects = >>>>> + type_sz(dest.type) / >>>>> type_sz(BRW_REGISTER_TYPE_UD); >>>>> + /* We read a little bit less per MOV INDIRECT, as >>>>> they >>>>> are now >>>>> + * 32-bits ones instead of 64-bit. Fix read_size >>>>> then. >>>>> + */ >>>>> + const unsigned read_size_32bit = read_size - >>>>> + (num_mov_indirects - 1) * >>>>> type_sz(BRW_REGISTER_TYPE_UD); >>>>> + for (unsigned j = 0; j < instr->num_components; j++) >>>>> { >>>>> + for (unsigned i = 0; i < num_mov_indirects; i++) >>>>> { >>>>> + bld.emit(SHADER_OPCODE_MOV_INDIRECT, >>>>> + subscript(offset(dest, bld, j), >>>>> BRW_REGISTER_TYPE_UD, i), >>>>> + subscript(offset(src, bld, j), >>>>> BRW_REGISTER_TYPE_UD, i), >>>>> + indirect, >>>>> brw_imm_ud(read_size_32bit)); >>>>> + } >>>>> } >>>>> } >>>>> } >>>>> -- >>>>> 2.11.0 >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev