Jason Ekstrand <[email protected]> writes: > On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez <[email protected]> > wrote: > >> This is required in combination with the following commit, because >> otherwise if a source region with an extended 8+ stride is present in >> the instruction (which we're about to declare legal) we'll end up >> emitting code that attempts to write to such a region, even though >> strides greater than four are still illegal for the destination. >> --- >> src/intel/compiler/brw_fs_lower_regioning.cpp | 20 ++++++++++++++----- >> 1 file changed, 15 insertions(+), 5 deletions(-) >> >> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp >> b/src/intel/compiler/brw_fs_lower_regioning.cpp >> index 6a3c23892b4..b86e95ed9eb 100644 >> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp >> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp >> @@ -71,15 +71,25 @@ namespace { >> !is_byte_raw_mov(inst)) { >> return get_exec_type_size(inst); >> } else { >> - unsigned stride = inst->dst.stride * type_sz(inst->dst.type); >> + /* Calculate the maximum byte stride and the minimum type size >> across >> + * all source and destination operands. >> + */ >> + unsigned max_stride = inst->dst.stride * type_sz(inst->dst.type); >> + unsigned min_size = type_sz(inst->dst.type); >> >> for (unsigned i = 0; i < inst->sources; i++) { >> - if (!is_uniform(inst->src[i]) && !inst->is_control_source(i)) >> - stride = MAX2(stride, inst->src[i].stride * >> - type_sz(inst->src[i].type)); >> + if (!is_uniform(inst->src[i]) && !inst->is_control_source(i)) >> { >> + max_stride = MAX2(max_stride, inst->src[i].stride * >> + type_sz(inst->src[i].type)); >> + min_size = MIN2(min_size, type_sz(inst->src[i].type)); >> + } >> } >> >> - return stride; >> + /* Attempt to use the largest byte stride among all present >> operands, >> + * but never exceed a stride of 4 since that would lead to >> illegal >> + * destination regions during lowering. >> + */ >> + return MIN2(max_stride, 4 * min_size); >> > > Why not just fall back to tightly packed in this case? I think I can > answer my own question: Because using something that's equal to one of the > strides reduces the liklihood that we'll need a temporary. If that's the > correct answer, then maybe what we want is the maximum of all strides with > stride_in_bytes <= 4 * type_sz? >
We also want the result to be greater than or equal to the size of the largest non-uniform, non-control source type, since packing a vector of such a type into a temporary of lower byte stride than its size is impossible. This patch guarantees that as long as max_size <= 4 * min_size, which is necessary for the lowering code that calls this function to work at all. It would be possible to preserve this guarantee while attempting to pick one of the strides of the pre-existing sources as you say -- I would be happy to review that change as a follow-up micro-optimization patch, but there are some corner cases to consider I don't necessarily want to bother with in the patch doing the functional change, for the sake of bisectability. It may make sense to add an assert here that max_size <= 4 * min_size for the case such an instruction doesn't blow up already at the EU validator (it doesn't look like the validator is currently enforcing the lack of conversions between 8 and 64 bit types?), it will just involve calculating max_size in addition to max_stride and min_size above. > --Jason > > >> } >> } >> >> -- >> 2.19.2 >> >> _______________________________________________ >> mesa-dev mailing list >> [email protected] >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
