On Thursday, September 28, 2017 1:24:21 AM PDT Iago Toral Quiroga wrote: > Triggering the push model when 64-bit inputs are involved is not easy due to > the constrains on the maximum number of registers that we allow for this mode, > however, for GS with 'points' primitive type and just a couple of double > varyings we can trigger this and it just doesn't work because the > implementation is not 64-bit aware at all. For now, let's make sure that we > don't attempt this model whith 64-bit inputs and we always fall back to pull > model for them. > > Also, don't enable the VUE handles in the thread payload on the fly when we > find an input for which we need the pull model, this is not safe: if we need > to resort to the pull model we need to account for that when we setup the > thread payload so we compute the first non-payload register properly. If we > didn't do that correctly and we enable it on-the-fly here then we will end up > VUE handles on the first non-payload register which will probably lead to > GPU hangs. Instead, always enable the VUE handles for the pull model so we > can safely use them when needed. The GS is going to resort to pull model > almost in every situation anyway, so this shouldn't make a significant > difference and it makes things easier and safer. > > v2: Always enable the VUE handles for pull model, this is easier and safer > and the GS is going to fallback to pull model almost always anyway (Ken) > > v3: Only clamp the URB read length if we are over the maximum reserved for > push inputs as we were doing in the original code (Ken). > --- > src/intel/compiler/brw_fs.cpp | 16 +++++++++++----- > src/intel/compiler/brw_fs_nir.cpp | 4 +++- > 2 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index eb9b4c3890..b13b56cfba 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -5602,6 +5602,17 @@ fs_visitor::setup_gs_payload() > payload.num_regs++; > } > > + /* Always enable VUE handles so we can safely use pull model if needed. > + * > + * The push model for a GS uses a ton of register space even for trivial > + * scenarios with just a few inputs, so just make things easier and a bit > + * safer by always having pull model available. > + */ > + gs_prog_data->base.include_vue_handles = true; > + > + /* R3..RN: ICP Handles for each incoming vertex (when using pull model) */ > + payload.num_regs += nir->info.gs.vertices_in; > + > /* Use a maximum of 24 registers for push-model inputs. */ > const unsigned max_push_components = 24; > > @@ -5613,11 +5624,6 @@ fs_visitor::setup_gs_payload() > */ > if (8 * vue_prog_data->urb_read_length * nir->info.gs.vertices_in > > max_push_components || gs_prog_data->invocations > 1) {
One last change...sorry! It doesn't make much sense to clamp to the maximum number of push components just because invocations > 1. We only needed to consider that in order to include the VUE handles, which we now do unconditionally. Let's change this to: if (8 * vue_prog_data->urb_read_length * nir->info.gs.vertices_in > max_push_components) { With that change, Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> Looking at this more closely has really shown all the ways I made a mess of this...so, thanks again for cleaning this all up! > - gs_prog_data->base.include_vue_handles = true; > - > - /* R3..RN: ICP Handles for each incoming vertex (when using pull > model) */ > - payload.num_regs += nir->info.gs.vertices_in; > - > vue_prog_data->urb_read_length = > ROUND_DOWN_TO(max_push_components / nir->info.gs.vertices_in, 8) / > 8; > } > diff --git a/src/intel/compiler/brw_fs_nir.cpp > b/src/intel/compiler/brw_fs_nir.cpp > index d760946e62..aa57bb9d78 100644 > --- a/src/intel/compiler/brw_fs_nir.cpp > +++ b/src/intel/compiler/brw_fs_nir.cpp > @@ -1915,7 +1915,9 @@ fs_visitor::emit_gs_input_load(const fs_reg &dst, > const unsigned push_reg_count = gs_prog_data->base.urb_read_length * 8; > > /* TODO: figure out push input layout for invocations == 1 */ > + /* TODO: make this work with 64-bit inputs */ > if (gs_prog_data->invocations == 1 && > + type_sz(dst.type) <= 4 && > offset_const != NULL && vertex_const != NULL && > 4 * (base_offset + offset_const->u32[0]) < push_reg_count) { > int imm_offset = (base_offset + offset_const->u32[0]) * 4 + > @@ -1928,7 +1930,7 @@ fs_visitor::emit_gs_input_load(const fs_reg &dst, > } > > /* Resort to the pull model. Ensure the VUE handles are provided. */ > - gs_prog_data->base.include_vue_handles = true; > + assert(gs_prog_data->base.include_vue_handles); > > unsigned first_icp_handle = gs_prog_data->include_primitive_id ? 3 : 2; > fs_reg icp_handle = bld.vgrf(BRW_REGISTER_TYPE_UD, 1); >
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev