Hi Chris, thanks for looking into that!
On Saturday, 2 June 2018 09:58:14 CEST Chris Wilson wrote: > Quoting mathias.froehl...@gmx.net (2018-05-17 07:38:26) > > From: Mathias Fröhlich <mathias.froehl...@web.de> > > > > Avoid looping over all VARYING_SLOT_MAX urb_setup array > > entries from genX_upload_sbe. Prepare an array indirection > > to the active entries of urb_setup already in the compile > > step. On upload only walk the active arrays. > > > > Signed-off-by: Mathias Fröhlich <mathias.froehl...@web.de> > > --- > > src/intel/compiler/brw_compiler.h | 7 +++++++ > > src/intel/compiler/brw_fs.cpp | 23 +++++++++++++++++++++++ > > src/intel/compiler/brw_fs.h | 2 ++ > > src/intel/compiler/brw_fs_visitor.cpp | 1 + > > src/mesa/drivers/dri/i965/genX_state_upload.c | 7 +++---- > > 5 files changed, 36 insertions(+), 4 deletions(-) > > > > diff --git a/src/intel/compiler/brw_compiler.h > > b/src/intel/compiler/brw_compiler.h > > index 8b4e6fe2e2..a9df45e00d 100644 > > --- a/src/intel/compiler/brw_compiler.h > > +++ b/src/intel/compiler/brw_compiler.h > > @@ -743,6 +743,13 @@ struct brw_wm_prog_data { > > * For varying slots that are not used by the FS, the value is -1. > > */ > > int urb_setup[VARYING_SLOT_MAX]; > > + /** > > + * Cache structure into the urb_setup array above that contains the > > + * attribute numbers of active varyings out of urb_setup. > > + * The actual count is stored in urb_setup_attribs_count. > > + */ > > + int urb_setup_attribs[VARYING_SLOT_MAX]; > > + int urb_setup_attribs_count; > > }; > > > > struct brw_push_const_block { > > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > > index b21996c168..6930414067 100644 > > --- a/src/intel/compiler/brw_fs.cpp > > +++ b/src/intel/compiler/brw_fs.cpp > > @@ -1507,6 +1507,25 @@ fs_visitor::assign_curb_setup() > > this->first_non_payload_grf = payload.num_regs + > > prog_data->curb_read_length; > > } > > > > +/* > > + * Build up an array of indices into the urb_setup array that > > + * references the active entries of the urb_setup array. > > + * Used to accelerate walking the active entries of the urb_setup array > > + * on each upload. > > + */ > > +void > > +brw_compute_urb_setup_index(struct brw_wm_prog_data *wm_prog_data) > > +{ > > + int index = 0; > > + for (int attr = 0; attr < VARYING_SLOT_MAX; attr++) { > > + int input_index = wm_prog_data->urb_setup[attr]; > > + if (input_index < 0) > > + continue; > > + wm_prog_data->urb_setup_attribs[index++] = attr; > > So far the only consumer of this wants the input_index again. > Does that change, or is it worth including both to avoid the trawl? Hmm, I don't know too much about the internal requirements of hardware in this regard. But one property of the current code is that the current code orders the varying slots in urb_setup[] with ascending attribute index. So if we collapse the urb_setup[] and urb_setup_index[] arrays into something like struct { uint8_t attrib; uint8_t index; } urb_setup[VARYING_SLOT_MAX; uint8_t urb_setup_count; do we need to sort that afterwards by attribute before we can use that in genX_upload_sbe? > Is uint8_t (with a STATIC_ASSERT) good enough? Sure, I was catching up with the next declaration beside to stick with the 'surrounding coding style'. That's changed here in a v2 version. We could even reach an even smaller cache footprint by using a single uint64_t and iterate that using u_bit_scan64(). But I received some general headwind lately from someone who did not like these bitmask loops. So I apply the bitmask iteration only in places where the bitmasks really provide a larger improvement than just a smaller cache footprint. What do you think? best Mathias _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev