El 14/06/18 a las 02:46, Jason Ekstrand escribió: > On Wed, Jun 13, 2018 at 5:07 PM, Chema Casanova <jmcasan...@igalia.com > <mailto:jmcasan...@igalia.com>> wrote: > > On 13/06/18 22:46, Jason Ekstrand wrote: > > On Sat, Jun 9, 2018 at 4:13 AM, Jose Maria Casanova Crespo > > <jmcasan...@igalia.com <mailto:jmcasan...@igalia.com> > <mailto:jmcasan...@igalia.com <mailto:jmcasan...@igalia.com>>> wrote: > > > > This new function takes care of shuffle/unshuffle components of a > > particular bit-size in components with a different bit-size. > > > > If source type size is smaller than destination type size the > operation > > needed is a component shuffle. The opposite case would be an > unshuffle. > > > > The operation allows to skip first_component number of > components from > > the source. > > > > Shuffle MOVs are retyped using integer types avoiding problems > with > > denorms > > and float types. This allows to simplify uses of shuffle functions > > that are > > dealing with these retypes individually. > > > > Now there is a new restriction so source and destination can > not overlap > > anymore when calling this suffle function. Following patches that > > migrate > > to use this new function will take care individually of > avoiding source > > and destination overlaps. > > --- > > src/intel/compiler/brw_fs_nir.cpp | 92 > +++++++++++++++++++++++++++++++ > > 1 file changed, 92 insertions(+) > > > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > > b/src/intel/compiler/brw_fs_nir.cpp > > index 166da0aa6d7..1a9d3c41d1d 100644 > > --- a/src/intel/compiler/brw_fs_nir.cpp > > +++ b/src/intel/compiler/brw_fs_nir.cpp > > @@ -5362,6 +5362,98 @@ shuffle_16bit_data_for_32bit_write(const > > fs_builder &bld, > > } > > } > > > > +/* > > + * This helper takes a source register and un/shuffles it > into the > > destination > > + * register. > > + * > > + * If source type size is smaller than destination type size the > > operation > > + * needed is a component shuffle. The opposite case would be an > > unshuffle. If > > + * source/destination type size is equal a shuffle is done that > > would be > > + * equivalent to a simple MOV. > > > > > > There's a sticky bit here if we want this to work with 64-bit types on > > gen7 and earlier because we only have DF there and not Q so the > > brw_reg_type_from_bit_size below doesn't work. If we care about that > > case (and I'm not convinced we do), it should be easy enough to add a > > type_sz(src.type) == type_sz(dst.type) case which just does MOVs from > > source to dest. > > At this moment, current uses of this function are to read from 32-bits > or to write to 32-bit. But I think that for completeness if would be > nice to have all cases covered. The option of doing the MOVs in the case > of equality (that would be quite normal) saves us to do the shuffle > calculus for the simple case. So I'm going for it. > > > + * > > + * For example, if source is a 16-bit type and destination is > > 32-bit. A 3 > > + * components .xyz 16-bit vector on SIMD8 would be. > > + * > > + * |x1|x2|x3|x4|x5|x6|x7|x8|y1|y2|y3|y4|y5|y6|y7|y8| > > + * |z1|z2|z3|z4|z5|z6|z7|z8| | | | | | | | | > > + * > > + * This helper will return the following 2 32-bit components with > > the 16-bit > > + * values shuffled: > > + * > > + * |x1 y1|x2 y2|x3 y3|x4 y4|x5 y5|x6 y6|x7 y7|x8 y8| > > + * |z1 |z2 |z3 |z4 |z5 |z6 |z7 |z8 | > > + * > > + * For unshuffle, the example would be the opposite, a 64-bit > type > > source > > + * and a 32-bit destination. A 2 component .xy 64-bit vector > on SIMD8 > > + * would be: > > + * > > + * | x1l x1h | x2l x2h | x3l x3h | x4l x4h | > > + * | x5l x5h | x6l x6h | x7l x7h | x8l x8h | > > + * | y1l y1h | y2l y2h | y3l y3h | y4l y4h | > > + * | y5l y5h | y6l y6h | y7l y7h | y8l y8h | > > + * > > + * The returned result would be the following 4 32-bit components > > unshuffled: > > + * > > + * | x1l | x2l | x3l | x4l | x5l | x6l | x7l | x8l | > > + * | x1h | x2h | x3h | x4h | x5h | x6h | x7h | x8h | > > + * | y1l | y2l | y3l | y4l | y5l | y6l | y7l | y8l | > > + * | y1h | y2h | y3h | y4h | y5h | y6h | y7h | y8h | > > + * > > + * - Source and destination register must not be overlapped. > > + * - first_component parameter allows skipping source components. > > + */ > > +void > > +shuffle_src_to_dst(const fs_builder &bld, > > + const fs_reg &dst, > > + const fs_reg &src, > > + uint32_t first_component, > > + uint32_t components) > > +{ > > + if (type_sz(src.type) <= type_sz(dst.type)) { > > + /* Source is shuffled into destination */ > > + unsigned size_ratio = type_sz(dst.type) / > type_sz(src.type); > > +#ifndef NDEBUG > > + boolean src_dst_overlap = regions_overlap(dst, > > + type_sz(dst.type) * bld.dispatch_width() * components, > > + offset(src, bld, first_component * size_ratio), > > > > > > Why do you need to multiply first_component by size_ratio? It's > already > > in units of source components. > > Yes, that's wrong. I forgot to update this assertion with my lasts > refactorings of this series, and as nowadays there are no overlaps I > didn't realize of the error. Both calls to regions_overlap have all > parameters wrong. > > I should also include in the comment than the "component" parameter is > measured in terms of the smaller type between source and destination. As > we are un/shuffling the smaller components from/into a bigger one.
> Yes, please. Is that really what we want though? Do we have cases > where we are going from a big type to a small type and want a small > component offset? I suppose I could see it happening. It's ok if it is > what we want; I just want to make sure. :-) Maybe it would make sense to have first_component_src and first_component_dst. In this series it would simplify some case where we do an offset of the destination register like in: shuffle_from_32bit_read(bld, offset(orig_dest, bld, iter * 2), retype(dest, BRW_REGISTER_TYPE_D), 0, num_components); } But in my local version over this series of VK_KHR_16bit_storage input/output at store_output I have a more complex case that could take advantage if we had a new function shuffle_for_32bit_write_with_register() where my store_output code: ================================= if (type_size != 4) { src = shuffle_for_32bit_write(bld, src, 0, num_components); num_components = DIV_ROUND_UP(num_components * 2, 8 / type_size); } fs_reg new_dest = retype(offset(outputs[instr->const_index[0]], bld, 4 * const_offset->u32[0]), src.type); for (unsigned j = 0; j < num_components; j++) { bld.MOV(offset(new_dest, bld, j + first_component), offset(src, bld, j)); } ================================= Could become: ======================== fs_reg new_dest = retype(offset(outputs[instr->const_index[0]], bld, 4 * const_offset->u32[0]), BRW_REGISTER_TYPE_D); shuffle_for_32bit_write_with_register (new_dest, first_component, src, 0 /* src_first_component */ num_components); =========================== If you think this is useful, I can prepare do a follow up. > > + type_sz(src.type) * bld.dispatch_width() * components * > > size_ratio); > > +#endif > > + assert(!src_dst_overlap); > > > > > > If the only thing you're doing with src_dst_overlap is to assert on it, > > you may as well put the regions_overlap call inside the assert and drop > > the #ifndef. > > Ok, I just thought it was too long function. I'll change it. > > > + > > + brw_reg_type shuffle_type = > > + brw_reg_type_from_bit_size(8 * type_sz(src.type), > > + BRW_REGISTER_TYPE_D); > > + for (unsigned i = 0; i < components; i++) { > > + fs_reg shuffle_component_i = > > + subscript(offset(dst, bld, i / size_ratio), > > + shuffle_type, i % size_ratio); > > + bld.MOV(shuffle_component_i, > > + retype(offset(src, bld, i + first_component), > > shuffle_type)); > > + } > > + } else { > > + /* Source is unshuffled into destination */ > > + unsigned size_ratio = type_sz(src.type) / type_sz(dst.type); > > +#ifndef NDEBUG > > + boolean src_dst_overlap = regions_overlap(dst, > > + type_sz(dst.type) * bld.dispatch_width() * components, > > + offset(src, bld, first_component / size_ratio), > > > > > > Again, I don't think we want to divide here. > > Sure. > > > > > > > + type_sz(src.type) * bld.dispatch_width() * > > + DIV_ROUND_UP(components + first_component % size_ratio, > > size_ratio)); > > +#endif > > + assert(!src_dst_overlap); > > > > > > As I said above, everything can go in the assert. > > Ok. > > > Other than that, this looks great. > > Thanks for the review. I'm sending tomorrow a v2 with the fixes. > > > --Jason > > > > > > > > + brw_reg_type shuffle_type = > > + brw_reg_type_from_bit_size(8 * type_sz(dst.type), > > + BRW_REGISTER_TYPE_D); > > + for (unsigned i = 0; i < components; i++) { > > + fs_reg shuffle_component_i = > > + subscript(offset(src, bld, (first_component + i) / > > size_ratio), > > + shuffle_type, (first_component + i) % > > size_ratio); > > + bld.MOV(retype(offset(dst, bld, i), shuffle_type), > > + shuffle_component_i); > > + } > > + } > > +} > > + > > fs_reg > > setup_imm_df(const fs_builder &bld, double v) > > { > > -- > > 2.17.1 > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > <mailto:mesa-dev@lists.freedesktop.org> > <mailto:mesa-dev@lists.freedesktop.org > <mailto:mesa-dev@lists.freedesktop.org>> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > <https://lists.freedesktop.org/mailman/listinfo/mesa-dev> > > <https://lists.freedesktop.org/mailman/listinfo/mesa-dev > <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>> > > > > > > > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org <mailto:mesa-dev@lists.freedesktop.org> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > <https://lists.freedesktop.org/mailman/listinfo/mesa-dev> > > > > > > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev