64-bit shuffle_src_to_dst function

Chema Casanova Thu, 14 Jun 2018 06:15:29 -0700

El 14/06/18 a las 02:46, Jason Ekstrand escribió:
> On Wed, Jun 13, 2018 at 5:07 PM, Chema Casanova <jmcasan...@igalia.com
> <mailto:jmcasan...@igalia.com>> wrote:
> 
>     On 13/06/18 22:46, Jason Ekstrand wrote:
>     > On Sat, Jun 9, 2018 at 4:13 AM, Jose Maria Casanova Crespo
>     > <jmcasan...@igalia.com <mailto:jmcasan...@igalia.com>
>     <mailto:jmcasan...@igalia.com <mailto:jmcasan...@igalia.com>>> wrote:
>     >
>     >     This new function takes care of shuffle/unshuffle components of a
>     >     particular bit-size in components with a different bit-size.
>     >
>     >     If source type size is smaller than destination type size the
>     operation
>     >     needed is a component shuffle. The opposite case would be an
>     unshuffle.
>     >
>     >     The operation allows to skip first_component number of
>     components from
>     >     the source.
>     >
>     >     Shuffle MOVs are retyped using integer types avoiding problems
>     with
>     >     denorms
>     >     and float types. This allows to simplify uses of shuffle functions
>     >     that are
>     >     dealing with these retypes individually.
>     >
>     >     Now there is a new restriction so source and destination can
>     not overlap
>     >     anymore when calling this suffle function. Following patches that
>     >     migrate
>     >     to use this new function will take care individually of
>     avoiding source
>     >     and destination overlaps.
>     >     ---
>     >      src/intel/compiler/brw_fs_nir.cpp | 92
>     +++++++++++++++++++++++++++++++
>     >      1 file changed, 92 insertions(+)
>     >
>     >     diff --git a/src/intel/compiler/brw_fs_nir.cpp
>     >     b/src/intel/compiler/brw_fs_nir.cpp
>     >     index 166da0aa6d7..1a9d3c41d1d 100644
>     >     --- a/src/intel/compiler/brw_fs_nir.cpp
>     >     +++ b/src/intel/compiler/brw_fs_nir.cpp
>     >     @@ -5362,6 +5362,98 @@ shuffle_16bit_data_for_32bit_write(const
>     >     fs_builder &bld,
>     >         }
>     >      }
>     >
>     >     +/*
>     >     + * This helper takes a source register and un/shuffles it
>     into the
>     >     destination
>     >     + * register.
>     >     + *
>     >     + * If source type size is smaller than destination type size the
>     >     operation
>     >     + * needed is a component shuffle. The opposite case would be an
>     >     unshuffle. If
>     >     + * source/destination type size is equal a shuffle is done that
>     >     would be
>     >     + * equivalent to a simple MOV.
>     >
>     >
>     > There's a sticky bit here if we want this to work with 64-bit types on
>     > gen7 and earlier because we only have DF there and not Q so the
>     > brw_reg_type_from_bit_size below doesn't work.  If we care about that
>     > case (and I'm not convinced we do), it should be easy enough to add a
>     > type_sz(src.type) == type_sz(dst.type) case which just does MOVs from
>     > source to dest.
> 
>     At this moment, current uses of this function are to read from 32-bits
>     or to write to 32-bit. But I think that for completeness if would be
>     nice to have all cases covered. The option of doing the MOVs in the case
>     of equality (that would be quite normal) saves us to do the shuffle
>     calculus for the simple case. So I'm going for it.
> 
>     >     + *
>     >     + * For example, if source is a 16-bit type and destination is
>     >     32-bit. A 3
>     >     + * components .xyz 16-bit vector on SIMD8 would be.
>     >     + *
>     >     + *    |x1|x2|x3|x4|x5|x6|x7|x8|y1|y2|y3|y4|y5|y6|y7|y8|
>     >     + *    |z1|z2|z3|z4|z5|z6|z7|z8|  |  |  |  |  |  |  |  |
>     >     + *
>     >     + * This helper will return the following 2 32-bit components with
>     >     the 16-bit
>     >     + * values shuffled:
>     >     + *
>     >     + *    |x1 y1|x2 y2|x3 y3|x4 y4|x5 y5|x6 y6|x7 y7|x8 y8|
>     >     + *    |z1   |z2   |z3   |z4   |z5   |z6   |z7   |z8   |
>     >     + *
>     >     + * For unshuffle, the example would be the opposite, a 64-bit
>     type
>     >     source
>     >     + * and a 32-bit destination. A 2 component .xy 64-bit vector
>     on SIMD8
>     >     + * would be:
>     >     + *
>     >     + *    | x1l   x1h | x2l   x2h | x3l   x3h | x4l   x4h |
>     >     + *    | x5l   x5h | x6l   x6h | x7l   x7h | x8l   x8h |
>     >     + *    | y1l   y1h | y2l   y2h | y3l   y3h | y4l   y4h |
>     >     + *    | y5l   y5h | y6l   y6h | y7l   y7h | y8l   y8h |
>     >     + *
>     >     + * The returned result would be the following 4 32-bit components
>     >     unshuffled:
>     >     + *
>     >     + *    | x1l | x2l | x3l | x4l | x5l | x6l | x7l | x8l |
>     >     + *    | x1h | x2h | x3h | x4h | x5h | x6h | x7h | x8h |
>     >     + *    | y1l | y2l | y3l | y4l | y5l | y6l | y7l | y8l |
>     >     + *    | y1h | y2h | y3h | y4h | y5h | y6h | y7h | y8h |
>     >     + *
>     >     + * - Source and destination register must not be overlapped.
>     >     + * - first_component parameter allows skipping source components.
>     >     + */
>     >     +void
>     >     +shuffle_src_to_dst(const fs_builder &bld,
>     >     +                   const fs_reg &dst,
>     >     +                   const fs_reg &src,
>     >     +                   uint32_t first_component,
>     >     +                   uint32_t components)
>     >     +{
>     >     +   if (type_sz(src.type) <= type_sz(dst.type)) {
>     >     +      /* Source is shuffled into destination */
>     >     +      unsigned size_ratio = type_sz(dst.type) /
>     type_sz(src.type);
>     >     +#ifndef NDEBUG
>     >     +      boolean src_dst_overlap = regions_overlap(dst,
>     >     +         type_sz(dst.type) * bld.dispatch_width() * components,
>     >     +         offset(src, bld, first_component * size_ratio),
>     >
>     >
>     > Why do you need to multiply first_component by size_ratio?  It's
>     already
>     > in units of source components.
> 
>     Yes, that's wrong. I forgot to update this assertion with my lasts
>     refactorings of this series, and as nowadays there are no overlaps I
>     didn't realize of the error. Both calls to regions_overlap have all
>     parameters wrong.
> 
>     I should also include in the comment than the "component" parameter is
>     measured in terms of the smaller type between source and destination. As
>     we are un/shuffling the smaller components from/into a bigger one.


> Yes, please.  Is that really what we want though?  Do we have cases
> where we are going from a big type to a small type and want a small
> component offset?  I suppose I could see it happening.  It's ok if it is
> what we want; I just want to make sure. :-)

Maybe it would make sense to have first_component_src and
first_component_dst. In this series it would simplify some case where we
do an offset of the destination register like in:

  shuffle_from_32bit_read(bld,
     offset(orig_dest, bld, iter * 2),
            retype(dest, BRW_REGISTER_TYPE_D),
            0, num_components);
  }

But in my local version over this series of VK_KHR_16bit_storage
input/output at store_output I have a more complex case that could take
advantage if we had a new function
shuffle_for_32bit_write_with_register() where my store_output code:

=================================
if (type_size != 4) {
    src = shuffle_for_32bit_write(bld, src, 0, num_components);
    num_components = DIV_ROUND_UP(num_components * 2, 8 / type_size);
}

fs_reg new_dest = retype(offset(outputs[instr->const_index[0]], bld,
                                4 * const_offset->u32[0]), src.type);

for (unsigned j = 0; j < num_components; j++) {
   bld.MOV(offset(new_dest, bld, j + first_component),
           offset(src, bld, j));
}
=================================

Could become:

========================
fs_reg new_dest = retype(offset(outputs[instr->const_index[0]], bld,
                                4 * const_offset->u32[0]),
                                BRW_REGISTER_TYPE_D);

shuffle_for_32bit_write_with_register (new_dest, first_component,
                                       src, 0  /* src_first_component */
                                       num_components);
===========================

If you think this is useful, I can prepare do a follow up.

>     >     +         type_sz(src.type) * bld.dispatch_width() * components *
>     >     size_ratio);
>     >     +#endif
>     >     +      assert(!src_dst_overlap);
>     > 
>     > 
>     > If the only thing you're doing with src_dst_overlap is to assert on it,
>     > you may as well put the regions_overlap call inside the assert and drop
>     > the #ifndef.
> 
>     Ok, I just thought it was too long function. I'll change it.
> 
>     >     +
>     >     +      brw_reg_type shuffle_type =
>     >     +         brw_reg_type_from_bit_size(8 * type_sz(src.type),
>     >     +                                    BRW_REGISTER_TYPE_D);
>     >     +      for (unsigned i = 0; i < components; i++) {
>     >     +         fs_reg shuffle_component_i =
>     >     +            subscript(offset(dst, bld, i / size_ratio),
>     >     +                      shuffle_type, i % size_ratio);
>     >     +         bld.MOV(shuffle_component_i,
>     >     +                 retype(offset(src, bld, i + first_component),
>     >     shuffle_type));
>     >     +      }
>     >     +   } else {
>     >     +      /* Source is unshuffled into destination */
>     >     +      unsigned size_ratio = type_sz(src.type) / type_sz(dst.type);
>     >     +#ifndef NDEBUG
>     >     +      boolean src_dst_overlap = regions_overlap(dst,
>     >     +         type_sz(dst.type) * bld.dispatch_width() * components,
>     >     +         offset(src, bld, first_component / size_ratio),
>     > 
>     > 
>     > Again, I don't think we want to divide here.
> 
>     Sure.
> 
>     >  
>     > 
>     >     +         type_sz(src.type) * bld.dispatch_width() *
>     >     +         DIV_ROUND_UP(components + first_component % size_ratio,
>     >     size_ratio));
>     >     +#endif
>     >     +      assert(!src_dst_overlap);
>     > 
>     > 
>     > As I said above, everything can go in the assert.
> 
>     Ok.
> 
>     > Other than that, this looks great.
> 
>     Thanks for the review. I'm sending tomorrow a v2 with the fixes.
> 
>     > --Jason
>     > 
>     >  
>     > 
>     >     +      brw_reg_type shuffle_type =
>     >     +         brw_reg_type_from_bit_size(8 * type_sz(dst.type),
>     >     +                                    BRW_REGISTER_TYPE_D);
>     >     +      for (unsigned i = 0; i < components; i++) {
>     >     +         fs_reg shuffle_component_i =
>     >     +            subscript(offset(src, bld, (first_component + i) /
>     >     size_ratio),
>     >     +                      shuffle_type, (first_component + i) %
>     >     size_ratio);
>     >     +         bld.MOV(retype(offset(dst, bld, i), shuffle_type),
>     >     +                 shuffle_component_i);
>     >     +      }
>     >     +   }
>     >     +}
>     >     +
>     >      fs_reg
>     >      setup_imm_df(const fs_builder &bld, double v)
>     >      {
>     >     -- 
>     >     2.17.1
>     > 
>     >     _______________________________________________
>     >     mesa-dev mailing list
>     >     mesa-dev@lists.freedesktop.org
>     <mailto:mesa-dev@lists.freedesktop.org>
>     <mailto:mesa-dev@lists.freedesktop.org
>     <mailto:mesa-dev@lists.freedesktop.org>>
>     >     https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>     >     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > mesa-dev mailing list
>     > mesa-dev@lists.freedesktop.org <mailto:mesa-dev@lists.freedesktop.org>
>     > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>     <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>     >
> 
> 
> 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/14] intel/compiler: general 8/16/32/64-bit shuffle_src_to_dst function

Reply via email to