> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Monday, July 29, 2024 9:43 PM
> To: Richard Biener <rguent...@suse.de>
> Cc: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> Richard Biener <rguent...@suse.de> writes:
> > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
> >
> >> Hi Richard,
> >> Thanks for your suggestions on RFC email, the attached patch adds
> support for streaming of poly_int when it's degree <= accel's
> NUM_POLY_INT_COEFFS.
> >> The patch changes streaming of poly_int as follows:
> >>
> >> Streaming out poly_int:
> >>
> >> degree = poly_int.degree();
> >> stream out degree;
> >> for (i = 0; i < degree; i++)
> >>   stream out poly_int.coeffs[i];
> >>
> >> Streaming in poly_int:
> >>
> >> stream in degree;
> >> if (degree > NUM_POLY_INT_COEFFS)
> >>   fatal_error();
> >> stream in coeffs;
> >> // Set remaining coeffs to zero in case degree < accel's
> >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
> >>   poly_int.coeffs[i] = 0;
> >>
> >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
> linux-gnu.
> >> LTO bootstrap+test on x86_64-linux-gnu in progress.
> >>
> >> I am not quite sure how to test it for offloading since currently
> it's (entirely) broken for aarch64->nvptx.
> >> I can give a try with x86_64->nvptx offloading if required (altho I
> >> guess LTO bootstrap should test streaming changes ?)
> >
> > +  unsigned degree
> > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> > HOST_WIDE_INT));
> >
> > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> > I'd say we should constrain it to a reasonable upper bound, like 2?
> > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
> in
> > poly-int.h and constrain NUM_POLY_INT_COEFFS.
> >
> > The patch looks reasonable over all, but Richard S. should have a
> say
> > about the abstraction you chose and the poly-int adjustment.
> 
> Sorry if this has been discussed already, but could we instead stream
> NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
> It's a target invariant, and poly_int has wormed its way into lots of
> things by now :)
Hi Richard,
The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int 
(and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
During streaming-in, it reads back the degree (and streamed coeffs upto degree) 
and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
(as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If 
degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
(similar to zero-extension). I posted more details in RFC: 
https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html

The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to 
represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined by 
any target,
and uses that for packing/unpacking degree of poly_int to/from bitstream, which 
should make it independent of the type used for representing 
NUM_POLY_INT_COEFFS by
the target.

Bootstrap+test and LTO bootstrap+test in progress on aarch64-linux-gnu.
Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>

Thanks,
Prathamesh
> 
> Thanks,
> Richard
Partially support streaming of poly_int for offloading.

Support streaming of poly_int for offloading when it's degree doesn't exceed
accel's NUM_POLY_INT_COEFFS.

The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int (for accelerator):

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

The patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent
number of bits needed to represent value of max NUM_POLY_INT_COEFFS for any
target, which should make packing/unpacking degree of poly_int to/from
bitstream independent of the type used to represent NUM_POLY_INT_COEFF by the
target.

gcc/ChangeLog:

        * data-streamer-in.cc (streamer_read_poly_uint64): Stream in poly_int
        degree and call poly_int_read_common. 
        (streamer_read_poly_int64): Likewise.
        * data-streamer-out.cc (streamer_write_poly_uint64): Stream out poly_int
        degree.
        (streamer_write_poly_int64): Likewise.
        * data-streamer.h (bp_pack_poly_value): Likewise.
        (poly_int_read_common): New function template.
        (bp_unpack_poly_value): Stream in poly_int degree and call
        poly_int_read_common.
        * poly-int.h (poly_int::degree): New method.
        (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
        * tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Stream in
        POLY_INT_CST degree, issue a fatal_error if degree is greater than
        NUM_POLY_INT_COEFFS, and zero out remaining coeffs. 
        * tree-streamer-out.cc (write_ts_poly_tree_pointers): Calculate and
        stream out POLY_INT_CST degree.

Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com>

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..91cece39b05 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,11 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
     res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read a poly_int64 from IB.  */
@@ -193,10 +194,11 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
 poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
     res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer-out.cc b/gcc/data-streamer-out.cc
index c237e30f704..b0fb9dedb24 100644
--- a/gcc/data-streamer-out.cc
+++ b/gcc/data-streamer-out.cc
@@ -227,7 +227,9 @@ streamer_write_hwi (struct output_block *ob, HOST_WIDE_INT 
work)
 void
 streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
     streamer_write_uhwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
@@ -236,7 +238,9 @@ streamer_write_poly_uint64 (struct output_block *ob, 
poly_uint64 work)
 void
 streamer_write_poly_int64 (struct output_block *ob, poly_int64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
     streamer_write_hwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..ad676cc9287 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -142,7 +142,9 @@ bp_pack_poly_value (struct bitpack_d *bp,
                    const poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> &val,
                    unsigned nbits)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = val.degree ();
+  bp_pack_value (bp, degree, MAX_NUM_POLY_INT_COEFFS_BITS);
+  for (unsigned i = 0; i < degree; ++i)
     bp_pack_value (bp, val.coeffs[i], nbits);
 }
 
@@ -194,15 +196,33 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits)
   return val & mask;
 }
 
+template<unsigned N, typename C>
+inline poly_int<N, C>
+poly_int_read_common (poly_int<N, C> x, unsigned degree)
+{
+  if (degree > NUM_POLY_INT_COEFFS)
+    fatal_error (input_location,
+                "%<poly_int%> degree (%u) exceeds value of "
+                "%<NUM_POLY_INT_COEFFS%> (%u)", degree,
+                NUM_POLY_INT_COEFFS);
+  for (unsigned i = degree; i < NUM_POLY_INT_COEFFS; i++)
+    x.coeffs[i] = 0;
+  return x;
+}
+
 /* Unpacks a polynomial value from the bit-packing context BP in which each
    coefficient has NBITS bits.  */
 inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
 bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
 {
+  unsigned degree
+    = bp_unpack_value (bp, MAX_NUM_POLY_INT_COEFFS_BITS);
+
   poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned i = 0; i < degree; i++)
     x.coeffs[i] = bp_unpack_value (bp, nbits);
-  return x;
+
+  return poly_int_read_common (x, degree);
 }
 
 
diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index e3f8d4df716..5f2272313ed 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2>
    ? (void) ((RES).coeffs[I] = VALUE) \
    : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
 
+/* Number of bits needed to represent maximum value of
+   NUM_POLY_INT_COEFFS defined by any target.  */
+#define MAX_NUM_POLY_INT_COEFFS_BITS (2)
+
 /* poly_int_full and poly_int_hungry are used internally within poly_int
    for delegated initializers.  poly_int_full indicates that a parameter
    pack has enough elements to initialize every coefficient.  poly_int_hungry
@@ -422,6 +426,8 @@ public:
   poly_int<N, HOST_WIDE_INT> force_shwi () const;
   poly_int<N, unsigned HOST_WIDE_INT> force_uhwi () const;
 
+  unsigned degree (void) const;
+
 #if POLY_INT_CONVERSION
   operator C () const;
 #endif
@@ -678,6 +684,18 @@ poly_int<N, C>::force_uhwi () const
   return r;
 }
 
+/* Find leading non-zero coeff. In case all coeffs are zero,
+   treat it as degree-1 poly_int.  */
+
+template<unsigned N, typename C>
+inline unsigned poly_int<N, C>::degree () const
+{
+  unsigned i;
+  for (i = NUM_POLY_INT_COEFFS - 1; i > 0 && !this->coeffs[i]; i--)
+    ;
+  return i + 1;
+}
+
 #if POLY_INT_CONVERSION
 /* Provide a conversion operator to constants.  */
 
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c248a74f7a1..2394ac209f5 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -671,8 +671,20 @@ static void
 lto_input_ts_poly_tree_pointers (class lto_input_block *ib,
                                 class data_in *data_in, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = streamer_read_uhwi (ib);
+  if (degree > NUM_POLY_INT_COEFFS)
+    fatal_error (input_location,
+                "%<poly_int%> degree (%u) exceeds value of "
+                "%<NUM_POLY_INT_COEFFS%> (%u)", degree,
+                NUM_POLY_INT_COEFFS);
+
+  unsigned i;
+  for (i = 0; i < degree; ++i)
     POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in);
+
+  tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0));
+  for (; i < NUM_POLY_INT_COEFFS; ++i)
+    POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type);
 }
 
 
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index b7205287ffb..e28616b9a7a 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -576,7 +576,14 @@ write_ts_vector_tree_pointers (struct output_block *ob, 
tree expr)
 static void
 write_ts_poly_tree_pointers (struct output_block *ob, tree expr)
 {
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned i;
+  for (i = NUM_POLY_INT_COEFFS - 1;
+       i > 0 && integer_zerop (POLY_INT_CST_COEFF (expr, i));
+       i--)
+    ;
+  unsigned degree = i + 1;
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (i = 0; i < degree; ++i)
     stream_write_tree_ref (ob, POLY_INT_CST_COEFF (expr, i));
 }
 

Reply via email to