> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Monday, July 29, 2024 9:43 PM > To: Richard Biener <rguent...@suse.de> > Cc: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc- > patc...@gcc.gnu.org > Subject: Re: Support streaming of poly_int for offloading when it's > degree <= accel's NUM_POLY_INT_COEFFS > > External email: Use caution opening links or attachments > > > Richard Biener <rguent...@suse.de> writes: > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote: > > > >> Hi Richard, > >> Thanks for your suggestions on RFC email, the attached patch adds > support for streaming of poly_int when it's degree <= accel's > NUM_POLY_INT_COEFFS. > >> The patch changes streaming of poly_int as follows: > >> > >> Streaming out poly_int: > >> > >> degree = poly_int.degree(); > >> stream out degree; > >> for (i = 0; i < degree; i++) > >> stream out poly_int.coeffs[i]; > >> > >> Streaming in poly_int: > >> > >> stream in degree; > >> if (degree > NUM_POLY_INT_COEFFS) > >> fatal_error(); > >> stream in coeffs; > >> // Set remaining coeffs to zero in case degree < accel's > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++) > >> poly_int.coeffs[i] = 0; > >> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64- > linux-gnu. > >> LTO bootstrap+test on x86_64-linux-gnu in progress. > >> > >> I am not quite sure how to test it for offloading since currently > it's (entirely) broken for aarch64->nvptx. > >> I can give a try with x86_64->nvptx offloading if required (altho I > >> guess LTO bootstrap should test streaming changes ?) > > > > + unsigned degree > > + = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned > > HOST_WIDE_INT)); > > > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary. > > I'd say we should constrain it to a reasonable upper bound, like 2? > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS > in > > poly-int.h and constrain NUM_POLY_INT_COEFFS. > > > > The patch looks reasonable over all, but Richard S. should have a > say > > about the abstraction you chose and the poly-int adjustment. > > Sorry if this has been discussed already, but could we instead stream > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int? > It's a target invariant, and poly_int has wormed its way into lots of > things by now :) Hi Richard, The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int (and streams-out coeffs only up to degree, ignoring the higher zero coeffs). During streaming-in, it reads back the degree (and streamed coeffs upto degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0 (similar to zero-extension). I posted more details in RFC: https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html
The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined by any target, and uses that for packing/unpacking degree of poly_int to/from bitstream, which should make it independent of the type used for representing NUM_POLY_INT_COEFFS by the target. Bootstrap+test and LTO bootstrap+test in progress on aarch64-linux-gnu. Does the patch look OK ? Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> Thanks, Prathamesh > > Thanks, > Richard
Partially support streaming of poly_int for offloading. Support streaming of poly_int for offloading when it's degree doesn't exceed accel's NUM_POLY_INT_COEFFS. The patch changes streaming of poly_int as follows: Streaming out poly_int: degree = poly_int.degree(); stream out degree; for (i = 0; i < degree; i++) stream out poly_int.coeffs[i]; Streaming in poly_int (for accelerator): stream in degree; if (degree > NUM_POLY_INT_COEFFS) fatal_error(); stream in coeffs; // Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++) poly_int.coeffs[i] = 0; The patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent number of bits needed to represent value of max NUM_POLY_INT_COEFFS for any target, which should make packing/unpacking degree of poly_int to/from bitstream independent of the type used to represent NUM_POLY_INT_COEFF by the target. gcc/ChangeLog: * data-streamer-in.cc (streamer_read_poly_uint64): Stream in poly_int degree and call poly_int_read_common. (streamer_read_poly_int64): Likewise. * data-streamer-out.cc (streamer_write_poly_uint64): Stream out poly_int degree. (streamer_write_poly_int64): Likewise. * data-streamer.h (bp_pack_poly_value): Likewise. (poly_int_read_common): New function template. (bp_unpack_poly_value): Stream in poly_int degree and call poly_int_read_common. * poly-int.h (poly_int::degree): New method. (MAX_NUM_POLY_INT_COEFFS_BITS): New macro. * tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Stream in POLY_INT_CST degree, issue a fatal_error if degree is greater than NUM_POLY_INT_COEFFS, and zero out remaining coeffs. * tree-streamer-out.cc (write_ts_poly_tree_pointers): Calculate and stream out POLY_INT_CST degree. Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc index 7dce2928ef0..91cece39b05 100644 --- a/gcc/data-streamer-in.cc +++ b/gcc/data-streamer-in.cc @@ -182,10 +182,11 @@ streamer_read_hwi (class lto_input_block *ib) poly_uint64 streamer_read_poly_uint64 (class lto_input_block *ib) { + unsigned degree = streamer_read_uhwi (ib); poly_uint64 res; - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) + for (unsigned int i = 0; i < degree; ++i) res.coeffs[i] = streamer_read_uhwi (ib); - return res; + return poly_int_read_common (res, degree); } /* Read a poly_int64 from IB. */ @@ -193,10 +194,11 @@ streamer_read_poly_uint64 (class lto_input_block *ib) poly_int64 streamer_read_poly_int64 (class lto_input_block *ib) { + unsigned degree = streamer_read_uhwi (ib); poly_int64 res; - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) + for (unsigned int i = 0; i < degree; ++i) res.coeffs[i] = streamer_read_hwi (ib); - return res; + return poly_int_read_common (res, degree); } /* Read gcov_type value from IB. */ diff --git a/gcc/data-streamer-out.cc b/gcc/data-streamer-out.cc index c237e30f704..b0fb9dedb24 100644 --- a/gcc/data-streamer-out.cc +++ b/gcc/data-streamer-out.cc @@ -227,7 +227,9 @@ streamer_write_hwi (struct output_block *ob, HOST_WIDE_INT work) void streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work) { - for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i) + unsigned degree = work.degree (); + streamer_write_uhwi_stream (ob->main_stream, degree); + for (unsigned i = 0; i < degree; ++i) streamer_write_uhwi_stream (ob->main_stream, work.coeffs[i]); } @@ -236,7 +238,9 @@ streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work) void streamer_write_poly_int64 (struct output_block *ob, poly_int64 work) { - for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i) + unsigned degree = work.degree (); + streamer_write_uhwi_stream (ob->main_stream, degree); + for (unsigned i = 0; i < degree; ++i) streamer_write_hwi_stream (ob->main_stream, work.coeffs[i]); } diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h index 6a2596134ce..ad676cc9287 100644 --- a/gcc/data-streamer.h +++ b/gcc/data-streamer.h @@ -142,7 +142,9 @@ bp_pack_poly_value (struct bitpack_d *bp, const poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> &val, unsigned nbits) { - for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i) + unsigned degree = val.degree (); + bp_pack_value (bp, degree, MAX_NUM_POLY_INT_COEFFS_BITS); + for (unsigned i = 0; i < degree; ++i) bp_pack_value (bp, val.coeffs[i], nbits); } @@ -194,15 +196,33 @@ bp_unpack_value (struct bitpack_d *bp, unsigned nbits) return val & mask; } +template<unsigned N, typename C> +inline poly_int<N, C> +poly_int_read_common (poly_int<N, C> x, unsigned degree) +{ + if (degree > NUM_POLY_INT_COEFFS) + fatal_error (input_location, + "%<poly_int%> degree (%u) exceeds value of " + "%<NUM_POLY_INT_COEFFS%> (%u)", degree, + NUM_POLY_INT_COEFFS); + for (unsigned i = degree; i < NUM_POLY_INT_COEFFS; i++) + x.coeffs[i] = 0; + return x; +} + /* Unpacks a polynomial value from the bit-packing context BP in which each coefficient has NBITS bits. */ inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits) { + unsigned degree + = bp_unpack_value (bp, MAX_NUM_POLY_INT_COEFFS_BITS); + poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> x; - for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i) + for (unsigned i = 0; i < degree; i++) x.coeffs[i] = bp_unpack_value (bp, nbits); - return x; + + return poly_int_read_common (x, degree); } diff --git a/gcc/poly-int.h b/gcc/poly-int.h index e3f8d4df716..5f2272313ed 100644 --- a/gcc/poly-int.h +++ b/gcc/poly-int.h @@ -354,6 +354,10 @@ struct poly_result<T1, T2, 2> ? (void) ((RES).coeffs[I] = VALUE) \ : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE))) +/* Number of bits needed to represent maximum value of + NUM_POLY_INT_COEFFS defined by any target. */ +#define MAX_NUM_POLY_INT_COEFFS_BITS (2) + /* poly_int_full and poly_int_hungry are used internally within poly_int for delegated initializers. poly_int_full indicates that a parameter pack has enough elements to initialize every coefficient. poly_int_hungry @@ -422,6 +426,8 @@ public: poly_int<N, HOST_WIDE_INT> force_shwi () const; poly_int<N, unsigned HOST_WIDE_INT> force_uhwi () const; + unsigned degree (void) const; + #if POLY_INT_CONVERSION operator C () const; #endif @@ -678,6 +684,18 @@ poly_int<N, C>::force_uhwi () const return r; } +/* Find leading non-zero coeff. In case all coeffs are zero, + treat it as degree-1 poly_int. */ + +template<unsigned N, typename C> +inline unsigned poly_int<N, C>::degree () const +{ + unsigned i; + for (i = NUM_POLY_INT_COEFFS - 1; i > 0 && !this->coeffs[i]; i--) + ; + return i + 1; +} + #if POLY_INT_CONVERSION /* Provide a conversion operator to constants. */ diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc index c248a74f7a1..2394ac209f5 100644 --- a/gcc/tree-streamer-in.cc +++ b/gcc/tree-streamer-in.cc @@ -671,8 +671,20 @@ static void lto_input_ts_poly_tree_pointers (class lto_input_block *ib, class data_in *data_in, tree expr) { - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) + unsigned degree = streamer_read_uhwi (ib); + if (degree > NUM_POLY_INT_COEFFS) + fatal_error (input_location, + "%<poly_int%> degree (%u) exceeds value of " + "%<NUM_POLY_INT_COEFFS%> (%u)", degree, + NUM_POLY_INT_COEFFS); + + unsigned i; + for (i = 0; i < degree; ++i) POLY_INT_CST_COEFF (expr, i) = stream_read_tree_ref (ib, data_in); + + tree coeff_type = TREE_TYPE (POLY_INT_CST_COEFF (expr, 0)); + for (; i < NUM_POLY_INT_COEFFS; ++i) + POLY_INT_CST_COEFF (expr, i) = build_zero_cst (coeff_type); } diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc index b7205287ffb..e28616b9a7a 100644 --- a/gcc/tree-streamer-out.cc +++ b/gcc/tree-streamer-out.cc @@ -576,7 +576,14 @@ write_ts_vector_tree_pointers (struct output_block *ob, tree expr) static void write_ts_poly_tree_pointers (struct output_block *ob, tree expr) { - for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i) + unsigned i; + for (i = NUM_POLY_INT_COEFFS - 1; + i > 0 && integer_zerop (POLY_INT_CST_COEFF (expr, i)); + i--) + ; + unsigned degree = i + 1; + streamer_write_uhwi_stream (ob->main_stream, degree); + for (i = 0; i < degree; ++i) stream_write_tree_ref (ob, POLY_INT_CST_COEFF (expr, i)); }