On Tue, 30 Jul 2024, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Tue, 30 Jul 2024, Richard Sandiford wrote:
> >
> >> Richard Biener <rguent...@suse.de> writes:
> >> > On Tue, 30 Jul 2024, Prathamesh Kulkarni wrote:
> >> >
> >> >> 
> >> >> 
> >> >> > -----Original Message-----
> >> >> > From: Richard Sandiford <richard.sandif...@arm.com>
> >> >> > Sent: Monday, July 29, 2024 9:43 PM
> >> >> > To: Richard Biener <rguent...@suse.de>
> >> >> > Cc: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc-
> >> >> > patc...@gcc.gnu.org
> >> >> > Subject: Re: Support streaming of poly_int for offloading when it's
> >> >> > degree <= accel's NUM_POLY_INT_COEFFS
> >> >> > 
> >> >> > External email: Use caution opening links or attachments
> >> >> > 
> >> >> > 
> >> >> > Richard Biener <rguent...@suse.de> writes:
> >> >> > > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
> >> >> > >
> >> >> > >> Hi Richard,
> >> >> > >> Thanks for your suggestions on RFC email, the attached patch adds
> >> >> > support for streaming of poly_int when it's degree <= accel's
> >> >> > NUM_POLY_INT_COEFFS.
> >> >> > >> The patch changes streaming of poly_int as follows:
> >> >> > >>
> >> >> > >> Streaming out poly_int:
> >> >> > >>
> >> >> > >> degree = poly_int.degree();
> >> >> > >> stream out degree;
> >> >> > >> for (i = 0; i < degree; i++)
> >> >> > >>   stream out poly_int.coeffs[i];
> >> >> > >>
> >> >> > >> Streaming in poly_int:
> >> >> > >>
> >> >> > >> stream in degree;
> >> >> > >> if (degree > NUM_POLY_INT_COEFFS)
> >> >> > >>   fatal_error();
> >> >> > >> stream in coeffs;
> >> >> > >> // Set remaining coeffs to zero in case degree < accel's
> >> >> > >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
> >> >> > >>   poly_int.coeffs[i] = 0;
> >> >> > >>
> >> >> > >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
> >> >> > linux-gnu.
> >> >> > >> LTO bootstrap+test on x86_64-linux-gnu in progress.
> >> >> > >>
> >> >> > >> I am not quite sure how to test it for offloading since currently
> >> >> > it's (entirely) broken for aarch64->nvptx.
> >> >> > >> I can give a try with x86_64->nvptx offloading if required (altho I
> >> >> > >> guess LTO bootstrap should test streaming changes ?)
> >> >> > >
> >> >> > > +  unsigned degree
> >> >> > > +    = bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> >> >> > > HOST_WIDE_INT));
> >> >> > >
> >> >> > > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> >> >> > > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> >> >> > > I'd say we should constrain it to a reasonable upper bound, like 2?
> >> >> > > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
> >> >> > in
> >> >> > > poly-int.h and constrain NUM_POLY_INT_COEFFS.
> >> >> > >
> >> >> > > The patch looks reasonable over all, but Richard S. should have a
> >> >> > say
> >> >> > > about the abstraction you chose and the poly-int adjustment.
> >> >> > 
> >> >> > Sorry if this has been discussed already, but could we instead stream
> >> >> > NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
> >> >> > It's a target invariant, and poly_int has wormed its way into lots of
> >> >> > things by now :)
> >> >> Hi Richard,
> >> >> The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of 
> >> >> poly_int (and streams-out coeffs only up to degree, ignoring the higher 
> >> >> zero coeffs).
> >> >> During streaming-in, it reads back the degree (and streamed coeffs upto 
> >> >> degree) and issues an error if degree > accel's NUM_POLY_INT_COEFFS, 
> >> >> since we can't
> >> >> (as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS 
> >> >> < N. If degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are 
> >> >> set to 0
> >> >> (similar to zero-extension). I posted more details in RFC: 
> >> >> https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html
> >> 
> >> It's not clear to me what the plan is for VLA host + VLS offloading.
> >> Is the streamed data guaranteed to be "clean" of any host-only
> >> VLA stuff?  E.g. if code does:
> >> 
> >>   #include <arm_sve.h>
> >> 
> >>   svint32_t *ptr:
> >>   void foo(svint32_t);
> >> 
> >>   #pragma GCC target "+nosve"
> >> 
> >>   ...offloading...
> >> 
> >> is there a guarantee that the offload target won't see the definition
> >> of ptr and foo?
> >
> > No.  If it sees any unsupported poly-* the offload compilation will fail.
> 
> Could it fail even if the offloading code doesn't refer to ptr and foo
> directly?  Or is only "relevant" stuff streamed?

Only "relevant" stuff should be streamed - the offload code and all
trees refered to.

> > I think all current issues are because of poly-* leaking in for cases
> > where a non-poly would have worked fine, but I have not had a look
> > myself.
> 
> One of the cases that Prathamesh mentions is streaming the mode sizes.
> Are those modes "offload target modes" or "host modes"?  It seems like
> it shouldn't be an error for the host to have VLA modes per se.  It's
> just that those modes can't be used in the host/offload interface.

There's a requirement that a mode mapping exists from the host to
target enum machine_mode.  I don't remember exactly how we compute
that mapping and whether streaming of some data (and thus poly-int)
are part of this.

Richard.

> Thanks,
> Richard
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to