On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> Only "relevant" stuff should be streamed - the offload code and all
> trees refered to.

Yeah.

> > > I think all current issues are because of poly-* leaking in for cases
> > > where a non-poly would have worked fine, but I have not had a look
> > > myself.
> > 
> > One of the cases that Prathamesh mentions is streaming the mode sizes.
> > Are those modes "offload target modes" or "host modes"?  It seems like
> > it shouldn't be an error for the host to have VLA modes per se.  It's
> > just that those modes can't be used in the host/offload interface.
> 
> There's a requirement that a mode mapping exists from the host to
> target enum machine_mode.  I don't remember exactly how we compute
> that mapping and whether streaming of some data (and thus poly-int)
> are part of this.

During streaming out, the code records what machine modes are being streamed
(in streamer_mode_table).
For those modes (and their inner modes) then lto_write_mode_table
should stream a table with mode details like class, bits, size, inner mode,
nunits, real mode format if any, etc.
That table is then streamed in in the offloading compiler and it attempts to
find corresponding modes (and emits fatal_error if there is no such mode;
consider say x86_64 long double with XFmode being used in offloading code
which doesn't have XFmode support).
Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int rather
than int, this has been changed to use bp_pack_poly_value; but that relies
on the same number of coefficients for poly_int, which is not the case when
e.g. offloading aarch64 to gcn or nvptx.

>From what I can see, this mode table handling are the only uses of
bp_pack_poly_value.  So the options are either to stream at the start of the
mode table the NUM_POLY_INT_COEFFS value and in bp_unpack_poly_value pass to
it what we've read and fill in any remaining coeffs with zeros, or in each
bp_pack_poly_value stream the number of coefficients and then stream that
back in and fill in remaining ones (and diagnose if it would try to read
non-zero coefficient which isn't stored).
I think streaming NUM_POLY_INT_COEFFS once would be more compact (at least
for non-aarch64/riscv targets).

        Jakub

Reply via email to