> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Tuesday, August 13, 2024 12:52 PM > To: Andrew Pinski <pins...@gmail.com> > Cc: Prathamesh Kulkarni <prathame...@nvidia.com>; gcc- > patc...@gcc.gnu.org; Thomas Schwinge <tschwi...@baylibre.com> > Subject: Re: [optc-save-gen.awk] Fix streaming of command line options > for offloading > > External email: Use caution opening links or attachments > > > > Am 13.08.2024 um 08:37 schrieb Andrew Pinski <pins...@gmail.com>: > > > > On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni > > <prathame...@nvidia.com> wrote: > >> > >> Hi, > >> As mentioned in: > >> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html > >> > >> AArch64 cl_optimization_stream_out streams out target-specific > >> optimization options like flag_aarch64_early_ldp_fusion, > aarch64_early_ra etc, which breaks AArch64/nvptx offloading, since > nvptx cl_optimization_stream_in doesn't have corresponding stream-in > for these options and ends up setting invalid values for ptr- > >explicit_mask (and subsequent data structures). > >> > >> This makes even a trivial test like the following to cause ICE in > lto_read_decls with -O3 -fopenmp -foffload=nvptx-none: > >> > >> int main() > >> { > >> int x; > >> #pragma omp target map(x) > >> x; > >> } > >> > >> The attached patch modifies optc-save-gen.awk to generate if > >> (!lto_stream_offload_p) check before streaming out target-specific > opt in cl_optimization_stream_out, which fixes the issue. > cl_optimization_stream_out after patch (last few entries): > >> > >> bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer); > >> bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p); if > >> (!lto_stream_offload_p) bp_pack_var_len_int (bp, > >> ptr->x_flag_aarch64_early_ldp_fusion); > >> if (!lto_stream_offload_p) > >> bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra); if > >> (!lto_stream_offload_p) bp_pack_var_len_int (bp, > >> ptr->x_flag_aarch64_late_ldp_fusion); > >> if (!lto_stream_offload_p) > >> bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div); if > >> (!lto_stream_offload_p) bp_pack_var_len_int (bp, > >> ptr->x_flag_mrecip_low_precision_sqrt); > >> if (!lto_stream_offload_p) > >> bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt); for > >> (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++) > >> bp_pack_value (bp, ptr->explicit_mask[i], 64); > >> > >> For target-specific options, streaming out is gated on > !lto_stream_offload_p check. > >> > >> The patch also fixes failures due to same issue with x86_64->nvptx > offloading for target-print-1.f90 (and couple more). > >> Does the patch look OK ? > > > > I think it seems to be on the right track. One thing that is also > > going to be an issue is streaming in, there could be a target option > > on the offload side that is marked as Optimization that would might > > also cause issues. We should check to make sure that also gets fixed > > here too. Or error out for offloading targets can't have target > > options with Optimization on them during the build. Thanks for the suggestions. The attached patch modifies optc-save-gen.awk to emit an error if accel backend marks target specific option with Optimization. AFAIU, currently neither nvptx nor gcn have target-specific options marked with Optimization, so this is mostly a safeguard against future additions.
cl_optimization_stream_in after patch for target-specifc optimization options: #ifdef ACCEL_COMPILER #error accel compiler cannot define Optimization attribute for target-specific option x_flag_aarch64_early_ldp_fusion #else ptr->x_flag_aarch64_early_ldp_fusion = (signed char ) bp_unpack_var_len_int (bp); #endif To test if this works, I added -mfoo to nvptx.opt and marked it with both Target and Optimization, which resulted in the following build error for nvptx: options-save.cc:13548:2: error: #error accel compiler cannot define Optimization attribute for target-specifc option x_flag_nvptx_foo 13548 | #error accel compiler cannot define Optimization attribute for target-specific option x_flag_nvptx_foo | ^~~~~ > > It may have been misguided to mark target specific flags as > Optimization. It might be required to merge those (from all targets) > into the common optimize enum, like we do for tree codes. Language > specific options marked as Optimization possibly have the same issue > when mixing with other languages and LTO. Can you assess the > situation a bit more? AFAIK, only c-family/c.opt marks few options with Optimization flag. I tried marking fortran's -ffrontend-optimize with Optimization and verified that Optimization options are combined for c-family languages and fortran in cl_optimization_stream_{out,in}. cl_optimization_stream_out shows: ... bp_pack_var_len_int (bp, ptr->x_flag_frontend_optimize); ... bp_pack_var_len_int (bp, ptr->x_flag_nothrow_opt); and likewise has corresponding entries for cl_optimization_stream_in. So I guess this shouldn't be an issue with lang specific Optimization opts ? Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> Thanks, Prathamesh > > I think the proposed fix looks reasonable but the problem might be > more widespread and warrant a more global solution or at least > revisiting documentation? > > Thanks, > Richard > > > Thanks, > > Andrew Pinski > > > >> > >> Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> > >> > >> Thanks, > >> Prathamesh
[optc-save-gen.awk] Fix streaming of command line options for offloading. The patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p) check before streaming out target-specific opt in cl_optimization_stream_out, when offloading is enabled. gcc/ChangeLog: * gcc/optc-save-gen.awk: New array var_target_opt. Use it to generate if (!lto_stream_offload_p) check in cl_optimization_stream_out, and generate a diagnostic with #error if accelerator backend uses Optimization for target-specifc options in cl_optimization_stream_in. Signed-off-by: Prathamesh Kulkarni <prathame...@nvidia.com> diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk index a3af88e3776..b1289c281e7 100644 --- a/gcc/optc-save-gen.awk +++ b/gcc/optc-save-gen.awk @@ -1307,6 +1307,11 @@ for (i = 0; i < n_opts; i++) { var_opt_optimize_init[n_opt_val] = init; } + # Mark options that are annotated with both Optimization and + # Target so we can avoid streaming out target-specific opts when + # offloading is enabled. + if (flag_set_p("Target", flags[i])) + var_target_opt[n_opt_val] = 1; n_opt_val++; } } @@ -1384,6 +1389,10 @@ for (i = 0; i < n_opt_val; i++) { } else { sgn = "int"; } + # Do not stream out target-specific opts if offloading is + # enabled. + if (var_target_opt[i]) + print " if (!lto_stream_offload_p)" # If applicable, encode the streamed value. if (var_opt_optimize_init[i]) { print " if (" var_opt_optimize_init[i] " > (" var_opt_val_type[i] ") 10)"; @@ -1408,6 +1417,11 @@ print " struct cl_optimization *ptr ATTRIBUTE_UNUSED)" print "{"; for (i = 0; i < n_opt_val; i++) { name = var_opt_val[i] + if (var_target_opt[i]) { + print "#ifdef ACCEL_COMPILER" + print "#error accel compiler cannot define Optimization attribute for target-specific option " name; + print "#else" + } otype = var_opt_val_type[i]; if (otype ~ "^const char \\**$") { print " ptr->" name" = bp_unpack_string (data_in, bp);"; @@ -1427,6 +1441,8 @@ for (i = 0; i < n_opt_val; i++) { print " ptr->" name" ^= " var_opt_optimize_init[i] ";"; } } + if (var_target_opt[i]) + print "#endif" } print " for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)"; print " ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";