[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading

2020-03-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #7 from Richard Biener  ---
More-or-less a latent issue.

[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading

2020-01-14 Thread tschwinge at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029

Thomas Schwinge  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-01-14
 Ever confirmed|0   |1

--- Comment #6 from Thomas Schwinge  ---
(I can't look into this in more detail right now, but wanted to dump the
following information in case that's useful in any way.)

To restore PASSing of this test case, until recently it was sufficient to
revert r273435 "Support folding from array ctor spanning multiple elements" and
(before reverting that one) r274114 "Fix tree-optimization/91169" (PR91169) as
a prerequisite, but as of recently, r280141 "Optimize reads from multiple elts
in fold_ctor_reference (PR tree-optimization/93210)" has to be reverted too, or
the same ICE shows up again.

[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading

2019-10-09 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029

--- Comment #5 from rguenther at suse dot de  ---
On Wed, 9 Oct 2019, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029
> 
> --- Comment #4 from Jakub Jelinek  ---
> Just a general comment, what we need to ensure is that the tables contain the
> same variables in the same order between what we emit on the host and in LTO
> for the offloading targets.  It is fine if we are able to remove some variable
> as unused before the LTO streaming, as long as we manage to update whatever
> needs to be to ensure the tables are the same.  After the streaming, we can't
> remove anything mentioned in the tables.

OK, so ideally we'd compute the table contents at the time we stream
then, using IPA-REF info in case that is readily available.

[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading

2019-10-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029

--- Comment #4 from Jakub Jelinek  ---
Just a general comment, what we need to ensure is that the tables contain the
same variables in the same order between what we emit on the host and in LTO
for the offloading targets.  It is fine if we are able to remove some variable
as unused before the LTO streaming, as long as we manage to update whatever
needs to be to ensure the tables are the same.  After the streaming, we can't
remove anything mentioned in the tables.

[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading

2019-10-09 Thread burnus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029

--- Comment #3 from Tobias Burnus  ---
(In reply to Richard Biener from comment #2)
> another fix would be to _not_ stream it into the offload section in
> output_offload_tables if the variable was removed:
> 
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
…
> +  if (varpool_node::get ((*offload_vars)[i]))

At least that's not sufficient as it then fails at:

#0  0x0063b6e8 in vec::operator[]
(this=0x0, ix=0) at gcc-mainline/gcc/vec.h:859
859   gcc_checking_assert (ix < m_vecpfx.m_num);

Note the "this=0x0"; ix is 0.

Called via:
lto-streamer.h:1242| DEFINE_DECL_STREAM_FUNCS (VAR_DECL, var_decl)
lto-streamer.h:124 | if (varpool_node::get ((*offload_vars)[i]))
lto-cgraph.c:1781  |  lto_file_decl_data_get_var_decl (file_data, decl_index);
lto-common.c:2797  | input_offload_tables (!flag_ltrans);

lto_file_decl_data_get_var_decl uses:
  struct lto_in_decl_state *state = data->current_decl_state; \
   return (*state->streams[LTO_DECL_STREAM_## UPPER_NAME])[idx]; \
and we have "idx = 0" and:
 *file_data->current_decl_state
$8 = {streams = {0x0, 0x0, 0x773eb500, 0x0, 0x0, 0x0, 0x0}, fn_decl = 0x0,
compressed = true}

[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading

2019-10-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0

--- Comment #2 from Richard Biener  ---
It means the variable is gone.  I guess it's too late to mark the variable as
force_output where it is currently done.  Maybe do it in
varpool_node::get_create
where we also push it to the offload_vars vector?  I'm not sure if at that
point the variable is used in the IL but only "meta-wise" thus another
fix would be to _not_ stream it into the offload section in
output_offload_tables
if the variable was removed:

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 147975ba869..481b4a5b45c 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1076,8 +1076,9 @@ output_offload_tables (void)
 {
   streamer_write_enum (ob->main_stream, LTO_symtab_tags,
   LTO_symtab_last_tag, LTO_symtab_variable);
-  lto_output_var_decl_index (ob->decl_state, ob->main_stream,
-(*offload_vars)[i]);
+  if (varpool_node::get ((*offload_vars)[i]))
+   lto_output_var_decl_index (ob->decl_state, ob->main_stream,
+  (*offload_vars)[i]);
 }

   streamer_write_uhwi_stream (ob->main_stream, 0);

I'm also curious why we need to populate offload_vars during varpool node
creation time rather than when streaming the offload portion.  IPA
references should allow to identify those that need streaming?

But that's a larger re-org.

That said, the change just exposed a latent issue - a variable being optimized
away between putting it into offload_vars and streaming out.

[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading

2019-10-09 Thread burnus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029

--- Comment #1 from Tobias Burnus  ---
In particular, it happens for in nvptx-none/lto1 for:

 > initial
>


In symtab_node::get () – as called in "varpool_node::get
(var_decl)->force_output = 1;" - one has:
437 return decl->decl_with_vis.symtab_node;
and this returns NULL.

The core code Fortran code is (with: "integer :: v(4), i"):
  !$omp target map(from:v)
v(:) = (/ (i, i=1,4) /)
  !$omp end target

omplower dump has:

.omp_data_arr.6.v = 
#pragma omp target num_teams(1) thread_limit(0) map(from:v [len: 16])
[child fn: MAIN__._omp_fn.0 (.omp_data_arr.6, .omp_data_sizes.7,
.omp_data_kinds.8)]
  {
.omp_data_i = (const struct .omp_data_t.5 & restrict)
&.omp_data_arr.6;
{
  static integer(kind=4) A.0[4] = {1, 2, 3, 4};

  D.3933 = .omp_data_i->v;
  D.3934 = D.3933;
  D.3935 = MEM  [(c_char * {ref-all})];
  MEM  [(c_char * {ref-all})D.3934] = D.3935;
}
#pragma omp return
  }
.omp_data_arr.6 = {CLOBBER};


In the optimized dump for MAIN__._omp_fn.0, one has – with -O0:
integer(kind=4) A.0[4];
…
_3 = .omp_data_i_2(D)->v;
_4 = _3;
_5 = MEM  [(c_char * {ref-all})];
MEM  [(c_char * {ref-all})_4] = _5;

And with -Og (which ICEs):
   _3 = .omp_data_i_2(D)->v;
   MEM  [(c_char * {ref-all})_3] = 0x4000300020001;