[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-08 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #26 from rguenther at suse dot de  ---
On Mon, 8 Apr 2024, douglas.boffey at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480
> 
> --- Comment #25 from Douglas Boffey  ---
> (In reply to rguent...@suse.de from comment #24)
> > dumpbin /headers executable.exe
> 
> ...
>   C0 size of stack reserve

OK, so that's the expected 12MB we adjust the stack reserve to.  It's
odd that you run into stack exhaustion (or maybe you don't and instead
run out of other memory).

I've now tried GCC 11.4 as you reportedly used on x86_64-linux and
can compile the testcase successfully with that using 2GB memory
and 75s compile-time.  Stack usage itself isn't measured but 8MB
were enough.

GCC 13 runs out of (8MB) stack for me.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-08 Thread douglas.boffey at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #25 from Douglas Boffey  ---
(In reply to rguent...@suse.de from comment #24)
> dumpbin /headers executable.exe

...
  C0 size of stack reserve
1000 size of stack commit
...

Hope this helps.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-08 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #24 from rguenther at suse dot de  ---
On Mon, 8 Apr 2024, douglas.boffey at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480
> 
> --- Comment #23 from Douglas Boffey  ---
> (In reply to Richard Biener from comment #22)
> > Note we're using -Wl,--stack,12582912 when linking the GCC executables, I
> > wonder
> > if the reporter can verify the segfaulting executables have the correct 
> > stack
> > size set?
> 
> How do I find that out?

googling tells me

dumpbin /headers executable.exe

which should show a "size of stack reserve".  Alternatively if you
have objdump from binutils that might also show this info.  Basically
it's encoded in the binary (you want to check cc1plus.exe here).

There's also

editbin /stack:N executable.exe

where you can alter the stack reserve of the binary to N

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-08 Thread douglas.boffey at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #23 from Douglas Boffey  ---
(In reply to Richard Biener from comment #22)
> Note we're using -Wl,--stack,12582912 when linking the GCC executables, I
> wonder
> if the reporter can verify the segfaulting executables have the correct stack
> size set?

How do I find that out?

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #22 from Richard Biener  ---
Note we're using -Wl,--stack,12582912 when linking the GCC executables, I
wonder
if the reporter can verify the segfaulting executables have the correct stack
size set?

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #21 from Alexander Monakov  ---
It is possible to reduce gcc_qsort workload by improving the presorted-ness of
the array, but of course avoiding quadratic behavior would be much better.

With the following change, we go from

   261,250,628,954  cycles:u
   533,040,964,437  instructions:u#2.04  insn per cycle
   114,415,857,214  branches:u
   395,327,966  branch-misses:u   #0.35% of all branches

to

   256,620,517,403  cycles:u
   526,337,243,809  instructions:u#2.05  insn per cycle
   113,447,583,099  branches:u
   383,121,251  branch-misses:u   #0.34% of all branches

diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index d12a4a97f6..621793f7f4 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -805,21 +805,22 @@ prune_unused_phi_nodes (bitmap phis, bitmap kills, bitmap
uses)
  locate the nearest dominating def in logarithmic time by binary search.*/
   bitmap_ior (to_remove, kills, phis);
   n_defs = bitmap_count_bits (to_remove);
+  adef = 2 * n_defs + 1;
   defs = XNEWVEC (struct dom_dfsnum, 2 * n_defs + 1);
   defs[0].bb_index = 1;
   defs[0].dfs_num = 0;
-  adef = 1;
+  struct dom_dfsnum *head = defs + 1, *tail = defs + adef;
   EXECUTE_IF_SET_IN_BITMAP (to_remove, 0, i, bi)
 {
   def_bb = BASIC_BLOCK_FOR_FN (cfun, i);
-  defs[adef].bb_index = i;
-  defs[adef].dfs_num = bb_dom_dfs_in (CDI_DOMINATORS, def_bb);
-  defs[adef + 1].bb_index = i;
-  defs[adef + 1].dfs_num = bb_dom_dfs_out (CDI_DOMINATORS, def_bb);
-  adef += 2;
+  head->bb_index = i;
+  head->dfs_num = bb_dom_dfs_in (CDI_DOMINATORS, def_bb);
+  head++, tail--;
+  tail->bb_index = i;
+  tail->dfs_num = bb_dom_dfs_out (CDI_DOMINATORS, def_bb);
 }
+  gcc_assert (head == tail);
   BITMAP_FREE (to_remove);
-  gcc_assert (adef == 2 * n_defs + 1);
   qsort (defs, adef, sizeof (struct dom_dfsnum), cmp_dfsnum);
   gcc_assert (defs[0].bb_index == 1);

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #20 from Alexander Monakov  ---
(note that if you uninclude the testcase and compile with -fno-exceptions it's
much faster)

On the smaller testcase from comment 14, prune_unused_phi_nodes invokes
gcc_qsort 53386 times. There are two distinct phases.

In the first phase, the count of struct dom_dfsnum to sort grows in a roughly
linear fashion up to 23437 on the 12294'th invocation. Hence this first phase
is quadratic in the overall number of processed dom_dfsnum-s.

In the second phase, it almost always sorts exactly 7 elements for the
remaining ~41000 invocations.

The number of pairwise comparisons performed by gcc_qsort is approximately
(n+1)*(log_2(n)-1), which results in 1.8e9 comparisons overall for the 53386
invocations. perf shows 10.9e9 cycles spent under gcc_qsort, i.e. 6 cycles per
comparison, which looks about right. It's possible to reduce that further by
switching from classic to bidirectional merge, and using cmovs instead of
bitwise arithmetic for branchless selects.

> I'll note the swapping of 8 bytes is a bit odd and it seems to be
> if-converted, thus always doing a write.

That is not a swap. That's the merge step of a mergesort, we are taking the
smaller element from the heads of two arrays and moving it to the tail of a
third array.

Basically there's quadratic behavior in tree-into-ssa, gcc_qsort shows
relatively higher on the profile because the log_2(n) factor becomes
noticeable.

Hope that helps!

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Richard Biener  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #19 from Richard Biener  ---
Alexander - the testcase at -O1 shows curiously high

   3.16%  9840  cc1plus  cc1plus [.] mergesort

which is attributed (by callgrind) to

  if (sizeof (size_t) == 8 && LIKELY (c->size == 8))
--> MERGE_ELTSIZE (8);

and the caller in tree-into-ssa.cc:prune_unused_phi_nodes doing

  qsort (defs, adef, sizeof (struct dom_dfsnum), cmp_dfsnum);

I'm not sure why callgrind pins it this way, but perf somewhat agrees:

Samples││MERGE_ELTSIZE (8);   
▒
 1 │2d0:│  mov%r9,%rsi
▒
 8 ││  mov%r9,0x8(%rsp)   
▒
   528 ││  mov%r12,%rdi   
▒
31 ││→ call   *0x0(%r13)  
▒
   236 ││  mov0x8(%rsp),%r9   
▒
 2 ││  sar$0x1f,%eax  
▒
   244 ││  mov%r12,%rcx   
▒
   ││  movslq %eax,%rdx   
▒
   531 ││  and$0x8,%eax   
▒
62 ││  add$0x8,%rbx   
▒
   ││  cltq   
◆
   725 ││  xor%r9,%rcx
▒
   914 ││  add%rax,%r12   
▒
 1 ││  and%rdx,%rcx   
▒
   ││  xor%r9,%rcx
▒
 3 ││  mov(%rcx),%rcx 
▒
  2155 ││  mov%rcx,-0x8(%rbx) 
▒
29 ││  cmp%r12,%rbx   
▒
   │└──je 1d7

I'll note the swapping of 8 bytes is a bit odd and it seems to be
if-converted, thus always doing a write.

I'm of course questioning what prune_unused_phi_nodes does but I have no
idea if that's sensible at all, but it seems slow for this testcase, and
the sorting is the slowest part of it.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #18 from Richard Biener  ---
Btw, clang is quite quick with -O0 (8s, 1GB ram) but with -O1 uses 18GB ram and
8 minutes compile-time.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #17 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:e7b7188b1cf8c174f0e890d4ac279ff480b51043

commit r14-9767-ge7b7188b1cf8c174f0e890d4ac279ff480b51043
Author: Richard Biener 
Date:   Tue Apr 2 12:31:04 2024 +0200

tree-optimization/114557 - reduce ehcleanup peak memory use

The following reduces peak memory use for the PR114480 testcase at -O1
which is almost exclusively spent by the ehcleanup pass in allocating
PHI nodes.  The free_phinodes cache we maintain isn't very effective
since it has effectively two slots, one for 4 and one for 9 argument
PHIs and it is only ever used for allocations up to 9 arguments but
we put all larger PHIs in the 9 argument bucket.  This proves
uneffective resulting in much garbage to be kept when incrementally
growing PHI nodes by edge redirection.

The mitigation is to rely on the GC freelist for larger sizes and
thus immediately return all larger bucket sized PHIs to it via ggc_free.

This reduces the peak memory use from 19.8GB to 11.3GB and compile-time
from 359s to 168s.

PR tree-optimization/114557
PR tree-optimization/114480
* tree-phinodes.cc (release_phi_node): Return PHIs from
allocation buckets not covered by free_phinodes to GC.
(remove_phi_node): Release the PHI LHS before freeing the
PHI node.
* tree-vect-loop.cc (vectorizable_live_operation): Get PHI lhs
before releasing it.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-04-02 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #16 from Richard Biener  ---
Created attachment 57849
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57849=edit
patch for expand

Interestingly this patch for RTL expansion, more specifically
add_scope_conflicts, only slows down things even though it should improve
cache locality.  In particular doing the 2nd stage inbetween makes things
worse.

Maybe this is due to the high in-degree of some blocks in the CFG.

I'll note there's no "change" in the 2nd iteration for SCCs, so optimizing
to only IOR in from (changed) backedges might "work" to reduce work.

Still it's odd the apparent locality improvement doesn't actually help
(as said it might be skewed by a high indegree of an SCC).

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #15 from Richard Biener  ---
(In reply to Richard Biener from comment #14)
> Created attachment 57829 [details]
> smaller testcase
> 
> Smaller testcase, shows the same compile-time issue at -O0.  At -O1 it's a
> lot
> less bad but memory usage is better (8GB), so the slowness of the full
> testcase
> is likely memory bandwidth related.
> 
> -O1 is then
> 
>  tree PTA   :  20.59 ( 21%)
>  expand vars:   9.19 (  9%)
>  expand :  14.26 ( 15%)

The memory use goes into RTXen created during RTL expansion.  The compile-time
part is add_scope_conflicts.  There's the possibility to do like
var-tracking and use rev_post_order_and_mark_dfs_back_seme, avoiding iteration
for non-loops and have better cache locality.

We have half of the profile hits on ggc_internal_alloc and it's

17 | d8:+- mov%r14,%rax
 #
   ||  mov(%r14),%r14  
 #
  1440 ||  test   %r14,%r14
 #
 4 ||  je 530  
 #
   ||if (p->bytes == entry_size)   
 #
   | e7:|  cmp0x10(%r14),%r12  
 #
 65582 |+--jned8   

which is the linear walk

  /* Check the list of free pages for one we can use.  */
  for (pp = _pages, p = *pp; p; pp = >next, p = *pp) 
if (p->bytes == entry_size)
  break;

so we seem to have many free pages for some reason but the free pages
pool is global and not per order?!

Samples: 299K of event 'cycles', Event count (approx.): 338413178083
Overhead   Samples  Command  Shared Object   Symbol 
  23.16% 67756  cc1plus  cc1plus [.] ggc_internal_alloc
   6.98% 21637  cc1plus  cc1plus [.] bitmap_tree_splay
   6.89% 20413  cc1plus  cc1plus [.] bitmap_ior_into
   4.05% 11989  cc1plus  cc1plus [.] bitmap_elt_ior
   3.16%  9840  cc1plus  cc1plus [.] mergesort
   2.90%  8860  cc1plus  cc1plus [.] bitmap_set_bit
   2.76%  8281  cc1plus  cc1plus [.]
get_ref_base_and_extent
   1.37%  4071  cc1plus  cc1plus [.]
stmt_may_clobber_ref_p_1
   1.32%  4095  cc1plus  cc1plus [.] dominated_by_p
   1.16%  3597  cc1plus  cc1plus [.]
bitmap_tree_unlink_element
   1.06%  3128  cc1plus  cc1plus [.] walk_aliased_vdefs_1

the bitmap_tree_splay is from compute_idf, refactoring that some more,
also avoiding the duplicate processing and doing away with the bitmap
for the workset might help a bit there (not using tree view just gets
set-bit up with no overall positive change).

I will look into the above things more (but not the RA slowness at -O0).

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #14 from Richard Biener  ---
Created attachment 57829
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57829=edit
smaller testcase

Smaller testcase, shows the same compile-time issue at -O0.  At -O1 it's a lot
less bad but memory usage is better (8GB), so the slowness of the full testcase
is likely memory bandwidth related.

-O1 is then

 tree PTA   :  20.59 ( 21%)
 expand vars:   9.19 (  9%)
 expand :  14.26 ( 15%)

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #13 from Richard Biener  ---
So this remains an interesting testcase for register allocation at -O0 where
we now have on trunk with release checking:

 integrated RA  : 220.01 ( 60%) 
 LRA create live ranges :  91.86 ( 25%)
 TOTAL  : 367.27

With -O1 it's now

 tree PTA   : 212.64 ( 21%)
 expand vars: 182.37 ( 18%)
 expand : 212.10 ( 21%)
 TOTAL  : 990.88

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-28 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:0bad303944a1d2311c07d59912b4dfa7bff988c8

commit r14-9701-g0bad303944a1d2311c07d59912b4dfa7bff988c8
Author: Richard Biener 
Date:   Wed Mar 27 16:19:01 2024 +0100

middle-end/114480 - IDF compute is slow

The testcase in this PR shows very slow IDF compute:

  tree SSA rewrite   :  76.99 ( 31%)
  24.78%243663  cc1plus  cc1plus [.] compute_idf

which can be mitigated to some extent by refactoring the bitmap
operations to simpler variants.  With the patch below this becomes

  tree SSA rewrite   :  15.23 (  8%)

when not optimizing and in addition to that

  tree SSA incremental   : 181.52 ( 30%)

to

  tree SSA incremental   :  24.09 (  6%)

when optimizing.

PR middle-end/114480
* cfganal.cc (compute_idf): Use simpler bitmap iteration,
touch work_set only when phi_insertion_points changed.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-27 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #11 from Vladimir Makarov  ---
My finding is that RA is not a problem for GCC speed with -O1 and up.

RA in -O0 does really consume a big portion of GCC compiler time.  The
biggest part of RA in -O0 is actually spent in life analysis.  It is
difficult to implement a modest RA w/o life analysis as it will
results in huge stack slot generation (not knowing pseudo lives
basically means allocating stack slot for each pseudo).

The problem with the test is a huge number of pseudos (or IRA
objects).  This results in a big sparse set (which can be hardly
placed in L3 cache) and bad cache behaviour.

I tried to use a bitmap instead of sparse set, but GCC crashed after
allocating 48GB memory.  Sbitmap works better and improves IRA time by
12%.  But it works worse for other more frequently use cases.

So I don't think that RA behaviour can be improved for this case.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
  Known to fail||14.0
   Keywords||ra
 Ever confirmed|0   |1
 CC||vmakarov at gcc dot gnu.org
   Last reconfirmed|2024-03-26 00:00:00 |2024-03-27

--- Comment #10 from Richard Biener  ---
I see on x86_64-linux w/ release checking

 tree SSA rewrite   :  76.99 ( 31%)   0.09 (  5%)  77.11 ( 31%)
   96M (  9%)
 integrated RA  :  92.31 ( 37%)   0.15 (  8%)  92.49 ( 37%)
  105M ( 10%)
 LRA create live ranges :  54.01 ( 22%)   0.00 (  0%)  54.02 ( 22%)
  885k (  0%)
 TOTAL  : 246.34  1.88248.43   
 1039M
246.34user 2.02system 4:08.92elapsed 99%CPU (0avgtext+0avgdata
3287072maxresident)k
70416inputs+0outputs (110major+1229628minor)pagefaults 0swaps

tree SSA rewrite is interesting, probably bitmap slowness and cache dependent.

With -O1:

 tree PTA   :  85.65 ( 14%)   0.21 (  3%)  85.89 ( 14%)
  348M (  2%)
 tree SSA rewrite   :  76.05 ( 13%)   0.10 (  1%)  76.14 ( 12%)
   96M (  1%)
 tree SSA incremental   : 181.52 ( 30%)   0.03 (  0%) 181.50 ( 30%)
10031k (  0%)
 expand vars:  66.72 ( 11%)   0.00 (  0%)  66.74 ( 11%)
 6132k (  0%)
 expand :  64.33 ( 11%)   0.02 (  0%)  64.39 ( 11%)
  172M (  1%)
 TOTAL  : 603.55  7.72611.61   
19327M
603.55user 7.83system 10:11.78elapsed 99%CPU (0avgtext+0avgdata
19809792maxresident)k
21520inputs+0outputs (48major+5102514minor)pagefaults 0swaps

definitely "interesting" testcase.

The profile for -O0 shows IDF compute (that's SSA rewrite, a usual suspect)
and other bits that might be interesting for the RA part.

Samples: 1M of event 'cycles:u', Event count (approx.): 1332096582355   
Overhead   Samples  Command  Shared Object   Symbol 
  24.78%243663  cc1plus  cc1plus [.] compute_idf
  11.29%115134  cc1plus  cc1plus [.] make_hard_regno_dead
  10.29%104126  cc1plus  cc1plus [.] process_bb_node_lives
   5.29% 53680  cc1plus  cc1plus [.] mark_pseudo_regno_live
   4.95% 50051  cc1plus  cc1plus [.] mark_ref_dead
   3.95% 40075  cc1plus  cc1plus [.]
update_allocno_pressure
   2.73% 27977  cc1plus  cc1plus [.]
lra_create_live_ranges_
   2.48% 25136  cc1plus  cc1plus [.] inc_register_pressure
   2.37% 24268  cc1plus  cc1plus [.] update_pseudo_point
   2.23% 21976  cc1plus  cc1plus [.] mergesort
   2.19% 22208  cc1plus  cc1plus [.] make_object_dead
   2.09% 21316  cc1plus  cc1plus [.] sparseset_clear_bit
   1.99% 20181  cc1plus  cc1plus [.] bitmap_set_bit

I'll note this was all tested on trunk, GCC 11 might behave even worse and
quite some deep recursion issues have been fixed in newer releases.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #9 from Jonathan Wakely  ---
It compiles OK with GCC 11.4.0 on x86_64-pc-linux-gnu, it just takes a very
long time. I think you probably just ran out of memory or stack space.

-ftime-report shows:

Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 1562k (  0%)
 phase parsing  :   1.85 (  1%)   0.84 ( 44%)   2.69 (  1%)
  231M ( 20%)
 phase lang. deferred   :   0.44 (  0%)   0.09 (  5%)   0.54 (  0%)
   46M (  4%)
 phase opt and generate : 238.73 ( 99%)   0.96 ( 51%) 240.34 ( 99%)
  852M ( 75%)
 |name lookup   :   0.24 (  0%)   0.07 (  4%)   0.24 (  0%)
 2189k (  0%)
 |overload resolution   :   0.97 (  0%)   0.20 ( 11%)   1.25 (  1%)
  151M ( 13%)
 garbage collection :   0.33 (  0%)   0.01 (  1%)   0.34 (  0%)
0  (  0%)
 dump files :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
0  (  0%)
 callgraph construction :   0.20 (  0%)   0.01 (  1%)   0.20 (  0%)
   45M (  4%)
 callgraph optimization :   0.07 (  0%)   0.00 (  0%)   0.07 (  0%)
0  (  0%)
 callgraph ipa passes   :   1.50 (  1%)   0.36 ( 19%)   1.87 (  1%)
  103M (  9%)
 ipa dead code removal  :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
0  (  0%)
 ipa inlining heuristics:   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)
0  (  0%)
 cfg construction   :   0.03 (  0%)   0.00 (  0%)   0.03 (  0%)
 3959k (  0%)
 cfg cleanup:   0.17 (  0%)   0.00 (  0%)   0.17 (  0%)
  928  (  0%)
 trivially dead code:   0.16 (  0%)   0.00 (  0%)   0.17 (  0%)
0  (  0%)
 df scan insns  :   0.57 (  0%)   0.15 (  8%)   0.73 (  0%)
   14k (  0%)
 df live regs   :   0.41 (  0%)   0.04 (  2%)   0.47 (  0%)
0  (  0%)
 df reg dead/unused notes   :   0.50 (  0%)   0.00 (  0%)   0.50 (  0%)
   31M (  3%)
 register information   :   1.55 (  1%)   0.00 (  0%)   1.55 (  1%)
0  (  0%)
 alias analysis :   0.08 (  0%)   0.01 (  1%)   0.09 (  0%)
   10M (  1%)
 alias stmt walking :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)
   36k (  0%)
 rebuild jump labels:   0.11 (  0%)   0.00 (  0%)   0.11 (  0%)
0  (  0%)
 preprocessing  :   0.23 (  0%)   0.20 ( 11%)   0.40 (  0%)
 1219k (  0%)
 parser (global):   1.42 (  1%)   0.52 ( 28%)   1.91 (  1%)
  213M ( 19%)
 parser struct body :   0.06 (  0%)   0.02 (  1%)   0.08 (  0%)
 8434k (  1%)
 parser enumerator list :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
   51k (  0%)
 parser function body   :   0.07 (  0%)   0.04 (  2%)   0.06 (  0%)
 3008k (  0%)
 parser inl. func. body :   0.03 (  0%)   0.02 (  1%)   0.02 (  0%)
 2894k (  0%)
 parser inl. meth. body :   0.06 (  0%)   0.01 (  1%)   0.04 (  0%)
 4000k (  0%)
 template instantiation :   0.17 (  0%)   0.07 (  4%)   0.28 (  0%)
   21M (  2%)
 constant expression evaluation :   0.25 (  0%)   0.04 (  2%)   0.39 (  0%)
   17M (  2%)
 inline parameters  :   0.16 (  0%)   0.00 (  0%)   0.17 (  0%)
 8247k (  1%)
 tree gimplify  :   0.95 (  0%)   0.03 (  2%)   0.99 (  0%)
   83M (  7%)
 tree eh:   8.63 (  4%)   0.04 (  2%)   8.67 (  4%)
   86M (  8%)
 tree CFG construction  :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)
   31M (  3%)
 tree CFG cleanup   :   0.14 (  0%)   0.00 (  0%)   0.14 (  0%)
  720  (  0%)
 tree PHI insertion :   0.50 (  0%)   0.00 (  0%)   0.50 (  0%)
 7901k (  1%)
 tree SSA rewrite   :   0.10 (  0%)   0.00 (  0%)   0.11 (  0%)
   40M (  4%)
 tree SSA other :   0.21 (  0%)   0.15 (  8%)   0.31 (  0%)
   45k (  0%)
 tree SSA incremental   :   0.08 (  0%)   0.00 (  0%)   0.07 (  0%)
0  (  0%)
 tree operand scan  :   0.32 (  0%)   0.19 ( 10%)   0.55 (  0%)
   18M (  2%)
 dominance frontiers:   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
0  (  0%)
 dominance computation  :   0.14 (  0%)   0.00 (  0%)   0.16 (  0%)
0  (  0%)
 out of ssa :   0.06 (  0%)   0.00 (  0%)   0.08 (  0%)
 1057k (  0%)
 expand vars:   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)
   10M (  1%)
 expand :   0.64 (  0%)   0.08 (  4%)   0.73 (  0%)
  226M ( 20%)
 post expand cleanups   :   0.11 (  0%)   0.00 (  0%)   0.10 (  0%)
   16M (  1%)
 varconst   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 9184  (  0%)
 loop init 

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread douglas.boffey at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #8 from Douglas Boffey  ---
(In reply to Andrew Pinski from comment #6)
> Just to check what options are you using passing to gcc?

Using the default options:
  g++ -o test-poly a-test-poly.ii

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #7 from Andrew Pinski  ---
the code does compile for x86_64-linux-gnu on the trunk (though very slowly).

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #6 from Andrew Pinski  ---
Just to check what options are you using passing to gcc?

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #5 from Andrew Pinski  ---
remove_unreachable_eh_regions_worker has a deep recusive which could cause
issues on host with limited stack space.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|1   |0
 Status|WAITING |UNCONFIRMED
   Keywords||compile-time-hog,
   ||memory-hog

--- Comment #4 from Andrew Pinski  ---
Hmm, this file contains a huge initialization like:
{{0, 1, 2, 3, 4}, {0, 24}, {fpToT, f, fpHoT, foT}},
repeated in a similar pattern a lot ...

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread douglas.boffey at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Douglas Boffey  changed:

   What|Removed |Added

 CC||douglas.boffey at gmail dot com

--- Comment #3 from Douglas Boffey  ---
Created attachment 57815
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57815=edit
Zipped preprocessed file

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2024-03-26
 Ever confirmed|0   |1
 Status|UNCONFIRMED |WAITING

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #2 from Andrew Pinski  ---
(In reply to Douglas Boffey from comment #1)
> Unable to add attachment.

try compressing it first.

[Bug c++/114480] g++: internal compiler error: Segmentation fault signal terminated program cc1plus

2024-03-26 Thread douglas.boffey at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480

--- Comment #1 from Douglas Boffey  ---
Unable to add attachment.