[Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression

hubicka at gcc dot gnu.org Fri, 20 Mar 2015 22:33:07 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076


--- Comment #10 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I can re-confirm the 16% compile time regression.  I went through some compare.

$ wc -l *.ssa
299231 tramp3d-v4.ii.015t.ssa
$ wc -l ../5/*.ssa
331115 ../5/tramp3d-v4.ii.018t.ssa

so as a lame compare, we already have 10% more statements to start with.
Now einline

$ wc -l *.einline
692812 tramp3d-v4.ii.018t.einline
$ wc -l ../5/*.einline
724090 ../5/tramp3d-v4.ii.026t.einline

so after einline we seem to have 4% statements more, we do about the same
number of inlining:

$ grep Inlining tramp3d-v4.ii.*einline | wc -l
28003
$ grep Inlining ../5/tramp3d-v4.ii.*einline | wc -l
28685

but at release_ssa we still have about 4% more.

$ wc -l *release_ssa*
348378 tramp3d-v4.ii.036t.release_ssa
$ wc -l ../5/*release_ssa*
365689 ../5/tramp3d-v4.ii.043t.release_ssa

There is no difference in number of functions in ssa and release_ssa dumps. 
What makes the functions bigger in GCC 5?

$ grep "^  .* = " *.release_ssa | wc -l
65028
$ grep "^  .* = " ../5/*.release_ssa | wc -l
72636

The number of statements is about the same.

During the actual inlining GCC 4.9 reports:
 Unit growth for small function inlining: 88536->114049 (28%)
and
 Unit growth for small function inlining: 87943->97699 (11%)

Statement count seems to remain 7% in .optimized dumps.  So perhaps the
slowdown is not really that much caused by IPA passes as we somehow manage to
produce more code out of C++ FE.

I looked for interesting differences in SSA dump.  Here are few:

-;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
funcdef_no=312, decl_uid=8436, symbol_order=127)
+;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
funcdef_no=312, decl_uid=8537, cgraph_uid=127, symbol_order=127)

 int __gthread_active_p() ()
 {
-  bool _1;
-  int _2;
+  static void * const __gthread_active_ptr = (void *) __gthrw_pthread_cancel;
+  void * __gthread_active_ptr.111_2;
+  bool _3;
+  int _4;

   <bb 2>:
-  _1 = __gthrw_pthread_cancel != 0B;
-  _2 = (int) _1;
-  return _2;
+  __gthread_active_ptr.111_2 = __gthread_active_ptr;
+  _3 = __gthread_active_ptr.111_2 != 0B;
+  _4 = (int) _3;
+  return _4;

 }

... this looks like header change, perhaps ...

 ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this)
 {
-  int _6;
+  int (*__vtbl_ptr_type) () * _2;
+  int _7;

   <bb 2>:
-  this_3(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B];
-  *this_3(D) ={v} {CLOBBER};
-  _6 = 0;
-  if (_6 != 0)
+  _2 = &_ZTV13ObserverEvent + 16;
+  this_4(D)->_vptr.ObserverEvent = _2;
+  MEM[(struct  &)this_4(D)] ={v} {CLOBBER};
+  _7 = 0;
+  if (_7 != 0)

... extra temporary initializing vtbl pointer. This is repeated many times ...

-;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
funcdef_no=3030, decl_uid=51649, symbol_order=884)
+;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
funcdef_no=3030, decl_uid=51730, cgraph_uid=883, symbol_order=884)

 static Unique::Value_t Unique::get() ()
 {
   Value_t retval;
-  long int next_s.83_2;
-  long int next_s.84_3;
-  long int next_s.85_4;
-  Value_t _7;
+  long int next_s.83_3;
+  long int next_s.84_4;
+  long int next_s.85_5;
+  Value_t _9;

   <bb 2>:
-  Pooma::DummyMutex::_ZN5Pooma10DummyMutex4lockEv.isra.26 ();
-  next_s.83_2 = next_s;
-  next_s.84_3 = next_s.83_2;
-  next_s.85_4 = next_s.84_3 + 1;
-  next_s = next_s.85_4;
-  retval_6 = next_s.84_3;
-  Pooma::DummyMutex::_ZN5Pooma10DummyMutex6unlockEv.isra.27 ();
-  _7 = retval_6;
-  return _7;
+  Pooma::DummyMutex::lock (&mutex_s);
+  next_s.83_3 = next_s;
+  next_s.84_4 = next_s.83_3;
+  next_s.85_5 = next_s.84_4 + 1;
+  next_s = next_s.85_5;
+  retval_7 = next_s.84_4;
+  Pooma::DummyMutex::unlock (&mutex_s);
+  _9 = retval_7;
+  return _9;

 }

... here we give up on ISRA....

and we have about twice as much EH:

$ grep "resx " tramp3d-v4.ii.*\.ssa | wc -l
4816
$ grep "resx " ../5/tramp3d-v4.ii.*\.ssa | wc -l
8671

which however is optimized out at a time of release_ssa.

Another thing that we may consider to cleanup in next stage1 is to get rid of
dead stores:

-  MEM[(struct new_allocator *)&D.561702] ={v} {CLOBBER};
-  D.561702 ={v} {CLOBBER};
-  D.561702 ={v} {CLOBBER};
-  MEM[(struct new_allocator *)_2] ={v} {CLOBBER};
-  MEM[(struct allocator *)_2] ={v} {CLOBBER};
-  MEM[(struct _Alloc_hider *)_2] ={v} {CLOBBER};
-  MEM[(struct basic_string *)_2] ={v} {CLOBBER};
-  *_2 ={v} {CLOBBER};
-  *this_1(D) ={v} {CLOBBER};
+  MEM[(struct  &)&D.570046] ={v} {CLOBBER};
+  MEM[(struct  &)&D.570046] ={v} {CLOBBER};
+  D.570046 ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)_2] ={v} {CLOBBER};
+  MEM[(struct  &)this_1(D)] ={v} {CLOBBER};

Clobbers are dangerously common. There are 18K clobbers in release_ssa dump out
of 65K assignments, that makes them to be 29% of all the code. The number of
clobbers seems to go down only in tramp3d-v4.ii.166t.ehcleanup dump and we
still get a lot of redundancies:

  <bb 32>:                                                                      
  D.581063 ={v} {CLOBBER};                                                      
  D.581063 ={v} {CLOBBER};                                                      
  D.164155 ={v} {CLOBBER};                                                      
  D.164155 ={v} {CLOBBER};                                                      
  operator delete [] (begbuf_18);                                               

Why those are not considered a dead stores and DCEed out earlier?

[Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression

Reply via email to