https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103585

--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Hacking around the logic in ipa-sra that disables the transform and adding
-fwhole-program I get down to:


 Performance counter stats for './a.out-bad3':

         24,946.66 msec task-clock                #    0.999 CPUs utilized      
             1,078      context-switches          #   43.212 /sec               
                42      cpu-migrations            #    1.684 /sec               
                71      page-faults               #    2.846 /sec               
    96,144,941,575      cycles                    #    3.854 GHz                
       151,439,200      stalled-cycles-frontend   #    0.16% frontend cycles
idle   
    68,072,941,085      stalled-cycles-backend    #   70.80% backend cycles
idle    
   210,675,636,303      instructions              #    2.19  insn per cycle     
                                                  #    0.32  stalled cycles per
insn
     9,128,994,716      branches                  #  365.941 M/sec              
        24,781,891      branch-misses             #    0.27% of all branches    

      24.982117481 seconds time elapsed

      24.909903000 seconds user
       0.036031000 seconds sys

which is not bad.  I think we have two ipa-sra issues
 1) ipa-sra is overparanoid about not adding derefernece. First I think it is
safe for parameters that are REFERENCE_TYPE rather than POINTER_TYPE second it
should do propagation from callers to callees: it is quite easy to figure out
that a given param contains data packed to a structure only to make callee
happy.
 2) since ipa-sra is run before ipa-cp it won't simplify ipa-cp (or other)
clones even if they are static symbols. 

Also I think ipa-sra may consider packing multiple structures together. If
array descriptors are passed by references and built in caller it seems
pointless to pass each as separate struct. Not sure what kind of benefits to
expect here though.

Reply via email to