------- Comment #11 from hubicka at gcc dot gnu dot org  2008-01-16 16:46 
-------
Last time I looked into it, it was code                                        
  alignment affected by inlining in the string matching loop (longest_match). 
This code is very atypical, since the internal loop comparing strings is hand
unrolled but it almost never rolls, since the compressed strings tends to be
all different.  GCC mispredicts this                                           
  moving some stuff out of the loop and bb-reorder aligns the code in a        
                                          way that the default path not doing
the loop is jumping pretty far                                                
hurting decode bandwidth of K8 especially because the jumps are hard to        
                                   predict.                                     

I don't see any direct things in the code heuristics can use to realize        
                                   that the loop is not rooling, except for
special casing the particular                                            
benchmark.                                                                      

FDO scores of gzip are not doing that bad, but there is still gap              
                                   relative to ICC (even archaic version of it
running 32bit compared to 64bit GCC).                                           
http://www.suse.de/~gcctest/SPEC-britten/CINT/sandbox-britten-FDO/index.html    
It would be nice to convince gzip/zlibc/bzip2 people to use profiling by       
                                   default in the build process - those
packages are ideal targets.                                                  

But since core is not that much sensitive to code alignment and nuber of       
                                   jumps as K8, perhaps there are extra
problems demonstrated by this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761

Reply via email to