https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99386

--- Comment #4 from Milian Wolff <mail at milianw dot de> ---
Ah, but LTO only helps with the variant that contains a single type. The
variant with two types remains very slow:


variant with single type:
```
 Performance counter stats for './variant 1' (5 runs):

            264.14 msec task-clock                #    0.999 CPUs utilized     
      ( +-  0.13% )
                 0      context-switches          #    0.001 K/sec             
      ( +-100.00% )
                 0      cpu-migrations            #    0.000 K/sec              
               380      page-faults               #    0.001 M/sec             
      ( +-  0.13% )
     1,182,582,454      cycles                    #    4.477 GHz               
      ( +-  0.06% )  (62.52%)
           634,015      stalled-cycles-frontend   #    0.05% frontend cycles
idle     ( +-  3.72% )  (62.52%)
     1,044,218,220      stalled-cycles-backend    #   88.30% backend cycles
idle      ( +-  0.16% )  (62.52%)
     1,187,317,899      instructions              #    1.00  insn per cycle     
                                                  #    0.88  stalled cycles per
insn  ( +-  0.11% )  (62.52%)
       132,470,519      branches                  #  501.512 M/sec             
      ( +-  0.09% )  (62.53%)
             2,967      branch-misses             #    0.00% of all branches   
      ( +-  7.80% )  (62.47%)
       788,740,131      L1-dcache-loads           # 2986.044 M/sec             
      ( +-  0.16% )  (62.47%)
        16,466,669      L1-dcache-load-misses     #    2.09% of all L1-dcache
accesses  ( +-  0.16% )  (62.46%)
   <not supported>      LLC-loads                                               
   <not supported>      LLC-load-misses                                         

          0.264412 +- 0.000379 seconds time elapsed  ( +-  0.14% )

```

The above measurements is in the same ballpark as the no-variant baseline
without LTO. But check out the following for using a variant with two types:
```
 Performance counter stats for './variant 2' (5 runs):

          1,807.01 msec task-clock                #    1.000 CPUs utilized     
      ( +-  0.04% )
                 4      context-switches          #    0.002 K/sec             
      ( +- 11.59% )
                 0      cpu-migrations            #    0.000 K/sec             
      ( +- 61.24% )
               383      page-faults               #    0.212 K/sec             
      ( +-  0.27% )
     8,093,139,812      cycles                    #    4.479 GHz               
      ( +-  0.01% )  (62.35%)
         1,393,308      stalled-cycles-frontend   #    0.02% frontend cycles
idle     ( +-  5.84% )  (62.52%)
     7,257,955,665      stalled-cycles-backend    #   89.68% backend cycles
idle      ( +-  0.08% )  (62.62%)
     4,728,542,717      instructions              #    0.58  insn per cycle     
                                                  #    1.53  stalled cycles per
insn  ( +-  0.02% )  (62.65%)
       395,189,246      branches                  #  218.698 M/sec             
      ( +-  0.02% )  (62.65%)
            17,570      branch-misses             #    0.00% of all branches   
      ( +- 12.38% )  (62.55%)
     3,806,321,294      L1-dcache-loads           # 2106.424 M/sec             
      ( +-  0.02% )  (62.39%)
        16,753,910      L1-dcache-load-misses     #    0.44% of all L1-dcache
accesses  ( +-  0.11% )  (62.28%)
   <not supported>      LLC-loads                                               
   <not supported>      LLC-load-misses                                         

          1.807335 +- 0.000776 seconds time elapsed  ( +-  0.04% )

```

Again, performance suffers dramatically

Reply via email to