https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
X265
GCC 9:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 9.3.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:      3, Avg QP:27.57  kb/s: 14018.64                      
x265 [info]: frame P:    146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:    451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 279.98s (2.14 fps), 1273.22 kb/s, Avg QP:33.68
1056.04user 1.31system 4:40.01elapsed 377%CPU (0avgtext+0avgdata
432688maxresident)k
0inputs+0outputs (0major+102385minor)pagefaults 0swaps


GCC 10:
y4m  [info]: 1920x1080 fps 30/1 i420p8 frames 0 - 599 of 600
raw  [info]: output file: /dev/null
x265 [info]: HEVC encoder version 3.1.2+1-76650bab70f9
x265 [info]: build info [Linux][GCC 10.1.1][64 bit][noasm] 8bit
x265 [info]: using cpu capabilities: none!
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 2 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 25 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / off / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 early-skip rskip signhide tmvp b-intra
x265 [info]: tools: strong-intra-smoothing lslices=6 deblock sao
x265 [info]: frame I:      3, Avg QP:27.57  kb/s: 14018.64                      
x265 [info]: frame P:    146, Avg QP:28.84  kb/s: 4313.98 
x265 [info]: frame B:    451, Avg QP:35.29  kb/s: 204.06  
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 0.7% 0.0% 0.0% 94.6% 4.7% 

encoded 600 frames in 292.63s (2.05 fps), 1273.22 kb/s, Avg QP:33.68
1079.80user 1.76system 4:52.65elapsed 369%CPU (0avgtext+0avgdata
427464maxresident)k
0inputs+0outputs (0major+73644minor)pagefaults 0swaps

So 5% difference instead of 50%. This is a codebase that I would build with
-O3.  Looking at perf reports there is a difference in inlining.

GCC 9:
   8.74%  x265     libx265.so.176       [.] (anonymous namespace)::satd_8x4
   5.67%  x265     libx265.so.176       [.] (anonymous
namespace)::filterVertical_sp_c<8>
   4.44%  x265     libx265.so.176       [.] (anonymous
namespace)::pixelavg_pp<8, 8>
   4.11%  x265     libx265.so.176       [.] (anonymous
namespace)::psyCost_pp<3>                                                       
   3.81%  x265     libx265.so.176       [.] (anonymous
namespace)::interp_horiz_ps_c<8, 64, 64>
   3.33%  x265     libx265.so.176       [.] (anonymous namespace)::sad<8, 8>
   3.29%  x265     libx265.so.176       [.] partialButterfly32

GCC 10:
   9.17%  x265     libx265.so.176       [.] (anonymous namespace)::_sa8d_8x8
   8.70%  x265     libx265.so.176       [.] (anonymous namespace)::satd_8x4 
   5.80%  x265     libx265.so.176       [.] (anonymous
namespace)::pixelavg_pp<8, 8>
   5.55%  x265     libx265.so.176       [.] (anonymous
namespace)::filterVertical_sp_c<8> 
   3.90%  x265     libx265.so.176       [.] (anonymous namespace)::sad<8, 8>
   3.71%  x265     libx265.so.176       [.] (anonymous
namespace)::interp_horiz_ps_c<8, 64, 64> 
   3.48%  x265     libx265.so.176       [.] (anonymous namespace)::sad_x4<8, 8>

I build with 
cmake ../source/ -DCMAKE_CXX_FLAGS=-O2 -DCMAKE_CXX_FLAGS_RELEASE=-DNDEBUG
-DCMAKE_CXX_COMPILER=g++-9
I think phoronix may be missing release flag override so he may be testing -O3
build.

GCC 9 inlines _sa8d_8x8 while GCC 10 does not. It is estimated by inliner to
159 insns, so this is indeed the change from --param inline-insns-single
dropping it  from 200 to 70 for -O2. The default of 200 did not make very good
sense for -O2 since inline is abused by C++ codebases (this was main point of
the retuning)

Reply via email to