Re: Question about information from -fdump-rtl-sched2 on M1 Max

Andrew Pinski via Gcc Mon, 29 Apr 2024 16:40:13 -0700

On Mon, Apr 29, 2024 at 4:26 PM Lucier, Bradley J via Gcc
<gcc@gcc.gnu.org> wrote:
>
> The question: How to interpret scheduling info with the compiler listed below.
>
> Specifically, a tight loop that was reported to be scheduled in 23 cycles (as 
> I understand it) actually executes in a little over 2 cycles per loop, as I 
> interpret two separate experiments.
>
> Am I misinterpreting something here?


Yes, the schedule mode in use here is the cortex-a53 one ...
as evidenced by "cortex_a53_slot_" in the dump.
Most aarch64 cores don't have a schedule model associated with it.
Especially when it comes cores that don't have not been upstream
directly from the company that produces them.
The default scheduling model is cortex-a53 anyways. And you didn't use
-mtune= nor -mcpu=; only -march=native which just changes the arch
features and not the tuning or scheduler model.

Thanks,
Andrew Pinski

>
> Thanks.
>
> Brad
>
> The compiler:
>
> [MacBook-Pro:~/programs/gambit/gambit-feeley] lucier% gcc-13 -v
> Using built-in specs.
> COLLECT_GCC=gcc-13
> COLLECT_LTO_WRAPPER=/opt/homebrew/Cellar/gcc/13.2.0/bin/../libexec/gcc/aarch64-apple-darwin23/13/lto-wrapper
> Target: aarch64-apple-darwin23
> Configured with: ../configure --prefix=/opt/homebrew/opt/gcc 
> --libdir=/opt/homebrew/opt/gcc/lib/gcc/current --disable-nls 
> --enable-checking=release --with-gcc-major-version-only 
> --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-13 
> --with-gmp=/opt/homebrew/opt/gmp --with-mpfr=/opt/homebrew/opt/mpfr 
> --with-mpc=/opt/homebrew/opt/libmpc --with-isl=/opt/homebrew/opt/isl 
> --with-zstd=/opt/homebrew/opt/zstd --with-pkgversion='Homebrew GCC 13.2.0' 
> --with-bugurl=https://github.com/Homebrew/homebrew-core/issues 
> --with-system-zlib --build=aarch64-apple-darwin23 
> --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk 
> --with-ld=/Library/Developer/CommandLineTools/usr/bin/ld-classic
> Thread model: posix
> Supported LTO compression algorithms: zlib zstd
> gcc version 13.2.0 (Homebrew GCC 13.2.0)
>
> (so perhaps not the standard gcc).
>
> The command line (cut down a bit) is
>
> gcc-13 -save-temps -fverbose-asm -fdump-rtl-sched2 -O1 
> -fexpensive-optimizations -fno-gcse -Wno-unused -Wno-write-strings 
> -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math 
> -fno-math-errno -fschedule-insns2 -foptimize-sibling-calls 
> -fomit-frame-pointer -fipa-ra -fmove-loop-invariants -march=native -fPIC 
> -fno-common   -I"../include" -c -o _num.o -I. _num.c -D___LIBRARY
>
> The scheduling report for the loop is
>
> ;;   ======================================================
> ;;   -- basic block 10 from 39 to 70 -- after reload
> ;;   ======================================================
>
> ;;        0--> b  0: i  39 x4=x2+x7                                
> :cortex_a53_slot_any
> ;;        0--> b  0: i  46 x1=zxn([sxn(x2)*0x4+x8])                
> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
> ;;        3--> b  0: i  45 x9=zxn([sxn(x4)*0x4+x3])                
> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
> ;;        7--> b  0: i  47 x1=zxn(x6)*zxn(x1)+x9                   
> :(cortex_a53_slot_any+cortex_a53_imul)
> ;;        9--> b  0: i  48 x1=x1+x5                                
> :cortex_a53_slot_any
> ;;        9--> b  0: i  53 x5=x12+x2                               
> :cortex_a53_slot_any
> ;;       10--> b  0: i  50 [sxn(x4)*0x4+x3]=x1                     
> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store
> ;;       10--> b  0: i  57 x4=x2+0x1                               
> :cortex_a53_slot_any
> ;;       11--> b  0: i  67 x2=x2+0x2                               
> :cortex_a53_slot_any
> ;;       12--> b  0: i  60 x9=zxn([sxn(x5)*0x4+x3])                
> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
> ;;       13--> b  0: i  61 x4=zxn([sxn(x4)*0x4+x8])                
> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
> ;;       17--> b  0: i  62 x4=zxn(x6)*zxn(x4)+x9                   
> :(cortex_a53_slot_any+cortex_a53_imul)
> ;;       20--> b  0: i  63 x1=x1 0>>0x20+x4                        
> :cortex_a53_slot_any
> ;;       20--> b  0: i  65 [sxn(x5)*0x4+x3]=x1                     
> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store
> ;;       22--> b  0: i  66 x5=x1 0>>0x20                           
> :cortex_a53_slot_any
> ;;       22--> b  0: i  69 cc=cmp(x11,x2)                          
> :cortex_a53_slot_any
> ;;       23--> b  0: i  70 pc={(cc>0)?L68:pc}                      
> :(cortex_a53_slot_any+cortex_a53_branch)
> ;;      Ready list (final):
> ;;   total time = 23
> ;;   new head = 39
> ;;   new tail = 70
>

Re: Question about information from -fdump-rtl-sched2 on M1 Max

Reply via email to