Richard, This is what my job script looks like:
#!/bin/bash #SBATCH -N 16 #SBATCH -C knl,quad,flat #SBATCH -p regular #SBATCH -J knlflat1024 #SBATCH -L SCRATCH #SBATCH -o knlflat1024.o%j #SBATCH --mail-type=ALL #SBATCH [email protected] #SBATCH -t 00:20:00 #run the application: cd $SCRATCH/Icesheet sbcast --compress=lz4 ./ex48cori /tmp/ex48cori srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine 1 According to the NERSC info pages, they say to add the "numactl" if using flat mode. Previously I tried cache mode but the performance seems to be unaffected. I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly 4-5x faster. Though I suspect this drastic change has much to do with the initial coarse grid size now being extremely small. I'll give the COPTFLAGS a try and see what happens Thanks, Justin On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills <[email protected]> wrote: > Hi Justin, > > How is the MCDRAM (on-package "high-bandwidth memory") configured for your > KNL runs? And if it is in "flat" mode, what are you doing to ensure that > you use the MCDRAM? Doing this wrong seems to be one of the most common > reasons for unexpected poor performance on KNL. > > I'm not that familiar with the environment on Cori, but I think that if > you are building for KNL, you should add "-xMIC-AVX512" to your compiler > flags to explicitly instruct the compiler to use the AVX512 instruction > set. I usually use something along the lines of > > 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' > > (The "-g" just adds symbols, which make the output from performance > profiling tools much more useful.) > > That said, I think that if you are comparing 1024 Haswell cores vs. 1024 > KNL cores (so double the number of Haswell nodes), I'm not surprised that > the simulations are almost twice as fast using the Haswell nodes. Keep in > mind that individual KNL cores are much less powerful than an individual > Haswell node. You are also using roughly twice the power footprint (dual > socket Haswell node should be roughly equivalent to a KNL node, I > believe). How do things look on when you compare equal nodes? > > Cheers, > Richard > > On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang <[email protected]> wrote: > >> Hi all, >> >> On NERSC's Cori I have the following configure options for PETSc: >> >> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 >> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn >> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 >> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >> >> Where I swapped out the default Intel programming environment with that >> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >> want to document the performance difference between Cori's Haswell and KNL >> processors. >> >> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >> Which leads me to suspect that I am not doing something right for KNL. Does >> anyone know what are some "optimal" configure options for running PETSc on >> KNL? >> >> Thanks, >> Justin >> > >
