Hello Chuck, Thanks for running and posting those performance numbers. It sadly seems like 1:1 is most frequently the most efficient use of CPU cycles.
It's interesting to see how this architectures scales with a large number of processes, while each core is designed for 8 lighter weight threads is seems. I was hoping to run similar performance test on lhcp-rh6 with 80 virtual cores 4 sockets each with 10 cores + hyper-theading. Unfortunately I need to use ninja more as my timing results appear to be from cached compilations and not actually running the compiler. I hope were are able to improve ITK threading performance with this system. But due to the tweaky-ness of this type of performance and not having direct access to the system to easily run performance analysis, I and a little unclear how to best utilize it. Thanks! Brad On Apr 23, 2015, at 6:21 PM, Chuck Atkins <[email protected]> wrote: > In case anybody's interested, here's the "spread_numa.sh" script I use to > evenly distribute across NUMA domains and bind to CPU cores: > > ----------BEGIN spread_numa.sh---------- > #!/bin/bash > > # Evenly spread a command across numa domains for a given number of CPU cores > function spread() > { > NUM_CORES=$1 > shift > > # Use this wicked awk script to parse the numactl hardware layout and > # select an equal number of cores from each NUMA domain, evenly spaced > # across each domain > SPREAD="$(numactl -H | sed -n 's|.*cpus: \(.*\)|\1|p' | awk -v > NC=${NUM_CORES} -v ND=${NUMA_DOMAINS} 'BEGIN{CPD=NC/ND} {S=NF/CPD; > for(C=0;C<CPD;C++){F0=C*S; F1=(F0==int(F0)?F0:int(F0)+1)+1; printf("%d", > $F1); if(!(NR==ND && C==CPD-1)){printf(",")} } }')" > > echo Executing: numactl --physcpubind=${SPREAD} "$@" > numactl --physcpubind=${SPREAD} "$@" > } > > # Check command arguments > if [ $# -lt 2 ] > then > echo "Usage: $0 [NUM_CORES_TO_USE] [cmd [arg1] ... [argn]]" > exit 1 > fi > > # Determine the total number of CPU cores > MAX_CORES=$(numactl -s | sed -n 's|physcpubind: \(.*\)|\1|p' | wc -w) > > # Determine the total number of NUMA domains > NUMA_DOMAINS=$(numactl -H | sed -n 's|available: \([0-9]*\).*|\1|p') > > # Verify the number of cores is sane > NUM_CORES=$1 > shift > if [ $NUM_CORES -gt $MAX_CORES ] > then > echo "WARNING: $NUM_CORES cores is out of bounds. Setting to $MAX_CORES > cores." > NUM_CORES=$MAX_CORES > fi > if [ $((NUM_CORES%NUMA_DOMAINS)) -ne 0 ] > then > TMP=$(( ((NUM_CORES/NUMA_DOMAINS) + 1) * NUMA_DOMAINS )) > echo "WARNING: $NUM_CORES core(s) are not evenly divided across > $NUMA_DOMAINS NUMA domains. Setting to $TMP." > NUM_CORES=$TMP > fi > > echo "Using ${NUM_CORES}/${MAX_CORES} cores across ${NUMA_DOMAINS} NUMA > domains" > > spread ${NUM_CORES} "$@" > ----------END spread_numa.sh---------- > > > - Chuck > > On Thu, Apr 23, 2015 at 4:57 PM, Chuck Atkins <[email protected]> > wrote: > (re-sent for the rest of the dev list) > Hi Bradley, > > It's pretty fast. The interesting numbers are for 20, 40, 80, and 160. That > aligns with 1:1, 2:1, 4:1, and 8:1 threads to core ratio. Starting from the > already configured ITKLinuxPOWER8 currently being built, I did a ninja clean > and then "time ninja -jN". Watching the cpu load for 20, 40, and 80 cores > though, I see a fair amount of both process migration and unbalanced thread > distribution, i.e. for -j20 I'll often see 2 cores with 6 or 8 threads and > the rest with only 1 or 2. So in addition to the -jN settings, I also ran > 20, 40, and 80 threads using numactl with fixed binding to physical CPU cores > to evenly distribute the threads across cores and prevent thread migration. > See timings below in seconds: > > Threads Real User Sys Total CPU Time > 20 1037.097 19866.685 429.796 20296.481 > (Numa Bind) 20 915.910 16290.589 319.017 16609.606 > 40 713.772 26953.663 556.960 27510.623 > (Numa Bind) 40 641.924 22442.685 432.379 22875.064 > 80 588.357 40970.439 822.944 41793.383 > (Numa Bind) 80 538.801 35366.297 637.922 36004.219 > 160 572.492 62542.901 1289.864 63832.765 > (Numa Bind) 160 549.742 61864.666 1242.975 63107.641 > > > > So it seems like core binding gives us an approximate 10% performance > increase for all thread configurations. And while clearly the core-locked > 4:1 gave us the best time, looking at the total CPU time (user+sys) the 1:1 > looks to be the most efficient for actual cycles used. > > It's interesting to watch how the whole system gets used up for most of the > build but everything gets periodically gated on a handful of linker > processes. And of course, it's always cool to see a screen cap of htop with > a whole boat load of cores at 100% > > > - Chuck > > On Thu, Apr 23, 2015 at 10:01 AM, Bradley Lowekamp <[email protected]> > wrote: > Matt, > > I'd love to explore the build performance of this system. > > Any chance you could run clean builds of ITK on this system with > 20,40,60,80,100,120,140 and 160 processes and record the timings? > > I am very curious how this unique systems scales with multiple heavy weight > processes, as it's design appears to be uniquely suitable to lighter weight > multi-threading. > > Thanks, > Brad > > On Apr 22, 2015, at 11:51 PM, Matt McCormick <[email protected]> > wrote: > > > Hi folks, > > > > With thanks to Chuck Atkins and FSF France, we have a new build on the > > dashboard [1] for the IBM POWER8 [2] system. This is a PowerPC64 > > system with 20 cores and 8 threads per core -- a great system where we > > can test and improve ITK parallel computing performance! > > > > > > To generate a test build on Gerrit, add > > > > request build: power8 > > > > in a review's comments. > > > > > > There are currently some build warnings and test failures that should > > be addressed before we will be able to use the system effectively. Any > > help here is appreciated. > > > > Thanks, > > Matt > > > > > > [1] > > https://open.cdash.org/index.php?project=Insight&date=2015-04-22&filtercount=1&showfilters=1&field1=site/string&compare1=63&value1=gcc112 > > > > [2] https://en.wikipedia.org/wiki/POWER8 > > _______________________________________________ > > Powered by www.kitware.com > > > > Visit other Kitware open-source projects at > > http://www.kitware.com/opensource/opensource.html > > > > Kitware offers ITK Training Courses, for more information visit: > > http://kitware.com/products/protraining.php > > > > Please keep messages on-topic and check the ITK FAQ at: > > http://www.itk.org/Wiki/ITK_FAQ > > > > Follow this link to subscribe/unsubscribe: > > http://public.kitware.com/mailman/listinfo/insight-developers > > _______________________________________________ > > Community mailing list > > [email protected] > > http://public.kitware.com/mailman/listinfo/community > > >
_______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Kitware offers ITK Training Courses, for more information visit: http://kitware.com/products/protraining.php Please keep messages on-topic and check the ITK FAQ at: http://www.itk.org/Wiki/ITK_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/insight-developers
