* Srikar Dronamraju <sri...@linux.vnet.ibm.com> wrote: > * Rik van Riel <r...@redhat.com> [2015-06-16 10:39:13]: > > > On 06/16/2015 07:56 AM, Srikar Dronamraju wrote: > > > This is consistent with all other load balancing instances where we > > > absorb unfairness upto env->imbalance_pct. Absorbing unfairness upto > > > env->imbalance_pct allows to pull and retain task to their preferred > > > nodes. > > > > > > Signed-off-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com> > > > > How does this work with other workloads, eg. > > single instance SPECjbb2005, or two SPECjbb2005 > > instances on a four node system? > > > > Is the load still balanced evenly between nodes > > with this patch? > > > > Yes, I have looked at mpstat logs while running SPECjbb2005 for 1JVMper > System, 2 JVMs per System and 4 JVMs per System and observed that the > load spreading was similar with and without this patch. > > Also I have visualized using htop when running 0.5X (i.e 48 threads on > 96 cpu system) cpu stress workloads to see that the spread is similar > before and after the patch. > > Please let me know if there are any better ways to observe the > spread. [...]
There are. I see you are using prehistoric tooling, but see the various NUMA convergence latency measurement utilities in 'perf bench numa': triton:~/tip> perf bench numa mem -h # Running 'numa/mem' benchmark: # Running main, "perf bench numa numa-mem -h" usage: perf bench numa <options> -p, --nr_proc <n> number of processes -t, --nr_threads <n> number of threads per process -G, --mb_global <MB> global memory (MBs) -P, --mb_proc <MB> process memory (MBs) -L, --mb_proc_locked <MB> process serialized/locked memory access (MBs), <= process_memory -T, --mb_thread <MB> thread memory (MBs) -l, --nr_loops <n> max number of loops to run -s, --nr_secs <n> max number of seconds to run -u, --usleep <n> usecs to sleep per loop iteration -R, --data_reads access the data via writes (can be mixed with -W) -W, --data_writes access the data via writes (can be mixed with -R) -B, --data_backwards access the data backwards as well -Z, --data_zero_memset access the data via glibc bzero only -r, --data_rand_walk access the data with random (32bit LFSR) walk -z, --init_zero bzero the initial allocations -I, --init_random randomize the contents of the initial allocations -0, --init_cpu0 do the initial allocations on CPU#0 -x, --perturb_secs <n> perturb thread 0/0 every X secs, to test convergence stability -d, --show_details Show details -a, --all Run all tests in the suite -H, --thp <n> MADV_NOHUGEPAGE < 0 < MADV_HUGEPAGE -c, --show_convergence show convergence details -m, --measure_convergence measure convergence latency -q, --quiet quiet mode -S, --serialize-startup serialize thread startup -C, --cpus <cpu[,cpu2,...cpuN]> bind the first N tasks to these specific cpus (the rest is unbound) -M, --memnodes <node[,node2,...nodeN]> bind the first N tasks to these specific memory nodes (the rest is unbound) '-m' will measure convergence. '-c' will visualize it. '--thp' can be used to turn hugepages on/off For example you can create a 'numa02' work-alike by doing: vega:~> cat numa02 #!/bin/bash perf bench numa mem --no-data_rand_walk -p 1 -t 32 -G 0 -P 0 -T 32 -l 800 -zZ0c $@ this perf bench numa command mimics numa02 pretty exactly on a 32 CPU system. This will run it in a loop: vega:~> cat numa02-loop while :; do ./numa02 2>&1 | grep runtime-max/thread sleep 1 done Or here are various numa01 work-alikes: vega:~> cat numa01 perf bench numa mem --no-data_rand_walk -p 2 -t 16 -G 0 -P 3072 -T 0 -l 50 -zZ0c $@ vega:~> cat numa01-hard-bind ./numa01 --cpus=0-16_16x16#16 --memnodes=0x16,2x16 or numa01-thread-alloc: vega:~> cat numa01-THREAD_ALLOC perf bench numa mem --no-data_rand_walk -p 2 -t 16 -G 0 -P 0 -T 192 -l 1000 -zZ0c $@ You can generate very flexible setups of NUMA access patterns, and measure their behavior accurately. It's all so much more capable and more flexible than autonumabench ... Also, when you are trying to report numbers for multiple runs, please use something like: perf stat --null --repeat 3 ... This will run the workload 3 times (doing only time measurement) and report the stddev in a human readable form. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/