On 2021/1/25 17:04, Mel Gorman wrote: > On Mon, Jan 25, 2021 at 12:29:47PM +0800, Li, Aubrey wrote: >>>>> hackbench -l 2560 -g 1 on 8 cores arm64 >>>>> v5.11-rc4 : 1.355 (+/- 7.96) >>>>> + sis improvement : 1.923 (+/- 25%) >>>>> + the patch below : 1.332 (+/- 4.95) >>>>> >>>>> hackbench -l 2560 -g 256 on 8 cores arm64 >>>>> v5.11-rc4 : 2.116 (+/- 4.62%) >>>>> + sis improvement : 2.216 (+/- 3.84%) >>>>> + the patch below : 2.113 (+/- 3.01%) >>>>> >> >> 4 benchmarks reported out during weekend, with patch 3 on a x86 4s system >> with 24 cores per socket and 2 HT per core, total 192 CPUs. >> >> It looks like mid-load has notable changes on my side: >> - netperf 50% num of threads in TCP mode has 27.25% improved >> - tbench 50% num of threads has 9.52% regression >> > > It's interesting that patch 3 would make any difference on x64 given that > it's SMT2. The scan depth should have been similar. It's somewhat expected > that it will not be a universal win, particularly once the utilisation > is high enough to spill over in sched domains (25%, 50%, 75% utilisation > being interesting on 4-socket systems). In such cases, double scanning can > still show improvements for workloads that idle rapidly like tbench and > hackbench even though it's expensive. The extra scanning gives more time > for a CPU to go idle enough to be selected which can improve throughput > but at the cost of wake-up latency,
aha, sorry for the confusion. Since you and Vincent discussed to drop patch3, I just mentioned I tested 5 patches with patch3, not patch3 alone. > > Hopefully v4 can be tested as well which is now just a single scan. > Sure, may I know the baseline of v4? Thanks, -Aubrey