[PATCH v4 00/10] Coregroup support on Powerpc

2020-07-26 Thread Srikar Dronamraju
Changelog v3 ->v4:
v3: 
https://lore.kernel.org/lkml/20200723085116.4731-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Create coregroup domain
if coregroup_support doesn't exist, update MC mask to the next
smaller domain mask.

Changelog v2 -> v3:
v2: 
https://lore.kernel.org/linuxppc-dev/20200721113814.32284-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Cache node for reuse
Removed node caching part. Rewrote the Commit msg (Michael Ellerman)
Renamed to powerpc/smp: Fix a warning under !NEED_MULTIPLE_NODES

powerpc/smp: Enable small core scheduling sooner
Rewrote changelog (Gautham)
Renamed to powerpc/smp: Move topology fixups into  a new function

powerpc/smp: Create coregroup domain
Add optimization for mask updation under coregroup_support

Changelog v1 -> v2:
v1: 
https://lore.kernel.org/linuxppc-dev/20200714043624.5648-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Merge Power9 topology with Power topology
Replaced a reference to cpu_smt_mask with per_cpu(cpu_sibling_map, cpu)
since cpu_smt_mask is only defined under CONFIG_SCHED_SMT

powerpc/smp: Enable small core scheduling sooner
Restored the previous info msg (Jordan)
Moved big core topology fixup to fixup_topology (Gautham)

powerpc/smp: Dont assume l2-cache to be superset of sibling
Set cpumask after verifying l2-cache. (Gautham)

powerpc/smp: Generalize 2nd sched domain
Moved shared_cache topology fixup to fixup_topology (Gautham)

Powerpc/numa: Detect support for coregroup
Explained Coregroup in commit msg (Michael Ellerman)

Powerpc/smp: Create coregroup domain
Moved coregroup topology fixup to fixup_topology (Gautham)

powerpc/smp: Implement cpu_to_coregroup_id
Move coregroup_enabled before getting associativity (Gautham)

powerpc/smp: Provide an ability to disable coregroup
Patch dropped (Michael Ellerman)

Cleanup of existing powerpc topologies and add coregroup support on
Powerpc. Coregroup is a group of (subset of) cores of a DIE that share
a resource.

Patch 7 of this patch series: "Powerpc/numa: Detect support for coregroup"
depends on
https://lore.kernel.org/linuxppc-dev/20200707140644.7241-1-sri...@linux.vnet.ibm.com/t/#u
However it should be easy to rebase the patch without the above patch.

This patch series is based on top of current powerpc/next tree + the
above patch.

On Power 8 Systems
--
$ tail /proc/cpuinfo
processor   : 255
cpu : POWER8 (architected), altivec supported
clock   : 3724.00MHz
revision: 2.1 (pvr 004b 0201)

timebase: 51200
platform: pSeries
model   : IBM,8408-E8E
machine : CHRP IBM,8408-E8E
MMU : Hash

Before the patchset
---
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
DIE
NUMA
NUMA
$ head /proc/schedstat
version 15
timestamp 4295534931
cpu0 0 0 0 0 0 0 41389823338 17682779896 14117
domain0 ,,,,,,,00ff 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 27087859050 152273672 10396
domain0 ,,,,,,,00ff 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After the patchset
--
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
DIE
NUMA
NUMA
$ head /proc/schedstat
version 15
timestamp 4295534931
cpu0 0 0 0 0 0 0 41389823338 17682779896 14117
domain0 ,,,,,,,00ff 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 27087859050 152273672 10396
domain0 ,,,,,,,00ff 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 ,0

[PATCH v4 00/10] Coregroup support on Powerpc

2020-07-26 Thread Srikar Dronamraju
Changelog v3 ->v4:
v3: 
https://lore.kernel.org/lkml/20200723085116.4731-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Create coregroup domain
if coregroup_support doesn't exist, update MC mask to the next
smaller domain mask.

Changelog v2 -> v3:
v2: 
https://lore.kernel.org/linuxppc-dev/20200721113814.32284-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Cache node for reuse
Removed node caching part. Rewrote the Commit msg (Michael Ellerman)
Renamed to powerpc/smp: Fix a warning under !NEED_MULTIPLE_NODES

powerpc/smp: Enable small core scheduling sooner
Rewrote changelog (Gautham)
Renamed to powerpc/smp: Move topology fixups into  a new function

powerpc/smp: Create coregroup domain
Add optimization for mask updation under coregroup_support

Changelog v1 -> v2:
v1: 
https://lore.kernel.org/linuxppc-dev/20200714043624.5648-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Merge Power9 topology with Power topology
Replaced a reference to cpu_smt_mask with per_cpu(cpu_sibling_map, cpu)
since cpu_smt_mask is only defined under CONFIG_SCHED_SMT

powerpc/smp: Enable small core scheduling sooner
Restored the previous info msg (Jordan)
Moved big core topology fixup to fixup_topology (Gautham)

powerpc/smp: Dont assume l2-cache to be superset of sibling
Set cpumask after verifying l2-cache. (Gautham)

powerpc/smp: Generalize 2nd sched domain
Moved shared_cache topology fixup to fixup_topology (Gautham)

Powerpc/numa: Detect support for coregroup
Explained Coregroup in commit msg (Michael Ellerman)

Powerpc/smp: Create coregroup domain
Moved coregroup topology fixup to fixup_topology (Gautham)

powerpc/smp: Implement cpu_to_coregroup_id
Move coregroup_enabled before getting associativity (Gautham)

powerpc/smp: Provide an ability to disable coregroup
Patch dropped (Michael Ellerman)

Cleanup of existing powerpc topologies and add coregroup support on
Powerpc. Coregroup is a group of (subset of) cores of a DIE that share
a resource.

Patch 7 of this patch series: "Powerpc/numa: Detect support for coregroup"
depends on
https://lore.kernel.org/linuxppc-dev/20200707140644.7241-1-sri...@linux.vnet.ibm.com/t/#u
However it should be easy to rebase the patch without the above patch.

This patch series is based on top of current powerpc/next tree + the
above patch.

On Power 8 Systems
--
$ tail /proc/cpuinfo
processor   : 255
cpu : POWER8 (architected), altivec supported
clock   : 3724.00MHz
revision: 2.1 (pvr 004b 0201)

timebase: 51200
platform: pSeries
model   : IBM,8408-E8E
machine : CHRP IBM,8408-E8E
MMU : Hash

Before the patchset
---
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
DIE
NUMA
NUMA
$ head /proc/schedstat
version 15
timestamp 4295534931
cpu0 0 0 0 0 0 0 41389823338 17682779896 14117
domain0 ,,,,,,,00ff 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 27087859050 152273672 10396
domain0 ,,,,,,,00ff 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After the patchset
--
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
DIE
NUMA
NUMA
$ head /proc/schedstat
version 15
timestamp 4295534931
cpu0 0 0 0 0 0 0 41389823338 17682779896 14117
domain0 ,,,,,,,00ff 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ,,,,,,, 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 27087859050 152273672 10396
domain0 ,,,,,,,00ff 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 ,0

Re: [PATCH v4 00/10] Coregroup support on Powerpc

2020-07-26 Thread Srikar Dronamraju
* Srikar Dronamraju  [2020-07-27 10:47:55]:


> Changelog v3 ->v4:
> v3: 
> https://lore.kernel.org/lkml/20200723085116.4731-1-sri...@linux.vnet.ibm.com/t/#u
> 
> powerpc/smp: Create coregroup domain
>   if coregroup_support doesn't exist, update MC mask to the next
>   smaller domain mask.
> 

Sorry for the double post of v4.
Please follow the other thread.

http://lore.kernel.org/lkml/20200727053230.19753-1-sri...@linux.vnet.ibm.com/t/#u

> 

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH v4 00/10] Coregroup support on Powerpc

2020-07-30 Thread Srikar Dronamraju
* Srikar Dronamraju  [2020-07-27 11:02:20]:

> Changelog v3 ->v4:
> v3: 
> https://lore.kernel.org/lkml/20200723085116.4731-1-sri...@linux.vnet.ibm.com/t/#u
>

Here is a summary of some of the testing done with coregroup v4 patchsets.
It includes ebizzy, schbench, perf bench sched pipe and topology verification.
One the left side are results from powerpc/next tree and on the right are the
results with the patchset applied.  Topological verification clearly shows that
there is no change in topology with and without the patches on all the 3 class
of systems that were tested.

On PowerPc/NextOn 
Powerpc/next + Coregroup Support v4 patchset

Power 9 PowerNV (2 Node/ 160 Cpu System)
-
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
  N  Min   MaxMedian   AvgStddev  N 
 Min   MaxMedian   Avg  Stddev
100   993884   1276090   1173476   1165914 54867.201100   
910470   1279820   1171095   116209167363.28

schbench (latency hence lower is better)
Latency percentiles (usec)  Latency 
percentiles (usec)
50.0th: 455 
50.0th: 454
75.0th: 533 
75.0th: 543
90.0th: 683 
90.0th: 701
95.0th: 743 
95.0th: 737
*99.0th: 815
*99.0th: 805
99.5th: 839 
99.5th: 835
99.9th: 913 
99.9th: 893
min=0, max=1011 
min=0, max=2833

perf bench sched pipe (lesser time and higher ops/sec is better)
# Running 'sched/pipe' benchmark:   # 
Running 'sched/pipe' benchmark:
# Executed 100 pipe operations between two processes# 
Executed 100 pipe operations between two processes

 Total time: 6.083 [sec] 
Total time: 6.303 [sec]

   6.083576 usecs/op   
6.303318 usecs/op
 164377 ops/sec 
 158646 ops/sec


Power 9 LPAR (2 Node/ 128 Cpu System)
-
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
  N   Min   MaxMedian Avg  Stddev N 
  Min   MaxMedian Avg  Stddev
100   1058029   1295393   1200414   1188306.7   56786.538   100
943264   1287619   1180522   1168473.2   64469.955

schbench (latency hence lower is better)
Latency percentiles (usec)
Latency percentiles (usec)
50.th: 34   
  50.th: 39
75.th: 46   
  75.th: 52
90.th: 53   
  90.th: 68
95.th: 56   
  95.th: 77
*99.th: 61  
  *99.th: 89
99.5000th: 63   
  99.5000th: 94
99.9000th: 81   
  99.9000th: 169
min=0, max=8405 
  min=0, max=23674

perf bench sched pipe (lesser time and higher ops/sec is better)
# Running 'sched/pipe' benchmark:# 
Running 'sched/pipe' benchmark:
# Executed 100 pipe operations between two processes # 
Executed 100 pipe operations between two processes

 Total time: 8.768 [sec]
  Total time: 5.217 [sec]

   8.768400 usecs/op
5.217625 usecs/op
 114045 ops/sec 
  191658 ops/sec

Power 8 LPAR (8 Node/ 256 Cpu System)
-
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
  N   Min   MaxMedian Avg  Stddev   N  
Min  Max   MedianAvg Stddev
100   1267615   1965234   1707423   1689137.6   144363.29 100  
1175357  1924262  1691104  1664792.1   145876.4

schbench (latency hence low