Re: [PATCH] Doc: admin-guide: Add entry for kvm_cma_resv_ratio kernel param

2020-09-16 Thread Satheesh Rajendran
Hi Randy,

Thanks for the comments, will send a V2 fixing them.

On Tue, Sep 15, 2020 at 11:18:52PM -0700, Randy Dunlap wrote:
> On 9/15/20 11:11 PM, sathn...@linux.vnet.ibm.com wrote:
> > From: Satheesh Rajendran 
> > 
> > Add document entry for kvm_cma_resv_ratio kernel param which
> > is used to alter the KVM contiguous memory allocation percentage
> > for hash pagetable allocation used by hash mode PowerPC KVM guests.
> > 
> > Cc: linux-kernel@vger.kernel.org
> > Cc: kvm-...@vger.kernel.org
> > Cc: linuxppc-...@lists.ozlabs.org
> > Cc: Paul Mackerras 
> > Cc: Michael Ellerman 
> > Cc: Jonathan Corbet   
> > Signed-off-by: Satheesh Rajendran 
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index a1068742a6df..9cb126573c71 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -599,6 +599,15 @@
> > altogether. For more information, see
> > include/linux/dma-contiguous.h
> >  
> > +kvm_cma_resv_ratio=n
> > +[PPC]
> 
> You can put [PPC] on the line above.
> 
sure
> > +Reserves given percentage from system memory area 
> > for
> > +contiguous memory allocation for KVM hash pagetable
> > +allocation.
> > +Bydefault it reserves 5% of total system memory.
> 
>  By default
> 
> > +Format: 
> > +Default: 5
> > +
> 
> and please use tabs for indentation, not all spaces.
> 
sure
> > cmo_free_hint=  [PPC] Format: { yes | no }
> > Specify whether pages are marked as being inactive
> > when they are freed.  This is used in CMO environments
> > 
> 
> Entries in kernel-parameters.txt should be sorted into dictionary order,
> so please put that with the other kvm parameters.
> 
sure.
> thanks.
> -- 
> ~Randy
> 

Thanks!
-Satheesh.


Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-13 Thread Satheesh Rajendran
On Fri, Sep 11, 2020 at 09:55:23PM +1000, Michael Ellerman wrote:
> Srikar Dronamraju  writes:
> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > always be a superset of cpu_sibling_mask.
> >
> > Lets stop that assumption. cpu_l2_cache_mask is a superset of
> > cpu_sibling_mask if and only if shared_caches is set.
> 
> I'm seeing oopses with this:
> 
> [0.117392][T1] smp: Bringing up secondary CPUs ...
> [0.156515][T1] smp: Brought up 2 nodes, 2 CPUs
> [0.158265][T1] numa: Node 0 CPUs: 0
> [0.158520][T1] numa: Node 1 CPUs: 1
> [0.167453][T1] BUG: Unable to handle kernel data access on read at 
> 0x800041228298
> [0.167992][T1] Faulting instruction address: 0xc018c128
> [0.168817][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.168964][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [0.169417][T1] Modules linked in:
> [0.170047][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc2-00095-g7430ad5aa700 #209
> [0.170305][T1] NIP:  c018c128 LR: c018c0cc CTR: 
> c004dce0
> [0.170498][T1] REGS: c0007e343880 TRAP: 0380   Not tainted  
> (5.9.0-rc2-00095-g7430ad5aa700)
> [0.170602][T1] MSR:  82009033   CR: 
> 4400  XER: 
> [0.170985][T1] CFAR: c018c288 IRQMASK: 0
> [0.170985][T1] GPR00:  c0007e343b10 
> c173e400 4000
> [0.170985][T1] GPR04:  0800 
> 0800 
> [0.170985][T1] GPR08:  c122c298 
> c0003fffc000 c0007fd05ce8
> [0.170985][T1] GPR12: c0007e0119f8 c193 
> 8ade 
> [0.170985][T1] GPR16: c0007e3c0640 0917 
> c0007e3c0658 0008
> [0.170985][T1] GPR20: c15d0bb8 8ade 
> c0f57400 c1817c28
> [0.170985][T1] GPR24: c176dc80 c0007e3c0890 
> c0007e3cfe00 
> [0.170985][T1] GPR28: c1772310 c0007e011900 
> c0007e3c0800 0001
> [0.172750][T1] NIP [c018c128] build_sched_domains+0x808/0x14b0
> [0.172900][T1] LR [c018c0cc] build_sched_domains+0x7ac/0x14b0
> [0.173186][T1] Call Trace:
> [0.173484][T1] [c0007e343b10] [c018bfe8] 
> build_sched_domains+0x6c8/0x14b0 (unreliable)
> [0.173821][T1] [c0007e343c50] [c018dcdc] 
> sched_init_domains+0xec/0x130
> [0.174037][T1] [c0007e343ca0] [c10d59d8] 
> sched_init_smp+0x50/0xc4
> [0.174207][T1] [c0007e343cd0] [c10b45c4] 
> kernel_init_freeable+0x1b4/0x378
> [0.174378][T1] [c0007e343db0] [c00129fc] 
> kernel_init+0x24/0x158
> [0.174740][T1] [c0007e343e20] [c000d9d0] 
> ret_from_kernel_thread+0x5c/0x6c
> [0.175050][T1] Instruction dump:
> [0.175626][T1] 554905ee 71480040 7d2907b4 4182016c 2c29 3920006e 
> 913e002c 41820034
> [0.175841][T1] 7c6307b4 e9300020 78631f24 7d58182a <7d2a482a> 
> f93e0080 7d404828 314a0001
> [0.178340][T1] ---[ end trace 6876b88dd1d4b3bb ]---
> [0.178512][T1]
> [1.180458][T1] Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x000b
> 
> That's qemu:
> 
> qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
>   -kernel build~/vmlinux \
>   -m 2G,slots=2,maxmem=4G \
>   -object memory-backend-ram,size=1G,id=m0 \
>   -object memory-backend-ram,size=1G,id=m1 \
>   -numa node,nodeid=0,memdev=m0 \
>   -numa node,nodeid=1,memdev=m1 \
>   -smp 2,sockets=2,maxcpus=2  \

PowerKVM guest vCPUs does not yet have L2 and L3 cache elements
I had this bug raised some time ago, probably related?
https://bugs.launchpad.net/qemu/+bug/1774605


Regards,
-Satheesh.
> 
> 
> On mambo I get:
> 
> [0.005069][T1] smp: Bringing up secondary CPUs ...
> [0.011656][T1] smp: Brought up 2 nodes, 8 CPUs
> [0.011682][T1] numa: Node 0 CPUs: 0-3
> [0.011709][T1] numa: Node 1 CPUs: 4-7
> [0.012015][T1] BUG: arch topology borken
> [0.012040][T1]  the SMT domain not a subset of the CACHE domain
> [0.012107][T1] BUG: Unable to handle kernel data access on read at 
> 0x8001012e7398
> [0.012142][T1] Faulting instruction address: 0xc01aa4f0
> [0.012174][T1] Oops: Kernel access of bad area, sig: 11 [#1]
> [0.012206][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> [0.012236][T1] Modules linked in:
> [0.012264][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.9.0-rc2-00095-g7430ad5aa700 #1
> [0.012304][T1] NIP:  c01aa4f0 LR: c01aa498 CTR: 
> 
> [0.012341][T1] REGS: c000ef583880 TRAP: 0380   Not tainted  
> 

Re: [PATCH 0/7] Optimization to improve cpu online/offline on Powerpc

2020-07-29 Thread Satheesh Rajendran
On Mon, Jul 27, 2020 at 01:25:25PM +0530, Srikar Dronamraju wrote:
> Anton reported that his 4096 cpu (1024 cores in a socket) was taking too
> long to boot. He also analyzed that most of the time was being spent on
> updating cpu_core_mask.
> 
> Here are some optimizations and fixes to make ppc64_cpu --smt=8/ppc64_cpu
> --smt=1 run faster and hence boot the kernel also faster.
> 
> Its based on top of my v4 coregroup support patchset.
> http://lore.kernel.org/lkml/20200727053230.19753-1-sri...@linux.vnet.ibm.com/t/#u
> 
> The first two patches should solve Anton's immediate problem.
> On the unofficial patches, Anton reported that the boot time came from 30
> mins to 6 seconds. (Basically a high core count in a single socket
> configuration). Satheesh also reported similar numbers.
> 
> The rest are simple cleanups/optimizations.
> 
> Since cpu_core_mask is an exported symbol for a long duration, lets retain
> as a snapshot of cpumask_of_node.

boot tested on P9 KVM guest.

without this series:
# dmesg|grep smp
[0.066624] smp: Bringing up secondary CPUs ...
[  347.521264] smp: Brought up 1 node, 2048 CPUs

with this series:
# dmesg|grep smp
[0.067744] smp: Bringing up secondary CPUs ...
[    5.416910] smp: Brought up 1 node, 2048 CPUs

Tested-by: Satheesh Rajendran 

Regards,
-Satheesh
> 
> Architecture:ppc64le
> Byte Order:  Little Endian
> CPU(s):  160
> On-line CPU(s) list: 0-159
> Thread(s) per core:  4
> Core(s) per socket:  20
> Socket(s):   2
> NUMA node(s):2
> Model:   2.2 (pvr 004e 1202)
> Model name:  POWER9, altivec supported
> CPU max MHz: 3800.
> CPU min MHz: 2166.
> L1d cache:   32K
> L1i cache:   32K
> L2 cache:512K
> L3 cache:10240K
> NUMA node0 CPU(s):   0-79
> NUMA node8 CPU(s):   80-159
> 
> without patch (powerpc/next)
> [0.099347] smp: Bringing up secondary CPUs ...
> [0.832513] smp: Brought up 2 nodes, 160 CPUs
> 
> with powerpc/next + coregroup support patchset
> [0.099241] smp: Bringing up secondary CPUs ...
> [0.835627] smp: Brought up 2 nodes, 160 CPUs
> 
> with powerpc/next + coregroup + this patchset
> [0.097232] smp: Bringing up secondary CPUs ...
> [0.528457] smp: Brought up 2 nodes, 160 CPUs
> 
> x ppc64_cpu --smt=1
> + ppc64_cpu --smt=4
> 
> without patch
> N   Min   MaxMedian   AvgStddev
> x 100 11.82 17.06 14.01 14.05 1.2665247
> + 100 12.25 16.59 13.86   14.1143  1.164293
> 
> with patch
> N   Min   MaxMedian   AvgStddev
> x 100 12.68 16.15 14.2414.2380.75489246
> + 100 12.93 15.85 14.35   14.28970.60041813
> 
> Cc: linuxppc-dev 
> Cc: LKML 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Cc: Anton Blanchard 
> Cc: Oliver O'Halloran 
> Cc: Nathan Lynch 
> Cc: Michael Neuling 
> Cc: Gautham R Shenoy 
> Cc: Satheesh Rajendran 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Valentin Schneider 
> 
> Srikar Dronamraju (7):
>   powerpc/topology: Update topology_core_cpumask
>   powerpc/smp: Stop updating cpu_core_mask
>   powerpc/smp: Remove get_physical_package_id
>   powerpc/smp: Optimize remove_cpu_from_masks
>   powerpc/smp: Limit cpus traversed to within a node.
>   powerpc/smp: Stop passing mask to update_mask_by_l2
>   powerpc/smp: Depend on cpu_l1_cache_map when adding cpus
> 
>  arch/powerpc/include/asm/smp.h  |  5 --
>  arch/powerpc/include/asm/topology.h |  7 +--
>  arch/powerpc/kernel/smp.c   | 79 +
>  3 files changed, 24 insertions(+), 67 deletions(-)
> 
> -- 
> 2.17.1
> 


[PATCH V2] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size

2020-06-19 Thread Satheesh Rajendran
Early secure guest boot hits the below crash while booting with
vcpus numbers aligned with page boundary for PAGE size of 64k
and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
for shared_lppaca_total_size equal to shared_lppaca_size,

 [0.00] Partition configured for 64 cpus.
 [0.00] CPU maps initialized for 1 thread per core
 [0.00] [ cut here ]
 [0.00] kernel BUG at arch/powerpc/kernel/paca.c:89!
 [0.00] Oops: Exception in kernel mode, sig: 5 [#1]
 [0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries

which is not necessary, let's remove it.

Fixes: bd104e6db6f0 ("powerpc/pseries/svm: Use shared memory for LPPACA 
structures")
Cc: linux-kernel@vger.kernel.org
Cc: Michael Ellerman 
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Reviewed-by: Laurent Dufour 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Satheesh Rajendran 
---

V2:
Added Reviewed by Thiago and Laurent.
Added Fixes tag as per Thiago suggest.

V1: 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200609105731.14032-1-sathn...@linux.vnet.ibm.com/
 
---
 arch/powerpc/kernel/paca.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 2168372b792d..74da65aacbc9 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -87,7 +87,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 * This is very early in boot, so no harm done if the kernel crashes at
 * this point.
 */
-   BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+   BUG_ON(shared_lppaca_size > shared_lppaca_total_size);
 
return ptr;
 }
-- 
2.26.2



[PATCH V2] powerpc/pseries/svm: Drop unused align argument in alloc_shared_lppaca() function

2020-06-12 Thread Satheesh Rajendran
Argument "align" in alloc_shared_lppaca() was unused inside the
function. Let's drop it and update code comment for page alignment.

Cc: linux-kernel@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Cc: Michael Ellerman 
Reviewed-by: Thiago Jung Bauermann 
Signed-off-by: Satheesh Rajendran 
---

V2:
Added reviewed by Thiago.
Dropped align argument as per Michael suggest.
Modified commit msg.

V1: 
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200609113909.17236-1-sathn...@linux.vnet.ibm.com/
---
 arch/powerpc/kernel/paca.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 8d96169c597e..a174d64d9b4d 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -57,8 +57,8 @@ static void *__init alloc_paca_data(unsigned long size, 
unsigned long align,
 
 #define LPPACA_SIZE 0x400
 
-static void *__init alloc_shared_lppaca(unsigned long size, unsigned long 
align,
-   unsigned long limit, int cpu)
+static void *__init alloc_shared_lppaca(unsigned long size, unsigned long 
limit,
+   int cpu)
 {
size_t shared_lppaca_total_size = PAGE_ALIGN(nr_cpu_ids * LPPACA_SIZE);
static unsigned long shared_lppaca_size;
@@ -68,6 +68,12 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
if (!shared_lppaca) {
memblock_set_bottom_up(true);
 
+   /* See Documentation/powerpc/ultravisor.rst for mode details
+*
+* UV/HV data share is in PAGE granularity, In order to
+* minimize the number of pages shared and maximize the
+* use of a page, let's use page align.
+*/
shared_lppaca =
memblock_alloc_try_nid(shared_lppaca_total_size,
   PAGE_SIZE, MEMBLOCK_LOW_LIMIT,
@@ -122,7 +128,7 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
return NULL;
 
if (is_secure_guest())
-   lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
+   lp = alloc_shared_lppaca(LPPACA_SIZE, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
 
-- 
2.26.2



[PATCH] powerpc/pseries/svm: Fixup align argument in alloc_shared_lppaca() function

2020-06-09 Thread Satheesh Rajendran
Argument "align" in alloc_shared_lppaca() function was unused inside the
function. Let's fix it and update code comment.

Cc: linux-kernel@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Signed-off-by: Satheesh Rajendran 
---
 arch/powerpc/kernel/paca.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 8d96169c597e..9088e107fb43 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -70,7 +70,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 
shared_lppaca =
memblock_alloc_try_nid(shared_lppaca_total_size,
-  PAGE_SIZE, MEMBLOCK_LOW_LIMIT,
+  align, MEMBLOCK_LOW_LIMIT,
   limit, NUMA_NO_NODE);
if (!shared_lppaca)
panic("cannot allocate shared data");
@@ -122,7 +122,14 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
return NULL;
 
if (is_secure_guest())
-   lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
+   /*
+* See Documentation/powerpc/ultravisor.rst for mode details
+*
+* UV/HV data share is in PAGE granularity, In order to minimize
+* the number of pages shared and maximize the use of a page,
+* let's use page align.
+*/
+   lp = alloc_shared_lppaca(LPPACA_SIZE, PAGE_SIZE, limit, cpu);
else
lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
 
-- 
2.26.2



[PATCH] powerpc/pseries/svm: Remove unwanted check for shared_lppaca_size

2020-06-09 Thread Satheesh Rajendran
Early secure guest boot hits the below crash while booting with
vcpus numbers aligned with page boundary for PAGE size of 64k
and LPPACA size of 1k i.e 64, 128 etc, due to the BUG_ON assert
for shared_lppaca_total_size equal to shared_lppaca_size,

 [0.00] Partition configured for 64 cpus.
 [0.00] CPU maps initialized for 1 thread per core
 [0.00] [ cut here ]
 [0.00] kernel BUG at arch/powerpc/kernel/paca.c:89!
 [0.00] Oops: Exception in kernel mode, sig: 5 [#1]
 [0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries

which is not necessary, let's remove it.

Cc: linux-kernel@vger.kernel.org
Cc: Thiago Jung Bauermann 
Cc: Ram Pai 
Cc: Sukadev Bhattiprolu 
Cc: Laurent Dufour 
Signed-off-by: Satheesh Rajendran 
---
 arch/powerpc/kernel/paca.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 949eceb25..10b7c54a7 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -86,7 +86,7 @@ static void *__init alloc_shared_lppaca(unsigned long size, 
unsigned long align,
 * This is very early in boot, so no harm done if the kernel crashes at
 * this point.
 */
-   BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+   BUG_ON(shared_lppaca_size > shared_lppaca_total_size);
 
return ptr;
 }
-- 
2.26.2



Re: [mainline][Oops][bisected 2ba3e6 ] 5.7.0 boot fails with kernel panic on powerpc

2020-06-03 Thread Satheesh Rajendran
On Wed, Jun 03, 2020 at 03:32:57PM +0200, Joerg Roedel wrote:
> On Wed, Jun 03, 2020 at 04:20:57PM +0530, Abdul Haleem wrote:
> > @Joerg, Could you please have a look?
> 
> Can you please try the attached patch?

Hi Joerg,

I did hit the similar boot failue on a Power9 baremetal box(mentioned in Note) 
and 
your below patch helped solving that for my environment and 
am able to boot the system fine.

...
Fedora 31 (Thirty One)
Kernel 5.7.0-gd6f9469a0-dirty on an ppc64le (hvc0)

 login:


Tested-by: Satheesh Rajendran 

Note: for the record, here is the boot failure call trace.

[0.023555] mempolicy: Enabling automatic NUMA balancing. Configure with 
numa_balancing= or the kernel.numa_balancing sysctl
[0.023582] pid_max: default: 163840 minimum: 1280
[0.035014] BUG: Unable to handle kernel data access on read at 
0xc060
[0.035058] Faulting instruction address: 0xc0382304
[0.035074] Oops: Kernel access of bad area, sig: 11 [#1]
[0.035097] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[0.035113] Modules linked in:
[0.035136] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 5.7.0-gd6f9469a0 #1
[0.035161] NIP:  c0382304 LR: c038407c CTR: 
[0.035197] REGS: c167f930 TRAP: 0300   Not tainted  
(5.7.0-gd6f9469a0)
[0.035241] MSR:  92009033   CR: 
42022422  XER: 
[0.035294] CFAR: c03822fc DAR: c060 DSISR: 4000 
IRQMASK: 0 
[0.035294] GPR00: c038407c c167fbc0 c168090[  
150.252645597,5] OPAL: Reboot request...
[  150.252928266,5] RESET: Initiating fast reboot 1...
0 c008 
[0.035294] GPR04:  01ff c008001f 
0060 
[0.035294] GPR08: 6000 0005 c060 
c0080020 
[0.035294] GPR12: 22022422 c187 c000 
c008 
[0.035294] GPR16: c00807ff c0080020  
c060 
[0.035294] GPR20: c0080800 c0080800 c00807ff 
c00807ff 
[0.035294] GPR24: c163f7c8 c172d0c0 0001 
0001 
[0.035294] GPR28: c1708000 c172d0c8  
c0080800 
[0.035622] NIP [c0382304] map_kernel_range_noflush+0x274/0x510
[0.035657] LR [c038407c] __vmalloc_node_range+0x2ec/0x3a0
[0.035690] Call Trace:
[0.035709] [c167fbc0] [c038d848] 
__alloc_pages_nodemask+0x158/0x3f0 (unreliable)
[0.035750] [c167fc90] [c038407c] 
__vmalloc_node_range+0x2ec/0x3a0
[0.035787] [c167fd40] [c0384268] __vmalloc+0x58/0x70
[0.035823] [c167fdb0] [c1056db8] 
alloc_large_system_hash+0x204/0x304
[0.035870] [c167fe60] [c105c1f0] vfs_caches_init+0xd8/0x138
[0.035916] [c167fee0] [c10242a0] start_kernel+0x644/0x6ec
[0.035960] [c167ff90] [c000ca9c] 
start_here_common+0x1c/0x400
[0.036004] Instruction dump:
[0.036016] 3af4 6000 6000 38c90010 7f663036 7d667a14 7cc600d0 
7d713038 
[0.036038] 38d1 7c373040 41810008 7e91a378  2c25 418201b4 
7f464830 
[0.036083] ---[ end trace c7e72029dfacc217 ]---
[0.036114] 
[1.036223] Kernel panic - not syncing: Attempted to kill the idle task!
[1.036858] Rebooting in 10 seconds..


Regards,
-Satheesh.

> 
> diff --git a/include/asm-generic/5level-fixup.h 
> b/include/asm-generic/5level-fixup.h
> index 58046ddc08d0..afbab31fbd7e 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -17,6 +17,11 @@
>   ((unlikely(pgd_none(*(p4d))) && __pud_alloc(mm, p4d, address)) ? \
>   NULL : pud_offset(p4d, address))
> 
> +#define pud_alloc_track(mm, p4d, address, mask)  
> \
> + ((unlikely(pgd_none(*(p4d))) && 
> \
> +   (__pud_alloc(mm, p4d, address) || 
> ({*(mask)|=PGTBL_P4D_MODIFIED;0;})))?   \
> +   NULL : pud_offset(p4d, address))
> +
>  #define p4d_alloc(mm, pgd, address)  (pgd)
>  #define p4d_alloc_track(mm, pgd, address, mask)  (pgd)
>  #define p4d_offset(pgd, start)   (pgd)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7e07f4f490cb..d46bf03b804f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2088,35 +2088,35 @@ static inline pud_t *pud_alloc(struct mm_struct *mm, 
> p4d_t *p4d,
>   NULL : pud_offset(p4d, address);
>  }
> 
> -static inline p4d_t *p4d_alloc_track(struct mm_struct *mm, pgd_t *pgd,
> +static inline pud_t *pud_alloc_track(struct mm_struct *mm, p4d_t *p4d,
>   

Re: [PATCH] recordmcount: Fix spurious mcount entries on powerpc

2019-06-27 Thread Satheesh Rajendran
On Thu, Jun 27, 2019 at 12:08:01AM +0530, Naveen N. Rao wrote:
> The recent change enabling HAVE_C_RECORDMCOUNT on powerpc started
> showing the following issue:
> 
>   # modprobe kprobe_example
>ftrace-powerpc: Not expected bl: opcode is 3c4c0001
>WARNING: CPU: 0 PID: 227 at kernel/trace/ftrace.c:2001 
> ftrace_bug+0x90/0x318
>Modules linked in:
>CPU: 0 PID: 227 Comm: modprobe Not tainted 5.2.0-rc6-00678-g1c329100b942 #2
>NIP:  c0264318 LR: c025d694 CTR: c0f5cd30
>REGS: c1f2b7b0 TRAP: 0700   Not tainted  
> (5.2.0-rc6-00678-g1c329100b942)
>MSR:  90010282b033   CR: 
> 28228222  XER: 
>CFAR: c02642fc IRQMASK: 0
>
>NIP [c0264318] ftrace_bug+0x90/0x318
>LR [c025d694] ftrace_process_locs+0x4f4/0x5e0
>Call Trace:
>[c1f2ba40] [0004] 0x4 (unreliable)
>[c1f2bad0] [c025d694] ftrace_process_locs+0x4f4/0x5e0
>[c1f2bb90] [c020ff10] load_module+0x25b0/0x30c0
>[c1f2bd00] [c0210cb0] sys_finit_module+0xc0/0x130
>[c1f2be20] [c000bda4] system_call+0x5c/0x70
>Instruction dump:
>419e0018 2f83 419e00bc 2f83ffea 409e00cc 481c 0fe0 3c62ff96
>3901 3940 386386d0 48c4 <0fe0> 3ce20003 3901 3c62ff96
>---[ end trace 4c438d5cebf78381 ]---
>ftrace failed to modify
>[] 0xc008012a0008
> actual:   01:00:4c:3c
>Initializing ftrace call sites
>ftrace record flags: 200
> (0)
> expected tramp: c006af4c
> 
> Looking at the relocation records in __mcount_loc showed a few spurious
> entries:
>   RELOCATION RECORDS FOR [__mcount_loc]:
>   OFFSET   TYPE  VALUE
>    R_PPC64_ADDR64.text.unlikely+0x0008
>   0008 R_PPC64_ADDR64.text.unlikely+0x0014
>   0010 R_PPC64_ADDR64.text.unlikely+0x0060
>   0018 R_PPC64_ADDR64.text.unlikely+0x00b4
>   0020 R_PPC64_ADDR64.init.text+0x0008
>   0028 R_PPC64_ADDR64.init.text+0x0014
> 
> The first entry in each section is incorrect. Looking at the relocation
> records, the spurious entries correspond to the R_PPC64_ENTRY records:
>   RELOCATION RECORDS FOR [.text.unlikely]:
>   OFFSET   TYPE  VALUE
>    R_PPC64_REL64 .TOC.-0x0008
>   0008 R_PPC64_ENTRY *ABS*
>   0014 R_PPC64_REL24 _mcount
>   
> 
> The problem is that we are not validating the return value from
> get_mcountsym() in sift_rel_mcount(). With this entry, mcountsym is 0,
> but Elf_r_sym(relp) also ends up being 0. Fix this by ensuring mcountsym
> is valid before processing the entry.
> 
> Fixes: c7d64b560ce80 ("powerpc/ftrace: Enable C Version of recordmcount")
> Signed-off-by: Naveen N. Rao 
> ---
>  scripts/recordmcount.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
Hi,

linuxppc/merge branch latest commit 850f6274c5b5 was failing to boot IBM
Power8 Box, and the failure got resolved with this patch.
https://github.com/linuxppc/issues/issues/254

# git log -2 --oneline
0e0f55b31ea8 (HEAD -> issue_254) recordmcount: Fix spurious mcount entries on 
powerpc
850f6274c5b5 (origin/merge, merge) Automatic merge of branches 'master', 'next' 
and 'fixes' into merge
# uname -r
5.2.0-rc6-00123-g0e0f55b31ea8
 
Tested-by: Satheesh Rajendran 

Regards,
-Satheesh
> 
> diff --git a/scripts/recordmcount.h b/scripts/recordmcount.h
> index 13c5e6c8829c..47fca2c69a73 100644
> --- a/scripts/recordmcount.h
> +++ b/scripts/recordmcount.h
> @@ -325,7 +325,8 @@ static uint_t *sift_rel_mcount(uint_t *mlocp,
>   if (!mcountsym)
>   mcountsym = get_mcountsym(sym0, relp, str0);
> 
> - if (mcountsym == Elf_r_sym(relp) && !is_fake_mcount(relp)) {
> + if (mcountsym && mcountsym == Elf_r_sym(relp) &&
> + !is_fake_mcount(relp)) {
>   uint_t const addend =
>   _w(_w(relp->r_offset) - recval + mcount_adjust);
>   mrelp->r_offset = _w(offbase
> -- 
> 2.22.0
> 



Re: [PATCH v5 0/5] powerpc: system call table generation support

2018-12-17 Thread Satheesh Rajendran
Hi Firoz,

On Thu, Dec 13, 2018 at 02:32:45PM +0530, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 
> 
> The system call tables are in different format in 
> all architecture. It will be difficult to manually
> add, modify or delete the system calls in the resp-
> ective files manually. To make it easy by keeping a 
> script and which'll generate uapi header file and 
> syscall table file.
> 
> syscall.tbl contains the list of available system 
> calls along with system call number and correspond-
> ing entry point. Add a new system call in this arch-
> itecture will be possible by adding new entry in 
> the syscall.tbl file.
> 
> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
>   - Compat entry name, if required.
>   - spu entry name, if required.
> 
> ARM, s390 and x86 architecuture does exist the sim-
> ilar support. I leverage their implementation to 
> come up with a generic solution.
> 
> I have done the same support for work for alpha, 
> ia64, m68k, microblaze, mips, parisc, sh, sparc, 
> and xtensa. Below mentioned git repository contains
> more details about the workflow.
> 
> https://github.com/frzkhn/system_call_table_generator/
> 
> Finally, this is the ground work to solve the Y2038
> issue. We need to add two dozen of system calls to 
> solve Y2038 issue. So this patch series will help to
> add new system calls easily by adding new entry in the
> syscall.tbl.
> 
> Changes since v4:
>  - DOTSYM macro removed for ppc32, which was causing
>the compilation error.
> 
> Changes since v3:
>  - split compat syscall table out from native table.
>  - modified the script to add new line in the generated
>file.
> 
> Changes since v2:
>  - modified/optimized the syscall.tbl to avoid duplicate
>for the spu entries.
>  - updated the syscalltbl.sh to meet the above point.
> 
> Changes since v1:
>  - optimized/updated the syscall table generation 
>scripts.
>  - fixed all mixed indentation issues in syscall.tbl.
>  - added "comments" in syscall_*.tbl.
>  - changed from generic-y to generated-y in Kbuild.
> 
> Firoz Khan (5):
>   powerpc: add __NR_syscalls along with NR_syscalls
>   powerpc: move macro definition from asm/systbl.h
>   powerpc: add system call table generation support
>   powerpc: split compat syscall table out from native table
>   powerpc: generate uapi header and system call table files

Tried to apply on linus "master" and 
linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git) 
"merge" branch,
both failed to apply series.

# git am mbox
Applying: powerpc: add __NR_syscalls along with NR_syscalls
Applying: powerpc: move macro definition from asm/systbl.h
Applying: powerpc: add system call table generation support
Applying: powerpc: split compat syscall table out from native table
Applying: powerpc: generate uapi header and system call table files
error: patch failed: arch/powerpc/include/uapi/asm/Kbuild:1
error: arch/powerpc/include/uapi/asm/Kbuild: patch does not apply
Patch failed at 0005 powerpc: generate uapi header and system call table files
Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Then, tried with 
linuxppc-dev(https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git) 
"next" branch,
patch got applied, compiled with ppc64le_defconfig and booted on IBM Power8 box.

# uname -r
4.20.0-rc2-gdd2690d2c

Looks like patch series needs a rebase against the latest kernel versions.


Thanks,
-Satheesh.

> 
>  arch/powerpc/Makefile   |   3 +
>  arch/powerpc/include/asm/Kbuild |   4 +
>  arch/powerpc/include/asm/syscall.h  |   3 +-
>  arch/powerpc/include/asm/systbl.h   | 396 --
>  arch/powerpc/include/asm/unistd.h   |   3 +-
>  arch/powerpc/include/uapi/asm/Kbuild|   2 +
>  arch/powerpc/include/uapi/asm/unistd.h  | 389 +
>  arch/powerpc/kernel/Makefile|  10 -
>  arch/powerpc/kernel/entry_64.S  |   7 +-
>  arch/powerpc/kernel/syscalls/Makefile   |  63 
>  arch/powerpc/kernel/syscalls/syscall.tbl| 427 
> 
>  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
>  arch/powerpc/kernel/systbl.S|  40 ++-
>  arch/powerpc/kernel/systbl_chk.c|  60 
>  arch/powerpc/kernel/vdso.c  | 

Re: [PATCH v4 0/5] powerpc: system call table generation support

2018-12-06 Thread Satheesh Rajendran
On Fri, Dec 07, 2018 at 11:41:35AM +0530, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 
> 
> The system call tables are in different format in 
> all architecture. It will be difficult to manually
> add, modify or delete the system calls in the resp-
> ective files manually. To make it easy by keeping a 
> script and which'll generate uapi header file and 
> syscall table file.
> 
> syscall.tbl contains the list of available system 
> calls along with system call number and correspond-
> ing entry point. Add a new system call in this arch-
> itecture will be possible by adding new entry in 
> the syscall.tbl file.
> 
> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
>   - Compat entry name, if required.
>   - spu entry name, if required.
> 
> ARM, s390 and x86 architecuture does exist the sim-
> ilar support. I leverage their implementation to 
> come up with a generic solution.
> 
> I have done the same support for work for alpha, 
> ia64, m68k, microblaze, mips, parisc, sh, sparc, 
> and xtensa. Below mentioned git repository contains
> more details about the workflow.
> 
> https://github.com/frzkhn/system_call_table_generator/
> 
> Finally, this is the ground work to solve the Y2038
> issue. We need to add two dozen of system calls to 
> solve Y2038 issue. So this patch series will help to
> add new system calls easily by adding new entry in the
> syscall.tbl.
> 
> Changes since v3:
>  - split compat syscall table out from native table.
>  - modified the script to add new line in the generated
>file.
Hi Firoz,

This version(v4) booted fine in IBM Power8 box.
Compiled with 'ppc64le_defconfig' aginst 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge=a26b21082959cee3075b3edb7ef23071c7e0b09a

Reference failure v3 version: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-November/182110.html

Regards,
-Satheesh.
> 
> Changes since v2:
>  - modified/optimized the syscall.tbl to avoid duplicate
>for the spu entries.
>  - updated the syscalltbl.sh to meet the above point.
> 
> Changes since v1:
>  - optimized/updated the syscall table generation 
>scripts.
>  - fixed all mixed indentation issues in syscall.tbl.
>  - added "comments" in syscall_*.tbl.
>  - changed from generic-y to generated-y in Kbuild.
> 
> Firoz Khan (5):
>   powerpc: add __NR_syscalls along with NR_syscalls
>   powerpc: move macro definition from asm/systbl.h
>   powerpc: add system call table generation support
>   powerpc: split compat syscall table out from native table
>   powerpc: generate uapi header and system call table files
> 
>  arch/powerpc/Makefile   |   3 +
>  arch/powerpc/include/asm/Kbuild |   4 +
>  arch/powerpc/include/asm/syscall.h  |   3 +-
>  arch/powerpc/include/asm/systbl.h   | 396 --
>  arch/powerpc/include/asm/unistd.h   |   3 +-
>  arch/powerpc/include/uapi/asm/Kbuild|   2 +
>  arch/powerpc/include/uapi/asm/unistd.h  | 389 +
>  arch/powerpc/kernel/Makefile|  10 -
>  arch/powerpc/kernel/entry_64.S  |   7 +-
>  arch/powerpc/kernel/syscalls/Makefile   |  63 
>  arch/powerpc/kernel/syscalls/syscall.tbl| 427 
> 
>  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
>  arch/powerpc/kernel/systbl.S|  40 ++-
>  arch/powerpc/kernel/systbl_chk.c|  60 
>  arch/powerpc/kernel/vdso.c  |   7 +-
>  arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
>  17 files changed, 606 insertions(+), 898 deletions(-)
>  delete mode 100644 arch/powerpc/include/asm/systbl.h
>  create mode 100644 arch/powerpc/kernel/syscalls/Makefile
>  create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
>  create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
>  create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/powerpc/kernel/systbl_chk.c
> 
> -- 
> 1.9.1
> 



Re: System not booting since dm changes? (was Linux 4.20-rc1)

2018-11-05 Thread Satheesh Rajendran
On Mon, Nov 05, 2018 at 08:51:57AM -0500, Mike Snitzer wrote:
> On Mon, Nov 05 2018 at  5:25am -0500,
> Michael Ellerman  wrote:
> 
> > Linus Torvalds  writes:
> > ...
> > > Mike Snitzer (1):
> > > device mapper updates
> > 
> > Hi Mike,
> > 
> > Replying here because I can't find the device-mapper pull or the patch
> > in question on LKML. I guess I should be subscribed to dm-devel.
> > 
> > We have a box that doesn't boot any more, bisect points at one of:
> > 
> >   cef6f55a9fb4 Mike Snitzer   dm table: require that request-based DM 
> > be layered on blk-mq devices 
> >   953923c09fe8 Mike Snitzer   dm: rename DM_TYPE_MQ_REQUEST_BASED to 
> > DM_TYPE_REQUEST_BASED 
> >   6a23e05c2fe3 Jens Axboe dm: remove legacy request-based IO path 
> > 
> > 
> > It's a Power8 system running Rawhide, it does have multipath, but I'm
> > told it was setup by the Fedora installer, ie. nothing fancy.
> > 
> > The symptom is the system can't find its root filesystem and drops into
> > the initramfs shell. The dmesg includes a bunch of errors like below:
> > 
> >   [   43.263460] localhost multipathd[1344]: sdb: fail to get serial
> >   [   43.268762] localhost multipathd[1344]: mpatha: failed in domap for 
> > addition of new path sdb
> >   [   43.268762] localhost multipathd[1344]: uevent trigger error
> >   [   43.282065] localhost kernel: device-mapper: table: table load 
> > rejected: not all devices are blk-mq request-stackable
> ...
> >
> > Any ideas what's going wrong here?
> 
> "table load rejected: not all devices are blk-mq request-stackable"
> speaks to the fact that you aren't using blk-mq for scsi (aka scsi-mq).
> 
> You need to use scsi_mod.use_blk_mq=Y on the kernel commandline (or set
> CONFIG_SCSI_MQ_DEFAULT in your kernel config)

Thanks Mike!, above solution worked and the system booted fine now:-)

# uname -r
4.20.0-rc1+
# cat /proc/cmdline 
root=/dev/mapper/fedora_ltc--test--ci2-root ro 
rd.lvm.lv=fedora_ltc-test-ci2/root rd.lvm.lv=fedora_ltc-test-ci2/swap 
scsi_mod.use_blk_mq=Y

CONFIG_SCSI_MQ_DEFAULT kernel was not set in my kernel config, will set in 
future runs.

Thanks Michael!

Regards,
-Satheesh.

> 
> Mike
> 



Re: System not booting since dm changes? (was Linux 4.20-rc1)

2018-11-05 Thread Satheesh Rajendran
On Mon, Nov 05, 2018 at 08:51:57AM -0500, Mike Snitzer wrote:
> On Mon, Nov 05 2018 at  5:25am -0500,
> Michael Ellerman  wrote:
> 
> > Linus Torvalds  writes:
> > ...
> > > Mike Snitzer (1):
> > > device mapper updates
> > 
> > Hi Mike,
> > 
> > Replying here because I can't find the device-mapper pull or the patch
> > in question on LKML. I guess I should be subscribed to dm-devel.
> > 
> > We have a box that doesn't boot any more, bisect points at one of:
> > 
> >   cef6f55a9fb4 Mike Snitzer   dm table: require that request-based DM 
> > be layered on blk-mq devices 
> >   953923c09fe8 Mike Snitzer   dm: rename DM_TYPE_MQ_REQUEST_BASED to 
> > DM_TYPE_REQUEST_BASED 
> >   6a23e05c2fe3 Jens Axboe dm: remove legacy request-based IO path 
> > 
> > 
> > It's a Power8 system running Rawhide, it does have multipath, but I'm
> > told it was setup by the Fedora installer, ie. nothing fancy.
> > 
> > The symptom is the system can't find its root filesystem and drops into
> > the initramfs shell. The dmesg includes a bunch of errors like below:
> > 
> >   [   43.263460] localhost multipathd[1344]: sdb: fail to get serial
> >   [   43.268762] localhost multipathd[1344]: mpatha: failed in domap for 
> > addition of new path sdb
> >   [   43.268762] localhost multipathd[1344]: uevent trigger error
> >   [   43.282065] localhost kernel: device-mapper: table: table load 
> > rejected: not all devices are blk-mq request-stackable
> ...
> >
> > Any ideas what's going wrong here?
> 
> "table load rejected: not all devices are blk-mq request-stackable"
> speaks to the fact that you aren't using blk-mq for scsi (aka scsi-mq).
> 
> You need to use scsi_mod.use_blk_mq=Y on the kernel commandline (or set
> CONFIG_SCSI_MQ_DEFAULT in your kernel config)

Thanks Mike!, above solution worked and the system booted fine now:-)

# uname -r
4.20.0-rc1+
# cat /proc/cmdline 
root=/dev/mapper/fedora_ltc--test--ci2-root ro 
rd.lvm.lv=fedora_ltc-test-ci2/root rd.lvm.lv=fedora_ltc-test-ci2/swap 
scsi_mod.use_blk_mq=Y

CONFIG_SCSI_MQ_DEFAULT kernel was not set in my kernel config, will set in 
future runs.

Thanks Michael!

Regards,
-Satheesh.

> 
> Mike
> 



[tip:perf/core] perf bench numa: Fixup discontiguous/sparse numa nodes

2017-11-28 Thread tip-bot for Satheesh Rajendran
Commit-ID:  321a7c35c90cc834851ceda18a8ee18f1d032b92
Gitweb: https://git.kernel.org/tip/321a7c35c90cc834851ceda18a8ee18f1d032b92
Author: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
AuthorDate: Wed, 22 Nov 2017 22:13:53 +0530
Committer:  Arnaldo Carvalho de Melo <a...@redhat.com>
CommitDate: Tue, 28 Nov 2017 14:28:10 -0300

perf bench numa: Fixup discontiguous/sparse numa nodes

Certain systems are designed to have sparse/discontiguous nodes.  On
such systems, 'perf bench numa' hangs, shows wrong number of nodes and
shows values for non-existent nodes. Handle this by only taking nodes
that are exposed by kernel to userspace.

Signed-off-by: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
Reviewed-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n@linux.vnet.ibm.com>
Link: 
http://lkml.kernel.org/r/1edbcd353c009e109e93d78f2f46381930c340fe.1511368645.git.sathn...@linux.vnet.ibm.com
Signed-off-by: Balamuruhan S <bal...@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
---
 tools/perf/bench/numa.c | 56 -
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index d95fdcc..944070e 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -216,6 +216,47 @@ static const char * const numa_usage[] = {
NULL
 };
 
+/*
+ * To get number of numa nodes present.
+ */
+static int nr_numa_nodes(void)
+{
+   int i, nr_nodes = 0;
+
+   for (i = 0; i < g->p.nr_nodes; i++) {
+   if (numa_bitmask_isbitset(numa_nodes_ptr, i))
+   nr_nodes++;
+   }
+
+   return nr_nodes;
+}
+
+/*
+ * To check if given numa node is present.
+ */
+static int is_node_present(int node)
+{
+   return numa_bitmask_isbitset(numa_nodes_ptr, node);
+}
+
+/*
+ * To check given numa node has cpus.
+ */
+static bool node_has_cpus(int node)
+{
+   struct bitmask *cpu = numa_allocate_cpumask();
+   unsigned int i;
+
+   if (cpu && !numa_node_to_cpus(node, cpu)) {
+   for (i = 0; i < cpu->size; i++) {
+   if (numa_bitmask_isbitset(cpu, i))
+   return true;
+   }
+   }
+
+   return false; /* lets fall back to nocpus safely */
+}
+
 static cpu_set_t bind_to_cpu(int target_cpu)
 {
cpu_set_t orig_mask, mask;
@@ -244,12 +285,12 @@ static cpu_set_t bind_to_cpu(int target_cpu)
 
 static cpu_set_t bind_to_node(int target_node)
 {
-   int cpus_per_node = g->p.nr_cpus/g->p.nr_nodes;
+   int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
cpu_set_t orig_mask, mask;
int cpu;
int ret;
 
-   BUG_ON(cpus_per_node*g->p.nr_nodes != g->p.nr_cpus);
+   BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
BUG_ON(!cpus_per_node);
 
ret = sched_getaffinity(0, sizeof(orig_mask), _mask);
@@ -649,7 +690,7 @@ static int parse_setup_node_list(void)
int i;
 
for (i = 0; i < mul; i++) {
-   if (t >= g->p.nr_tasks) {
+   if (t >= g->p.nr_tasks || 
!node_has_cpus(bind_node)) {
printf("\n# NOTE: ignoring bind NODEs 
starting at NODE#%d\n", bind_node);
goto out;
}
@@ -964,6 +1005,8 @@ static void calc_convergence(double runtime_ns_max, double 
*convergence)
sum = 0;
 
for (node = 0; node < g->p.nr_nodes; node++) {
+   if (!is_node_present(node))
+   continue;
nr = nodes[node];
nr_min = min(nr, nr_min);
nr_max = max(nr, nr_max);
@@ -984,8 +1027,11 @@ static void calc_convergence(double runtime_ns_max, 
double *convergence)
process_groups = 0;
 
for (node = 0; node < g->p.nr_nodes; node++) {
-   int processes = count_node_processes(node);
+   int processes;
 
+   if (!is_node_present(node))
+   continue;
+   processes = count_node_processes(node);
nr = nodes[node];
tprintf(" %2d/%-2d", nr, processes);
 
@@ -1291,7 +1337,7 @@ static void print_summary(void)
 
printf("\n ###\n");
printf(" # %d %s will execute (on %d nodes, %d CPUs):\n",
-   g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", 
g->p.nr_nodes, g->p.nr_cpus);
+   g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", 
nr_numa_nodes(), g->p.nr_cpus);
printf(" #  %5dx %5ldMB global  shared mem operations\n",
g->p.nr_loops, g->p.bytes_global/1024/1024);
printf(" #  %5dx %5ldMB process shared mem operations\n",


[tip:perf/core] perf bench numa: Fixup discontiguous/sparse numa nodes

2017-11-28 Thread tip-bot for Satheesh Rajendran
Commit-ID:  321a7c35c90cc834851ceda18a8ee18f1d032b92
Gitweb: https://git.kernel.org/tip/321a7c35c90cc834851ceda18a8ee18f1d032b92
Author: Satheesh Rajendran 
AuthorDate: Wed, 22 Nov 2017 22:13:53 +0530
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 28 Nov 2017 14:28:10 -0300

perf bench numa: Fixup discontiguous/sparse numa nodes

Certain systems are designed to have sparse/discontiguous nodes.  On
such systems, 'perf bench numa' hangs, shows wrong number of nodes and
shows values for non-existent nodes. Handle this by only taking nodes
that are exposed by kernel to userspace.

Signed-off-by: Satheesh Rajendran 
Reviewed-by: Srikar Dronamraju 
Acked-by: Naveen N. Rao 
Link: 
http://lkml.kernel.org/r/1edbcd353c009e109e93d78f2f46381930c340fe.1511368645.git.sathn...@linux.vnet.ibm.com
Signed-off-by: Balamuruhan S 
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/bench/numa.c | 56 -
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
index d95fdcc..944070e 100644
--- a/tools/perf/bench/numa.c
+++ b/tools/perf/bench/numa.c
@@ -216,6 +216,47 @@ static const char * const numa_usage[] = {
NULL
 };
 
+/*
+ * To get number of numa nodes present.
+ */
+static int nr_numa_nodes(void)
+{
+   int i, nr_nodes = 0;
+
+   for (i = 0; i < g->p.nr_nodes; i++) {
+   if (numa_bitmask_isbitset(numa_nodes_ptr, i))
+   nr_nodes++;
+   }
+
+   return nr_nodes;
+}
+
+/*
+ * To check if given numa node is present.
+ */
+static int is_node_present(int node)
+{
+   return numa_bitmask_isbitset(numa_nodes_ptr, node);
+}
+
+/*
+ * To check given numa node has cpus.
+ */
+static bool node_has_cpus(int node)
+{
+   struct bitmask *cpu = numa_allocate_cpumask();
+   unsigned int i;
+
+   if (cpu && !numa_node_to_cpus(node, cpu)) {
+   for (i = 0; i < cpu->size; i++) {
+   if (numa_bitmask_isbitset(cpu, i))
+   return true;
+   }
+   }
+
+   return false; /* lets fall back to nocpus safely */
+}
+
 static cpu_set_t bind_to_cpu(int target_cpu)
 {
cpu_set_t orig_mask, mask;
@@ -244,12 +285,12 @@ static cpu_set_t bind_to_cpu(int target_cpu)
 
 static cpu_set_t bind_to_node(int target_node)
 {
-   int cpus_per_node = g->p.nr_cpus/g->p.nr_nodes;
+   int cpus_per_node = g->p.nr_cpus / nr_numa_nodes();
cpu_set_t orig_mask, mask;
int cpu;
int ret;
 
-   BUG_ON(cpus_per_node*g->p.nr_nodes != g->p.nr_cpus);
+   BUG_ON(cpus_per_node * nr_numa_nodes() != g->p.nr_cpus);
BUG_ON(!cpus_per_node);
 
ret = sched_getaffinity(0, sizeof(orig_mask), _mask);
@@ -649,7 +690,7 @@ static int parse_setup_node_list(void)
int i;
 
for (i = 0; i < mul; i++) {
-   if (t >= g->p.nr_tasks) {
+   if (t >= g->p.nr_tasks || 
!node_has_cpus(bind_node)) {
printf("\n# NOTE: ignoring bind NODEs 
starting at NODE#%d\n", bind_node);
goto out;
}
@@ -964,6 +1005,8 @@ static void calc_convergence(double runtime_ns_max, double 
*convergence)
sum = 0;
 
for (node = 0; node < g->p.nr_nodes; node++) {
+   if (!is_node_present(node))
+   continue;
nr = nodes[node];
nr_min = min(nr, nr_min);
nr_max = max(nr, nr_max);
@@ -984,8 +1027,11 @@ static void calc_convergence(double runtime_ns_max, 
double *convergence)
process_groups = 0;
 
for (node = 0; node < g->p.nr_nodes; node++) {
-   int processes = count_node_processes(node);
+   int processes;
 
+   if (!is_node_present(node))
+   continue;
+   processes = count_node_processes(node);
nr = nodes[node];
tprintf(" %2d/%-2d", nr, processes);
 
@@ -1291,7 +1337,7 @@ static void print_summary(void)
 
printf("\n ###\n");
printf(" # %d %s will execute (on %d nodes, %d CPUs):\n",
-   g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", 
g->p.nr_nodes, g->p.nr_cpus);
+   g->p.nr_tasks, g->p.nr_tasks == 1 ? "task" : "tasks", 
nr_numa_nodes(), g->p.nr_cpus);
printf(" #  %5dx %5ldMB global  shared mem operations\n",
g->p.nr_loops, g->p.bytes_global/1024/1024);
printf(" #  %5dx %5ldMB process shared mem operations\n",


Re: [PATCH v4 1/1] perf/bench/numa: Fixup discontiguous/sparse numa nodes

2017-11-22 Thread Satheesh Rajendran
On Wed, 2017-11-22 at 21:39 +0530, Naveen N. Rao wrote:
> >   sum += nr;
> >   }
> >   BUG_ON(nr_min > nr_max);
> > -
> >   BUG_ON(sum > g->p.nr_tasks);
> 
> Looks like that change to remove a blank line did slip in, but that's
> a 
> small nit. Apart from that, the patch looks good to me.
> Acked-by: Naveen N. Rao 
> 
Hi Naveen,

Thanks, have sent v5 with that addressed :-)

Regards,
-Satheesh
> - Naveen



Re: [PATCH v4 1/1] perf/bench/numa: Fixup discontiguous/sparse numa nodes

2017-11-22 Thread Satheesh Rajendran
On Wed, 2017-11-22 at 21:39 +0530, Naveen N. Rao wrote:
> >   sum += nr;
> >   }
> >   BUG_ON(nr_min > nr_max);
> > -
> >   BUG_ON(sum > g->p.nr_tasks);
> 
> Looks like that change to remove a blank line did slip in, but that's
> a 
> small nit. Apart from that, the patch looks good to me.
> Acked-by: Naveen N. Rao 
> 
Hi Naveen,

Thanks, have sent v5 with that addressed :-)

Regards,
-Satheesh
> - Naveen



Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

2017-11-15 Thread Satheesh Rajendran
Hi Arnaldo,Please find my reply inline.

On Tue, 2017-10-31 at 12:26 -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Oct 31, 2017 at 08:46:58PM +0530, Naveen N. Rao escreveu:
> > 
> > On 2017/08/21 10:17AM, sathn...@linux.vnet.ibm.com wrote:
> > > 
> > > From: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
> > > 
> > > Certain systems are designed to have sparse/discontiguous nodes.
> > > On such systems, perf bench numa hangs, shows wrong number of
> > > nodes
> > > and shows values for non-existent nodes. Handle this by only
> > > taking nodes that are exposed by kernel to userspace.
> > > 
> > > Cc: Arnaldo Carvalho de Melo <a...@kernel.org>
> > > Reviewed-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com>
> > > Signed-off-by: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
> > > Signed-off-by: Balamuruhan S <bal...@linux.vnet.ibm.com>
> > > ---
> > >  tools/perf/bench/numa.c | 17 ++---
> > >  1 file changed, 10 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > > index 2483174..d44 100644
> > > --- a/tools/perf/bench/numa.c
> > > +++ b/tools/perf/bench/numa.c
> > > @@ -287,12 +287,12 @@ static cpu_set_t bind_to_cpu(int
> > > target_cpu)
> > > 
> > >  static cpu_set_t bind_to_node(int target_node)
> > >  {
> > > - int cpus_per_node = g->p.nr_cpus/g->p.nr_nodes;
> > > + int cpus_per_node = g->p.nr_cpus/nr_numa_nodes();
> > >   cpu_set_t orig_mask, mask;
> > >   int cpu;
> > >   int ret;
> > > 
> > > - BUG_ON(cpus_per_node*g->p.nr_nodes != g->p.nr_cpus);
> > > + BUG_ON(cpus_per_node*nr_numa_nodes() != g->p.nr_cpus);
> > >   BUG_ON(!cpus_per_node);
> > > 
> > >   ret = sched_getaffinity(0, sizeof(orig_mask),
> > > _mask);
> > > @@ -692,7 +692,7 @@ static int parse_setup_node_list(void)
> > >   int i;
> > > 
> > >   for (i = 0; i < mul; i++) {
> > > - if (t >= g->p.nr_tasks) {
> > > + if (t >= g->p.nr_tasks ||
> > > !node_has_cpus(bind_node)) {
> > >   printf("\n# NOTE:
> > > ignoring bind NODEs starting at NODE#%d\n", bind_node);
> > >   goto out;
> > >   }
> > > @@ -973,6 +973,7 @@ static void calc_convergence(double
> > > runtime_ns_max, double *convergence)
> > >   int node;
> > >   int cpu;
> > >   int t;
> > > + int processes;
> > > 
> > >   if (!g->p.show_convergence && !g->p.measure_convergence)
> > >   return;
> > > @@ -1007,13 +1008,14 @@ static void calc_convergence(double
> > > runtime_ns_max, double *convergence)
> > >   sum = 0;
> > > 
> > >   for (node = 0; node < g->p.nr_nodes; node++) {
> > > + if (!is_node_present(node))
> > > + continue;
> > >   nr = nodes[node];
> > >   nr_min = min(nr, nr_min);
> > >   nr_max = max(nr, nr_max);
> > >   sum += nr;
> > >   }
> > >   BUG_ON(nr_min > nr_max);
> > > -
> > Looks like an un-necessary change there.
> Right, and I would leave the 'int processes' declaration where it is,
> as
> it is not used outside that loop.
> 
I had hit with this compilation error, so had to move the
initialization above.

  CC   bench/numa.o
bench/numa.c: In function ‘calc_convergence’:
bench/numa.c:1035:3: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
   int processes = count_node_processes(node);
   ^
cc1: all warnings being treated as errors
mv: cannot stat ‘bench/.numa.o.tmp’: No such file or directory
make[4]: *** [bench/numa.o] Error 1
make[3]: *** [bench] Error 2
make[2]: *** [perf-in.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2

> The move of that declaration to the top of the calc_convergence()
> function made me spend some cycles trying to figure out why that was
> done, only to realize that it was an unnecessary change :-\
> 

Agree, I would have kept it in the same scope, will keep as below,

@@ -984,8 +1026,11 @@ static void calc_convergence(double runtime_ns_max, 
double *convergence)
process_groups = 0;
 
for (node = 0; node < g->p.nr_nodes; node++) {
-   int p

Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

2017-11-15 Thread Satheesh Rajendran
Hi Arnaldo,Please find my reply inline.

On Tue, 2017-10-31 at 12:26 -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Oct 31, 2017 at 08:46:58PM +0530, Naveen N. Rao escreveu:
> > 
> > On 2017/08/21 10:17AM, sathn...@linux.vnet.ibm.com wrote:
> > > 
> > > From: Satheesh Rajendran 
> > > 
> > > Certain systems are designed to have sparse/discontiguous nodes.
> > > On such systems, perf bench numa hangs, shows wrong number of
> > > nodes
> > > and shows values for non-existent nodes. Handle this by only
> > > taking nodes that are exposed by kernel to userspace.
> > > 
> > > Cc: Arnaldo Carvalho de Melo 
> > > Reviewed-by: Srikar Dronamraju 
> > > Signed-off-by: Satheesh Rajendran 
> > > Signed-off-by: Balamuruhan S 
> > > ---
> > >  tools/perf/bench/numa.c | 17 ++---
> > >  1 file changed, 10 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > > index 2483174..d44 100644
> > > --- a/tools/perf/bench/numa.c
> > > +++ b/tools/perf/bench/numa.c
> > > @@ -287,12 +287,12 @@ static cpu_set_t bind_to_cpu(int
> > > target_cpu)
> > > 
> > >  static cpu_set_t bind_to_node(int target_node)
> > >  {
> > > - int cpus_per_node = g->p.nr_cpus/g->p.nr_nodes;
> > > + int cpus_per_node = g->p.nr_cpus/nr_numa_nodes();
> > >   cpu_set_t orig_mask, mask;
> > >   int cpu;
> > >   int ret;
> > > 
> > > - BUG_ON(cpus_per_node*g->p.nr_nodes != g->p.nr_cpus);
> > > + BUG_ON(cpus_per_node*nr_numa_nodes() != g->p.nr_cpus);
> > >   BUG_ON(!cpus_per_node);
> > > 
> > >   ret = sched_getaffinity(0, sizeof(orig_mask),
> > > _mask);
> > > @@ -692,7 +692,7 @@ static int parse_setup_node_list(void)
> > >   int i;
> > > 
> > >   for (i = 0; i < mul; i++) {
> > > - if (t >= g->p.nr_tasks) {
> > > + if (t >= g->p.nr_tasks ||
> > > !node_has_cpus(bind_node)) {
> > >   printf("\n# NOTE:
> > > ignoring bind NODEs starting at NODE#%d\n", bind_node);
> > >   goto out;
> > >   }
> > > @@ -973,6 +973,7 @@ static void calc_convergence(double
> > > runtime_ns_max, double *convergence)
> > >   int node;
> > >   int cpu;
> > >   int t;
> > > + int processes;
> > > 
> > >   if (!g->p.show_convergence && !g->p.measure_convergence)
> > >   return;
> > > @@ -1007,13 +1008,14 @@ static void calc_convergence(double
> > > runtime_ns_max, double *convergence)
> > >   sum = 0;
> > > 
> > >   for (node = 0; node < g->p.nr_nodes; node++) {
> > > + if (!is_node_present(node))
> > > + continue;
> > >   nr = nodes[node];
> > >   nr_min = min(nr, nr_min);
> > >   nr_max = max(nr, nr_max);
> > >   sum += nr;
> > >   }
> > >   BUG_ON(nr_min > nr_max);
> > > -
> > Looks like an un-necessary change there.
> Right, and I would leave the 'int processes' declaration where it is,
> as
> it is not used outside that loop.
> 
I had hit with this compilation error, so had to move the
initialization above.

  CC   bench/numa.o
bench/numa.c: In function ‘calc_convergence’:
bench/numa.c:1035:3: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
   int processes = count_node_processes(node);
   ^
cc1: all warnings being treated as errors
mv: cannot stat ‘bench/.numa.o.tmp’: No such file or directory
make[4]: *** [bench/numa.o] Error 1
make[3]: *** [bench] Error 2
make[2]: *** [perf-in.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2

> The move of that declaration to the top of the calc_convergence()
> function made me spend some cycles trying to figure out why that was
> done, only to realize that it was an unnecessary change :-\
> 

Agree, I would have kept it in the same scope, will keep as below,

@@ -984,8 +1026,11 @@ static void calc_convergence(double runtime_ns_max, 
double *convergence)
process_groups = 0;
 
for (node = 0; node < g->p.nr_nodes; node++) {
-   int processes = count_node_processes(node);
+   int processes;
 
+   if (!is_node_present(node))
+   continue;
+   proces

Re: [PATCH v3 1/2] perf/bench/numa: Add functions to detect sparse numa nodes

2017-11-14 Thread Satheesh Rajendran
Hi Naveen,Thanks for detailed review, my comments inline.

On Tue, 2017-10-31 at 20:44 +0530, Naveen N. Rao wrote:
> Hi Satheesh,
> 
> On 2017/08/21 10:15AM, sathn...@linux.vnet.ibm.com wrote:
> > 
> > From: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
> > 
> > Added functions 1) to get a count of all nodes that are exposed to
> > userspace. These nodes could be memoryless cpu nodes or cpuless
> > memory
> > nodes, 2) to check given node is present and 3) to check given
> > node has cpus
> > 
> > This information can be used to handle sparse/discontiguous nodes.
> > 
> > Cc: Arnaldo Carvalho de Melo <a...@kernel.org>
> > Reviewed-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com>
> > Signed-off-by: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
> > Signed-off-by: Balamuruhan S <bal...@linux.vnet.ibm.com>
> > ---
> >  tools/perf/bench/numa.c | 44
> > 
> >  1 file changed, 44 insertions(+)
> > 
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 469d65b..2483174 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -215,6 +215,50 @@ static const char * const numa_usage[] = {
> >     NULL
> >  };
> > 
> > +/*
> > + * To get number of numa nodes present.
> > + */
> > +static int nr_numa_nodes(void)
> > +{
> > +   int i, nr_nodes = 0;
> > +
> > +   for (i = 0; i < g->p.nr_nodes; i++) {
> > +   if (numa_bitmask_isbitset(numa_nodes_ptr, i))
> > +   nr_nodes++;
> > +   }
> > +
> > +   return nr_nodes;
> > +}
> > +
> > +/* 
> Please run patches through scripts/checkpatch.pl. There is a
> trailing 
> whitespace above...
> 
Sure
> > 
> > + * To check if given numa node is present.
> > + */
> > +static int is_node_present(int node)
> > +{
> > +   return numa_bitmask_isbitset(numa_nodes_ptr, node);
> > +}
> > +
> > +/*
> > + * To check given numa node has cpus.
> > + */
> > +static bool node_has_cpus(int node)
> > +{
> > +   struct bitmask *cpu = numa_allocate_cpumask();
> > +   unsigned int i;
> > +
> > +   if (cpu == NULL)
> > +   return false; /* lets fall back to nocpus safely
> > */
> > +
> > +   if (numa_node_to_cpus(node, cpu) == 0) {
> This can be simplified to:
>   if (cpu && !numa_node_to_cpus(node, cpu)) {
> 
> > 
> > +   for (i = 0; i < cpu->size; i++) {
> > +   if (numa_bitmask_isbitset(cpu, i))
> > +   return true;
> > +   }
> > +   }
> The indentation on those brackets look to be wrong.
> 
Sure
> > 
> > +
> > +   return false;
> > +}
> > +
> More importantly, you've introduced few functions in this patch, but 
> none of those are being used. This is not a useful way to split your 
> patches. In fact, this hurts bisect since trying to build perf with
> just 
> this patch applied throws errors.
> 
Sure, This can be merged to single patch, will do it in next version.

> You seem to be addressing a few different issues related to perf
> bench 
> numa. You might want to split your patch based on the specific
> issue(s) 
> each change fixes.
> 
> 
> - Naveen
> 
> 
Regards,
-Satheesh.
> > 
> >  static cpu_set_t bind_to_cpu(int target_cpu)
> >  {
> >     cpu_set_t orig_mask, mask;



Re: [PATCH v3 1/2] perf/bench/numa: Add functions to detect sparse numa nodes

2017-11-14 Thread Satheesh Rajendran
Hi Naveen,Thanks for detailed review, my comments inline.

On Tue, 2017-10-31 at 20:44 +0530, Naveen N. Rao wrote:
> Hi Satheesh,
> 
> On 2017/08/21 10:15AM, sathn...@linux.vnet.ibm.com wrote:
> > 
> > From: Satheesh Rajendran 
> > 
> > Added functions 1) to get a count of all nodes that are exposed to
> > userspace. These nodes could be memoryless cpu nodes or cpuless
> > memory
> > nodes, 2) to check given node is present and 3) to check given
> > node has cpus
> > 
> > This information can be used to handle sparse/discontiguous nodes.
> > 
> > Cc: Arnaldo Carvalho de Melo 
> > Reviewed-by: Srikar Dronamraju 
> > Signed-off-by: Satheesh Rajendran 
> > Signed-off-by: Balamuruhan S 
> > ---
> >  tools/perf/bench/numa.c | 44
> > 
> >  1 file changed, 44 insertions(+)
> > 
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 469d65b..2483174 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -215,6 +215,50 @@ static const char * const numa_usage[] = {
> >     NULL
> >  };
> > 
> > +/*
> > + * To get number of numa nodes present.
> > + */
> > +static int nr_numa_nodes(void)
> > +{
> > +   int i, nr_nodes = 0;
> > +
> > +   for (i = 0; i < g->p.nr_nodes; i++) {
> > +   if (numa_bitmask_isbitset(numa_nodes_ptr, i))
> > +   nr_nodes++;
> > +   }
> > +
> > +   return nr_nodes;
> > +}
> > +
> > +/* 
> Please run patches through scripts/checkpatch.pl. There is a
> trailing 
> whitespace above...
> 
Sure
> > 
> > + * To check if given numa node is present.
> > + */
> > +static int is_node_present(int node)
> > +{
> > +   return numa_bitmask_isbitset(numa_nodes_ptr, node);
> > +}
> > +
> > +/*
> > + * To check given numa node has cpus.
> > + */
> > +static bool node_has_cpus(int node)
> > +{
> > +   struct bitmask *cpu = numa_allocate_cpumask();
> > +   unsigned int i;
> > +
> > +   if (cpu == NULL)
> > +   return false; /* lets fall back to nocpus safely
> > */
> > +
> > +   if (numa_node_to_cpus(node, cpu) == 0) {
> This can be simplified to:
>   if (cpu && !numa_node_to_cpus(node, cpu)) {
> 
> > 
> > +   for (i = 0; i < cpu->size; i++) {
> > +   if (numa_bitmask_isbitset(cpu, i))
> > +   return true;
> > +   }
> > +   }
> The indentation on those brackets look to be wrong.
> 
Sure
> > 
> > +
> > +   return false;
> > +}
> > +
> More importantly, you've introduced few functions in this patch, but 
> none of those are being used. This is not a useful way to split your 
> patches. In fact, this hurts bisect since trying to build perf with
> just 
> this patch applied throws errors.
> 
Sure, This can be merged to single patch, will do it in next version.

> You seem to be addressing a few different issues related to perf
> bench 
> numa. You might want to split your patch based on the specific
> issue(s) 
> each change fixes.
> 
> 
> - Naveen
> 
> 
Regards,
-Satheesh.
> > 
> >  static cpu_set_t bind_to_cpu(int target_cpu)
> >  {
> >     cpu_set_t orig_mask, mask;



Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

2017-11-14 Thread Satheesh Rajendran
Hi Naveen,


On Tue, 2017-10-31 at 20:46 +0530, Naveen N. Rao wrote:
> 
> >     }
> >     BUG_ON(nr_min > nr_max);
> > -
> Looks like an un-necessary change there.
> 
> - Naveen
> 
I had hit with this compilation error, so had to move the
initialization above.
Please advice. Thanks.

  CC   bench/numa.o
bench/numa.c: In function ‘calc_convergence’:
bench/numa.c:1035:3: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
   int processes = count_node_processes(node);
   ^
cc1: all warnings being treated as errors
mv: cannot stat ‘bench/.numa.o.tmp’: No such file or directory
make[4]: *** [bench/numa.o] Error 1
make[3]: *** [bench] Error 2
make[2]: *** [perf-in.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2

Regards,
-Satheesh.
> > 
> >     BUG_ON(sum > g->p.nr_tasks);
> > 
> >     if (0 && (sum < g->p.nr_tasks))
@@ -1027,8 +1029,9 @@ static void calc_convergence(double 



Re: [PATCH v3 2/2] perf/bench/numa: Handle discontiguous/sparse numa nodes

2017-11-14 Thread Satheesh Rajendran
Hi Naveen,


On Tue, 2017-10-31 at 20:46 +0530, Naveen N. Rao wrote:
> 
> >     }
> >     BUG_ON(nr_min > nr_max);
> > -
> Looks like an un-necessary change there.
> 
> - Naveen
> 
I had hit with this compilation error, so had to move the
initialization above.
Please advice. Thanks.

  CC   bench/numa.o
bench/numa.c: In function ‘calc_convergence’:
bench/numa.c:1035:3: error: ISO C90 forbids mixed declarations and code
[-Werror=declaration-after-statement]
   int processes = count_node_processes(node);
   ^
cc1: all warnings being treated as errors
mv: cannot stat ‘bench/.numa.o.tmp’: No such file or directory
make[4]: *** [bench/numa.o] Error 1
make[3]: *** [bench] Error 2
make[2]: *** [perf-in.o] Error 2
make[1]: *** [sub-make] Error 2
make: *** [all] Error 2

Regards,
-Satheesh.
> > 
> >     BUG_ON(sum > g->p.nr_tasks);
> > 
> >     if (0 && (sum < g->p.nr_tasks))
@@ -1027,8 +1029,9 @@ static void calc_convergence(double 



Re: [PATCH v3 0/2] Fixup for discontiguous/sparse numa nodes

2017-09-19 Thread Satheesh Rajendran
Hi Arnaldo,

Please let me know if any further comments.
Thanks in advance :-)

Regards,
-Satheesh.On Mon, 2017-08-21 at 15:45 +0530, sathn...@linux.vnet.ibm.com wrote:
> From: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
> 
> Certain systems would have sparse/discontinguous
> numa nodes.
> perf bench numa doesnt work well on such nodes.
> 1. It shows wrong values.
> 2. It can hang.
> 3. It can show redundant information for non-existant nodes.
> 
>  #numactl -H
> available: 2 nodes (0,8)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 61352 MB
> node 0 free: 57168 MB
> node 8 cpus: 8 9 10 11 12 13 14 15
> node 8 size: 65416 MB
> node 8 free: 36593 MB
> node distances:
> node   0   8
>   0:  10  40
>   8:  40  10
> 
> Scenario 1:
> 
> Before Fix:
>  # perf bench numa mem --no-data_rand_walk -p 2 -t 20 -G 0 -P 3072 -T
> 0 -l 50 -c -s 1000
> ...
> ...
>  # 40 tasks will execute (on 9 nodes, 16 CPUs): > Wrong number of
> nodes
> ...
>  #2.0%  [0.2
> mins]  1/1   0/0   0/0   0/0   0/0   0/0   0/0   0/0   4/1  [ 4/2 ]
> l:  0-0   (  0) > Shows info on non-existant nodes.
> 
> After Fix:
>  # ./perf bench numa mem --no-data_rand_walk -p 2 -t 20 -G 0 -P 3072
> -T 0 -l 50 -c -s 1000
> ...
> ...
>  # 40 tasks will execute (on 2 nodes, 16 CPUs):
> ... 
>  #2.0%  [0.2 mins]  9/1   0/0  [ 9/1 ] l:  0-0   (  0)
>  #4.0%  [0.4 mins] 21/2  19/1  [ 2/3 ] l:  0-1   (  1) {1-2}
> 
> Scenario 2:
> 
> Before Fix:
>  # perf bench numa all
>  # Running numa/mem benchmark...
> 
> ...
>  # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0
> -M 1 -s 20 -zZq --thp  1 --no-data_rand_walk"
> perf: bench/numa.c:306: bind_to_memnode: Assertion `!(ret)' failed.
> > Got hung
> 
> After Fix:
>  # ./perf bench numa all
>  # Running numa/mem benchmark...
> 
> ...
>  # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0
> -M 1 -s 20 -zZq --thp  1 --no-data_rand_walk"
> 
>  # NOTE: ignoring bind NODEs starting at NODE#1
>  # NOTE: 0 tasks mem-bound, 1 tasks unbound
>  20.017 secs slowest (max) thread-runtime
>  20.000 secs fastest (min) thread-runtime
>  20.006 secs average thread-runtime
>   0.043 % difference between max/avg runtime
> 413.794 GB data processed, per thread
> 413.794 GB data processed, total
>       0.048 nsecs/byte/thread runtime
>  20.672 GB/sec/thread speed
>  20.672 GB/sec total speed
> 
> Changes in v2:
> Fixed review comments for function names and alloc failure handle
> 
> Changes in v3:
> Coding Style fixes.
> 
> 
> Satheesh Rajendran (2):
>   perf/bench/numa: Add functions to detect sparse numa nodes
>   perf/bench/numa: Handle discontiguous/sparse numa nodes
> 
>  tools/perf/bench/numa.c | 61
> +++--
>  1 file changed, 54 insertions(+), 7 deletions(-)
> 



Re: [PATCH v3 0/2] Fixup for discontiguous/sparse numa nodes

2017-09-19 Thread Satheesh Rajendran
Hi Arnaldo,

Please let me know if any further comments.
Thanks in advance :-)

Regards,
-Satheesh.On Mon, 2017-08-21 at 15:45 +0530, sathn...@linux.vnet.ibm.com wrote:
> From: Satheesh Rajendran 
> 
> Certain systems would have sparse/discontinguous
> numa nodes.
> perf bench numa doesnt work well on such nodes.
> 1. It shows wrong values.
> 2. It can hang.
> 3. It can show redundant information for non-existant nodes.
> 
>  #numactl -H
> available: 2 nodes (0,8)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 61352 MB
> node 0 free: 57168 MB
> node 8 cpus: 8 9 10 11 12 13 14 15
> node 8 size: 65416 MB
> node 8 free: 36593 MB
> node distances:
> node   0   8
>   0:  10  40
>   8:  40  10
> 
> Scenario 1:
> 
> Before Fix:
>  # perf bench numa mem --no-data_rand_walk -p 2 -t 20 -G 0 -P 3072 -T
> 0 -l 50 -c -s 1000
> ...
> ...
>  # 40 tasks will execute (on 9 nodes, 16 CPUs): > Wrong number of
> nodes
> ...
>  #2.0%  [0.2
> mins]  1/1   0/0   0/0   0/0   0/0   0/0   0/0   0/0   4/1  [ 4/2 ]
> l:  0-0   (  0) > Shows info on non-existant nodes.
> 
> After Fix:
>  # ./perf bench numa mem --no-data_rand_walk -p 2 -t 20 -G 0 -P 3072
> -T 0 -l 50 -c -s 1000
> ...
> ...
>  # 40 tasks will execute (on 2 nodes, 16 CPUs):
> ... 
>  #2.0%  [0.2 mins]  9/1   0/0  [ 9/1 ] l:  0-0   (  0)
>  #4.0%  [0.4 mins] 21/2  19/1  [ 2/3 ] l:  0-1   (  1) {1-2}
> 
> Scenario 2:
> 
> Before Fix:
>  # perf bench numa all
>  # Running numa/mem benchmark...
> 
> ...
>  # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0
> -M 1 -s 20 -zZq --thp  1 --no-data_rand_walk"
> perf: bench/numa.c:306: bind_to_memnode: Assertion `!(ret)' failed.
> > Got hung
> 
> After Fix:
>  # ./perf bench numa all
>  # Running numa/mem benchmark...
> 
> ...
>  # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0
> -M 1 -s 20 -zZq --thp  1 --no-data_rand_walk"
> 
>  # NOTE: ignoring bind NODEs starting at NODE#1
>  # NOTE: 0 tasks mem-bound, 1 tasks unbound
>  20.017 secs slowest (max) thread-runtime
>  20.000 secs fastest (min) thread-runtime
>  20.006 secs average thread-runtime
>   0.043 % difference between max/avg runtime
> 413.794 GB data processed, per thread
> 413.794 GB data processed, total
>   0.048 nsecs/byte/thread runtime
>  20.672 GB/sec/thread speed
>  20.672 GB/sec total speed
> 
> Changes in v2:
> Fixed review comments for function names and alloc failure handle
> 
> Changes in v3:
> Coding Style fixes.
> 
> 
> Satheesh Rajendran (2):
>   perf/bench/numa: Add functions to detect sparse numa nodes
>   perf/bench/numa: Handle discontiguous/sparse numa nodes
> 
>  tools/perf/bench/numa.c | 61
> +++--
>  1 file changed, 54 insertions(+), 7 deletions(-)
> 



Re: [PATCH 1/2] perf/bench/numa: Add functions to detect sparse numa nodes

2017-08-21 Thread Satheesh Rajendran
Thanks Arnaldo for the detailed review :-)
Will address them and send across v2. On Thu, 2017-08-10 at 16:22 -0300, 
Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 10, 2017 at 12:58:49PM +0530, sathn...@linux.vnet.ibm.com
>  escreveu:
> > 
> > From: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
> > 
> > Added functions 1) to get a count of all nodes that are exposed to
> > userspace. These nodes could be memoryless cpu nodes or cpuless
> > memory
> > nodes, 2) to check given node is present and 3) to check given
> > node has cpus
> > 
> > This information can be used to handle sparse/discontiguous nodes.
> > 
> > Reviewed-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com>
> > Signed-off-by: Satheesh Rajendran <sathn...@linux.vnet.ibm.com>
> > Signed-off-by: Balamuruhan S <bal...@linux.vnet.ibm.com>
> > ---
> >  tools/perf/bench/numa.c | 35 +++
> >  1 file changed, 35 insertions(+)
> > 
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 469d65b..efd7595 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -215,6 +215,41 @@ static const char * const numa_usage[] = {
> >     NULL
> >  };
> >  
> > +static int nr_numa_nodes(void)
> > +{
> > +   int node = 0, i;
> > +
> > +for (i = 0; i < g->p.nr_nodes; i++) {
> > +   if (numa_bitmask_isbitset(numa_nodes_ptr, i))
> > +   node++;
> > +   }
> > +   return node;
> Humm, can you rename 'node' to 'nr_nodes'?
> 
Sure, will Change it.
> > 
> > +}
> > +
> > +static bool is_node_present(int node)
> > +{
> > +   if (numa_bitmask_isbitset(numa_nodes_ptr, node))
> > +   return true;
> > +   else
> > +   return false;
> > +}
> Why four lines instead of just one? Isn't this equivalent:
> 
Sure.
> static bool is_node_present(int node)
> {
>   return numa_bitmask_isbitset(numa_nodes_ptr, node);
> }
> 
> ?
> 
> > +
> > +static bool is_node_hascpu(int node)
> Can you rename this function, the name is confusing :-\
> 
> Based on the documentation for this function, that you left only in
> the
> changelog (please put it just before the function, as a comment, I
> think
> it should be named node_has_cpus()?
> 
make sense, will change it.
> > 
> > +{
> > +   struct bitmask *cpu;
> > +   unsigned int i;
> > +
> > +   cpu = numa_allocate_cpumask();
> Please put the line with the initialization together with the
> declaration, making it:
> 
> struct bitmask *cpu = numa_allocate_cpumask();
> 
> Also, this is a "alloc" function, I bet it can fail? If so, check it
> and
> return something useful if it fails, which probably will be difficult
> since this function returns bool?
> 
Sure, will check return false as failsafe.
> 
> > 
> > +   if (numa_node_to_cpus(node, cpu) == 0) {
> > +   for (i = 0; i < cpu->size; i++) {
> > +   if (numa_bitmask_isbitset(cpu, i))
> > +   return true;
> > +   }
> > +   } else
> > +   return false; // lets fall back to nocpus safely
> > +   return false;
> > +}
> > +
> >  static cpu_set_t bind_to_cpu(int target_cpu)
> >  {
> >     cpu_set_t orig_mask, mask;



Re: [PATCH 1/2] perf/bench/numa: Add functions to detect sparse numa nodes

2017-08-21 Thread Satheesh Rajendran
Thanks Arnaldo for the detailed review :-)
Will address them and send across v2. On Thu, 2017-08-10 at 16:22 -0300, 
Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 10, 2017 at 12:58:49PM +0530, sathn...@linux.vnet.ibm.com
>  escreveu:
> > 
> > From: Satheesh Rajendran 
> > 
> > Added functions 1) to get a count of all nodes that are exposed to
> > userspace. These nodes could be memoryless cpu nodes or cpuless
> > memory
> > nodes, 2) to check given node is present and 3) to check given
> > node has cpus
> > 
> > This information can be used to handle sparse/discontiguous nodes.
> > 
> > Reviewed-by: Srikar Dronamraju 
> > Signed-off-by: Satheesh Rajendran 
> > Signed-off-by: Balamuruhan S 
> > ---
> >  tools/perf/bench/numa.c | 35 +++
> >  1 file changed, 35 insertions(+)
> > 
> > diff --git a/tools/perf/bench/numa.c b/tools/perf/bench/numa.c
> > index 469d65b..efd7595 100644
> > --- a/tools/perf/bench/numa.c
> > +++ b/tools/perf/bench/numa.c
> > @@ -215,6 +215,41 @@ static const char * const numa_usage[] = {
> >     NULL
> >  };
> >  
> > +static int nr_numa_nodes(void)
> > +{
> > +   int node = 0, i;
> > +
> > +for (i = 0; i < g->p.nr_nodes; i++) {
> > +   if (numa_bitmask_isbitset(numa_nodes_ptr, i))
> > +   node++;
> > +   }
> > +   return node;
> Humm, can you rename 'node' to 'nr_nodes'?
> 
Sure, will Change it.
> > 
> > +}
> > +
> > +static bool is_node_present(int node)
> > +{
> > +   if (numa_bitmask_isbitset(numa_nodes_ptr, node))
> > +   return true;
> > +   else
> > +   return false;
> > +}
> Why four lines instead of just one? Isn't this equivalent:
> 
Sure.
> static bool is_node_present(int node)
> {
>   return numa_bitmask_isbitset(numa_nodes_ptr, node);
> }
> 
> ?
> 
> > +
> > +static bool is_node_hascpu(int node)
> Can you rename this function, the name is confusing :-\
> 
> Based on the documentation for this function, that you left only in
> the
> changelog (please put it just before the function, as a comment, I
> think
> it should be named node_has_cpus()?
> 
make sense, will change it.
> > 
> > +{
> > +   struct bitmask *cpu;
> > +   unsigned int i;
> > +
> > +   cpu = numa_allocate_cpumask();
> Please put the line with the initialization together with the
> declaration, making it:
> 
> struct bitmask *cpu = numa_allocate_cpumask();
> 
> Also, this is a "alloc" function, I bet it can fail? If so, check it
> and
> return something useful if it fails, which probably will be difficult
> since this function returns bool?
> 
Sure, will check return false as failsafe.
> 
> > 
> > +   if (numa_node_to_cpus(node, cpu) == 0) {
> > +   for (i = 0; i < cpu->size; i++) {
> > +   if (numa_bitmask_isbitset(cpu, i))
> > +   return true;
> > +   }
> > +   } else
> > +   return false; // lets fall back to nocpus safely
> > +   return false;
> > +}
> > +
> >  static cpu_set_t bind_to_cpu(int target_cpu)
> >  {
> >     cpu_set_t orig_mask, mask;