[PATCH v3 01/15] cpu: Add new {add,remove}_cpu() functions

2020-02-23 Thread Qais Yousef
The new functions use device_{online,offline}() which are userspace
safe.

This is in preparation to move cpu_{up, down} kernel users to use
a safer interface that is not racy with userspace.

Suggested-by: "Paul E. McKenney" 
Signed-off-by: Qais Yousef 
CC: Thomas Gleixner 
CC: "Paul E. McKenney" 
CC: Helge Deller 
CC: Michael Ellerman 
CC: "David S. Miller" 
CC: Juergen Gross 
CC: Mark Rutland 
CC: Lorenzo Pieralisi 
CC: xen-de...@lists.xenproject.org
CC: linux-par...@vger.kernel.org
CC: sparcli...@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-arm-ker...@lists.infradead.org
CC: x...@kernel.org
CC: linux-ker...@vger.kernel.org
---
 include/linux/cpu.h |  2 ++
 kernel/cpu.c| 24 
 2 files changed, 26 insertions(+)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 1ca2baf817ed..cf8cf38dca43 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -89,6 +89,7 @@ extern ssize_t arch_cpu_release(const char *, size_t);
 #ifdef CONFIG_SMP
 extern bool cpuhp_tasks_frozen;
 int cpu_up(unsigned int cpu);
+int add_cpu(unsigned int cpu);
 void notify_cpu_starting(unsigned int cpu);
 extern void cpu_maps_update_begin(void);
 extern void cpu_maps_update_done(void);
@@ -118,6 +119,7 @@ extern void cpu_hotplug_disable(void);
 extern void cpu_hotplug_enable(void);
 void clear_tasks_mm_cpumask(int cpu);
 int cpu_down(unsigned int cpu);
+int remove_cpu(unsigned int cpu);
 
 #else /* CONFIG_HOTPLUG_CPU */
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9c706af713fb..069802f7010f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1057,6 +1057,18 @@ int cpu_down(unsigned int cpu)
 }
 EXPORT_SYMBOL(cpu_down);
 
+int remove_cpu(unsigned int cpu)
+{
+   int ret;
+
+   lock_device_hotplug();
+   ret = device_offline(get_cpu_device(cpu));
+   unlock_device_hotplug();
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(remove_cpu);
+
 #else
 #define takedown_cpu   NULL
 #endif /*CONFIG_HOTPLUG_CPU*/
@@ -1209,6 +1221,18 @@ int cpu_up(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(cpu_up);
 
+int add_cpu(unsigned int cpu)
+{
+   int ret;
+
+   lock_device_hotplug();
+   ret = device_online(get_cpu_device(cpu));
+   unlock_device_hotplug();
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(add_cpu);
+
 #ifdef CONFIG_PM_SLEEP_SMP
 static cpumask_var_t frozen_cpus;
 
-- 
2.17.1



[PATCH v3 00/15] Convert cpu_up/down to device_online/offline

2020-02-23 Thread Qais Yousef
Changes in v3:
* Fixup smp_shutdown_nonboot_cpus() to hold the right lock as suggested
  by Russel King.
* Split the combined arm/arm64 patch into 2 separate patches.
* Add new add/remove_cpu() functions that wraps lock,
  device_online/offline, unlock as suggested by Paul McKenney
* Use the new add/remove instead of device_online/offline where
  appropriate
* Dropped a patch that exported device_online/offline() since
  the new add/remove_cpu() are exported and used by torture test which
  could build as a module
* Rebsed on top 5.6-rc1 from linus/master


git clone git://linux-arm.org/linux-qy.git -b cpu-hp-cleanup-v3


Older post can be found here:

https://lore.kernel.org/linuxppc-dev/20191125112754.25223-1-qais.you...@arm.com/
https://lore.kernel.org/lkml/20191125112754.25223-2-qais.you...@arm.com/


Using cpu_up/down directly to bring cpus online/offline loses synchronization
with sysfs and could suffer from a race similar to what is described in
commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and serialization
during LPM").

cpu_up/down seem to be more of a internal implementation detail for the cpu
subsystem to use to boot up cpus, perform suspend/resume and low level hotplug
operations. Users outside of the cpu subsystem would be better using the device
core API to bring a cpu online/offline which is the interface used to hotplug
memory and other system devices.

Several users have already migrated to use the device core API, this series
converts the remaining users and hides cpu_up/down from internal users at the
end.

I noticed this problem while working on a hack to disable offlining
a particular CPU but noticed that setting the offline_disabled attribute in the
device struct isn't enough because users can easily bypass the device core.
While my hack isn't a valid use case but it did highlight the inconsistency in
the way cpus are being onlined/offlined and this attempt hopefully improves on
this.

The first 9 patches fix arch users.

The remaining 6 patches fix generic code users. Particularly creating a new
special exported API for the device core to use instead of cpu_up/down.

The last patch removes cpu_up/down from cpu.h and unexport the functions.

In some cases where the use of cpu_up/down seemed legitimate, I encapsulated
the logic in a higher level - special purposed function; and converted the code
to use that instead.

I did re-run the rcu torture, lock torture and psci checker tests and no
problem was noticed. I did perform build tests on all arch affected except for
parisc.

Hopefully I got the CC list right for all the patches. Apologies in advance if
some people were omitted from some patches but they should have been CCed.

CC: Armijn Hemel 
CC: Benjamin Herrenschmidt 
CC: Bjorn Helgaas 
CC: Borislav Petkov 
CC: Boris Ostrovsky 
CC: Catalin Marinas 
CC: Christophe Leroy 
CC: Daniel Lezcano 
CC: Davidlohr Bueso 
CC: "David S. Miller" 
CC: Eiichi Tsukata 
CC: Enrico Weigelt 
CC: Fenghua Yu 
CC: Greg Kroah-Hartman 
CC: Helge Deller 
CC: "H. Peter Anvin" 
CC: Ingo Molnar 
CC: "James E.J. Bottomley" 
CC: James Morse 
CC: Jiri Kosina 
CC: Josh Poimboeuf 
CC: Josh Triplett 
CC: Juergen Gross 
CC: Lorenzo Pieralisi 
CC: Mark Rutland 
CC: Michael Ellerman 
CC: Nadav Amit 
CC: Nicholas Piggin 
CC: "Paul E. McKenney" 
CC: Paul Mackerras 
CC: Pavankumar Kondeti 
CC: "Peter Zijlstra (Intel)" 
CC: "Rafael J. Wysocki" 
CC: Ram Pai 
CC: Richard Fontana 
CC: Russell King 
CC: Sakari Ailus 
CC: Stefano Stabellini 
CC: Steve Capper 
CC: Thiago Jung Bauermann 
CC: Thomas Gleixner 
CC: Tony Luck 
CC: Will Deacon 
CC: Zhenzhong Duan 
CC: linux-arm-ker...@lists.infradead.org
CC: linux-i...@vger.kernel.org
CC: linux-ker...@vger.kernel.org
CC: linux-par...@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: sparcli...@vger.kernel.org
CC: x...@kernel.org
CC: xen-de...@lists.xenproject.org


Qais Yousef (15):
  cpu: Add new {add,remove}_cpu() functions
  smp: Create a new function to shutdown nonboot cpus
  ia64: Replace cpu_down with smp_shutdown_nonboot_cpus()
  arm: Don't use disable_nonboot_cpus()
  arm64: Don't use disable_nonboot_cpus()
  arm64: hibernate.c: Create a new function to handle cpu_up(sleep_cpu)
  x86: Replace cpu_up/down with add/remove_cpu
  powerpc: Replace cpu_up/down with add/remove_cpu
  sparc: Replace cpu_up/down with add/remove_cpu
  parisc: Replace cpu_up/down with add/remove_cpu
  driver: xen: Replace cpu_up/down with device_online/offline
  firmware: psci: Replace cpu_up/down with add/remove_cpu
  torture: Replace cpu_up/down with add/remove_cpu
  smp: Create a new function to bringup nonboot cpus online
  cpu: Hide cpu_up/down

 arch/arm/kernel/reboot.c |   4 +-
 arch/arm64/kernel/hibernate.c|  13 +--
 arch/arm64/kernel/process.c  |   4 +-
 arch/ia64/kernel/process.c   |   8 +-
 arch/parisc/kernel/processor.c   |  

[PATCH v3 08/15] powerpc: Replace cpu_up/down with add/remove_cpu

2020-02-23 Thread Qais Yousef
The core device API performs extra housekeeping bits that are missing
from directly calling cpu_up/down.

See commit a6717c01ddc2 ("powerpc/rtas: use device model APIs and
serialization during LPM") for an example description of what might go
wrong.

This also prepares to make cpu_up/down a private interface for anything
but the cpu subsystem.

Acked-by: Michael Ellerman 
Signed-off-by: Qais Yousef 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Enrico Weigelt 
CC: Ram Pai 
CC: Nicholas Piggin 
CC: Thiago Jung Bauermann 
CC: Christophe Leroy 
CC: Thomas Gleixner 
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-ker...@vger.kernel.org
---

Michael, this now uses add_cpu() which you should be CCed on. I wasn't sure if
I can keep your Ack or remove it in this case. Please let me know if you need
more clarification.

 arch/powerpc/kexec/core_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 04a7cba58eff..b4184092172a 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -212,7 +212,7 @@ static void wake_offline_cpus(void)
if (!cpu_online(cpu)) {
printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
   cpu);
-   WARN_ON(cpu_up(cpu));
+   WARN_ON(add_cpu(cpu));
}
}
 }
-- 
2.17.1



Re: [PATCH] powerpc: Include .BTF section

2020-02-23 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld:
>   ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed 
> in section `.BTF'
>
> Include .BTF section in vmlinux explicitly to fix the same.

I don't see any other architectures doing this in their linker script.
Why are we special?

cheers

> diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
> b/arch/powerpc/kernel/vmlinux.lds.S
> index b4c89a1acebb..a32d478a7f41 100644
> --- a/arch/powerpc/kernel/vmlinux.lds.S
> +++ b/arch/powerpc/kernel/vmlinux.lds.S
> @@ -303,6 +303,12 @@ SECTIONS
>   *(.branch_lt)
>   }
>  
> +#ifdef CONFIG_DEBUG_INFO_BTF
> + .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) {
> + *(.BTF)
> + }
> +#endif
> +
>   .opd : AT(ADDR(.opd) - LOAD_OFFSET) {
>   __start_opd = .;
>   KEEP(*(.opd))
> -- 
> 2.24.1


[PATCH v2 00/18] genirq: Remove setup_irq()

2020-02-23 Thread afzal mohammed
While trying to understand internals of irq handling, came across a
thread [1] in which tglx was referring to avoid usage of setup_irq().
The early boot setup_irq() invocations happen either via 'init_IRQ()'
or 'time_init()', while memory allocators are ready by 'mm_init()'.

Hence instances of setup_irq() are replaced by request_irq() &
setup_irq() [along with remove_irq()] definition deleted in the last
patch.

Seldom remove_irq() usage has been observed coupled with setup_irq(),
wherever that has been found, it too has been replaced by free_irq().

Build & boot tested on ARM & x86_64 platforms (ensured that on the
machines used for testing, modifications made in this series is being
exercised at runtime)

Much of the changes were created using Coccinelle with an intention
to learn it. But not everything could be automated.

Searching with 'git grep -n '\Wsetup_irq('' & avoiding the irrelevant
ones, 153 invocation's of setup_irq() were found. 112 could be replaced
w/ cocci, of which in a few files some desired hunks were missing or
not as expected, these were fixed up manually. Also the remaining 41
had to be done manually.

Although cocci could replace 112, because of line continue not
happening at paranthesis for request_irq(), around 80 had to be
manually aligned in the request_irq() statement.

So though many changes could be automated, there are a considerable
amount of manual changes, please review carefully especially mips &
alpha.

Usage of setup_percpu_irq() is untouched w/ this series.

There are 2 checkpatch warning about usage of BUG(), they were already
present w/ setup_irq(), status quo maintained.

[1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos

Since changes from v1 are trivial as below, tags received has been
applied to the relevant patches, if any objections, please shout.

v2:
 * Replace pr_err("request_irq() on %s failed" by
   pr_err("%s: request_irq() failed"
 * m68k: remove now irrelevant comment separation comment lines
 * Commit message massage


afzal mohammed (18):
  alpha: replace setup_irq() by request_irq()
  ARM: replace setup_irq() by request_irq()
  c6x: replace setup_irq() by request_irq()
  hexagon: replace setup_irq() by request_irq()
  ia64: replace setup_irq() by request_irq()
  m68k: Replace setup_irq() by request_irq()
  microblaze: Replace setup_irq() by request_irq()
  MIPS: Replace setup_irq() by request_irq()
  parisc: Replace setup_irq() by request_irq()
  powerpc: Replace setup_irq() by request_irq()
  s390: replace setup_irq() by request_irq()
  sh: replace setup_irq() by request_irq()
  unicore32: replace setup_irq() by request_irq()
  x86: Replace setup_irq() by request_irq()
  xtensa: replace setup_irq() by request_irq()
  clocksource: Replace setup_irq() by request_irq()
  irqchip: Replace setup_irq() by request_irq()
  genirq: Remove setup_irq() and remove_irq()

 arch/alpha/kernel/irq_alpha.c | 29 ++---
 arch/alpha/kernel/irq_i8259.c |  8 +--
 arch/alpha/kernel/irq_impl.h  |  7 +--
 arch/alpha/kernel/irq_pyxis.c |  3 +-
 arch/alpha/kernel/sys_alcor.c |  3 +-
 arch/alpha/kernel/sys_cabriolet.c |  3 +-
 arch/alpha/kernel/sys_eb64p.c |  3 +-
 arch/alpha/kernel/sys_marvel.c|  2 +-
 arch/alpha/kernel/sys_miata.c |  6 +-
 arch/alpha/kernel/sys_ruffian.c   |  3 +-
 arch/alpha/kernel/sys_rx164.c |  3 +-
 arch/alpha/kernel/sys_sx164.c |  3 +-
 arch/alpha/kernel/sys_wildfire.c  |  7 +--
 arch/alpha/kernel/time.c  |  6 +-
 arch/arm/mach-cns3xxx/core.c  | 10 +---
 arch/arm/mach-ebsa110/core.c  | 10 +---
 arch/arm/mach-ep93xx/timer-ep93xx.c   | 12 ++--
 arch/arm/mach-footbridge/dc21285-timer.c  | 11 +---
 arch/arm/mach-footbridge/isa-irq.c|  8 +--
 arch/arm/mach-footbridge/isa-timer.c  | 11 +---
 arch/arm/mach-iop32x/time.c   | 12 ++--
 arch/arm/mach-mmp/time.c  | 11 +---
 arch/arm/mach-omap1/pm.c  | 22 ---
 arch/arm/mach-omap1/time.c| 10 +---
 arch/arm/mach-omap1/timer32k.c| 10 +---
 arch/arm/mach-omap2/timer.c   | 11 +---
 arch/arm/mach-rpc/time.c  |  8 +--
 arch/arm/mach-spear/time.c|  9 +--
 arch/arm/plat-orion/time.c| 10 +---
 arch/c6x/platforms/timer64.c  | 11 +---
 arch/hexagon/kernel/smp.c | 17 +++---
 arch/hexagon/kernel/time.c| 11 +---
 arch/ia64/kernel/irq_ia64.c   | 42 +
 arch/ia64/kernel/mca.c| 51 +---
 arch/m68k/68000/timers.c  | 11 +---
 arch/m68k/coldfire/pit.c  | 11 +---
 arch/m68k/coldfire/sltimers.c

[PATCH v2 10/18] powerpc: Replace setup_irq() by request_irq()

2020-02-23 Thread afzal mohammed
request_irq() is preferred over setup_irq(). The early boot setup_irq()
invocations happen either via 'init_IRQ()' or 'time_init()', while
memory allocators are ready by 'mm_init()'.

Per tglx[1], setup_irq() existed in olden days when allocators were not
ready by the time early interrupts were initialized.

Hence replace setup_irq() by request_irq().

Seldom remove_irq() usage has been observed coupled with setup_irq(),
wherever that has been found, it too has been replaced by free_irq().

[1] https://lkml.kernel.org/r/alpine.DEB.2.20.1710191609480.1971@nanos

Signed-off-by: afzal mohammed 
Tested-by: Christophe Leroy  # for 8xx parts
---

v2:
 * Replace pr_err("request_irq() on %s failed" by
   pr_err("%s: request_irq() failed"
 * Commit message massage

 arch/powerpc/platforms/85xx/mpc85xx_cds.c | 10 +++-
 arch/powerpc/platforms/8xx/cpm1.c |  9 ++-
 arch/powerpc/platforms/8xx/m8xx_setup.c   |  9 ++-
 arch/powerpc/platforms/chrp/setup.c   | 14 --
 arch/powerpc/platforms/powermac/pic.c | 31 ++-
 arch/powerpc/platforms/powermac/smp.c |  9 ++-
 6 files changed, 27 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/mpc85xx_cds.c 
b/arch/powerpc/platforms/85xx/mpc85xx_cds.c
index 6b1436abe9b1..1c5598877d70 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_cds.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_cds.c
@@ -218,12 +218,6 @@ static irqreturn_t mpc85xx_8259_cascade_action(int irq, 
void *dev_id)
 {
return IRQ_HANDLED;
 }
-
-static struct irqaction mpc85xxcds_8259_irqaction = {
-   .handler = mpc85xx_8259_cascade_action,
-   .flags = IRQF_SHARED | IRQF_NO_THREAD,
-   .name = "8259 cascade",
-};
 #endif /* PPC_I8259 */
 #endif /* CONFIG_PCI */
 
@@ -271,7 +265,9 @@ static int mpc85xx_cds_8259_attach(void)
 *  disabled when the last user of the shared IRQ line frees their
 *  interrupt.
 */
-   if ((ret = setup_irq(cascade_irq, &mpc85xxcds_8259_irqaction))) {
+   ret = request_irq(cascade_irq, mpc85xx_8259_cascade_action,
+ IRQF_SHARED | IRQF_NO_THREAD, "8259 cascade", NULL);
+   if (ret) {
printk(KERN_ERR "Failed to setup cascade interrupt\n");
return ret;
}
diff --git a/arch/powerpc/platforms/8xx/cpm1.c 
b/arch/powerpc/platforms/8xx/cpm1.c
index a43ee7d1ff85..4db4ca2e1222 100644
--- a/arch/powerpc/platforms/8xx/cpm1.c
+++ b/arch/powerpc/platforms/8xx/cpm1.c
@@ -120,12 +120,6 @@ static irqreturn_t cpm_error_interrupt(int irq, void *dev)
return IRQ_HANDLED;
 }
 
-static struct irqaction cpm_error_irqaction = {
-   .handler = cpm_error_interrupt,
-   .flags = IRQF_NO_THREAD,
-   .name = "error",
-};
-
 static const struct irq_domain_ops cpm_pic_host_ops = {
.map = cpm_pic_host_map,
 };
@@ -187,7 +181,8 @@ unsigned int __init cpm_pic_init(void)
if (!eirq)
goto end;
 
-   if (setup_irq(eirq, &cpm_error_irqaction))
+   if (request_irq(eirq, cpm_error_interrupt, IRQF_NO_THREAD, "error",
+   NULL))
printk(KERN_ERR "Could not allocate CPM error IRQ!");
 
setbits32(&cpic_reg->cpic_cicr, CICR_IEN);
diff --git a/arch/powerpc/platforms/8xx/m8xx_setup.c 
b/arch/powerpc/platforms/8xx/m8xx_setup.c
index f1c805c8adbc..df4d57d07f9a 100644
--- a/arch/powerpc/platforms/8xx/m8xx_setup.c
+++ b/arch/powerpc/platforms/8xx/m8xx_setup.c
@@ -39,12 +39,6 @@ static irqreturn_t timebase_interrupt(int irq, void *dev)
return IRQ_HANDLED;
 }
 
-static struct irqaction tbint_irqaction = {
-   .handler = timebase_interrupt,
-   .flags = IRQF_NO_THREAD,
-   .name = "tbint",
-};
-
 /* per-board overridable init_internal_rtc() function. */
 void __init __attribute__ ((weak))
 init_internal_rtc(void)
@@ -157,7 +151,8 @@ void __init mpc8xx_calibrate_decr(void)
(TBSCR_TBF | TBSCR_TBE));
immr_unmap(sys_tmr2);
 
-   if (setup_irq(virq, &tbint_irqaction))
+   if (request_irq(virq, timebase_interrupt, IRQF_NO_THREAD, "tbint",
+   NULL))
panic("Could not allocate timer IRQ!");
 }
 
diff --git a/arch/powerpc/platforms/chrp/setup.c 
b/arch/powerpc/platforms/chrp/setup.c
index fcf6f2342ef4..3f4e324f1d67 100644
--- a/arch/powerpc/platforms/chrp/setup.c
+++ b/arch/powerpc/platforms/chrp/setup.c
@@ -451,13 +451,6 @@ static void __init chrp_find_openpic(void)
of_node_put(np);
 }
 
-#if defined(CONFIG_VT) && defined(CONFIG_INPUT_ADBHID) && defined(CONFIG_XMON)
-static struct irqaction xmon_irqaction = {
-   .handler = xmon_irq,
-   .name = "XMON break",
-};
-#endif
-
 static void __init chrp_find_8259(void)
 {
struct device_node *np, *pic = NULL;
@@ -541,8 +534,11 @@ static void __init chrp_init_IRQ(void)
if (of_node_is_type(kbd->parent, "adb"))
break;
of_node_put(kbd)

Re: [PATCH V14] mm/debug: Add tests validating architecture page table helpers

2020-02-23 Thread Anshuman Khandual
On 02/17/2020 08:47 AM, Anshuman Khandual wrote:
> This adds a test validation for architecture exported page table helpers.
> Patch adds basic transformation tests at various levels of the page table.
> 
> This test was originally suggested by Catalin during arm64 THP migration
> RFC discussion earlier. Going forward it can include more specific tests
> with respect to various generic MM functions like THP, HugeTLB etc and
> platform specific tests.
> 
> https://lore.kernel.org/linux-mm/20190628102003.ga56...@arrakis.emea.arm.com/
> 
> Needs to be applied on linux V5.6-rc2
> 
> Changes in V14:
> 
> - Disabled DEBUG_VM_PGFLAGS for IA64 and ARM (32 Bit) per Andrew and 
> Christophe
> - Updated DEBUG_VM_PGFLAGS documentation wrt EXPERT and disabled platforms
> - Updated RANDOM_[OR|NZ]VALUE open encodings with GENMASK() per Catalin
> - Updated s390 constraint bits from 12 to 4 (S390_MASK_BITS) per Gerald
> - Updated in-code documentation for RANDOM_ORVALUE per Gerald
> - Updated pxx_basic_tests() to use invert functions first per Catalin
> - Dropped ARCH_HAS_4LEVEL_HACK check from pud_basic_tests()
> - Replaced __ARCH_HAS_[4|5]LEVEL_HACK with __PAGETABLE_[PUD|P4D]_FOLDED per 
> Catalin
> - Trimmed the CC list on the commit message per Catalin

Hello Andrew,

As there are no further comments on this patch from last week, wondering
if you would possibly consider this patch. But if you feel there is still
something which need to be taken care here, please do let me know.

Thank you.

- Anshuman


Re: [PATCH v3 05/27] ocxl: Address kernel doc errors & warnings

2020-02-23 Thread Andrew Donnellan

On 21/2/20 2:26 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

This patch addresses warnings and errors from the kernel doc scripts for
the OpenCAPI driver.

It also makes minor tweaks to make the docs more consistent.

Signed-off-by: Alastair D'Silva 


Looks good, fixes all the kerneldoc warnings I get.

Acked-by: Andrew Donnellan 


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [RFC PATCH v2 00/12] Reduce ifdef mess in ptrace

2020-02-23 Thread Michael Neuling
Christophe,

> Le 28/06/2019 à 17:47, Christophe Leroy a écrit :
> > The purpose of this series is to reduce the amount of #ifdefs
> > in ptrace.c
> > 
> 
> Any feedback on this series which aims at fixing the issue you opened at 
> https://github.com/linuxppc/issues/issues/128 ?

Yeah, sorry my bad. You did all the hard work and I ignored it.

I like the approach and is a long the lines I was thinking. Putting it in a
ptrace subdir, splitting out adv_debug_regs, TM, SPE, Alitivec, VSX.
ppc_gethwdinfo() looks a lot nicer now too (that was some of the worst of it).

I've not gone through it with a fine tooth comb though. There is (rightly) a lot
of code moved around which could have introduced some issues.

It applies on v5.2 but are you planning on updating it to a newer base?

Mikey


Re: [PATCH 1/2] powerpc/powernv: Treat an empty reboot string as default

2020-02-23 Thread Alexey Kardashevskiy



On 17/02/2020 13:48, Oliver O'Halloran wrote:
> Treat an empty reboot cmd string the same as a NULL string. This squashes a
> spurious unsupported reboot message that sometimes gets out when using
> xmon.
> 
> Signed-off-by: Oliver O'Halloran 
> ---
>  arch/powerpc/platforms/powernv/setup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/setup.c 
> b/arch/powerpc/platforms/powernv/setup.c
> index 11fdae8..a8fe630 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -229,7 +229,7 @@ static void  __noreturn pnv_restart(char *cmd)
>   pnv_prepare_going_down();
>  
>   do {
> - if (!cmd)
> + if (!cmd || !strlen(cmd))


nit: this does not matter here in practice but

if (!cmd || cmd[0] == '\0')

is faster (you do not care about the length anyway) and safer (@cmd can
potentially be endless) ;)


>   rc = opal_cec_reboot();
>   else if (strcmp(cmd, "full") == 0)
>   rc = opal_cec_reboot2(OPAL_REBOOT_FULL_IPL, NULL);
> 

-- 
Alexey


Re: [PATCH v3 03/27] powerpc: Map & release OpenCAPI LPC memory

2020-02-23 Thread Andrew Donnellan

On 21/2/20 2:26 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

This patch adds platform support to map & release LPC memory.

Signed-off-by: Alastair D'Silva 


Nothing seems obviously wrong here.

Reviewed-by: Andrew Donnellan 


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH 1/2] powerpc/powernv: Treat an empty reboot string as default

2020-02-23 Thread Andrew Donnellan

On 17/2/20 1:48 pm, Oliver O'Halloran wrote:

Treat an empty reboot cmd string the same as a NULL string. This squashes a
spurious unsupported reboot message that sometimes gets out when using
xmon.

Signed-off-by: Oliver O'Halloran 


Pretty sure I've seen that spurious reboot message a few times and never 
thought to check why... this makes sense.


Reviewed-by: Andrew Donnellan 


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH 2/2] powerpc/powernv: Add explicit fast-reboot support

2020-02-23 Thread Andrew Donnellan

On 17/2/20 1:48 pm, Oliver O'Halloran wrote:

Add a way to manually invoke a fast-reboot rather than setting the NVRAM
flag. The idea is to allow userspace to invoke a fast-reboot using the
optional string argument to the reboot() system call, or using the xmon
zr command so we don't need to leave around a persistent changes on
a system to use the feature.

Signed-off-by: Oliver O'Halloran 


Both this and the previous patch have passed snowpatch with no warnings, 
and don't seem to have any obvious issues.


Reviewed-by: Andrew Donnellan 


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH] powerpc: Include .BTF section

2020-02-23 Thread Naveen N. Rao

Michael Ellerman wrote:

"Naveen N. Rao"  writes:

Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld:
  ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in 
section `.BTF'

Include .BTF section in vmlinux explicitly to fix the same.


I don't see any other architectures doing this in their linker script.
Why are we special?


I think this is due to commit 83a092cf95f28 ("powerpc: Link warning for 
orphan sections"):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=83a092cf95f28

If your question was why I didn't include the .BTF section in .data, 
it's because libbpf seems to expect the .BTF section to be separate.



- Naveen

PS: I also see a linker warning for '.gnu.hash' if I enable 
CONFIG_RELOCATABLE, but I couldn't decipher what that was for, and if it 
should be retained.




Re: [PATCH v2 0/5] Track and expose idle PURR and SPURR ticks

2020-02-23 Thread Kamalesh Babulal
On 2/21/20 10:48 AM, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 

[...]

> 
> Gautham R. Shenoy (5):
>   powerpc: Move idle_loop_prolog()/epilog() functions to header file
>   powerpc/idle: Add accessor function to always read latest idle PURR
>   powerpc/pseries: Account for SPURR ticks on idle CPUs
>   powerpc/sysfs: Show idle_purr and idle_spurr for every CPU
>   Documentation: Document sysfs interfaces purr, spurr, idle_purr,
> idle_spurr
> 
>  Documentation/ABI/testing/sysfs-devices-system-cpu | 39 ++
>  arch/powerpc/include/asm/idle.h| 88 
> ++
>  arch/powerpc/kernel/sysfs.c| 54 -
>  arch/powerpc/platforms/pseries/setup.c |  8 +-
>  drivers/cpuidle/cpuidle-pseries.c  | 39 ++
>  5 files changed, 191 insertions(+), 37 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/idle.h
> 

For the whole series:

Tested-by: Kamalesh Babulal 

-- 
Kamalesh



Re: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices

2020-02-23 Thread Alastair D'Silva
On Fri, 2020-02-21 at 08:21 -0800, Dan Williams wrote:
> On Thu, Feb 20, 2020 at 7:28 PM Alastair D'Silva <
> alast...@au1.ibm.com> wrote:
> > From: Alastair D'Silva 
> > 
> > This series adds support for OpenCAPI Persistent Memory devices,
> > exposing
> > them as nvdimms so that we can make use of the existing
> > infrastructure.
> 
> A single sentence to introduce:
> 
> 24 files changed, 3029 insertions(+), 97 deletions(-)
> 
> ...is inadequate. What are OpenCAPI Persistent Memory devices? How do
> they compare, in terms relevant to libnvdimm, to other persistent
> memory devices? What challenges do they pose to the existing
> enabling?
> What is the overall approach taken with this 27 patch break down?
> What
> are the changes since v2, v1? If you incorporated someone's review
> feedback note it in the cover letter changelog, if you didn't
> incorporate someone's feedback note that too with an explanation.
> 
> In short, provide a bridge document for someone familiar with the
> upstream infrastructure, but not necessarily steeped in powernv /
> OpenCAPI platform details, to get started with this code.
> 
> For now, no need to resend the whole series, just reply to this
> message with a fleshed out cover letter and then incorporate it going
> forward for v4+.


Apologies, I was maintaining a changelog, and forgot to include it.
I'll flesh out the cover letter too:

This series adds support for OpenCAPI Persistent Memory devices on bare
metal (arch/powernv), exposing them as nvdimms so that we can make use
of the existing infrastructure. There already exists a driver for the
same devices abstracted through PowerVM (arch/pseries):
arch/powerpc/platforms/pseries/papr_scm.c

These devices are connected via OpenCAPI, and present as LPC (lowest
coherence point) memory to the system, practically, that means that
memory on these cards could be treated as conventional, cache-coherent
memory.

Since the devices are connected via OpenCAPI, they are not enumerated
via ACPI. Instead, OpenCAPI links present as pseudo-PCI bridges, with
devices below them.

This series introduces a driver that exposes the memory on these cards
as nvdimms, with each card getting it's own bus. This is somewhat
complicated by the fact that the cards do not have out of band
persistent storage for metadata, so 1 SECTION_SIZE's (see SPARSEMEM)
worth of storage is carved out of the top of the card storage to
implement the ndctl_config_* calls.

The driver is not responsible for configuring the NPU (NVLink
Processing Unit) BARs to map the LPC memory from the card into the
system's physical address space, instead, it requests this to be done
via OPAL calls (typically implemented by Skiboot).

The series is structured as follows:
 - Required infrastructure changes & cleanup
 - A minimal driver implementation
 - Implementing additional features within the driver

V3:
  - Rebase against next/next-20200220
  - Move driver to arch/powerpc/platforms/powernv, we now expect this
driver to go upstream via the powerpc tree
  - "nvdimm/ocxl: Implement the Read Error Log command"
- Fix bad header path
  - "nvdimm/ocxl: Read the capability registers & wait for device
ready"
- Fix overlapping masks between readiness_timeout &
memory_available_timeout
  - "nvdimm: Add driver for OpenCAPI Storage Class Memory"
- Address minor review comments from Jonathan Cameron
- Remove attributes
- Default to module if building LIBNVDIMM
- Propogate errors up from called functions in probe()
  - "nvdimm/ocxl: Expose SMART data via ndctl"
- Pack attributes in struct
- Support different size SMART buffers for compatibility with
newer
  ndctls that may want more SMART attribs than we provide
- Rework to to use ND_CMD_CALL instead of ND_CMD_SMART
  - drop "ocxl: Free detached contexts in ocxl_context_detach_all()"
  - "powerpc: Map & release OpenCAPI LPC memory"
- Remove 'extern'
- Only available with CONFIG_MEMORY_HOTPLUG_SPARSE
  - "ocxl: Tally up the LPC memory on a link & allow it to be mapped"
- Address minor review comments from Jonathan Cameron
  - "ocxl: Add functions to map/unmap LPC memory"
- Split detected memory message into a separate patch
- Address minor review comments from Jonathan Cameron
- Add a comment explaining why unmap_lpc_mem is in
deconfigure_afu
  - "nvdimm/ocxl: Add support for Admin commands"
- use sizeof(u64) rather than 0x08 when iterating u64s
  - "nvdimm/ocxl: Implement the heartbeat command"
- Fix typo in blurb
  - Address kernel doc issues
  - Ensure all uapi headers use C89 compatible comments
  - Drop patches for firmware update & overwrite, these will be
submitted later once patches are available for ndctl
  - Rename SCM to OpenCAPI Persistent Memory

V2:
  - "powerpc: Map & release OpenCAPI LPC memory"
  - Fix #if -> #ifdef
  - use pci_dev_id to get the bdfn
  - use __be64 to ho

Re: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices

2020-02-23 Thread Matthew Wilcox
On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva wrote:
> V3:
>   - Rebase against next/next-20200220
>   - Move driver to arch/powerpc/platforms/powernv, we now expect this
> driver to go upstream via the powerpc tree

That's rather the opposite direction of normal; mostly drivers live under
drivers/ and not in arch/.  It's easier for drivers to get overlooked
when doing tree-wide changes if they're hiding.



Re: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices

2020-02-23 Thread Alastair D'Silva
On Sun, 2020-02-23 at 20:37 -0800, Matthew Wilcox wrote:
> On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva wrote:
> > V3:
> >   - Rebase against next/next-20200220
> >   - Move driver to arch/powerpc/platforms/powernv, we now expect
> > this
> > driver to go upstream via the powerpc tree
> 
> That's rather the opposite direction of normal; mostly drivers live
> under
> drivers/ and not in arch/.  It's easier for drivers to get overlooked
> when doing tree-wide changes if they're hiding.

This is true, however, given that it was not all that desirable to have
it under drivers/nvdimm, it's sister driver (for the same hardware) is
also under arch, and that we don't expect this driver to be used on any
platform other than powernv, we think this was the most reasonable
place to put it.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v2 1/5] powerpc: Move idle_loop_prolog()/epilog() functions to header file

2020-02-23 Thread Gautham R Shenoy
Hello Nathan,

On Fri, Feb 21, 2020 at 09:03:16AM -0600, Nathan Lynch wrote:
> "Gautham R. Shenoy"  writes:
> 
> > From: "Gautham R. Shenoy" 
> >
> > Currently prior to entering an idle state on a Linux Guest, the
> > pseries cpuidle driver implement an idle_loop_prolog() and
> > idle_loop_epilog() functions which ensure that idle_purr is correctly
> > computed, and the hypervisor is informed that the CPU cycles have been
> > donated.
> >
> > These prolog and epilog functions are also required in the default
> > idle call, i.e pseries_lpar_idle(). Hence move these accessor
> > functions to a common header file and call them from
> > pseries_lpar_idle(). Since the existing header files such as
> > asm/processor.h have enough clutter, create a new header file
> > asm/idle.h.
> >
> > Signed-off-by: Gautham R. Shenoy 
> > ---
> >  arch/powerpc/include/asm/idle.h| 27 +++
> >  arch/powerpc/platforms/pseries/setup.c |  7 +--
> >  drivers/cpuidle/cpuidle-pseries.c  | 24 +---
> >  3 files changed, 33 insertions(+), 25 deletions(-)
> >  create mode 100644 arch/powerpc/include/asm/idle.h
> >
> > diff --git a/arch/powerpc/include/asm/idle.h 
> > b/arch/powerpc/include/asm/idle.h
> > new file mode 100644
> > index 000..f32a7d8
> > --- /dev/null
> > +++ b/arch/powerpc/include/asm/idle.h
> > @@ -0,0 +1,27 @@
> > +#ifndef _ASM_POWERPC_IDLE_H
> > +#define _ASM_POWERPC_IDLE_H
> > +#include 
> > +
> > +static inline void idle_loop_prolog(unsigned long *in_purr)
> > +{
> > +   ppc64_runlatch_off();
> > +   *in_purr = mfspr(SPRN_PURR);
> > +   /*
> > +* Indicate to the HV that we are idle. Now would be
> > +* a good time to find other work to dispatch.
> > +*/
> > +   get_lppaca()->idle = 1;
> > +}
> > +
> > +static inline void idle_loop_epilog(unsigned long in_purr)
> > +{
> > +   u64 wait_cycles;
> > +
> > +   wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
> > +   wait_cycles += mfspr(SPRN_PURR) - in_purr;
> > +   get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> > +   get_lppaca()->idle = 0;
> > +
> > +   ppc64_runlatch_on();
> > +}
> > +#endif
> 
> Looks fine and correct as a cleanup, but asm/include/idle.h and
> idle_loop_prolog, idle_loop_epilog, strike me as too generic for
> pseries-specific code.

Should it be prefixed with pseries , i.e pseries_idle_prolog()
and pseries_idle_epilog() ?

Also, I am planning another round of cleanup to move all the
idle-related declaration from asm/include/processor.h to
asm/include/idle.h




[PATCH V2 0/4] mm/vma: Use all available wrappers when possible

2020-02-23 Thread Anshuman Khandual
Apart from adding a VMA flag readable name for trace purpose, this series
does some open encoding replacements with availabe VMA specific wrappers.
This skips VM_HUGETLB check in vma_migratable() as its already being done
with another patch (https://patchwork.kernel.org/patch/11347831/) which
is yet to be merged.

This series applies on 5.6-rc3. This has been build tested on multiple
platforms, though boot and runtime testing was limited to arm64 and x86.

Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: kvm-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org

Changes in V2:

- Dropped patch [PATCH 4/5] mm/vma: Replacevma_set_anonymous() per Kirril
- Dropped braces around is_vm_hugetlb_page() in kvmppc_e500_shadow_map()
- Replaced two open encodings in mm/mmap.c with vma_is_accessible()
- Added hugetlb headers to prevent build failures wrt is_vm_hugetlb_page()

Changes in V1: (https://patchwork.kernel.org/cover/11385219/)

Anshuman Khandual (4):
  mm/vma: Add missing VMA flag readable name for VM_SYNC
  mm/vma: Make vma_is_accessible() available for general use
  mm/vma: Replace all remaining open encodings with is_vm_hugetlb_page()
  mm/vma: Replace all remaining open encodings with vma_is_anonymous()

 arch/csky/mm/fault.c | 2 +-
 arch/m68k/mm/fault.c | 2 +-
 arch/mips/mm/fault.c | 2 +-
 arch/powerpc/kvm/e500_mmu_host.c | 2 +-
 arch/powerpc/mm/fault.c  | 2 +-
 arch/sh/mm/fault.c   | 2 +-
 arch/x86/mm/fault.c  | 2 +-
 fs/binfmt_elf.c  | 3 ++-
 include/asm-generic/tlb.h| 3 ++-
 include/linux/mm.h   | 5 +
 include/trace/events/mmflags.h   | 1 +
 kernel/events/core.c | 3 ++-
 kernel/sched/fair.c  | 2 +-
 mm/gup.c | 5 +++--
 mm/memory.c  | 5 -
 mm/mempolicy.c   | 3 +--
 mm/mmap.c| 5 ++---
 17 files changed, 26 insertions(+), 23 deletions(-)

-- 
2.20.1



[PATCH V2 2/4] mm/vma: Make vma_is_accessible() available for general use

2020-02-23 Thread Anshuman Khandual
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
available for general use. While here, this replaces all remaining open
encodings for VMA access check with vma_is_accessible().

Cc: Guo Ren 
Cc: Geert Uytterhoeven 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andrew Morton 
Cc: Steven Rostedt 
Cc: Mel Gorman 
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: linux...@kvack.org
Acked-by: Geert Uytterhoeven 
Acked-by: Guo Ren 
Signed-off-by: Anshuman Khandual 
---
 arch/csky/mm/fault.c| 2 +-
 arch/m68k/mm/fault.c| 2 +-
 arch/mips/mm/fault.c| 2 +-
 arch/powerpc/mm/fault.c | 2 +-
 arch/sh/mm/fault.c  | 2 +-
 arch/x86/mm/fault.c | 2 +-
 include/linux/mm.h  | 5 +
 kernel/sched/fair.c | 2 +-
 mm/gup.c| 2 +-
 mm/memory.c | 5 -
 mm/mempolicy.c  | 3 +--
 mm/mmap.c   | 5 ++---
 12 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/arch/csky/mm/fault.c b/arch/csky/mm/fault.c
index f76618b630f9..4b3511b8298d 100644
--- a/arch/csky/mm/fault.c
+++ b/arch/csky/mm/fault.c
@@ -137,7 +137,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, 
unsigned long write,
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
} else {
-   if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
+   if (!vma_is_accessible(vma))
goto bad_area;
}
 
diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index e9b1d7585b43..d5131ec5d923 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -125,7 +125,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
case 1: /* read, present */
goto acc_err;
case 0: /* read, not present */
-   if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))
+   if (!vma_is_accessible(vma))
goto acc_err;
}
 
diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c
index 1e8d00793784..5b9f947bfa32 100644
--- a/arch/mips/mm/fault.c
+++ b/arch/mips/mm/fault.c
@@ -142,7 +142,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, 
unsigned long write,
goto bad_area;
}
} else {
-   if (!(vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)))
+   if (!vma_is_accessible(vma))
goto bad_area;
}
}
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 8db0507619e2..71a3658c516b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -314,7 +314,7 @@ static bool access_error(bool is_write, bool is_exec,
return false;
}
 
-   if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE
+   if (unlikely(!vma_is_accessible(vma)))
return true;
/*
 * We should ideally do the vma pkey access check here. But in the
diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c
index 5f51456f4fc7..a8c4253f37d7 100644
--- a/arch/sh/mm/fault.c
+++ b/arch/sh/mm/fault.c
@@ -355,7 +355,7 @@ static inline int access_error(int error_code, struct 
vm_area_struct *vma)
return 1;
 
/* read, not present: */
-   if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE
+   if (unlikely(!vma_is_accessible(vma)))
return 1;
 
return 0;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fa4ea09593ab..c461eaab0306 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1200,7 +1200,7 @@ access_error(unsigned long error_code, struct 
vm_area_struct *vma)
return 1;
 
/* read, not present: */
-   if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE
+   if (unlikely(!vma_is_accessible(vma)))
return 1;
 
return 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..b0e53ef13ff1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -541,6 +541,11 @@ static inline bool vma_is_anonymous(struct vm_area_struct 
*vma)
return !vma->vm_ops;
 }
 
+static inline bool vma_is_accessible(struct vm_area_struct *vma)
+{
+   return vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC);
+}
+
 #ifdef CONFIG_SHMEM
 /*
  * The vma_is_shmem is not inline because it is used only by slow
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c8a379c357e..bf15cc72695e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2573,7 +257

[PATCH V2 3/4] mm/vma: Replace all remaining open encodings with is_vm_hugetlb_page()

2020-02-23 Thread Anshuman Khandual
This replaces all remaining open encodings with is_vm_hugetlb_page().

Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Alexander Viro 
Cc: Will Deacon 
Cc: "Aneesh Kumar K.V" 
Cc: Andrew Morton 
Cc: Nick Piggin 
Cc: Peter Zijlstra 
Cc: Arnd Bergmann 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: kvm-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/kvm/e500_mmu_host.c | 2 +-
 fs/binfmt_elf.c  | 3 ++-
 include/asm-generic/tlb.h| 3 ++-
 kernel/events/core.c | 3 ++-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 425d13806645..df9989cf7ba3 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -422,7 +422,7 @@ static inline int kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
break;
}
} else if (vma && hva >= vma->vm_start &&
-  (vma->vm_flags & VM_HUGETLB)) {
+  is_vm_hugetlb_page(vma)) {
unsigned long psize = vma_kernel_pagesize(vma);
 
tsize = (gtlbe->mas1 & MAS1_TSIZE_MASK) >>
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index f4713ea76e82..1eb63867e266 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1317,7 +1318,7 @@ static unsigned long vma_dump_size(struct vm_area_struct 
*vma,
}
 
/* Hugetlb memory check */
-   if (vma->vm_flags & VM_HUGETLB) {
+   if (is_vm_hugetlb_page(vma)) {
if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
goto whole;
if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index f391f6b500b4..3f1649a8cf55 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -398,7 +399,7 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct 
vm_area_struct *vma)
 * We rely on tlb_end_vma() to issue a flush, such that when we reset
 * these values the batch is empty.
 */
-   tlb->vma_huge = !!(vma->vm_flags & VM_HUGETLB);
+   tlb->vma_huge = is_vm_hugetlb_page(vma);
tlb->vma_exec = !!(vma->vm_flags & VM_EXEC);
 }
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e453589da97c..ef5be3ed0580 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -7693,7 +7694,7 @@ static void perf_event_mmap_event(struct perf_mmap_event 
*mmap_event)
flags |= MAP_EXECUTABLE;
if (vma->vm_flags & VM_LOCKED)
flags |= MAP_LOCKED;
-   if (vma->vm_flags & VM_HUGETLB)
+   if (is_vm_hugetlb_page(vma))
flags |= MAP_HUGETLB;
 
if (file) {
-- 
2.20.1



Re: [PATCH v2 3/5] powerpc/pseries: Account for SPURR ticks on idle CPUs

2020-02-23 Thread Gautham R Shenoy
Hello Nathan,

On Fri, Feb 21, 2020 at 10:47:41AM -0600, Nathan Lynch wrote:
> "Gautham R. Shenoy"  writes:
> > +static inline void snapshot_spurr_idle_entry(void)
> > +{
> > +   *this_cpu_ptr(&idle_entry_spurr_snap) = mfspr(SPRN_SPURR);
> > +}
> > +
> 
> [...]
> 
> > +static inline void update_idle_spurr_accounting(void)
> > +{
> > +   u64 *idle_spurr_cycles_ptr = this_cpu_ptr(&idle_spurr_cycles);
> > +   u64 in_spurr = *this_cpu_ptr(&idle_entry_spurr_snap);
> > +
> > +   *idle_spurr_cycles_ptr += mfspr(SPRN_SPURR) - in_spurr;
> > +}
> 
> [...]
> 
> > +static inline u64 read_this_idle_spurr(void)
> > +{
> > +   /*
> > +* If we are reading from an idle context, update the
> > +* idle-spurr cycles corresponding to the last idle period.
> > +* Since the idle context is not yet over, take a fresh
> > +* snapshot of the idle-spurr.
> > +*/
> > +   if (get_lppaca()->idle == 1) {
> > +   update_idle_spurr_accounting();
> > +   snapshot_spurr_idle_entry();
> 
> This samples spurr twice when it could do with just one. I don't know
> the performance implications, but will the results be coherent?


We would have taken the snapshot in idle_loop_prolog() just before
entering idle. That fact that the "if" condition is true above in
read_this_idle_spurr() implies that we are reading the idle_spurr
value from an interrupt context and since get_lppaca()->idle == 1, we
haven't yet called idle_loop_epilog(), where we would have updated the
idle_spurr ticks for the last idle period.

Hence, in this function, we first update the idle_spurr accounting
from the time of the last snapshot to now. We update the snapshot to
the current SPURR value so that when we eventually call
idle_loop_epilog(), we will account for the remaining idle duration,
i.e from the read_this_idle_spurr() call to idle_loop_epilog()

The results are therefore coherant, in that we do not perform double
accounting the second time we invoke update_idle_spurr_accounting()
from idle_loop_epilog(), but only add the spurr ticks from
read_this_idle_spurr() to idle_loop_epilog().

--
Thanks and Regards
gautham.


Re: [PATCH v2 4/5] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

2020-02-23 Thread Gautham R Shenoy
On Fri, Feb 21, 2020 at 10:50:12AM -0600, Nathan Lynch wrote:
> "Gautham R. Shenoy"  writes:
> > diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> > index 80a676d..5b4b450 100644
> > --- a/arch/powerpc/kernel/sysfs.c
> > +++ b/arch/powerpc/kernel/sysfs.c
> > @@ -19,6 +19,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  #include "cacheinfo.h"
> > @@ -733,6 +734,42 @@ static void create_svm_file(void)
> >  }
> >  #endif /* CONFIG_PPC_SVM */
> >  
> > +static void read_idle_purr(void *val)
> > +{
> > +   u64 *ret = (u64 *)val;
> 
> No cast from void* needed.

Will fix this. Thanks.

> 
> 
> > +
> > +   *ret = read_this_idle_purr();
> > +}
> > +
> > +static ssize_t idle_purr_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > +   struct cpu *cpu = container_of(dev, struct cpu, dev);
> > +   u64 val;
> > +
> > +   smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
> > +   return sprintf(buf, "%llx\n", val);
> > +}
> > +static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);
> > +
> > +static void read_idle_spurr(void *val)
> > +{
> > +   u64 *ret = (u64 *)val;
> > +
> > +   *ret = read_this_idle_spurr();
> > +}
> > +
> > +static ssize_t idle_spurr_show(struct device *dev,
> > +  struct device_attribute *attr, char *buf)
> > +{
> > +   struct cpu *cpu = container_of(dev, struct cpu, dev);
> > +   u64 val;
> > +
> > +   smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
> > +   return sprintf(buf, "%llx\n", val);
> > +}
> > +static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);
> 
> It's regrettable that we have to wake up potentially idle CPUs in order
> to derive correct idle statistics for them, but I suppose the main user
> (lparstat) of these interfaces already is causing this to happen by
> polling the existing per-cpu purr and spurr attributes.
> 
> So now lparstat will incur at minimum four syscalls and four IPIs per
> CPU per polling interval -- one for each of purr, spurr, idle_purr and
> idle_spurr. Correct?

Yes, it is unforunate that we will end up making four syscalls and
generating IPI noise, and this is something that I discussed with
Naveen and Kamalesh. We have the following two constraints:

1) These values of PURR and SPURR required are per-cpu. Hence putting
them in lparcfg is not an option.

2) sysfs semantics encourages a single value per key, the key being
the sysfs-file. Something like the following would have made far more
sense.

cat /sys/devices/system/cpu/cpuX/purr_spurr_accounting
purr:A
idle_purr:B
spurr:C
idle_spurr:D

There are some sysfs files which allow something like this. Eg: 
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state

Thoughts on any other alternatives?


> 
> At some point it's going to make sense to batch sampling of remote CPUs'
> SPRs.
> 
> 
> >  static int register_cpu_online(unsigned int cpu)
> >  {
> > struct cpu *c = &per_cpu(cpu_devices, cpu);
> > @@ -794,10 +831,15 @@ static int register_cpu_online(unsigned int cpu)
> > if (!firmware_has_feature(FW_FEATURE_LPAR))
> > add_write_permission_dev_attr(&dev_attr_purr);
> > device_create_file(s, &dev_attr_purr);
> > +   if (firmware_has_feature(FW_FEATURE_LPAR))
> > +   device_create_file(s, &dev_attr_idle_purr);
> > }
> >  
> > -   if (cpu_has_feature(CPU_FTR_SPURR))
> > +   if (cpu_has_feature(CPU_FTR_SPURR)) {
> > device_create_file(s, &dev_attr_spurr);
> > +   if (firmware_has_feature(FW_FEATURE_LPAR))
> > +   device_create_file(s, &dev_attr_idle_spurr);
> > +   }
> >  
> > if (cpu_has_feature(CPU_FTR_DSCR))
> > device_create_file(s, &dev_attr_dscr);
> > @@ -879,11 +921,17 @@ static int unregister_cpu_online(unsigned int cpu)
> > if (cpu_has_feature(CPU_FTR_MMCRA))
> > device_remove_file(s, &dev_attr_mmcra);
> >  
> > -   if (cpu_has_feature(CPU_FTR_PURR))
> > +   if (cpu_has_feature(CPU_FTR_PURR)) {
> > device_remove_file(s, &dev_attr_purr);
> > +   if (firmware_has_feature(FW_FEATURE_LPAR))
> > +   device_remove_file(s, &dev_attr_idle_purr);
> > +   }
> >  
> > -   if (cpu_has_feature(CPU_FTR_SPURR))
> > +   if (cpu_has_feature(CPU_FTR_SPURR)) {
> > device_remove_file(s, &dev_attr_spurr);
> > +   if (firmware_has_feature(FW_FEATURE_LPAR))
> > +   device_remove_file(s, &dev_attr_idle_spurr);
> > +   }
> >  
> > if (cpu_has_feature(CPU_FTR_DSCR))
> > device_remove_file(s, &dev_attr_dscr);
> 
> The cpu register/unregister stuff here looks correct.

Thanks for reviewing the patch.

--
Thanks and Regards
gautham.


Re: [PATCH v2 5/5] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr

2020-02-23 Thread Gautham R Shenoy
On Fri, Feb 21, 2020 at 10:55:07AM -0600, Nathan Lynch wrote:
> "Gautham R. Shenoy"  writes:
> > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
> > b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > index 2e0e3b4..799dc737a 100644
> > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > @@ -580,3 +580,42 @@ Description:   Secure Virtual Machine
> > If 1, it means the system is using the Protected Execution
> > Facility in POWER9 and newer processors. i.e., it is a Secure
> > Virtual Machine.
> > +
> > +What:  /sys/devices/system/cpu/cpuX/purr
> > +Date:  Apr 2005
> > +Contact:   Linux for PowerPC mailing list 
> > +Description:   PURR ticks for this CPU since the system boot.
> > +
> > +   The Processor Utilization Resources Register (PURR) is
> > +   a 64-bit counter which provides an estimate of the
> > +   resources used by the CPU thread. The contents of this
> > +   register increases monotonically. This sysfs interface
> > +   exposes the number of PURR ticks for cpuX.
> > +
> > +What:  /sys/devices/system/cpu/cpuX/spurr
> > +Date:  Dec 2006
> > +Contact:   Linux for PowerPC mailing list 
> > +Description:   SPURR ticks for this CPU since the system boot.
> > +
> > +   The Scaled Processor Utilization Resources Register
> > +   (SPURR) is a 64-bit counter that provides a frequency
> > +   invariant estimate of the resources used by the CPU
> > +   thread. The contents of this register increases
> > +   monotonically. This sysfs interface exposes the number
> > +   of SPURR ticks for cpuX.
> > +
> > +What:  /sys/devices/system/cpu/cpuX/idle_purr
> > +Date:  Nov 2019
> > +Contact:   Linux for PowerPC mailing list 
> > +Description:   PURR ticks for cpuX when it was idle.
> > +
> > +   This sysfs interface exposes the number of PURR ticks
> > +   for cpuX when it was idle.
> > +
> > +What:  /sys/devices/system/cpu/cpuX/spurr
> 
> Copy-paste error? This should be:

Yes, this should have been idle_spurr. Will fix it in the next
version.

> 
> /sys/devices/system/cpu/cpuX/idle_spurr
> 
> > +Date:  Nov 2019
> 
> And I suppose Nov 2019 is no longer accurate.

My bad. I will resend this with the updated date.


> 
> 
> > +Contact:   Linux for PowerPC mailing list 
> > +Description:   SPURR ticks for cpuX when it was idle.
> > +
> > +   This sysfs interface exposes the number of SPURR ticks
> > +   for cpuX when it was idle.
> > -- 
> > 1.9.4


Re: [PATCH v2 1/8] powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 events run

2020-02-23 Thread kajoljain



On 2/23/20 8:21 AM, Sukadev Bhattiprolu wrote:
> Kajol Jain [kj...@linux.ibm.com] wrote:
>> Commit 2b206ee6b0df ("powerpc/perf/hv-24x7: Display change in counter
>> values")' added to print _change_ in the counter value rather then raw
>> value for 24x7 counters. Incase of transactions, the event count
>> is set to 0 at the beginning of the transaction. It also sets
>> the event's prev_count to the raw value at the time of initialization.
>> Because of setting event count to 0, we are seeing some weird behaviour,
>> whenever we run multiple 24x7 events at a time.
> 
> Interesting. Are we taking delta of a delta and ending up with large
> negative values in the -I case?  However...
> 

Hi Sukadev,
   That's right, we are ending up in calculating delta of delta which may 
give us negative values
because of which we are getting these large values in -I case.

> 
> 
>>
>> Signed-off-by: Kajol Jain 
>> ---
>>  arch/powerpc/perf/hv-24x7.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
>> index 573e0b309c0c..6dbbf70232aa 100644
>> --- a/arch/powerpc/perf/hv-24x7.c
>> +++ b/arch/powerpc/perf/hv-24x7.c
>> @@ -1409,7 +1409,7 @@ static void h_24x7_event_read(struct perf_event *event)
>>   * that would require issuing a hcall, which would then
>>   * defeat the purpose of using the txn interface.
>>   */
>> -local64_set(&event->count, 0);
>> +local64_add(0, &event->count);
> 
> ... not sure, how adding zero to the count helps. Should we just remove the
> line (and the comment block above it)?  Or does it help to clear the event
> count in ->start_txn() rather than on read()?

Its not impacting much as we are just adding and not setting event count.I 
think we can remove that line
with the comment added. Will it be ok if I remove that whole part?

> 
> How does the change impact the counts when run without the -I?

There won't be much impact because from my understanding you did add 
`(void)local64_xchg(&event->hw.prev_count, ct);`, to print change value in your
Commit 2b206ee6b0df in function 'h_24x7_event_init()'.
And we will go through this path in case of 'without -I'. Please let me know if 
my
understanding is correct.

Thanks,
Kajol

> 
> Thanks for chasing this down.
> 
> Sukadev
> 


Re: [PATCH v3 06/27] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2020-02-23 Thread Andrew Donnellan

On 21/2/20 2:26 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

Tally up the LPC memory on an OpenCAPI link & allow it to be mapped

Signed-off-by: Alastair D'Silva 


This commit message is a bit short and could do with some further 
explanation.


In particular - it's worth explaining why the tracking of available LPC 
memory needs to be done at a link level, because a single OpenCAPI card 
can have multiple PCI functions, each with multiple AFUs which define an 
amount of LPC memory they have, even if the common case is expected to 
be a single function with a single AFU and thus one LPC area per link.


Snowpatch has a few checkpatch issues to report:

https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/11800//artifact/linux/checkpatch.log

The code generally looks okay to me.


diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 198e4e4bc51d..d0c8c4838f42 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -142,4 +142,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 
offset);
  u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
  void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
  
+/**

+ * ocxl_link_add_lpc_mem() - Increment the amount of memory required by an 
OpenCAPI link
+ *
+ * @link_handle: The OpenCAPI link handle
+ * @offset: The offset of the memory to add
+ * @size: The amount of memory to increment by
+ *
+ * Returns 0 on success, negative on overflow
+ */


I think "amount of memory required" isn't the best way to express this.

Might as well explicitly say -EINVAL on overflow.

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH v3 06/27] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2020-02-23 Thread Alastair D'Silva
On Mon, 2020-02-24 at 16:25 +1100, Andrew Donnellan wrote:
> On 21/2/20 2:26 pm, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > Tally up the LPC memory on an OpenCAPI link & allow it to be mapped
> > 
> > Signed-off-by: Alastair D'Silva 
> 
> This commit message is a bit short and could do with some further 
> explanation.
> 
> In particular - it's worth explaining why the tracking of available
> LPC 
> memory needs to be done at a link level, because a single OpenCAPI
> card 
> can have multiple PCI functions, each with multiple AFUs which define
> an 
> amount of LPC memory they have, even if the common case is expected
> to 
> be a single function with a single AFU and thus one LPC area per
> link.

Ok

> 
> Snowpatch has a few checkpatch issues to report:
> 
> https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/11800//artifact/linux/checkpatch.log
> 

Gah, I could have sworn I ran checkpatch against this :/

> The code generally looks okay to me.
> 
> > diff --git a/drivers/misc/ocxl/ocxl_internal.h
> > b/drivers/misc/ocxl/ocxl_internal.h
> > index 198e4e4bc51d..d0c8c4838f42 100644
> > --- a/drivers/misc/ocxl/ocxl_internal.h
> > +++ b/drivers/misc/ocxl/ocxl_internal.h
> > @@ -142,4 +142,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context
> > *ctx, u64 offset);
> >   u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
> >   void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
> >   
> > +/**
> > + * ocxl_link_add_lpc_mem() - Increment the amount of memory
> > required by an OpenCAPI link
> > + *
> > + * @link_handle: The OpenCAPI link handle
> > + * @offset: The offset of the memory to add
> > + * @size: The amount of memory to increment by
> > + *
> > + * Returns 0 on success, negative on overflow
> > + */
> 
> I think "amount of memory required" isn't the best way to express
> this.
> 
> Might as well explicitly say -EINVAL on overflow.
> 

Ok

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v3 01/27] powerpc: Add OPAL calls for LPC memory alloc/release

2020-02-23 Thread Andrew Donnellan

On 21/2/20 2:26 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

Add OPAL calls for LPC memory alloc/release

Signed-off-by: Alastair D'Silva 
Acked-by: Andrew Donnellan 
Acked-by: Frederic Barrat 


Summary line should be "powerpc/powernv".


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH v3 03/27] powerpc: Map & release OpenCAPI LPC memory

2020-02-23 Thread Andrew Donnellan

On 24/2/20 1:51 pm, Andrew Donnellan wrote:

On 21/2/20 2:26 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

This patch adds platform support to map & release LPC memory.

Signed-off-by: Alastair D'Silva 


Nothing seems obviously wrong here.

Reviewed-by: Andrew Donnellan 


Oh, commit message nitpick :)

Summary should be powerpc/powernv. Commit message should explain that 
this is for the powernv platform and presents an interface that drivers 
can use to make use of the new OPAL calls.


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH v3 01/27] powerpc: Add OPAL calls for LPC memory alloc/release

2020-02-23 Thread Alastair D'Silva
On Mon, 2020-02-24 at 16:49 +1100, Andrew Donnellan wrote:
> On 21/2/20 2:26 pm, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > Add OPAL calls for LPC memory alloc/release
> > 
> > Signed-off-by: Alastair D'Silva 
> > Acked-by: Andrew Donnellan 
> > Acked-by: Frederic Barrat 
> 
> Summary line should be "powerpc/powernv".
> 
> 

Ok

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [RFC PATCH v2 00/12] Reduce ifdef mess in ptrace

2020-02-23 Thread Christophe Leroy




Le 24/02/2020 à 03:15, Michael Neuling a écrit :

Christophe,


Le 28/06/2019 à 17:47, Christophe Leroy a écrit :

The purpose of this series is to reduce the amount of #ifdefs
in ptrace.c



Any feedback on this series which aims at fixing the issue you opened at
https://github.com/linuxppc/issues/issues/128 ?


Yeah, sorry my bad. You did all the hard work and I ignored it.

I like the approach and is a long the lines I was thinking. Putting it in a
ptrace subdir, splitting out adv_debug_regs, TM, SPE, Alitivec, VSX.
ppc_gethwdinfo() looks a lot nicer now too (that was some of the worst of it).

I've not gone through it with a fine tooth comb though. There is (rightly) a lot
of code moved around which could have introduced some issues.

It applies on v5.2 but are you planning on updating it to a newer base?



As you noticed there is a lot of code moved around, and rebasing 
produces a lot of conflicts. So I didn't want to spend hours to rebase 
and rebase without being sure it was the right approach.


Now that I got a positive feedback I'll consider rebasing it, hopping 
that Michael will pick it up.


Christophe


Re: [PATCH v3 07/27] ocxl: Add functions to map/unmap LPC memory

2020-02-23 Thread Andrew Donnellan

On 21/2/20 2:27 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

Add functions to map/unmap LPC memory

Signed-off-by: Alastair D'Silva 
---
  drivers/misc/ocxl/core.c  | 51 +++
  drivers/misc/ocxl/ocxl_internal.h |  3 ++
  include/misc/ocxl.h   | 21 +
  3 files changed, 75 insertions(+)

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index 2531c6cf19a0..75ff14e3882a 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -210,6 +210,56 @@ static void unmap_mmio_areas(struct ocxl_afu *afu)
release_fn_bar(afu->fn, afu->config.global_mmio_bar);
  }
  
+int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu)

+{
+   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+   if ((afu->config.lpc_mem_size + afu->config.special_purpose_mem_size) 
== 0)
+   return 0;


I'd prefer the comparison here to be:

  afu->config.lpc_mem_size == 0 &&
afu->config.special_purpose_mem_size == 0

so a reader doesn't have to think about what this means.


+
+   afu->lpc_base_addr = ocxl_link_lpc_map(afu->fn->link, dev);
+   if (afu->lpc_base_addr == 0)
+   return -EINVAL;
+
+   if (afu->config.lpc_mem_size > 0) {
+   afu->lpc_res.start = afu->lpc_base_addr + 
afu->config.lpc_mem_offset;


Maybe not for this series - hmm, I wonder if we should print a warning 
somewhere (maybe in read_afu_lpc_memory_info()?) if we see the case 
where (lpc_mem_offset > 0 && lpc_mem_size == 0). Likewise for special 
purpose?



+   afu->lpc_res.end = afu->lpc_res.start + 
afu->config.lpc_mem_size - 1;
+   }
+
+   if (afu->config.special_purpose_mem_size > 0) {
+   afu->special_purpose_res.start = afu->lpc_base_addr +
+
afu->config.special_purpose_mem_offset;
+   afu->special_purpose_res.end = afu->special_purpose_res.start +
+  
afu->config.special_purpose_mem_size - 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ocxl_afu_map_lpc_mem);
+
+struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
+{
+   return &afu->lpc_res;
+}
+EXPORT_SYMBOL_GPL(ocxl_afu_lpc_mem);


What's the point of this function? A layer of indirection just in case 
we need it in future?



+
+static void unmap_lpc_mem(struct ocxl_afu *afu)
+{
+   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+   if (afu->lpc_res.start || afu->special_purpose_res.start) {
+   void *link = afu->fn->link;
+
+   // only release the link when the the last consumer calls 
release
+   ocxl_link_lpc_release(link, dev);
+
+   afu->lpc_res.start = 0;
+   afu->lpc_res.end = 0;
+   afu->special_purpose_res.start = 0;
+   afu->special_purpose_res.end = 0;
+   }
+}
+
  static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev 
*dev)
  {
int rc;
@@ -251,6 +301,7 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, 
struct pci_dev *dev)
  
  static void deconfigure_afu(struct ocxl_afu *afu)

  {
+   unmap_lpc_mem(afu);
unmap_mmio_areas(afu);
reclaim_afu_pasid(afu);
reclaim_afu_actag(afu);
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index d0c8c4838f42..ce0cac1da416 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -52,6 +52,9 @@ struct ocxl_afu {
void __iomem *global_mmio_ptr;
u64 pp_mmio_start;
void *private;
+   u64 lpc_base_addr; /* Covers both LPC & special purpose memory */
+   struct resource lpc_res;
+   struct resource special_purpose_res;
  };
  
  enum ocxl_context_status {

diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 357ef1aadbc0..d8b0b4d46bfb 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -203,6 +203,27 @@ int ocxl_irq_set_handler(struct ocxl_context *ctx, int 
irq_id,
  
  // AFU Metadata
  
+/**

+ * ocxl_afu_map_lpc_mem() - Map the LPC system & special purpose memory for an 
AFU
+ * Do not call this during device discovery, as there may me multiple


be


+ * devices on a link, and the memory is mapped for the whole link, not
+ * just one device. It should only be called after all devices have
+ * registered their memory on the link.
+ *
+ * @afu: The AFU that has the LPC memory to map
+ *
+ * Returns 0 on success, negative on failure
+ */
+int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu);
+
+/**
+ * ocxl_afu_lpc_mem() - Get the physical address range of LPC memory for an AFU
+ * @afu: The AFU associated with the LPC memory
+ *
+ * Returns a pointer to the resource struct for the physical address range
+ */
+struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu);
+
  /**
   * ocxl_afu_config() - Get a pointer to the config for an AFU
   * @afu: a pointer to the AFU to get the config 

Re: [PATCH v3 08/27] ocxl: Emit a log message showing how much LPC memory was detected

2020-02-23 Thread Andrew Donnellan

On 21/2/20 2:27 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

This patch emits a message showing how much LPC memory & special purpose
memory was detected on an OCXL device.

Signed-off-by: Alastair D'Silva 
---
  drivers/misc/ocxl/config.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index a62e3d7db2bf..701ae6216abf 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev,
afu->special_purpose_mem_size =
total_mem_size - lpc_mem_size;
}
+
+   dev_info(&dev->dev, "Probed LPC memory of %#llx bytes and special purpose 
memory of %#llx bytes\n",
+   afu->lpc_mem_size, afu->special_purpose_mem_size);
+


Printing this at info level for every single AFU seems a bit noisy. 
Perhaps we can print it only if LPC memory is > 0?


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH v3 07/27] ocxl: Add functions to map/unmap LPC memory

2020-02-23 Thread Alastair D'Silva
On Mon, 2020-02-24 at 17:02 +1100, Andrew Donnellan wrote:
> On 21/2/20 2:27 pm, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > Add functions to map/unmap LPC memory
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   drivers/misc/ocxl/core.c  | 51
> > +++
> >   drivers/misc/ocxl/ocxl_internal.h |  3 ++
> >   include/misc/ocxl.h   | 21 +
> >   3 files changed, 75 insertions(+)
> > 
> > diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> > index 2531c6cf19a0..75ff14e3882a 100644
> > --- a/drivers/misc/ocxl/core.c
> > +++ b/drivers/misc/ocxl/core.c
> > @@ -210,6 +210,56 @@ static void unmap_mmio_areas(struct ocxl_afu
> > *afu)
> > release_fn_bar(afu->fn, afu->config.global_mmio_bar);
> >   }
> >   
> > +int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> > +
> > +   if ((afu->config.lpc_mem_size + afu-
> > >config.special_purpose_mem_size) == 0)
> > +   return 0;
> 
> I'd prefer the comparison here to be:
> 
>afu->config.lpc_mem_size == 0 &&
>  afu->config.special_purpose_mem_size == 0
> 
> so a reader doesn't have to think about what this means.
> 

Ok

> > +
> > +   afu->lpc_base_addr = ocxl_link_lpc_map(afu->fn->link, dev);
> > +   if (afu->lpc_base_addr == 0)
> > +   return -EINVAL;
> > +
> > +   if (afu->config.lpc_mem_size > 0) {
> > +   afu->lpc_res.start = afu->lpc_base_addr + afu-
> > >config.lpc_mem_offset;
> 
> Maybe not for this series - hmm, I wonder if we should print a
> warning 
> somewhere (maybe in read_afu_lpc_memory_info()?) if we see the case 
> where (lpc_mem_offset > 0 && lpc_mem_size == 0). Likewise for
> special 
> purpose?
> 

Sounds reasonable, might as well add it here since there are other LPC
changes.

> > +   afu->lpc_res.end = afu->lpc_res.start + afu-
> > >config.lpc_mem_size - 1;
> > +   }
> > +
> > +   if (afu->config.special_purpose_mem_size > 0) {
> > +   afu->special_purpose_res.start = afu->lpc_base_addr +
> > +afu-
> > >config.special_purpose_mem_offset;
> > +   afu->special_purpose_res.end = afu-
> > >special_purpose_res.start +
> > +  afu-
> > >config.special_purpose_mem_size - 1;
> > +   }
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(ocxl_afu_map_lpc_mem);
> > +
> > +struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   return &afu->lpc_res;
> > +}
> > +EXPORT_SYMBOL_GPL(ocxl_afu_lpc_mem);
> 
> What's the point of this function? A layer of indirection just in
> case 
> we need it in future?
> 

struct ocxl_afu is opaque outsite the ocxl driver.

> > +
> > +static void unmap_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> > +
> > +   if (afu->lpc_res.start || afu->special_purpose_res.start) {
> > +   void *link = afu->fn->link;
> > +
> > +   // only release the link when the the last consumer
> > calls release
> > +   ocxl_link_lpc_release(link, dev);
> > +
> > +   afu->lpc_res.start = 0;
> > +   afu->lpc_res.end = 0;
> > +   afu->special_purpose_res.start = 0;
> > +   afu->special_purpose_res.end = 0;
> > +   }
> > +}
> > +
> >   static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct
> > pci_dev *dev)
> >   {
> > int rc;
> > @@ -251,6 +301,7 @@ static int configure_afu(struct ocxl_afu *afu,
> > u8 afu_idx, struct pci_dev *dev)
> >   
> >   static void deconfigure_afu(struct ocxl_afu *afu)
> >   {
> > +   unmap_lpc_mem(afu);
> > unmap_mmio_areas(afu);
> > reclaim_afu_pasid(afu);
> > reclaim_afu_actag(afu);
> > diff --git a/drivers/misc/ocxl/ocxl_internal.h
> > b/drivers/misc/ocxl/ocxl_internal.h
> > index d0c8c4838f42..ce0cac1da416 100644
> > --- a/drivers/misc/ocxl/ocxl_internal.h
> > +++ b/drivers/misc/ocxl/ocxl_internal.h
> > @@ -52,6 +52,9 @@ struct ocxl_afu {
> > void __iomem *global_mmio_ptr;
> > u64 pp_mmio_start;
> > void *private;
> > +   u64 lpc_base_addr; /* Covers both LPC & special purpose memory
> > */
> > +   struct resource lpc_res;
> > +   struct resource special_purpose_res;
> >   };
> >   
> >   enum ocxl_context_status {
> > diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> > index 357ef1aadbc0..d8b0b4d46bfb 100644
> > --- a/include/misc/ocxl.h
> > +++ b/include/misc/ocxl.h
> > @@ -203,6 +203,27 @@ int ocxl_irq_set_handler(struct ocxl_context
> > *ctx, int irq_id,
> >   
> >   // AFU Metadata
> >   
> > +/**
> > + * ocxl_afu_map_lpc_mem() - Map the LPC system & special purpose
> > memory for an AFU
> > + * Do not call this during device discovery, as there may me
> > multiple
> 
> be
> 
> > + * devices on a link, and the memory is mapped for the whole link,
> > not
> > + * just one device. It should only be called after all devices
> > have
> > + * registered their memor

Re: [PATCH v3 08/27] ocxl: Emit a log message showing how much LPC memory was detected

2020-02-23 Thread Alastair D'Silva
On Mon, 2020-02-24 at 17:06 +1100, Andrew Donnellan wrote:
> On 21/2/20 2:27 pm, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > This patch emits a message showing how much LPC memory & special
> > purpose
> > memory was detected on an OCXL device.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   drivers/misc/ocxl/config.c | 4 
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/misc/ocxl/config.c
> > b/drivers/misc/ocxl/config.c
> > index a62e3d7db2bf..701ae6216abf 100644
> > --- a/drivers/misc/ocxl/config.c
> > +++ b/drivers/misc/ocxl/config.c
> > @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct
> > pci_dev *dev,
> > afu->special_purpose_mem_size =
> > total_mem_size - lpc_mem_size;
> > }
> > +
> > +   dev_info(&dev->dev, "Probed LPC memory of %#llx bytes and
> > special purpose memory of %#llx bytes\n",
> > +   afu->lpc_mem_size, afu->special_purpose_mem_size);
> > +
> 
> Printing this at info level for every single AFU seems a bit noisy. 
> Perhaps we can print it only if LPC memory is > 0?
> 

There is an early exit before this if there is no LPC memory.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v3 08/27] ocxl: Emit a log message showing how much LPC memory was detected

2020-02-23 Thread Andrew Donnellan

On 24/2/20 5:10 pm, Alastair D'Silva wrote:

Printing this at info level for every single AFU seems a bit noisy.
Perhaps we can print it only if LPC memory is > 0?



There is an early exit before this if there is no LPC memory.



Noted, I'd missed that amidst all the early returns for errors.

In that case

Acked-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices

2020-02-23 Thread Oliver O'Halloran
On Mon, Feb 24, 2020 at 3:43 PM Alastair D'Silva  wrote:
>
> On Sun, 2020-02-23 at 20:37 -0800, Matthew Wilcox wrote:
> > On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva wrote:
> > > V3:
> > >   - Rebase against next/next-20200220
> > >   - Move driver to arch/powerpc/platforms/powernv, we now expect
> > > this
> > > driver to go upstream via the powerpc tree
> >
> > That's rather the opposite direction of normal; mostly drivers live
> > under
> > drivers/ and not in arch/.  It's easier for drivers to get overlooked
> > when doing tree-wide changes if they're hiding.
>
> This is true, however, given that it was not all that desirable to have
> it under drivers/nvdimm, it's sister driver (for the same hardware) is
> also under arch, and that we don't expect this driver to be used on any
> platform other than powernv, we think this was the most reasonable
> place to put it.

Historically powernv specific platform drivers go in their respective
subsystem trees rather than in arch/ and I'd prefer we kept it that
way. When I added the papr_scm driver I put it in the pseries platform
directory because most of the pseries paravirt code lives there for
some reason; I don't know why. Luckily for me that followed the same
model that Dan used when he put the NFIT driver in drivers/acpi/ and
the libnvdimm core in drivers/nvdimm/ so we didn't have anything to
argue about. However, as Matthew pointed out, it is at odds with how
most subsystems operate. Is there any particular reason we're doing
things this way or should we think about moving libnvdimm users to
drivers/nvdimm/?

Oliver