Re: [PATCH 1/2] powerpc/xmon: support dumping software pagetables

2017-07-21 Thread Michael Ellerman
Nice !

Balbir Singh  writes:

> It would be nice to be able to dump page tables in a
> particular context
>
> Example use cases
>
> Dumping PTE contents to see the keys (useful for debugging)
>
> c000ba48c880 c000bab438b0   2677   2675 T  2 protection_keys

What is that ^ ?

> 0:mon> ds c000ba48c880 0x77f7
> translating tsk c000ba48c880, addr 77f7
> G: 0xb95b6400   U: 0xb6334000   M: 0xb6543000   PA: 0x012c, PTE: 
> 0xd480012c0504

Without reading the code I don't grok what G/U/M mean.

Feel free to use more than one line of output :)

> Dumping vmalloc space
>
> 0:mon> ds 0 d000

I suspect we will want to do that a lot. So I'd rather the arguments
were reversed, and the second (task) can be omitted.

So:
0:mon> ds x == translate x via init_mm
0:mon> ds x y   == translate x via &y->mm

I guess it's easier for folks to find a task rather than an mm directly?
Otherwise it could take an mm not a task.

> translating tsk   (null), addr d000

We should special case that to say "using kernel page tables" or similar.

> G: 0x3d450400   U: 0xbc184000   M: 0x3d46   PA: 0x7e01, PTE: 
> 0xc0807e01018e

> I did not replicate the complex code of dump_pagetable and have no support
> for bolted linear mapping, thats why I've called it software pagetable
> dumping support.

Not sure about that naming. On hash it makes sense, but not on radix or
on other platforms.

Maybe 'dv' for dump Virtual address ?

It doesn't dump a PTE, it tries to translate an address into a PTE.

cheers


Re: [PATCH v2] powerpc/mm: Implemented default_hugepagesz verification for powerpc

2017-07-21 Thread Aneesh Kumar K.V
Victor Aoqui  writes:

> Implemented default hugepage size verification (default_hugepagesz=)
> in order to allow allocation of defined number of pages (hugepages=)
> only for supported hugepage sizes.
>
> Signed-off-by: Victor Aoqui 
> ---
> v2:
>
> - Renamed default_hugepage_setup_sz function to hugetlb_default_size_setup;
>
> - Added powerpc string to error message.
>
>  arch/powerpc/mm/hugetlbpage.c | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index e1bf5ca..3a142fe 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -780,6 +780,21 @@ static int __init hugepage_setup_sz(char *str)
>  }
>  __setup("hugepagesz=", hugepage_setup_sz);
>
> +static int __init hugetlb_default_size_setup(char *str)
> +{
> + unsigned long long size;
> +
> + size = memparse(str, &str);
> +
> + if (add_huge_page_size(size) != 0) {
> + hugetlb_bad_size();
> + pr_err("Invalid powerpc default huge page size 
> specified(%llu)\n", size);
> + }
> +
> + return 1;
> +}
> +__setup("default_hugepagesz=", hugetlb_default_size_setup);
> +

this means we don't core the core function hugetlb_default_setup().
Hence the value default_hstate_size = 0  which means different value
for default_hstate_idx. We use that for hugetlbfs mount. 

>  struct kmem_cache *hugepte_cache;
>  static int __init hugetlbpage_init(void)
>  {
> -- 
> 1.8.3.1



Re: [PATCH] powerpc/pseries: energy driver only print message when LPAR guest

2017-07-21 Thread Vaidyanathan Srinivasan
* Michael Ellerman  [2017-07-21 16:33:07]:

> Vaidyanathan Srinivasan  writes:
> > * Nicholas Piggin  [2017-07-21 11:16:44]:
> >> diff --git a/arch/powerpc/platforms/pseries/pseries_energy.c 
> >> b/arch/powerpc/platforms/pseries/pseries_energy.c
> >> index 164a13d3998a..35c891aabef0 100644
> >> --- a/arch/powerpc/platforms/pseries/pseries_energy.c
> >> +++ b/arch/powerpc/platforms/pseries/pseries_energy.c
> >> @@ -229,10 +229,9 @@ static int __init pseries_energy_init(void)
> >>int cpu, err;
> >>struct device *cpu_dev;
> >> 
> >> -  if (!firmware_has_feature(FW_FEATURE_BEST_ENERGY)) {
> >> -  printk(KERN_INFO "Hypercall H_BEST_ENERGY not supported\n");
> >> -  return 0;
> >> -  }
> >> +  if (!firmware_has_feature(FW_FEATURE_BEST_ENERGY))
> >> +  return 0; /* H_BEST_ENERGY hcall not supported */
> >> +
> >
> > The first patch (!firmware_has_feature(FW_FEATURE_LPAR)) would be
> > ideal, but we do not have this in KVM guest case also.
> 
> Yeah we do.
> 
> It should really be called FW_FEATURE_RUNNING_UNDER_PAPR_HYPERVISOR.
> 
> static int __init probe_fw_features(unsigned long node, const char *uname, int
>   depth, void *data)
> {
>   
>   if (!strcmp(uname, "rtas") || !strcmp(uname, "rtas@0")) {
>   prop = of_get_flat_dt_prop(node, "ibm,hypertas-functions", 
> &len);
>   if (prop) {
>   powerpc_firmware_features |= FW_FEATURE_LPAR;
> 
> 
> Qemu initialises that property unconditionally in spapr_dt_rtas().

oops... I meant that FW_FEATURE_BEST_ENERGY is not found in KVM and we
will see the print needlessly.

If we have a check for phyp LPAR, then we can enable the print
"H_BEST_ENERGY hcall not supported"

Since the FW_FEATURE_LPAR is common for all PAPR guest (both pHyp and
KVM), I agree that deleting the print is the right thing to do since
we see it on both powernv and KVM where it is not supported and there
is no point reporting it.

--Vaidy



Re: [PATCH 1/2] powerpc/xmon: support dumping software pagetables

2017-07-21 Thread Balbir Singh
On Fri, 2017-07-21 at 16:59 +1000, Michael Ellerman wrote:
> Nice !
> 
> Balbir Singh  writes:
> 
> > It would be nice to be able to dump page tables in a
> > particular context
> > 
> > Example use cases
> > 
> > Dumping PTE contents to see the keys (useful for debugging)
> > 
> > c000ba48c880 c000bab438b0   2677   2675 T  2 protection_keys
> 
> What is that ^ ?


Thats extra, the output of P (print all tasks, I've used that task selection)

> 
> > 0:mon> ds c000ba48c880 0x77f7
> > translating tsk c000ba48c880, addr 77f7
> > G: 0xb95b6400   U: 0xb6334000   M: 0xb6543000   PA: 0x012c, PTE: 
> > 0xd480012c0504
> 
> Without reading the code I don't grok what G/U/M mean.

PGD, PUD and PMD, I'll expand on them.

> 
> Feel free to use more than one line of output :)
> 
> > Dumping vmalloc space
> > 
> > 0:mon> ds 0 d000
> 
> I suspect we will want to do that a lot. So I'd rather the arguments
> were reversed, and the second (task) can be omitted.

I considered that as well, can do

> 
> So:
> 0:mon> ds x   == translate x via init_mm
> 0:mon> ds x y == translate x via &y->mm
> 
> I guess it's easier for folks to find a task rather than an mm directly?
> Otherwise it could take an mm not a task.
> 

we already have commands for task, via (P), which is why I thought
task makes sense.

> > translating tsk   (null), addr d000
> 
> We should special case that to say "using kernel page tables" or similar.
> 
> > G: 0x3d450400   U: 0xbc184000   M: 0x3d46   PA: 0x7e01, PTE: 
> > 0xc0807e01018e
> > I did not replicate the complex code of dump_pagetable and have no support
> > for bolted linear mapping, thats why I've called it software pagetable
> > dumping support.
> 
> Not sure about that naming. On hash it makes sense, but not on radix or
> on other platforms.
> 
> Maybe 'dv' for dump Virtual address ?
> 
> It doesn't dump a PTE, it tries to translate an address into a PTE.
>

Done, will do a v2

Thanks for the review!
Balbir Singh. 



Re: [PATCH 0/4] Allow non-legacy cards to be vgaarb default

2017-07-21 Thread Ard Biesheuvel
(+ Hans)

On 21 July 2017 at 00:52, Daniel Axtens  wrote:
> Hi Ard,
>
>> (+ Laszlo)
>>
>> On 19 July 2017 at 02:28, Daniel Axtens  wrote:
>>> Hi all,
>>>
>>> Previously I posted a patch that provided a quirk for a hibmc card
>>> behind a particular Huawei bridge that allowed it to be marked as the
>>> default device in the VGA arbiter.[0] This lead to some discussion.[1]
>>> It was broadly suggested that a more generic solution would be better,
>>> something in the style of powerpc's fixup_vga() quirk.
>>>
>>> Here is my suggested solution:
>>>
>>>  - Create a Kconfig option ARCH_WANT_VGA_ARB_FALLBACK
>>>
>>>  - if an arch selects that option, install PCI_FIXUP_CLASS_ENABLE
>>>hook. This hook fires when a card is enabled, which will require
>>>that a driver has been bound.
>>>
>>>  - if there is no default device when the hook fires, and the device
>>>can control memory and I/O, mark it as default.
>>>
>>> The patches are as follows:
>>>
>>>  (1) powerpc: simplify and fix VGA default device behaviour
>>>
>>>  This cleans up some quirks in the powerpc implementation of the
>>>  vga_fixup. It should make the behaviour match the original
>>>  intention.
>>>
>>>  (2) vgaarb: allow non-legacy cards to be marked as default
>>>
>>>  Add the Kconfig option, and create the fixup in vgaarb.c gated
>>>  behind the option. Nothing happens at this stage because no arch
>>>  has selected the option yet.
>>>
>>>  (3) powerpc: replace vga_fixup() with generic code
>>>
>>>  Select the option on powerpc and remove the old code. The only
>>>  change is that it moves from being a final fixup to an enable
>>>  fixup.
>>>
>>>  (4) arm64: allow non-legacy VGA devices to be default
>>>
>>>  Select the option on arm64. This solves my problem with the D05,
>>>  but may cause other cards to be marked as default on other
>>>  boards. This shouldn't cause any real issues but is worth being
>>>  aware of.
>>>
>>
>> Hi Daniel,
>>
>> Given that the whole point of the VGA arbiter is the ability to share
>> the legacy mem+io ranges between different cards, why do we care about
>> the VGA arbiter in the first place on arm64?
>>
>> AFAIK, there have been some recent changes in Xorg to address the
>> auto-detection problem. I don't remember the exact details, but I have
>> added Laszlo, who was involved with this at the time.
>
> I haven't been able to locate those changes - I remember that the call
> to pci_device_is_boot_vga() in xf86pciBus.c [0] was critical and that is
> still there in the latest git.
>
> Indeed, the reason we care about the vga arbiter at all is because of
> that Xorg dependency on the boot VGA card. pci_device_is_boot_vga()
> reads a sysfs file, and that sysfs file is populated based on the
> vga_default_driver(), so it's very difficult to extricate ourselves from
> the vga arbiter and its concept of the default device.
>
> We could make this method an 'either/or' rather than a fallback - so
> platforms who didn't care about legacy resources didn't bother with
> those tests, but I'm not sure what benefit that would give and I find it
> harder to be confident of an absence of unexpected consequences.
>

I was referring to this commit

https://cgit.freedesktop.org/xorg/xserver/commit/?id=ca8d88e50310a0d440a127c22a0a383cc149f408

but reading the commit log, it may have less to do with this issue
than I thought originally.

But the fact remains that we are going about this the wrong way.
Whether a graphics card decodes legacy VGA ranges or not has *nothing*
to do with whether or not it is in fact the primary device on a
non-x86 system, and so I still think the VGA arbiter should be omitted
entirely for such platforms, and Xorg should be fixed instead.


[PATCH v2] powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE

2017-07-21 Thread Santosh Sivaraj
Current vDSO64 implementation does not have support for coarse
clocks (CLOCK_MONOTONIC_COARSE, CLOCK_REALTIME_COARSE), for which it falls
back to system call. Below is a benchmark of the difference in execution
time with and without vDSO support.

(Non-coarse clocks are also included just for completion)

Without vDSO support:

clock-gettime-realtime: syscall: 1547 nsec/call
clock-gettime-realtime:libc: 258 nsec/call
clock-gettime-realtime:vdso: 180 nsec/call

clock-gettime-monotonic: syscall: 1399 nsec/call
clock-gettime-monotonic:libc: 317 nsec/call
clock-gettime-monotonic:vdso: 249 nsec/call

clock-gettime-realtime-coarse: syscall: 1228 nsec/call
clock-gettime-realtime-coarse:libc: 1320 nsec/call
clock-gettime-realtime-coarse:vdso: 1330 nsec/call

clock-gettime-monotonic-coarse: syscall: 1263 nsec/call
clock-gettime-monotonic-coarse:libc: 1368 nsec/call
clock-gettime-monotonic-coarse:vdso: 1258 nsec/call

With vDSO support:
--
clock-gettime-realtime: syscall: 1660 nsec/call
clock-gettime-realtime:libc: 251 nsec/call
clock-gettime-realtime:vdso: 180 nsec/call

clock-gettime-monotonic: syscall: 1514 nsec/call
clock-gettime-monotonic:libc: 309 nsec/call
clock-gettime-monotonic:vdso: 239 nsec/call

clock-gettime-realtime-coarse: syscall: 1228 nsec/call
clock-gettime-realtime-coarse:libc: 172 nsec/call
clock-gettime-realtime-coarse:vdso: 101 nsec/call

clock-gettime-monotonic-coarse: syscall: 1347 nsec/call
clock-gettime-monotonic-coarse:libc: 187 nsec/call
clock-gettime-monotonic-coarse:vdso: 125 nsec/call

Used https://github.com/nlynch-mentor/vdsotest.git for the benchmarks.

CC: Benjamin Herrenschmidt 
Signed-off-by: Santosh Sivaraj 
---

V2 update: moved syscall fallback to assembly. Tested fallback by removing
REALTIME case from vDSO handler.

clock-gettime-realtime: syscall: 1467 nsec/call
clock-gettime-realtime:libc: 1678 nsec/call
clock-gettime-realtime:vdso: 1615 nsec/call
 
 arch/powerpc/include/asm/vdso.h   |   1 +
 arch/powerpc/kernel/vdso64/Makefile   |   2 +-
 arch/powerpc/kernel/vdso64/gettime.c  | 143 ++
 arch/powerpc/kernel/vdso64/gettimeofday.S |  86 --
 4 files changed, 161 insertions(+), 71 deletions(-)
 create mode 100644 arch/powerpc/kernel/vdso64/gettime.c

diff --git a/arch/powerpc/include/asm/vdso.h b/arch/powerpc/include/asm/vdso.h
index c53f5f6..721e4cf 100644
--- a/arch/powerpc/include/asm/vdso.h
+++ b/arch/powerpc/include/asm/vdso.h
@@ -23,6 +23,7 @@ extern unsigned long vdso32_sigtramp;
 extern unsigned long vdso32_rt_sigtramp;
 
 int vdso_getcpu_init(void);
+struct vdso_data *__get_datapage(void);
 
 #else /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/vdso64/Makefile 
b/arch/powerpc/kernel/vdso64/Makefile
index 31107bf..8958d87 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -1,6 +1,6 @@
 # List of files in the vdso, has to be asm only for now
 
-obj-vdso64 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o
+obj-vdso64 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o 
gettime.o
 
 # Build rules
 
diff --git a/arch/powerpc/kernel/vdso64/gettime.c 
b/arch/powerpc/kernel/vdso64/gettime.c
new file mode 100644
index 000..ef8f75c
--- /dev/null
+++ b/arch/powerpc/kernel/vdso64/gettime.c
@@ -0,0 +1,143 @@
+/*
+ * Userland implementation of gettimeofday() for 64 bits processes in a
+ * ppc64 kernel for use in the vDSO
+ *
+ * Copyright (C) 2017 Santosh Sivaraj (sant...@fossix.org), IBM.
+ *
+ * Originally implemented in assembly by:
+ *   Benjamin Herrenschmuidt (b...@kernel.crashing.org),
+ *IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static notrace void kernel_get_tspec(struct timespec *tp,
+struct vdso_data *vdata, u32 *wtom_sec,
+u32 *wtom_nsec)
+{
+   u64 tb;
+   u32 update_count;
+
+   do {
+   /* check for update count & load values */
+   update_count = vdata->tb_update_count;
+
+   /* Get TB, offset it and scale result */
+   tb = mulhdu((get_tb() - vdata->tb_orig_stamp) << 12,
+   vdata->tb_to_xs) + vdata->stamp_sec_fraction;
+   tp->tv_sec = vdata->stamp_xtime.tv_sec;
+   if (wtom_sec)
+   *wtom_sec = vdata->wtom_clock_sec;
+   if (wtom_nsec)
+   *wtom_nsec = vdata->wtom_clock_nsec;
+   } while (update_count != vdata->tb_update_count);
+
+   tp->tv_nsec = ((u64)mulhw

[v3 PATCH 0/2] powerpc: powernv: Enable stop4 via cpuidle

2017-07-21 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Hi,

This is the third iteration of the patchset to enable exploitation of
stop4 idle state on POWER9 via cpuidle.

The earlier version can be found here :
[v2]: https://lkml.org/lkml/2017/7/19/152
[v1]: https://lkml.org/lkml/2017/7/18/691

The changes across the versions are as follows:
v2-->v3:
- Use a structure instead of an array for the stop sprs save area.
- Name the offsets into the paca->stop_sprs as STOP_XXX instead of PACA_XXX.
- Add comments in the assembly code explaining why saving/restoring
  is not needed on POWER8.
- Program the LPCR during platform idle entry/exit on both POWER8 and POWER9
  as suggested by Nicholas Piggin.

v1 --> v2:
- Move the LPCR manipulations for CPU-Hotplug into
arch/powerpc/platforms/powernv/idle.c as per Nicholas Piggin's
suggestion.

== Description ===
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.

Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time. When stop4 is used in the context
of cpuidle, we want these additional SPRs to be restored to their
older value, to ensure that the context on the CPU coming back from
idle is same as it was before going idle.

Additionally, the CPU which is in stop4 while idling can be woken up
by the decrementer interrupts. So we need to ensure that the LPCR is
programmed with PECE1 bit cleared via the stop-api only for the
CPU-Hotplug case and not for cpuidle.

The two patches in the series address this problem.

Gautham R. Shenoy (2):
  powernv/powerpc:Save/Restore additional SPRs for stop4 cpuidle
  powernv/powerpc: Clear PECE1 in LPCR via stop-api only on Hotplug

 arch/powerpc/include/asm/cpuidle.h| 11 ++
 arch/powerpc/include/asm/paca.h   |  7 
 arch/powerpc/kernel/asm-offsets.c |  8 +
 arch/powerpc/kernel/idle_book3s.S | 65 +--
 arch/powerpc/platforms/powernv/idle.c | 34 +-
 arch/powerpc/platforms/powernv/smp.c  | 10 --
 6 files changed, 122 insertions(+), 13 deletions(-)

-- 
1.9.4



[v3 PATCH 1/2] powernv/powerpc:Save/Restore additional SPRs for stop4 cpuidle

2017-07-21 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.

Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.

When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.

In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.

Signed-off-by: Gautham R. Shenoy 
---
v2-->v3:
- Use a structure instead of an array for the stop sprs save area.
- Name the offsets into the paca->stop_sprs as STOP_XXX instead of
  PACA_XXX.
- Add comments in the assembly code explaining why saving/restoring
  is not needed on POWER8.
v1-->v2:
No change

 arch/powerpc/include/asm/cpuidle.h | 11 +++
 arch/powerpc/include/asm/paca.h|  7 
 arch/powerpc/kernel/asm-offsets.c  |  8 +
 arch/powerpc/kernel/idle_book3s.S  | 65 --
 4 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 52586f9..8a174cb 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -67,6 +67,17 @@
 #define ERR_DEEP_STATE_ESL_MISMATCH-2
 
 #ifndef __ASSEMBLY__
+/* Additional SPRs that need to be saved/restored during stop */
+struct stop_sprs {
+   u64 pid;
+   u64 ldbar;
+   u64 fscr;
+   u64 hfscr;
+   u64 mmcr1;
+   u64 mmcr2;
+   u64 mmcra;
+};
+
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
 
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index dc88a31..04b60af 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -31,6 +31,7 @@
 #endif
 #include 
 #include 
+#include 
 
 register struct paca_struct *local_paca asm("r13");
 
@@ -183,6 +184,12 @@ struct paca_struct {
struct paca_struct **thread_sibling_pacas;
/* The PSSCR value that the kernel requested before going to stop */
u64 requested_psscr;
+
+   /*
+* Save area for additional SPRs that need to be
+* saved/restored during cpuidle stop.
+*/
+   struct stop_sprs stop_sprs;
 #endif
 
 #ifdef CONFIG_PPC_STD_MMU_64
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a7b5af3..e2a48df 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -743,6 +743,14 @@ int main(void)
OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
OFFSET(PACA_REQ_PSSCR, paca_struct, requested_psscr);
+#define STOP_SPR(x, f) OFFSET(x, paca_struct, stop_sprs.f)
+   STOP_SPR(STOP_PID, pid);
+   STOP_SPR(STOP_LDBAR, ldbar);
+   STOP_SPR(STOP_FSCR, fscr);
+   STOP_SPR(STOP_HFSCR, hfscr);
+   STOP_SPR(STOP_MMCR1, mmcr1);
+   STOP_SPR(STOP_MMCR2, mmcr2);
+   STOP_SPR(STOP_MMCRA, mmcra);
 #endif
 
DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 5adb390e..5e6af97 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -84,7 +84,61 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
std r3,_WORT(r1)
mfspr   r3,SPRN_WORC
std r3,_WORC(r1)
+/*
+ * On POWER9, there are idle states such as stop4, invoked via cpuidle,
+ * that lose hypervisor resources. In such cases, we need to save
+ * additional SPRs before entering those idle states so that they can
+ * be restored to their older values on wakeup from the idle state.
+ *
+ * On POWER8, the only such deep idle state is winkle which is used
+ * only in the context of CPU-Hotplug, where these additional SPRs are
+ * reinitiazed to a sane value. Hence there is no need to save/restore
+ * these SPRs.
+ */
+BEGIN_FTR_SECTION
+   blr
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
+
+power9_save_additional_sprs:
+   mfspr   r3, SPRN_PID
+   mfspr   r4, SPRN_LDBAR
+   std r3, STOP_PID(r13)
+   std r4, STOP_LDBAR(r13)
 
+   mfspr   r3, SPRN_FSCR
+   mfspr   r4, SPRN_HFSCR
+   std r3, STOP_FSCR(r13)
+   std r4, STOP_HFSCR(r13)
+
+   mfspr   r3, SPRN_MMCRA
+   mfspr   r4, SPRN_MMCR1
+   std 

[RESEND] [v3 PATCH 2/2] powernv/powerpc: Clear PECE1 in LPCR via stop-api only on Hotplug

2017-07-21 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

Currently we use the stop-api provided by the firmware to program the
SLW engine to restore the values of hypervisor resources that get lost
on deeper idle states (such as winkle). Since the deep states were
only used for CPU-Hotplug on POWER8 systems, we would program the LPCR
to have the PECE1 bit since Hotplugged CPUs shouldn't be spuriously
woken up by decrementer.

On POWER9, some of the deep platform idle states such as stop4 can be
used in cpuidle as well. In this case, we want the CPU in stop4 to be
woken up by the decrementer when some timer on the CPU expires.

In this patch, we program the stop-api for LPCR with PECE1
bit cleared only when we are offlining the CPU and set it
back once the CPU is online.

Signed-off-by: Gautham R. Shenoy 
---
v2 --> v3:
- Program the LPCR during platform idle entry/exit on both POWER8 and
  POWER9

v1 --> v2:
- Move the LPCR manipulations for CPU-Hotplug into idle.c

 arch/powerpc/platforms/powernv/idle.c | 34 +-
 arch/powerpc/platforms/powernv/smp.c  |  8 
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 2abee07..a1296e7 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -68,7 +68,7 @@ static int pnv_save_sprs_for_deep_states(void)
 * all cpus at boot. Get these reg values of current cpu and use the
 * same across all cpus.
 */
-   uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
+   uint64_t lpcr_val = mfspr(SPRN_LPCR);
uint64_t hid0_val = mfspr(SPRN_HID0);
uint64_t hid1_val = mfspr(SPRN_HID1);
uint64_t hid4_val = mfspr(SPRN_HID4);
@@ -355,6 +355,14 @@ void power9_idle(void)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
+static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)
+{
+   u64 pir = get_hard_smp_processor_id(cpu);
+
+   mtspr(SPRN_LPCR, lpcr_val);
+   opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
+}
+
 /*
  * pnv_cpu_offline: A function that puts the CPU into the deepest
  * available platform idle state on a CPU-Offline.
@@ -364,6 +372,20 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
 {
unsigned long srr1;
u32 idle_states = pnv_get_supported_cpuidle_states();
+   u64 lpcr_val;
+
+   /*
+* We don't want to take decrementer interrupts while we are
+* offline, so clear LPCR:PECE1. We keep PECE2 (and
+* LPCR_PECE_HVEE on P9) enabled as to let IPIs in.
+*
+* If the CPU gets woken up by a special wakeup, ensure that
+* the SLW engine sets LPCR with decrementer bit cleared, else
+* the CPU will come back to the kernel due to a spurious
+* wakeup.
+*/
+   lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
+   pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val);
 
__ppc64_runlatch_off();
 
@@ -394,6 +416,16 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
 
__ppc64_runlatch_on();
 
+   /*
+* Re-enable decrementer interrupts in LPCR.
+*
+* Further, we want stop states to be woken up by decrementer
+* for non-hotplug cases. So program the LPCR via stop api as
+* well.
+*/
+   lpcr_val = mfspr(SPRN_LPCR) | (u64)LPCR_PECE1;
+   pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val);
+
return srr1;
 }
 #endif
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index 40dae96..536b07b 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -164,12 +164,6 @@ static void pnv_smp_cpu_kill_self(void)
if (cpu_has_feature(CPU_FTR_ARCH_207S))
wmask = SRR1_WAKEMASK_P8;
 
-   /* We don't want to take decrementer interrupts while we are offline,
-* so clear LPCR:PECE1. We keep PECE2 (and LPCR_PECE_HVEE on P9)
-* enabled as to let IPIs in.
-*/
-   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
-
while (!generic_check_cpu_restart(cpu)) {
/*
 * Clear IPI flag, since we don't handle IPIs while
@@ -219,8 +213,6 @@ static void pnv_smp_cpu_kill_self(void)
 
}
 
-   /* Re-enable decrementer interrupts */
-   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1);
DBG("CPU%d coming online...\n", cpu);
 }
 
-- 
1.9.4



Re: POWER9 PMU interrupt after idle workaround

2017-07-21 Thread Michael Ellerman
On Mon, 2017-07-10 at 06:19:38 UTC, Nicholas Piggin wrote:
> POWER9 DD2 can see spurious PMU interrupts after state-loss idle in
> some conditions.
> 
> A solution is to save and reload MMCR0 over state-loss idle.
> 
> Signed-off-by: Nicholas Piggin 
> Acked-by: Madhavan Srinivasan 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/101dd590a7fa37954540cf3149a1c5

cheers


Re: [1/3] powerpc/mm/radix: Refactor radix__mark_rodata_ro()

2017-07-21 Thread Michael Ellerman
On Fri, 2017-07-14 at 06:51:21 UTC, Michael Ellerman wrote:
> Move the core logic into a helper, so we can use it for changing permissions
> other than _PAGE_WRITE.
> 
> Signed-off-by: Michael Ellerman 
> Reviewed-by: Balbir Singh 

Series applied to powerpc fixes.

https://git.kernel.org/powerpc/c/b134bd90286dc9f2952c35a91ab405

cheers


Re: powerpc/64s: Fix hypercall entry clobbering r12 input

2017-07-21 Thread Michael Ellerman
On Tue, 2017-07-18 at 05:32:44 UTC, Nicholas Piggin wrote:
> A previous optimisation incorrectly assumed the PAPR hcall does
> not use r12, and clobbers it upon entry. In fact it is used as
> an input. This can result in KVM guests crashing (observed with
> PR KVM).
> 
> Instead of using r12 to save r13, tihs patch saves r13 in ctr.
> This is more costly, but not as slow as using the SPRG.
> 
> Fixes: acd7d8cef0153 ("powerpc/64s: Optimize hypercall/syscall entry")
> Signed-off-by: Nicholas Piggin 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/76fc0cfcc5b0f520062ca6d7225b22

cheers


[GIT PULL] Please pull powerpc/linux.git powerpc-4.13-3 tag

2017-07-21 Thread Michael Ellerman
Hi Linus,

Please pull some more powerpc fixes for 4.13:

The following changes since commit 01e6a61aceb82e13bec29502a8eb70d9574f97ad:

  powerpc/64: Fix atomic64_inc_not_zero() to return an int (2017-07-12 21:49:55 
+1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.13-3

for you to fetch changes up to 029d9252b116fa52a95150819e62af1f6e420fe5:

  powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y 
(2017-07-18 19:54:24 +1000)


powerpc fixes for 4.13 #3

A handful of fixes, mostly for new code.

Some reworking of the new STRICT_KERNEL_RWX support to make sure we also remove
executable permission from __init memory before it's freed.

A fix to some recent optimisations to the hypercall entry where we were
clobbering r12, this was breaking nested guests (PR KVM).

A fix for the recent patch to opal_configure_cores(). This could break booting
on bare metal Power8 boxes if the kernel was built without
CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG.

And finally a workaround for spurious PMU interrupts on Power9 DD2.

Thanks to:
  Nicholas Piggin, Anton Blanchard, Balbir Singh.


Michael Ellerman (4):
  powerpc/powernv: Fix boot on Power8 bare metal due to 
opal_configure_cores()
  powerpc/mm/radix: Refactor radix__mark_rodata_ro()
  powerpc/mm/hash: Refactor hash__mark_rodata_ro()
  powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y

Nicholas Piggin (2):
  powerpc/perf: Avoid spurious PMU interrupts after idle
  powerpc/64s: Fix hypercall entry clobbering r12 input

 arch/powerpc/include/asm/book3s/64/hash.h|  1 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  1 +
 arch/powerpc/include/asm/book3s/64/radix.h   |  1 +
 arch/powerpc/include/asm/pgtable.h   |  7 +
 arch/powerpc/kernel/exceptions-64s.S | 28 +-
 arch/powerpc/kernel/idle_book3s.S| 15 +-
 arch/powerpc/mm/mem.c|  1 +
 arch/powerpc/mm/pgtable-hash64.c | 44 
 arch/powerpc/mm/pgtable-radix.c  | 28 ++
 arch/powerpc/mm/pgtable_64.c |  8 +
 arch/powerpc/platforms/powernv/opal.c|  2 +-
 11 files changed, 102 insertions(+), 34 deletions(-)


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/44x/fsp2: correct dtb reg property for /sdhci@020c0000

2017-07-21 Thread Ivan Mikhaylov
Hi Ian,
>Building the split device-tree tree[0] highlighted that upstream commit
>9eec6cb142bd ("powerpc/44x/fsp2: Add device tree for FSP2 board") introduced
>this warning when building the device tree:
>
>$ make CROSS_COMPILE=powerpc-linux-gnu- ARCH=powerpc fsp2.dtb
>  CHK scripts/mod/devicetable-offsets.h
>  DTC arch/powerpc/boot/fsp2.dtb
>arch/powerpc/boot/fsp2.dtb: Warning (reg_format): "reg" property in 
>/sdhci@020c has invalid length (8 bytes) (#address-cells == 2, #size-cells 
>== 1)
>
>This commit adds the second adress cell as zeroes to resolve the warning. Note:
>I have no access to or information about this platform so this is purely a
>guess as to the fix. An alternative would be to adjust #address-cells, but
>whether that is correct or not depends on the platform.

Yes, this problem exists on this tag but it is already fixed and waiting for
review by this https://patchwork.kernel.org/patch/9819379/ . You can check it
if you want, anyways it will go to powerpc next branch first.

Thank you.



Re: [PATCH] powerpc/44x/fsp2: correct dtb reg property for /sdhci@020c0000

2017-07-21 Thread Ian Campbell
On Fri, 2017-07-21 at 15:54 +0300, Ivan Mikhaylov wrote:
> Hi Ian,
> > Building the split device-tree tree[0] highlighted that upstream commit
> > 9eec6cb142bd ("powerpc/44x/fsp2: Add device tree for FSP2 board") introduced
> > this warning when building the device tree:
> > 
> > $ make CROSS_COMPILE=powerpc-linux-gnu- ARCH=powerpc fsp2.dtb
> >  CHK scripts/mod/devicetable-offsets.h
> >  DTC arch/powerpc/boot/fsp2.dtb
> > > > arch/powerpc/boot/fsp2.dtb: Warning (reg_format): "reg" property in 
> > > > /sdhci@020c has invalid length (8 bytes) (#address-cells == 2, 
> > > > #size-cells == 1)
> > 
> > This commit adds the second adress cell as zeroes to resolve the warning. 
> > Note:
> > I have no access to or information about this platform so this is purely a
> > guess as to the fix. An alternative would be to adjust #address-cells, but
> > whether that is correct or not depends on the platform.
> 
> Yes, this problem exists on this tag but it is already fixed and waiting for
> > review by this https://patchwork.kernel.org/patch/9819379/ . You can check 
> > it
> if you want, anyways it will go to powerpc next branch first.

Great, thanks for the info.




[PATCH] powerpc/pseries: Fix of_node_put() underflow during pseries remove

2017-07-21 Thread Laurent Vivier
As for commit 68baf692c435 ("powerpc/pseries: Fix of_node_put()
underflow during DLPAR remove"), the call to of_node_put()
must be removed from pSeries_reconfig_remove_node().

dlpar_detach_node() and pSeries_reconfig_remove_node() call
of_detach_node(), and thus the node should not be released
in this case too.

Signed-off-by: Laurent Vivier 
---
 arch/powerpc/platforms/pseries/reconfig.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
b/arch/powerpc/platforms/pseries/reconfig.c
index e5bf1e8..011ef21 100644
--- a/arch/powerpc/platforms/pseries/reconfig.c
+++ b/arch/powerpc/platforms/pseries/reconfig.c
@@ -82,7 +82,6 @@ static int pSeries_reconfig_remove_node(struct device_node 
*np)
 
of_detach_node(np);
of_node_put(parent);
-   of_node_put(np); /* Must decrement the refcount */
return 0;
 }
 
-- 
2.9.4



Re: [PATCH v1 2/2] 44x/fsp2: enable eMMC arasan for fsp2 platform

2017-07-21 Thread Rob Herring
On Fri, Jun 30, 2017 at 02:53:31PM +0300, Ivan Mikhaylov wrote:
> Add mmc0 changes for enabling arasan emmc and change
> defconfig appropriately.
> 
> Signed-off-by: Ivan Mikhaylov 
> ---
>  arch/powerpc/boot/dts/fsp2.dts  |   33 +-
>  arch/powerpc/configs/44x/fsp2_defconfig |2 +
>  2 files changed, 21 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/boot/dts/fsp2.dts b/arch/powerpc/boot/dts/fsp2.dts
> index 475953a..6a63026 100644
> --- a/arch/powerpc/boot/dts/fsp2.dts
> +++ b/arch/powerpc/boot/dts/fsp2.dts
> @@ -52,6 +52,7 @@
>   clocks {
>   mmc_clk: mmc_clk {
>   compatible = "fixed-clock";
> + #clock-cells = <0>;
>   clock-frequency = <5000>;
>   clock-output-names = "mmc_clk";
>   };
> @@ -359,20 +360,6 @@
>   interrupts = <31 0x4 15 0x84>;
>   };
>  
> - mmc0: sdhci@020c {
> - compatible  = "st,sdhci-stih407", "st,sdhci";
> - status  = "disabled";
> - reg = <0x020c 0x2>;
> - reg-names   = "mmc";
> - interrupt-parent = <&UIC1_3>;
> - interrupts  = <21 0x4 22 0x4>;
> - interrupt-names = "mmcirq";
> - pinctrl-names   = "default";
> - pinctrl-0   = <>;
> - clock-names = "mmc";
> - clocks  = <&mmc_clk>;
> - };
> -
>   plb6 {
>   compatible = "ibm,plb6";
>   #address-cells = <2>;
> @@ -501,6 +488,24 @@
>/*RXDE*/  4 &UIC1_2 13 0x4>;
>   };
>  
> + mmc0: sdhci@020c {

Use generic node names and drop leading 0: mmc@20c

Rob


Re: [RFC v6 27/62] powerpc: helper to validate key-access permissions of a pte

2017-07-21 Thread Ram Pai
On Fri, Jul 21, 2017 at 12:21:50PM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > On Thu, Jul 20, 2017 at 12:12:47PM +0530, Aneesh Kumar K.V wrote:
> >> Ram Pai  writes:
> >> 
> >> > helper function that checks if the read/write/execute is allowed
> >> > on the pte.
> >> >
> >> > Signed-off-by: Ram Pai 
> >> > ---
> >> >  arch/powerpc/include/asm/book3s/64/pgtable.h |4 +++
> >> >  arch/powerpc/include/asm/pkeys.h |   12 +
> >> >  arch/powerpc/mm/pkeys.c  |   33 
> >> > ++
> >> >  3 files changed, 49 insertions(+), 0 deletions(-)
> >> >
> >> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> >> > b/arch/powerpc/include/asm/book3s/64/pgtable.h
> >> > index 30d7f55..0056e58 100644
> >> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> >> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> >> > @@ -472,6 +472,10 @@ static inline void write_uamor(u64 value)
> >> >  mtspr(SPRN_UAMOR, value);
> >> >  }
> >> >
> >> > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> >> > +extern bool arch_pte_access_permitted(u64 pte, bool write, bool 
> >> > execute);
> >> > +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> >> > +
> >> >  #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
> >> >  static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
> >> > unsigned long addr, pte_t *ptep)
> >> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> >> > b/arch/powerpc/include/asm/pkeys.h
> >> > index bbb5d85..7a9aade 100644
> >> > --- a/arch/powerpc/include/asm/pkeys.h
> >> > +++ b/arch/powerpc/include/asm/pkeys.h
> >> > @@ -53,6 +53,18 @@ static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
> >> >  ((pteflags & H_PAGE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 
> >> > 0x0UL));
> >> >  }
> >> >
> >> > +static inline u16 pte_to_pkey_bits(u64 pteflags)
> >> > +{
> >> > +if (!pkey_inited)
> >> > +return 0x0UL;
> >> 
> >> Do we really need that above check ? We should always find it
> >> peky_inited to be set. 
> >
> > Yes. there are cases where pkey_inited is not enabled. 
> > a) if the MMU is radix.
> That should be be a feature check
> 
> > b) if the PAGE size is 4k.
> 
> That is a kernel config change
> 
> > c) if the device tree says the feature is not available
> > d) if the CPU is of a older generation. P6 and older.
> 
> Both feature check.
> 
> how about doing something like
> 
> static inline u16 pte_to_pkey_bits(u64 pteflags)
> {
>   if (!(pteflags & H_PAGE_KEY_MASK))
>   return 0x0UL;

This check accomplishes the same thing as the return below.
When (pteflag & H_PAGE_KEY_MASK) is 0,
the code below returns the same 0x0UL. 



> 
>   return (((pteflags & H_PAGE_PKEY_BIT0) ? 0x10 : 0x0UL) |
>   ((pteflags & H_PAGE_PKEY_BIT1) ? 0x8 : 0x0UL) |
>   ((pteflags & H_PAGE_PKEY_BIT2) ? 0x4 : 0x0UL) |
>   ((pteflags & H_PAGE_PKEY_BIT3) ? 0x2 : 0x0UL) |
>   ((pteflags & H_PAGE_PKEY_BIT4) ? 0x1 : 0x0UL));
> }

The idea  behind
   if (!pkey_inited)
   return 0x0UL;

was to not interpret the ptebits if we knew they were not initialized
to begin with. 


-- 
Ram Pai



Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion

2017-07-21 Thread LEROY Christophe

Benjamin Herrenschmidt  a écrit :


When hitting below a VM_GROWSDOWN vma (typically growing the stack),
we check whether it's a valid stack-growing instruction and we
check the distance to GPR1. This is largely open coded with lots
of comments, so move it out to a helper.


Did you have a look at the following patch ? It's been waiting for  
application for some weeks now.  
https://patchwork.ozlabs.org/patch/771869

It limits number of calls to get_user()
Can you have a look and merge it with your serie ?



While at it, make store_update_sp a boolean.


My patch requires a tristate here

Christophe



Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/mm/fault.c | 84  
-

 1 file changed, 48 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a229fd2d82d6..c2720ebb6a62 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -71,15 +71,15 @@ static inline bool notify_page_fault(struct  
pt_regs *regs)

  * Check whether the instruction at regs->nip is a store using
  * an update addressing form which will update r1.
  */
-static int store_updates_sp(struct pt_regs *regs)
+static bool store_updates_sp(struct pt_regs *regs)
 {
unsigned int inst;

if (get_user(inst, (unsigned int __user *)regs->nip))
-   return 0;
+   return false;
/* check for 1 in the rA field */
if (((inst >> 16) & 0x1f) != 1)
-   return 0;
+   return false;
/* check major opcode */
switch (inst >> 26) {
case 37:/* stwu */
@@ -87,7 +87,7 @@ static int store_updates_sp(struct pt_regs *regs)
case 45:/* sthu */
case 53:/* stfsu */
case 55:/* stfdu */
-   return 1;
+   return true;
case 62:/* std or stdu */
return (inst & 3) == 1;
case 31:
@@ -99,10 +99,10 @@ static int store_updates_sp(struct pt_regs *regs)
case 439:   /* sthux */
case 695:   /* stfsux */
case 759:   /* stfdux */
-   return 1;
+   return true;
}
}
-   return 0;
+   return false;
 }
 /*
  * do_page_fault error handling helpers
@@ -222,6 +222,43 @@ static bool bad_kernel_fault(bool is_exec,  
unsigned long error_code,

return is_exec || (address >= TASK_SIZE);
 }

+static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
+   struct vm_area_struct *vma,
+   bool store_update_sp)
+{
+   /*
+* N.B. The POWER/Open ABI allows programs to access up to
+* 288 bytes below the stack pointer.
+* The kernel signal delivery code writes up to about 1.5kB
+* below the stack pointer (r1) before decrementing it.
+* The exec code can write slightly over 640kB to the stack
+* before setting the user r1.  Thus we allow the stack to
+* expand to 1MB without further checks.
+*/
+   if (address + 0x10 < vma->vm_end) {
+   /* get user regs even if this fault is in kernel mode */
+   struct pt_regs *uregs = current->thread.regs;
+   if (uregs == NULL)
+   return true;
+
+   /*
+* A user-mode access to an address a long way below
+* the stack pointer is only valid if the instruction
+* is one which would update the stack pointer to the
+* address accessed if the instruction completed,
+* i.e. either stwu rs,n(r1) or stwux rs,r1,rb
+* (or the byte, halfword, float or double forms).
+*
+* If we don't check this then any write to the area
+* between the last mapped region and the stack will
+* expand the stack rather than segfaulting.
+*/
+   if (address + 2048 < uregs->gpr[1] && !store_update_sp)
+   return true;
+   }
+   return false;
+}
+
 static bool access_error(bool is_write, bool is_exec,
 struct vm_area_struct *vma)
 {
@@ -350,7 +387,7 @@ static int __do_page_fault(struct pt_regs *regs,  
unsigned long address,

int is_user = user_mode(regs);
int is_write = page_fault_is_write(error_code);
int fault, major = 0;
-   int store_update_sp = 0;
+   bool store_update_sp = false;

 #ifdef CONFIG_PPC_ICSWX
/*
@@ -458,36 +495,11 @@ static int __do_page_fault(struct pt_regs  
*regs, unsigned long address,

if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
return bad_area(regs, address);

-   /*
-* N.B. The POWER/Open ABI allows programs to access up to
-* 288 bytes below the stack pointer.
-* 

Re: [PATCH 1/2] powerpc/xmon: support dumping software pagetables

2017-07-21 Thread Ram Pai
On Fri, Jul 21, 2017 at 03:24:05PM +1000, Balbir Singh wrote:
> It would be nice to be able to dump page tables in a
> particular context
> 
> Example use cases
> 
> Dumping PTE contents to see the keys (useful for debugging)
> 
> c000ba48c880 c000bab438b0   2677   2675 T  2 protection_keys
> 0:mon> ds c000ba48c880 0x77f7
> translating tsk c000ba48c880, addr 77f7
> G: 0xb95b6400   U: 0xb6334000   M: 0xb6543000   PA: 0x012c, PTE: 
> 0xd480012c0504
> 
> Dumping vmalloc space
> 
> 0:mon> ds 0 d000
> translating tsk   (null), addr d000
> G: 0x3d450400   U: 0xbc184000   M: 0x3d46   PA: 0x7e01, PTE: 
> 0xc0807e01018e
> 
> I did not replicate the complex code of dump_pagetable and have no support
> for bolted linear mapping, thats why I've called it software pagetable
> dumping support. The format of the PTE can be expanded to add more useful
> information about the flags in the PTE if required.

yes. a nice way of dumping all the flags in the PTE will be handy.
especially; my favorite, protection key values.

> 
> Signed-off-by: Balbir Singh 
> ---
>  arch/powerpc/xmon/xmon.c | 97 
> 
>  1 file changed, 97 insertions(+)
> 
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index 08e367e..8aedfff 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -126,6 +126,7 @@ static void byterev(unsigned char *, int);
>  static void memex(void);
>  static int bsesc(void);
>  static void dump(void);
> +static void show_pte(unsigned long);
>  static void prdump(unsigned long, long);
>  static int ppc_inst_dump(unsigned long, long, int);
>  static void dump_log_buf(void);
> @@ -233,6 +234,7 @@ Commands:\n\
>  #endif
>"\
>dr dump stream of raw bytes\n\
> +  ds dump software PTEs\n\
>dt dump the tracing buffers (uses printk)\n\
>  "
>  #ifdef CONFIG_PPC_POWERNV
> @@ -2528,6 +2530,9 @@ dump(void)
>   } else if (c == 't') {
>   ftrace_dump(DUMP_ALL);
>   tracing_on();
> + } else if (c == 's') {
> + /* dump software pte */
> + show_pte(adrs);
>   } else if (c == 'r') {
>   scanhex(&ndump);
>   if (ndump == 0)
> @@ -2860,7 +2865,99 @@ static void show_task(struct task_struct *tsk)
>   state, task_thread_info(tsk)->cpu,
>   tsk->comm);
>  }
> +void format_pte(unsigned long pte)
> +{
> + unsigned long pa = pte & PTE_RPN_MASK;
> +
> + printf("PA: 0x%08lx, PTE: 0x%08lx\n", pa, pte);
> +}
> +
> +static void show_pte(unsigned long tskv)
> +{
> + unsigned long addr = 0;
> + struct task_struct *tsk = NULL;
> + struct mm_struct *mm;
> + pgd_t *pgdp;
> + pud_t *pudp;
> + pmd_t *pmdp;
> + pte_t *ptep;
> +
> + tsk = (struct task_struct *)tskv;
> + if (tsk == NULL)
> + mm = &init_mm;
> + else
> + mm = tsk->active_mm;
> +
> + if (mm == NULL)
> + mm = &init_mm;
> +
> + if (!scanhex(&addr))
> + printf("need address to translate\n");
> +
> + if (setjmp(bus_error_jmp) != 0) {
> + catch_memory_errors = 0;
> + printf("*** Error dumping pte for task %p\n", tsk);
> + return;
> + }
> +
> + catch_memory_errors = 1;
> + sync();
> +
> + if (mm == &init_mm)
> + pgdp = pgd_offset_k(addr);
> + else
> + pgdp = pgd_offset(mm, addr);
> +
> + if (pgd_none(*pgdp)) {
> + printf("no linux page table for address\n");
> + return;
> + }
> 
> + if (pgd_huge(*pgdp)) {
> + format_pte(pgd_val(*pgdp));
> + return;
> + }
> + printf("G: 0x%8lx\t", pgd_val(*pgdp));
> +
> + pudp = pud_offset(pgdp, addr);
> +
> + if (pud_none(*pudp)) {
> + printf("No valid PUD\n");
> + return;
> + }
> +
> + if (pud_huge(*pudp)) {
> + format_pte(pud_val(*pudp));
> + return;
> + }
> + printf("U: 0x%8lx\t", pud_val(*pudp));
> +
> + pmdp = pmd_offset(pudp, addr);
> +
> + if (pmd_none(*pmdp)) {
> + printf("No valid PMD\n");
> + return;
> + }
> +
> + if (pmd_huge(*pmdp)) {
> + format_pte(pmd_val(*pmdp));
> + return;
> + }
> + printf("M: 0x%8lx\t", pmd_val(*pmdp));
> +
> + /* pte_offset_map is the same as pte_offset_kernel */
> + ptep = pte_offset_kernel(pmdp, addr);
> + if (pte_none(*ptep)) {
> + printf("no valid PTE\n");
> + return;
> + }
> +
> + format_pte(pte_val(*ptep));
> +

These two lines below always go togather. A nice macro
#define sync_delay() { sync(); __delay(200); }
can help.

> + sync();
> + __delay(200);


> + catch_memory_errors = 0;


> +}
>  static void show_tasks(void)
>  {
>   unsigned long tskv;
> -- 
> 2.9.4

-- 
Ram Pai



[PATCH 0/6] powerpc: Free up RPAGE_RSV bits in 64K PTE

2017-07-21 Thread Ram Pai

RPAGE_RSV0..4 pte bits are currently used for hpte slot
tracking. We  need  these bits   for  memory-protection
keys. Luckily these  four bits   are  relatively easier 
to move among all the other candidate bits.

For  64K   linux-ptes   backed  by 4k hptes, these bits
are   used for tracking the  validity of the slot value
stored   in the second-part-of-the-pte. We device a new
mechanism for  tracking   the   validity  without using
those bits. Themechanism  is explained in the first
patch.

For 64K  linux-pte  backed by 64K hptes, we simply move
the   slot  tracking bits to the second-part-of-the-pte.

The above  mechanism  is also used to free the bits for
hugetlb linux-ptes.


Testing:

has survived  kernel  compilation on multiple platforms
p8 powernv hash-mode, p9 powernv hash-mode,  p7 powervm,
p8-powervm, p8-kvm-guest.

Has survived git-bisect on p8  power-nv  with  64K page
and 4K page.

History:
---
This patchset  is  a  spin-off from the memkey patchset.

version v7:
(1) GIX bit reset change  moved  to  the second
patch  -- noticed by Aneesh.
(2) Separated this patches from memkey patchset
(3) merged a  bunch  of  patches, that used the
helper function, into one.
version v6:
(1) No changes related to pte.

version v5:
(1) No changes related to pte.

version v4:
(1) No changes related to pte.

version v3:
(1) split the patches into smaller consumable
patches.
(2) A bug fix while  invalidating a hpte slot
in __hash_page_4K()
-- noticed by Aneesh


version v2:
(1) fixed a  bug  in 4k  hpte  backed 64k pte
where  pageinvalidation   was not
done  correctly,  and  initialization
ofsecond-part-of-the-pte  was not
donecorrectly  if the pte was not
yet Hashed with a hpte.
   --   Reported by Aneesh.


version v1: Initial version



Ram Pai (6):
  powerpc: Free up four 64K PTE bits in 4K backed HPTE pages
  powerpc: Free up four 64K PTE bits in 64K backed HPTE pages
  powerpc: capture the PTE format changes in the dump pte report
  powerpc: introduce pte_set_hash_slot() helper
  powerpc: introduce pte_get_hash_gslot() helper
  powerpc: use helper functions to get and set hash slots

 arch/powerpc/include/asm/book3s/64/hash-4k.h  |   20 
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   60 
 arch/powerpc/include/asm/book3s/64/hash.h |7 +-
 arch/powerpc/mm/dump_linuxpagetables.c|3 +-
 arch/powerpc/mm/hash64_4k.c   |   14 +--
 arch/powerpc/mm/hash64_64k.c  |  124 +
 arch/powerpc/mm/hash_utils_64.c   |   35 +--
 arch/powerpc/mm/hugetlbpage-hash64.c  |   16 +--
 8 files changed, 165 insertions(+), 114 deletions(-)



[PATCH 1/6] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-07-21 Thread Ram Pai
Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
in the 4K backed HPTE pages.These bits continue to be used
for 64K backed HPTE pages in this patch, but will be freed
up in the next patch. The  bit  numbers are big-endian  as
defined in the ISA3.0

The patch does the following change to the 4k htpe backed
64K PTE's format.

H_PAGE_BUSY moves from bit 3 to bit 9 (B bit in the figure
below)
V0 which occupied bit 4 is not used anymore.
V1 which occupied bit 5 is not used anymore.
V2 which occupied bit 6 is not used anymore.
V3 which occupied bit 7 is not used anymore.

Before the patch, the 4k backed 64k PTE format was as follows

 0 1 2 3 4  5  6  7  8 9 10...63
 : : : : :  :  :  :  : : ::
 v v v v v  v  v  v  v v vv

,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
|x|x|x|B|V0|V1|V2|V3|x| | |x|x||x|x|x|x| <- primary pte
'_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
|S|G|I|X|S |G |I |X |S|G|I|X|..|S|G|I|X| <- secondary pte
'_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'

After the patch, the 4k backed 64k PTE format is as follows

 0 1 2 3 4  5  6  7  8 9 10...63
 : : : : :  :  :  :  : : ::
 v v v v v  v  v  v  v v vv

,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
|x|x|x| |  |  |  |  |x|B| |x|x||.|.|.|.| <- primary pte
'_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
|S|G|I|X|S |G |I |X |S|G|I|X|..|S|G|I|X| <- secondary pte
'_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'

the four  bits S,G,I,X (one quadruplet per 4k HPTE) that
cache  the  hash-bucket  slot  value, is initialized  to
1,1,1,1 indicating -- an invalid slot.   If  a HPTE gets
cached in a   slot(i.e 7th  slot  of  secondary hash
bucket), it is  released  immediately. In  other  words,
even  though    is   a valid slot  value in the hash
bucket, we consider it invalid and  release the slot and
the HPTE.  This  gives  us  the opportunity to determine
the validity of S,G,I,X  bits  based on its contents and
not on any of the bits V0,V1,V2 or V3 in the primary PTE

When   we  release  aHPTEcached in the  slot
we alsorelease  a  legitimate   slot  in the primary
hash bucket  and  unmap  its  corresponding  HPTE.  This
is  to  ensure   that  we do get a HPTE cached in a slot
of the primary hash bucket, the next time we retry.

Though  treating    slot  as  invalid,  reduces  the
number of  available  slots  in the hash bucket and  may
have  an  effect   on the performance, the probabilty of
hitting a  slot is extermely low.

Compared  to  the  current   scheme, the above described
scheme  reduces  the  number of false hash table updates
significantly   andhas  the   added   advantage   of
releasing  four  valuable  PTE bits for other purpose.

NOTE:even though bits 3, 4, 5, 6, 7 are  not  used  when
the  64K  PTE is backed by 4k HPTE,  they continue to be
used  if  the  PTE  gets  backed  by 64k HPTE.  The next
patch will decouple that aswell, and truely  release the
bits.

This idea was jointly developed by Paul Mackerras,
Aneesh, Michael Ellermen and myself.

4K PTE format remains unchanged currently.

The patch does the following code changes
a) PTE flags are split between 64k and 4k  header files.
b) __hash_page_4K()  is  reimplemented   to reflect the
   above logic.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |2 +
 arch/powerpc/include/asm/book3s/64/hash-64k.h |8 +--
 arch/powerpc/include/asm/book3s/64/hash.h |1 -
 arch/powerpc/mm/hash64_64k.c  |   74 -
 arch/powerpc/mm/hash_utils_64.c   |4 +-
 5 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 0c4e470..f959c00 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -16,6 +16,8 @@
 #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
 #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
 
+#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are busy */
+
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
 H_PAGE_F_SECOND | H_PAGE_F_GIX)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 9732837..62e580c 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -12,18 +12,14 @@
  */
 #define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
 #define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
+#define

[PATCH 2/6] powerpc: Free up four 64K PTE bits in 64K backed HPTE pages

2017-07-21 Thread Ram Pai
Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
in the 64K backed HPTE pages. This along with the earlier
patch will  entirely free  up the four bits from 64K PTE.
The bit numbers are  big-endian as defined in the  ISA3.0

This patch  does  the  following change to 64K PTE backed
by 64K HPTE.

H_PAGE_F_SECOND (S) which  occupied  bit  4  moves to the
second part of the pte to bit 60.
H_PAGE_F_GIX (G,I,X) which  occupied  bit 5, 6 and 7 also
moves  to  the   second part of the pte to bit 61,
62, 63, 64 respectively

since bit 7 is now freed up, we move H_PAGE_BUSY (B) from
bit  9  to  bit  7.

The second part of the PTE will hold
(H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
NOTE: None of the bits in the secondary PTE were not used
by 64k-HPTE backed PTE.

Before the patch, the 64K HPTE backed 64k PTE format was
as follows

 0 1 2 3 4  5  6  7  8 9 10...63
 : : : : :  :  :  :  : : ::
 v v v v v  v  v  v  v v vv

,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
|x|x|x| |S |G |I |X |x|B| |x|x||x|x|x|x| <- primary pte
'_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
| | | | |  |  |  |  | | | | |..| | | | | <- secondary pte
'_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'

After the patch, the 64k HPTE backed 64k PTE format is
as follows

 0 1 2 3 4  5  6  7  8 9 10...63
 : : : : :  :  :  :  : : ::
 v v v v v  v  v  v  v v vv

,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
|x|x|x| |  |  |  |B |x| | |x|x||.|.|.|.| <- primary pte
'_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
| | | | |  |  |  |  | | | | |..|S|G|I|X| <- secondary pte
'_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'

The above PTE changes is applicable to hugetlbpages aswell.

The patch does the following code changes:

a) moves  the  H_PAGE_F_SECOND and  H_PAGE_F_GIX to 4k PTE
header   since it is no more needed b the 64k PTEs.
b) abstracts  out __real_pte() and __rpte_to_hidx() so the
caller  need not know the bit location of the slot.
c) moves the slot bits to the secondary pte.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |3 ++
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   29 +---
 arch/powerpc/include/asm/book3s/64/hash.h |3 --
 arch/powerpc/mm/hash64_64k.c  |   34 +---
 arch/powerpc/mm/hugetlbpage-hash64.c  |   26 +++---
 5 files changed, 61 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index f959c00..d2cf949 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -16,6 +16,9 @@
 #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
 #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
 
+#define H_PAGE_F_GIX_SHIFT 56
+#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
+#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
 #define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are busy */
 
 /* PTE flags to conserve for HPTE identification */
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 62e580c..c281f18 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -12,7 +12,7 @@
  */
 #define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
 #define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
-#define H_PAGE_BUSY_RPAGE_RPN42 /* software: PTE & hash are busy */
+#define H_PAGE_BUSY_RPAGE_RPN44 /* software: PTE & hash are busy */
 
 /*
  * We need to differentiate between explicit huge page and THP huge
@@ -21,8 +21,7 @@
 #define H_PAGE_THP_HUGE  H_PAGE_4K_PFN
 
 /* PTE flags to conserve for HPTE identification */
-#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_F_SECOND | \
-H_PAGE_F_GIX | H_PAGE_HASHPTE | H_PAGE_COMBO)
+#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | H_PAGE_COMBO)
 /*
  * we support 16 fragments per PTE page of 64K size.
  */
@@ -50,24 +49,22 @@ static inline real_pte_t __real_pte(pte_t pte, pte_t *ptep)
unsigned long *hidxp;
 
rpte.pte = pte;
-   rpte.hidx = 0;
-   if (pte_val(pte) & H_PAGE_COMBO) {
-   /*
-* Make sure we order the hidx load against the H_PAGE_COMBO
-* check. The store side ordering is done in __hash_page_4K
-*/
-   smp_rmb();
-   hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
-   rpte.hidx = *hidxp;
- 

[PATCH 3/6] powerpc: capture the PTE format changes in the dump pte report

2017-07-21 Thread Ram Pai
The H_PAGE_F_SECOND,H_PAGE_F_GIX are not in the 64K main-PTE.
capture these changes in the dump pte report.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Ram Pai 
---
 arch/powerpc/mm/dump_linuxpagetables.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/mm/dump_linuxpagetables.c 
b/arch/powerpc/mm/dump_linuxpagetables.c
index 44fe483..5627edd 100644
--- a/arch/powerpc/mm/dump_linuxpagetables.c
+++ b/arch/powerpc/mm/dump_linuxpagetables.c
@@ -213,7 +213,7 @@ struct flag_info {
.val= H_PAGE_4K_PFN,
.set= "4K_pfn",
}, {
-#endif
+#else /* CONFIG_PPC_64K_PAGES */
.mask   = H_PAGE_F_GIX,
.val= H_PAGE_F_GIX,
.set= "f_gix",
@@ -224,6 +224,7 @@ struct flag_info {
.val= H_PAGE_F_SECOND,
.set= "f_second",
}, {
+#endif /* CONFIG_PPC_64K_PAGES */
 #endif
.mask   = _PAGE_SPECIAL,
.val= _PAGE_SPECIAL,
-- 
1.7.1



[PATCH 4/6] powerpc: introduce pte_set_hash_slot() helper

2017-07-21 Thread Ram Pai
Introduce pte_set_hash_slot().It  sets the (H_PAGE_F_SECOND|H_PAGE_F_GIX)
bits at  the   appropriate   location   in   the   PTE  of  4K  PTE.  For
64K PTE, it  sets  the  bits  in  the  second  part  of  the  PTE. Though
the implementation  for the former just needs the slot parameter, it does
take some additional parameters to keep the prototype consistent.

This function  will  be  handy  as  we   work   towards  re-arranging the
bits in the later patches.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |   15 +++
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   25 +
 2 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index d2cf949..dc153c6 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -53,6 +53,21 @@ static inline int hash__hugepd_ok(hugepd_t hpd)
 }
 #endif
 
+/*
+ * 4k pte format is  different  from  64k  pte  format.  Saving  the
+ * hash_slot is just a matter of returning the pte bits that need to
+ * be modified. On 64k pte, things are a  little  more  involved and
+ * hence  needs   many   more  parameters  to  accomplish  the  same.
+ * However we  want  to abstract this out from the caller by keeping
+ * the prototype consistent across the two formats.
+ */
+static inline unsigned long pte_set_hash_slot(pte_t *ptep, real_pte_t rpte,
+   unsigned int subpg_index, unsigned long slot)
+{
+   return (slot << H_PAGE_F_GIX_SHIFT) &
+   (H_PAGE_F_SECOND | H_PAGE_F_GIX);
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
 static inline char *get_hpte_slot_array(pmd_t *pmdp)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index c281f18..89ef5a9 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -67,6 +67,31 @@ static inline unsigned long __rpte_to_hidx(real_pte_t rpte, 
unsigned long index)
return ((rpte.hidx >> (index<<2)) & 0xfUL);
 }
 
+/*
+ * Commit the hash slot and return pte bits that needs to be modified.
+ * The caller is expected to modify the pte bits accordingly and
+ * commit the pte to memory.
+ */
+static inline unsigned long pte_set_hash_slot(pte_t *ptep, real_pte_t rpte,
+   unsigned int subpg_index, unsigned long slot)
+{
+   unsigned long *hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
+
+   rpte.hidx &= ~(0xfUL << (subpg_index << 2));
+   *hidxp = rpte.hidx  | (slot << (subpg_index << 2));
+   /*
+* Commit the hidx bits to memory before returning.
+* Anyone reading  pte  must  ensure hidx bits are
+* read  only  after  reading the pte by using the
+* read-side  barrier  smp_rmb(). __real_pte() can
+* help ensure that.
+*/
+   smp_wmb();
+
+   /* no pte bits to be modified, return 0x0UL */
+   return 0x0UL;
+}
+
 #define __rpte_to_pte(r)   ((r).pte)
 extern bool __rpte_sub_valid(real_pte_t rpte, unsigned long index);
 /*
-- 
1.7.1



[PATCH 5/6] powerpc: introduce pte_get_hash_gslot() helper

2017-07-21 Thread Ram Pai
Introduce pte_get_hash_gslot()() which returns the slot number of the
HPTE in the global hash table.

This function will come in handy as we work towards re-arranging the
PTE bits in the later patches.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/book3s/64/hash.h |3 +++
 arch/powerpc/mm/hash_utils_64.c   |   18 ++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index d27f885..277158c 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -156,6 +156,9 @@ static inline int hash__pte_none(pte_t pte)
return (pte_val(pte) & ~H_PTE_NONE_MASK) == 0;
 }
 
+unsigned long pte_get_hash_gslot(unsigned long vpn, unsigned long shift,
+   int ssize, real_pte_t rpte, unsigned int subpg_index);
+
 /* This low level function performs the actual PTE insertion
  * Setting the PTE depends on the MMU type and other factors. It's
  * an horrible mess that I'm not going to try to clean up now but
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 1b494d0..d3604da 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1591,6 +1591,24 @@ static inline void tm_flush_hash_page(int local)
 }
 #endif
 
+/*
+ * return the global hash slot, corresponding to the given
+ * pte, which contains the hpte.
+ */
+unsigned long pte_get_hash_gslot(unsigned long vpn, unsigned long shift,
+   int ssize, real_pte_t rpte, unsigned int subpg_index)
+{
+   unsigned long hash, slot, hidx;
+
+   hash = hpt_hash(vpn, shift, ssize);
+   hidx = __rpte_to_hidx(rpte, subpg_index);
+   if (hidx & _PTEIDX_SECONDARY)
+   hash = ~hash;
+   slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+   slot += hidx & _PTEIDX_GROUP_IX;
+   return slot;
+}
+
 /* WARNING: This is called from hash_low_64.S, if you change this prototype,
  *  do not forget to update the assembly call site !
  */
-- 
1.7.1



[PATCH 6/6] powerpc: use helper functions to get and set hash slots

2017-07-21 Thread Ram Pai
replace redundant code in __hash_page_64K(), __hash_page_huge(),
__hash_page_4K(), __hash_page_4K() and flush_hash_page()   with
helper functions pte_get_hash_gslot() and   pte_set_hash_slot()

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Ram Pai 
---
 arch/powerpc/mm/hash64_4k.c  |   14 +++-
 arch/powerpc/mm/hash64_64k.c |   58 +++--
 arch/powerpc/mm/hash_utils_64.c  |   13 ++-
 arch/powerpc/mm/hugetlbpage-hash64.c |   28 ++--
 4 files changed, 27 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index 6fa450c..a1eebc1 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -20,6 +20,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
   pte_t *ptep, unsigned long trap, unsigned long flags,
   int ssize, int subpg_prot)
 {
+   real_pte_t rpte;
unsigned long hpte_group;
unsigned long rflags, pa;
unsigned long old_pte, new_pte;
@@ -54,6 +55,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 * need to add in 0x1 if it's a read-only user page
 */
rflags = htab_convert_pte_flags(new_pte);
+   rpte = __real_pte(__pte(old_pte), ptep);
 
if (cpu_has_feature(CPU_FTR_NOEXECUTE) &&
!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
@@ -64,13 +66,10 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
/*
 * There MIGHT be an HPTE for this pte
 */
-   hash = hpt_hash(vpn, shift, ssize);
-   if (old_pte & H_PAGE_F_SECOND)
-   hash = ~hash;
-   slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-   slot += (old_pte & H_PAGE_F_GIX) >> H_PAGE_F_GIX_SHIFT;
+   unsigned long gslot = pte_get_hash_gslot(vpn, shift,
+   ssize, rpte, 0);
 
-   if (mmu_hash_ops.hpte_updatepp(slot, rflags, vpn, MMU_PAGE_4K,
+   if (mmu_hash_ops.hpte_updatepp(gslot, rflags, vpn, MMU_PAGE_4K,
   MMU_PAGE_4K, ssize, flags) == -1)
old_pte &= ~_PAGE_HPTEFLAGS;
}
@@ -118,8 +117,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
return -1;
}
new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | H_PAGE_HASHPTE;
-   new_pte |= (slot << H_PAGE_F_GIX_SHIFT) &
-   (H_PAGE_F_SECOND | H_PAGE_F_GIX);
+   new_pte |= pte_set_hash_slot(ptep, rpte, 0, slot);
}
*ptep = __pte(new_pte & ~H_PAGE_BUSY);
return 0;
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index e922a70..6c1c87a 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -39,9 +39,8 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
 {
real_pte_t rpte;
unsigned long hpte_group;
-   unsigned long *hidxp;
unsigned int subpg_index;
-   unsigned long rflags, pa, hidx;
+   unsigned long rflags, pa;
unsigned long old_pte, new_pte, subpg_pte;
unsigned long vpn, hash, slot, gslot;
unsigned long shift = mmu_psize_defs[MMU_PAGE_4K].shift;
@@ -114,18 +113,13 @@ int __hash_page_4K(unsigned long ea, unsigned long 
access, unsigned long vsid,
if (__rpte_sub_valid(rpte, subpg_index)) {
int ret;
 
-   hash = hpt_hash(vpn, shift, ssize);
-   hidx = __rpte_to_hidx(rpte, subpg_index);
-   if (hidx & _PTEIDX_SECONDARY)
-   hash = ~hash;
-   slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-   slot += hidx & _PTEIDX_GROUP_IX;
-
-   ret = mmu_hash_ops.hpte_updatepp(slot, rflags, vpn,
+   gslot = pte_get_hash_gslot(vpn, shift, ssize, rpte,
+   subpg_index);
+   ret = mmu_hash_ops.hpte_updatepp(gslot, rflags, vpn,
 MMU_PAGE_4K, MMU_PAGE_4K,
 ssize, flags);
/*
-*if we failed because typically the HPTE wasn't really here
+* if we failed because typically the HPTE wasn't really here
 * we try an insertion.
 */
if (ret == -1)
@@ -221,20 +215,10 @@ int __hash_page_4K(unsigned long ea, unsigned long 
access, unsigned long vsid,
   MMU_PAGE_4K, MMU_PAGE_4K, old_pte);
return -1;
}
-   /*
-* Insert slot number & secondary bit in PTE second half,
-* clear H_PAGE_BUSY and set appropriate HPTE slot bit
-* Since we have H

[PATCH v2] sound: Convert to using %pOF instead of full_name

2017-07-21 Thread Rob Herring
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.

Signed-off-by: Rob Herring 
Cc: Johannes Berg 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Timur Tabi 
Cc: Nicolin Chen 
Cc: Xiubo Li 
Cc: Fabio Estevam 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: alsa-de...@alsa-project.org
---
v2:
- Add printf specifier change for fsl_dma.c

 sound/aoa/codecs/tas.c |  4 ++--
 sound/soc/fsl/fsl_dma.c|  4 ++--
 sound/soc/fsl/imx-audmux.c | 16 
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/sound/aoa/codecs/tas.c b/sound/aoa/codecs/tas.c
index 733b6365dad6..15c05755d270 100644
--- a/sound/aoa/codecs/tas.c
+++ b/sound/aoa/codecs/tas.c
@@ -905,8 +905,8 @@ static int tas_i2c_probe(struct i2c_client *client,
goto fail;
}
printk(KERN_DEBUG
-  "snd-aoa-codec-tas: tas found, addr 0x%02x on %s\n",
-  (unsigned int)client->addr, node->full_name);
+  "snd-aoa-codec-tas: tas found, addr 0x%02x on %pOF\n",
+  (unsigned int)client->addr, node);
return 0;
  fail:
mutex_destroy(&tas->mtx);
diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
index ccadefceeff2..ed8ea002902d 100644
--- a/sound/soc/fsl/fsl_dma.c
+++ b/sound/soc/fsl/fsl_dma.c
@@ -897,8 +897,8 @@ static int fsl_soc_dma_probe(struct platform_device *pdev)
 
ret = of_address_to_resource(ssi_np, 0, &res);
if (ret) {
-   dev_err(&pdev->dev, "could not determine resources for %s\n",
-   ssi_np->full_name);
+   dev_err(&pdev->dev, "could not determine resources for %pOF\n",
+   ssi_np);
of_node_put(ssi_np);
return ret;
}
diff --git a/sound/soc/fsl/imx-audmux.c b/sound/soc/fsl/imx-audmux.c
index fc57da341d61..392d5eef356d 100644
--- a/sound/soc/fsl/imx-audmux.c
+++ b/sound/soc/fsl/imx-audmux.c
@@ -268,13 +268,13 @@ static int imx_audmux_parse_dt_defaults(struct 
platform_device *pdev,
 
ret = of_property_read_u32(child, "fsl,audmux-port", &port);
if (ret) {
-   dev_warn(&pdev->dev, "Failed to get fsl,audmux-port of 
child node \"%s\"\n",
-   child->full_name);
+   dev_warn(&pdev->dev, "Failed to get fsl,audmux-port of 
child node \"%pOF\"\n",
+   child);
continue;
}
if (!of_property_read_bool(child, "fsl,port-config")) {
-   dev_warn(&pdev->dev, "child node \"%s\" does not have 
property fsl,port-config\n",
-   child->full_name);
+   dev_warn(&pdev->dev, "child node \"%pOF\" does not have 
property fsl,port-config\n",
+   child);
continue;
}
 
@@ -292,15 +292,15 @@ static int imx_audmux_parse_dt_defaults(struct 
platform_device *pdev,
}
 
if (ret != -EOVERFLOW) {
-   dev_err(&pdev->dev, "Failed to read u32 at index %d of 
child %s\n",
-   i, child->full_name);
+   dev_err(&pdev->dev, "Failed to read u32 at index %d of 
child %pOF\n",
+   i, child);
continue;
}
 
if (audmux_type == IMX31_AUDMUX) {
if (i % 2) {
-   dev_err(&pdev->dev, "One pdcr value is missing 
in child node %s\n",
-   child->full_name);
+   dev_err(&pdev->dev, "One pdcr value is missing 
in child node %pOF\n",
+   child);
continue;
}
imx_audmux_v2_configure_port(port, ptcr, pdcr);
-- 
2.11.0



[PATCH v2] sound: fsl_dma: remove dma_object path member

2017-07-21 Thread Rob Herring
dma_object.path is unused, so rather than fix it to work with DT
full_name changes, just remove it.

Signed-off-by: Rob Herring 
Cc: Timur Tabi 
Cc: Nicolin Chen 
Cc: Xiubo Li 
Cc: Fabio Estevam 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: alsa-de...@alsa-project.org
Cc: linuxppc-dev@lists.ozlabs.org
---
v2:
- Move printf specifier change to correct patch

 sound/soc/fsl/fsl_dma.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
index ed8ea002902d..b38dd328c783 100644
--- a/sound/soc/fsl/fsl_dma.c
+++ b/sound/soc/fsl/fsl_dma.c
@@ -63,7 +63,6 @@ struct dma_object {
struct ccsr_dma_channel __iomem *channel;
unsigned int irq;
bool assigned;
-   char path[1];
 };
 
 /*
@@ -903,14 +902,13 @@ static int fsl_soc_dma_probe(struct platform_device *pdev)
return ret;
}
 
-   dma = kzalloc(sizeof(*dma) + strlen(np->full_name), GFP_KERNEL);
+   dma = kzalloc(sizeof(*dma), GFP_KERNEL);
if (!dma) {
dev_err(&pdev->dev, "could not allocate dma object\n");
of_node_put(ssi_np);
return -ENOMEM;
}
 
-   strcpy(dma->path, np->full_name);
dma->dai.ops = &fsl_dma_ops;
dma->dai.pcm_new = fsl_dma_new;
dma->dai.pcm_free = fsl_dma_free_dma_buffers;
-- 
2.11.0



Re: [PATCH v2] sound: Convert to using %pOF instead of full_name

2017-07-21 Thread Takashi Iwai
On Fri, 21 Jul 2017 21:28:34 +0200,
Rob Herring wrote:
> 
> Now that we have a custom printf format specifier, convert users of
> full_name to use %pOF instead. This is preparation to remove storing
> of the full path string for each node.
> 
> Signed-off-by: Rob Herring 
> Cc: Johannes Berg 
> Cc: Jaroslav Kysela 
> Cc: Takashi Iwai 
> Cc: Timur Tabi 
> Cc: Nicolin Chen 
> Cc: Xiubo Li 
> Cc: Fabio Estevam 
> Cc: Liam Girdwood 
> Cc: Mark Brown 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: alsa-de...@alsa-project.org
> ---
> v2:
> - Add printf specifier change for fsl_dma.c

Could you split this one to the one for sound/aoa/* and one for
sound/soc/fsl?  It makes much easier to apply the patch, as the former
goes directly to me while the latter through Mark's ASoC tree.


thanks,

Takashi


> 
>  sound/aoa/codecs/tas.c |  4 ++--
>  sound/soc/fsl/fsl_dma.c|  4 ++--
>  sound/soc/fsl/imx-audmux.c | 16 
>  3 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/sound/aoa/codecs/tas.c b/sound/aoa/codecs/tas.c
> index 733b6365dad6..15c05755d270 100644
> --- a/sound/aoa/codecs/tas.c
> +++ b/sound/aoa/codecs/tas.c
> @@ -905,8 +905,8 @@ static int tas_i2c_probe(struct i2c_client *client,
>   goto fail;
>   }
>   printk(KERN_DEBUG
> -"snd-aoa-codec-tas: tas found, addr 0x%02x on %s\n",
> -(unsigned int)client->addr, node->full_name);
> +"snd-aoa-codec-tas: tas found, addr 0x%02x on %pOF\n",
> +(unsigned int)client->addr, node);
>   return 0;
>   fail:
>   mutex_destroy(&tas->mtx);
> diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
> index ccadefceeff2..ed8ea002902d 100644
> --- a/sound/soc/fsl/fsl_dma.c
> +++ b/sound/soc/fsl/fsl_dma.c
> @@ -897,8 +897,8 @@ static int fsl_soc_dma_probe(struct platform_device *pdev)
>  
>   ret = of_address_to_resource(ssi_np, 0, &res);
>   if (ret) {
> - dev_err(&pdev->dev, "could not determine resources for %s\n",
> - ssi_np->full_name);
> + dev_err(&pdev->dev, "could not determine resources for %pOF\n",
> + ssi_np);
>   of_node_put(ssi_np);
>   return ret;
>   }
> diff --git a/sound/soc/fsl/imx-audmux.c b/sound/soc/fsl/imx-audmux.c
> index fc57da341d61..392d5eef356d 100644
> --- a/sound/soc/fsl/imx-audmux.c
> +++ b/sound/soc/fsl/imx-audmux.c
> @@ -268,13 +268,13 @@ static int imx_audmux_parse_dt_defaults(struct 
> platform_device *pdev,
>  
>   ret = of_property_read_u32(child, "fsl,audmux-port", &port);
>   if (ret) {
> - dev_warn(&pdev->dev, "Failed to get fsl,audmux-port of 
> child node \"%s\"\n",
> - child->full_name);
> + dev_warn(&pdev->dev, "Failed to get fsl,audmux-port of 
> child node \"%pOF\"\n",
> + child);
>   continue;
>   }
>   if (!of_property_read_bool(child, "fsl,port-config")) {
> - dev_warn(&pdev->dev, "child node \"%s\" does not have 
> property fsl,port-config\n",
> - child->full_name);
> + dev_warn(&pdev->dev, "child node \"%pOF\" does not have 
> property fsl,port-config\n",
> + child);
>   continue;
>   }
>  
> @@ -292,15 +292,15 @@ static int imx_audmux_parse_dt_defaults(struct 
> platform_device *pdev,
>   }
>  
>   if (ret != -EOVERFLOW) {
> - dev_err(&pdev->dev, "Failed to read u32 at index %d of 
> child %s\n",
> - i, child->full_name);
> + dev_err(&pdev->dev, "Failed to read u32 at index %d of 
> child %pOF\n",
> + i, child);
>   continue;
>   }
>  
>   if (audmux_type == IMX31_AUDMUX) {
>   if (i % 2) {
> - dev_err(&pdev->dev, "One pdcr value is missing 
> in child node %s\n",
> - child->full_name);
> + dev_err(&pdev->dev, "One pdcr value is missing 
> in child node %pOF\n",
> + child);
>   continue;
>   }
>   imx_audmux_v2_configure_port(port, ptcr, pdcr);
> -- 
> 2.11.0
> 
> 


[RFC PATCH 00/11] relaxing allocation constraints on radix and bare metal

2017-07-21 Thread Nicholas Piggin
Hi,

This series isn't greatly polished or well tested yet, but it's
conceptually simple so I'll just get some comments. Powernv does not
have any real mode access limitation, and ISA3 radix does not have
any SLB/TLB limitation on the kernel linear address. We also can
avoid some allocations when not running as a LPAR guest.

After lifting these limits, we're in a better position to make some
of our important structures node-local. At least on radix. Hash could
as well, if node > 0 CPUs were to also bolt SLB corresponding with
their local memory (but that's left as an exercise for the reader).

Anyway, I'd be interested in comments.

Thanks,
Nick

Nicholas Piggin (11):
  powerpc/powernv: powernv platform is not constrained by RMA
  powerpc/powernv: Remove real mode access limit for early allocations
  powerpc/64s/radix: Remove SLB address limit for per-cpu stacks
  powerpc/64s: Relax PACA address limitations
  powerpc/64s/radix: Do not allocate SLB shadow structures
  powerpc/64s: do not allocate lppaca if we are not virtualized
  mm: make memblock_alloc_base_nid non-static
  powerpc/64: Allocate PACAs node-local if possible
  powerpc/64s: Allocate LPPACAs node-local if possible
  powerpc/64: allocate per-cpu stacks node-local if possible
  powerpc/64s/radix: allocate kernel page tables node-local if possible

 arch/powerpc/include/asm/book3s/64/hash.h|   2 +-
 arch/powerpc/include/asm/book3s/64/radix.h   |   2 +-
 arch/powerpc/include/asm/kvm_ppc.h   |   8 +-
 arch/powerpc/include/asm/lppaca.h|  15 +-
 arch/powerpc/include/asm/paca.h  |  12 +-
 arch/powerpc/include/asm/pmc.h   |  10 +-
 arch/powerpc/include/asm/smp.h   |   4 +-
 arch/powerpc/include/asm/sparsemem.h |   2 +-
 arch/powerpc/kernel/asm-offsets.c|   7 +
 arch/powerpc/kernel/crash.c  |   2 +-
 arch/powerpc/kernel/head_64.S|  12 +-
 arch/powerpc/kernel/machine_kexec_64.c   |  37 +++--
 arch/powerpc/kernel/paca.c   | 192 
 arch/powerpc/kernel/prom.c   |  10 +-
 arch/powerpc/kernel/setup_64.c   |  69 +
 arch/powerpc/kernel/smp.c|  10 +-
 arch/powerpc/kvm/book3s_hv.c |  21 +--
 arch/powerpc/kvm/book3s_hv_builtin.c |   2 +-
 arch/powerpc/kvm/book3s_hv_interrupts.S  |   3 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   5 +-
 arch/powerpc/mm/hash_utils_64.c  |  26 ++--
 arch/powerpc/mm/mem.c|   4 +-
 arch/powerpc/mm/numa.c   |  13 +-
 arch/powerpc/mm/pgtable-book3s64.c   |   6 +-
 arch/powerpc/mm/pgtable-radix.c  | 210 +--
 arch/powerpc/platforms/85xx/smp.c|   8 +-
 arch/powerpc/platforms/cell/smp.c|   4 +-
 arch/powerpc/platforms/powernv/idle.c|  13 +-
 arch/powerpc/platforms/powernv/opal.c|   7 +-
 arch/powerpc/platforms/powernv/setup.c   |   4 +-
 arch/powerpc/platforms/powernv/smp.c |   2 +-
 arch/powerpc/platforms/powernv/subcore.c |   2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |   2 +-
 arch/powerpc/platforms/pseries/lpar.c|   4 +-
 arch/powerpc/platforms/pseries/setup.c   |   2 +-
 arch/powerpc/platforms/pseries/smp.c |   4 +-
 arch/powerpc/sysdev/xics/icp-native.c|   2 +-
 arch/powerpc/xmon/xmon.c |   2 +-
 include/linux/memblock.h |   5 +-
 mm/memblock.c|   2 +-
 40 files changed, 465 insertions(+), 282 deletions(-)

-- 
2.11.0



[RFC PATCH 01/11] powerpc/powernv: powernv platform is not constrained by RMA

2017-07-21 Thread Nicholas Piggin
Remove incorrect comment about real mode address restrictions on
powernv (bare metal), and unnecessary clamping to ppc64_rma_size.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/opal.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 9b87abb178f0..bcdca4144362 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -162,12 +162,9 @@ int __init early_init_dt_scan_recoverable_ranges(unsigned 
long node,
sizeof(struct mcheck_recoverable_range);
 
/*
-* Allocate a buffer to hold the MC recoverable ranges. We would be
-* accessing them in real mode, hence it needs to be within
-* RMO region.
+* Allocate a buffer to hold the MC recoverable ranges.
 */
-   mc_recoverable_range =__va(memblock_alloc_base(size, __alignof__(u64),
-   ppc64_rma_size));
+   mc_recoverable_range =__va(memblock_alloc(size, __alignof__(u64)));
memset(mc_recoverable_range, 0, size);
 
for (i = 0; i < mc_recoverable_range_len; i++) {
-- 
2.11.0



[RFC PATCH 02/11] powerpc/powernv: Remove real mode access limit for early allocations

2017-07-21 Thread Nicholas Piggin
This removes the RMA limit on powernv platform, which constrains
early allocations such as PACAs and stacks. There are still other
restrictions that must be followed, such as bolted SLB limits, but
real mode addressing has no constraints.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/hash_utils_64.c | 24 +++-
 arch/powerpc/mm/pgtable-radix.c | 33 +
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 7a20669c19e7..d3da19cc4867 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1824,16 +1824,22 @@ void hash__setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
 */
BUG_ON(first_memblock_base != 0);
 
-   /* On LPAR systems, the first entry is our RMA region,
-* non-LPAR 64-bit hash MMU systems don't have a limitation
-* on real mode access, but using the first entry works well
-* enough. We also clamp it to 1G to avoid some funky things
-* such as RTAS bugs etc...
-*/
-   ppc64_rma_size = min_t(u64, first_memblock_size, 0x4000);
+   if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
+   /*
+* On virtualized systems, the first entry is our RMA region,
+* non-LPAR 64-bit hash MMU systems don't have a limitation
+* on real mode access.
+*
+* We also clamp it to 1G to avoid some funky things
+* such as RTAS bugs etc...
+*/
+   ppc64_rma_size = min_t(u64, first_memblock_size, 0x4000);
 
-   /* Finally limit subsequent allocations */
-   memblock_set_current_limit(ppc64_rma_size);
+   /* Finally limit subsequent allocations */
+   memblock_set_current_limit(ppc64_rma_size);
+   } else {
+   ppc64_rma_size = ULONG_MAX;
+   }
 }
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 8c13e4282308..897655ed067e 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -548,22 +548,23 @@ void radix__setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
 * physical on those processors
 */
BUG_ON(first_memblock_base != 0);
-   /*
-* We limit the allocation that depend on ppc64_rma_size
-* to first_memblock_size. We also clamp it to 1GB to
-* avoid some funky things such as RTAS bugs.
-*
-* On radix config we really don't have a limitation
-* on real mode access. But keeping it as above works
-* well enough.
-*/
-   ppc64_rma_size = min_t(u64, first_memblock_size, 0x4000);
-   /*
-* Finally limit subsequent allocations. We really don't want
-* to limit the memblock allocations to rma_size. FIXME!! should
-* we even limit at all ?
-*/
-   memblock_set_current_limit(first_memblock_base + first_memblock_size);
+
+   if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
+   /*
+* On virtualized systems, the first entry is our RMA region,
+* non-LPAR 64-bit hash MMU systems don't have a limitation
+* on real mode access.
+*
+* We also clamp it to 1G to avoid some funky things
+* such as RTAS bugs etc...
+*/
+   ppc64_rma_size = min_t(u64, first_memblock_size, 0x4000);
+
+   /* Finally limit subsequent allocations */
+   memblock_set_current_limit(ppc64_rma_size);
+   } else {
+   ppc64_rma_size = ULONG_MAX;
+   }
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.11.0



[RFC PATCH 03/11] powerpc/64s/radix: Remove SLB address limit for per-cpu stacks

2017-07-21 Thread Nicholas Piggin
Radix MMU does not take SLB or TLB interrupts when accessing kernel
linear address. Remove this restriction for radix mode.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup_64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index af23d4b576ec..4ecc4315b308 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -564,6 +564,9 @@ static __init u64 safe_stack_limit(void)
/* Other BookE, we assume the first GB is bolted */
return 1ul << 30;
 #else
+   if (early_radix_enabled())
+   return ULONG_MAX;
+
/* BookS, the first segment is bolted */
if (mmu_has_feature(MMU_FTR_1T_SEGMENT))
return 1UL << SID_SHIFT_1T;
-- 
2.11.0



[RFC PATCH 04/11] powerpc/64s: Relax PACA address limitations

2017-07-21 Thread Nicholas Piggin
Book3S radix-mode has no SLB interrupt limitation, and hash-mode has
a 1T limitation on modern CPUs.

Update the paca alloation limits to match the stack allocation. Book3E
still needs a look.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/paca.c | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 8d63627e067f..fd7deb79110a 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -194,22 +194,39 @@ void setup_paca(struct paca_struct *new_paca)
 
 static int __initdata paca_size;
 
-void __init allocate_pacas(void)
+static __init unsigned long safe_paca_limit(void)
 {
-   u64 limit;
-   int cpu;
+   unsigned long limit = ULONG_MAX;
 
-   limit = ppc64_rma_size;
-
-#ifdef CONFIG_PPC_BOOK3S_64
/*
-* We can't take SLB misses on the paca, and we want to access them
-* in real mode, so allocate them within the RMA and also within
-* the first segment.
+* We access pacas in real mode, so allocate them within real mode
+* constraints.
 */
-   limit = min(0x1000ULL, limit);
+   limit = min((unsigned long)ppc64_rma_size, limit);
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (!early_radix_enabled()) {
+   /*
+* We can't take SLB misses on the paca, so allocate them
+* within the first segment.
+*/
+   if (mmu_has_feature(MMU_FTR_1T_SEGMENT))
+   limit = min(1UL << SID_SHIFT_1T, limit);
+   else
+   limit = min(1UL << SID_SHIFT, limit);
+   }
 #endif
 
+   /* XXX: what about Book3E? (e.g., see safe_stack_limit) */
+
+   return limit;
+}
+
+void __init allocate_pacas(void)
+{
+   unsigned long limit = safe_paca_limit();
+   int cpu;
+
paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids);
 
paca = __va(memblock_alloc_base(paca_size, PAGE_SIZE, limit));
-- 
2.11.0



[RFC PATCH 05/11] powerpc/64s/radix: Do not allocate SLB shadow structures

2017-07-21 Thread Nicholas Piggin
These are unused in radix mode.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/paca.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index fd7deb79110a..032b073c2c30 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -104,13 +104,22 @@ static struct slb_shadow *slb_shadow;
 static void __init allocate_slb_shadows(int nr_cpus, int limit)
 {
int size = PAGE_ALIGN(sizeof(struct slb_shadow) * nr_cpus);
+
+   if (early_radix_enabled())
+   return;
+
slb_shadow = __va(memblock_alloc_base(size, PAGE_SIZE, limit));
memset(slb_shadow, 0, size);
 }
 
 static struct slb_shadow * __init init_slb_shadow(int cpu)
 {
-   struct slb_shadow *s = &slb_shadow[cpu];
+   struct slb_shadow *s;
+
+   if (early_radix_enabled())
+   return NULL;
+
+   s = &slb_shadow[cpu];
 
/*
 * When we come through here to initialise boot_paca, the slb_shadow
-- 
2.11.0



[RFC PATCH 06/11] powerpc/64s: do not allocate lppaca if we are not virtualized

2017-07-21 Thread Nicholas Piggin
The "lppaca" is a structure registered with the hypervisor. This
is unnecessary when running on non-virtualised platforms. One field
from the lppaca is also used by the host, so move that out into the
paca.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/paca.h |  8 ++--
 arch/powerpc/include/asm/pmc.h  | 10 +-
 arch/powerpc/kernel/asm-offsets.c   |  7 +++
 arch/powerpc/kernel/paca.c  | 13 ++---
 arch/powerpc/kernel/prom.c  | 10 +++---
 arch/powerpc/kvm/book3s_hv_interrupts.S |  3 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++---
 7 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index dc88a31cc79a..de47c5a4f132 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -57,7 +57,7 @@ struct task_struct;
  * processor.
  */
 struct paca_struct {
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
/*
 * Because hw_cpu_id, unlike other paca fields, is accessed
 * routinely from other CPUs (from the IRQ code), we stick to
@@ -66,7 +66,8 @@ struct paca_struct {
 */
 
struct lppaca *lppaca_ptr;  /* Pointer to LpPaca for PLIC */
-#endif /* CONFIG_PPC_BOOK3S */
+#endif /* CONFIG_PPC_PSERIES */
+
/*
 * MAGIC: the spinlock functions in arch/powerpc/lib/locks.c 
 * load lock_token and paca_index with a single lwz
@@ -158,6 +159,9 @@ struct paca_struct {
u64 saved_r1;   /* r1 save for RTAS calls or PM */
u64 saved_msr;  /* MSR saved here by enter_rtas */
u16 trap_save;  /* Used when bad stack is encountered */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   u8 pmcregs_in_use;  /* pseries puts this in lppaca */
+#endif
u8 soft_enabled;/* irq soft-enable flag */
u8 irq_happened;/* irq happened while soft-disabled */
u8 io_sync; /* writel() needs spin_unlock sync */
diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h
index 5a9ede4962cb..7b672a72cb0b 100644
--- a/arch/powerpc/include/asm/pmc.h
+++ b/arch/powerpc/include/asm/pmc.h
@@ -31,10 +31,18 @@ void ppc_enable_pmcs(void);
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
+#include 
 
 static inline void ppc_set_pmu_inuse(int inuse)
 {
-   get_lppaca()->pmcregs_in_use = inuse;
+#ifdef CONFIG_PPC_PSERIES
+   if (firmware_has_feature(FW_FEATURE_LPAR))
+   get_lppaca()->pmcregs_in_use = inuse;
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   get_paca()->pmcregs_in_use = inuse;
+#endif
 }
 
 extern void power4_enable_pmcs(void);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 6e95c2c19a7e..831b277c91c7 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -221,12 +221,19 @@ int main(void)
OFFSET(PACA_EXMC, paca_struct, exmc);
OFFSET(PACA_EXSLB, paca_struct, exslb);
OFFSET(PACA_EXNMI, paca_struct, exnmi);
+#ifdef CONFIG_PPC_PSERIES
OFFSET(PACALPPACAPTR, paca_struct, lppaca_ptr);
+#endif
OFFSET(PACA_SLBSHADOWPTR, paca_struct, slb_shadow_ptr);
OFFSET(SLBSHADOW_STACKVSID, slb_shadow, save_area[SLB_NUM_BOLTED - 
1].vsid);
OFFSET(SLBSHADOW_STACKESID, slb_shadow, save_area[SLB_NUM_BOLTED - 
1].esid);
OFFSET(SLBSHADOW_SAVEAREA, slb_shadow, save_area);
+#ifdef CONFIG_PPC_PSERIES
OFFSET(LPPACA_PMCINUSE, lppaca, pmcregs_in_use);
+#endif
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   OFFSET(PACA_PMCINUSE, paca_struct, pmcregs_in_use);
+#endif
OFFSET(LPPACA_DTLIDX, lppaca, dtl_idx);
OFFSET(LPPACA_YIELDCOUNT, lppaca, yield_count);
OFFSET(PACA_DTL_RIDX, paca_struct, dtl_ridx);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 032b073c2c30..801ce2e9c9ac 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -18,7 +18,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
 
 /*
  * The structure which the hypervisor knows about - this structure
@@ -45,6 +45,9 @@ static long __initdata lppaca_size;
 
 static void __init allocate_lppacas(int nr_cpus, unsigned long limit)
 {
+   if (!firmware_has_feature(FW_FEATURE_LPAR))
+   return;
+
if (nr_cpus <= NR_LPPACAS)
return;
 
@@ -58,6 +61,9 @@ static struct lppaca * __init new_lppaca(int cpu)
 {
struct lppaca *lp;
 
+   if (!firmware_has_feature(FW_FEATURE_LPAR))
+   return NULL;
+
if (cpu < NR_LPPACAS)
return &lppaca[cpu];
 
@@ -155,9 +161,10 @@ EXPORT_SYMBOL(paca);
 
 void __init initialise_paca(struct paca_struct *new_paca, int cpu)
 {
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
new_paca->lppaca_ptr = new_lppaca(cpu);
-#else
+#endif
+

[RFC PATCH 07/11] mm: make memblock_alloc_base_nid non-static

2017-07-21 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 include/linux/memblock.h | 5 -
 mm/memblock.c| 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 77d427974f57..03731a1ffa76 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -298,9 +298,12 @@ static inline bool memblock_bottom_up(void)
 #define MEMBLOCK_ALLOC_ANYWHERE(~(phys_addr_t)0)
 #define MEMBLOCK_ALLOC_ACCESSIBLE  0
 
-phys_addr_t __init memblock_alloc_range(phys_addr_t size, phys_addr_t align,
+phys_addr_t memblock_alloc_range(phys_addr_t size, phys_addr_t align,
phys_addr_t start, phys_addr_t end,
ulong flags);
+phys_addr_t memblock_alloc_base_nid(phys_addr_t size,
+   phys_addr_t align, phys_addr_t max_addr,
+   int nid, ulong flags);
 phys_addr_t memblock_alloc_base(phys_addr_t size, phys_addr_t align,
phys_addr_t max_addr);
 phys_addr_t __memblock_alloc_base(phys_addr_t size, phys_addr_t align,
diff --git a/mm/memblock.c b/mm/memblock.c
index 2cb25fe4452c..29c6b483581f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1194,7 +1194,7 @@ phys_addr_t __init memblock_alloc_range(phys_addr_t size, 
phys_addr_t align,
flags);
 }
 
-static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
+phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t max_addr,
int nid, ulong flags)
 {
-- 
2.11.0



[RFC PATCH 08/11] powerpc/64: Allocate PACAs node-local if possible

2017-07-21 Thread Nicholas Piggin
Change the paca array into an array of pointers to pacas. Allocate
pacas individually per CPU. Try to allocate node-local if possible.

Book3E not yet compiled.

Hash mode won't be able to get per-node allocations, but in theory
on node > 0 CPUs we could bolt an SLB at the bottom of their node
memory to provide SLB-safe node-local memory. For now just get it to
work with radix.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_ppc.h   |  8 ++--
 arch/powerpc/include/asm/lppaca.h|  2 +-
 arch/powerpc/include/asm/paca.h  |  4 +-
 arch/powerpc/include/asm/smp.h   |  4 +-
 arch/powerpc/kernel/crash.c  |  2 +-
 arch/powerpc/kernel/head_64.S| 12 +++--
 arch/powerpc/kernel/machine_kexec_64.c   | 22 -
 arch/powerpc/kernel/paca.c   | 68 
 arch/powerpc/kernel/setup_64.c   | 18 
 arch/powerpc/kernel/smp.c| 10 ++--
 arch/powerpc/kvm/book3s_hv.c | 21 +
 arch/powerpc/kvm/book3s_hv_builtin.c |  2 +-
 arch/powerpc/platforms/85xx/smp.c|  8 ++--
 arch/powerpc/platforms/cell/smp.c|  4 +-
 arch/powerpc/platforms/powernv/idle.c| 13 +++---
 arch/powerpc/platforms/powernv/setup.c   |  4 +-
 arch/powerpc/platforms/powernv/smp.c |  2 +-
 arch/powerpc/platforms/powernv/subcore.c |  2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  2 +-
 arch/powerpc/platforms/pseries/lpar.c|  4 +-
 arch/powerpc/platforms/pseries/setup.c   |  2 +-
 arch/powerpc/platforms/pseries/smp.c |  4 +-
 arch/powerpc/sysdev/xics/icp-native.c|  2 +-
 arch/powerpc/xmon/xmon.c |  2 +-
 24 files changed, 128 insertions(+), 94 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index ba5fadd6f3c9..49da5d47c693 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -428,15 +428,15 @@ struct openpic;
 extern void kvm_cma_reserve(void) __init;
 static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
 {
-   paca[cpu].kvm_hstate.xics_phys = (void __iomem *)addr;
+   paca_ptrs[cpu]->kvm_hstate.xics_phys = (void __iomem *)addr;
 }
 
 static inline void kvmppc_set_xive_tima(int cpu,
unsigned long phys_addr,
void __iomem *virt_addr)
 {
-   paca[cpu].kvm_hstate.xive_tima_phys = (void __iomem *)phys_addr;
-   paca[cpu].kvm_hstate.xive_tima_virt = virt_addr;
+   paca_ptrs[cpu]->kvm_hstate.xive_tima_phys = (void __iomem *)phys_addr;
+   paca_ptrs[cpu]->kvm_hstate.xive_tima_virt = virt_addr;
 }
 
 static inline u32 kvmppc_get_xics_latch(void)
@@ -450,7 +450,7 @@ static inline u32 kvmppc_get_xics_latch(void)
 
 static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
 {
-   paca[cpu].kvm_hstate.host_ipi = host_ipi;
+   paca_ptrs[cpu]->kvm_hstate.host_ipi = host_ipi;
 }
 
 static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index d0a2a2f99564..6e4589eee2da 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -103,7 +103,7 @@ struct lppaca {
 
 extern struct lppaca lppaca[];
 
-#define lppaca_of(cpu) (*paca[cpu].lppaca_ptr)
+#define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
 
 /*
  * We are using a non architected field to determine if a partition is
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index de47c5a4f132..f332f92996ab 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -228,10 +228,10 @@ struct paca_struct {
struct sibling_subcore_state *sibling_subcore_state;
 #endif
 #endif
-};
+} cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
-extern struct paca_struct *paca;
+extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
 extern void setup_paca(struct paca_struct *new_paca);
 extern void allocate_pacas(void);
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 8ea98504f900..1100574bcccd 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -164,12 +164,12 @@ static inline const struct cpumask *cpu_sibling_mask(int 
cpu)
 #ifdef CONFIG_PPC64
 static inline int get_hard_smp_processor_id(int cpu)
 {
-   return paca[cpu].hw_cpu_id;
+   return paca_ptrs[cpu]->hw_cpu_id;
 }
 
 static inline void set_hard_smp_processor_id(int cpu, int phys)
 {
-   paca[cpu].hw_cpu_id = phys;
+   paca_ptrs[cpu]->hw_cpu_id = phys;
 }
 #else
 /* 32-bit */
diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index cbabb5adccd9..99eb8fd87d6f 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/power

[RFC PATCH 09/11] powerpc/64s: Allocate LPPACAs node-local if possible

2017-07-21 Thread Nicholas Piggin
Similary to the previous patch, allocate LPPACAs on a per-CPU basis,
attempting to get node-local memory.
---
 arch/powerpc/include/asm/lppaca.h  | 13 ++-
 arch/powerpc/kernel/machine_kexec_64.c | 15 ++--
 arch/powerpc/kernel/paca.c | 65 +++---
 arch/powerpc/mm/numa.c |  4 +--
 4 files changed, 52 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index 6e4589eee2da..78f171f298b7 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -36,14 +36,7 @@
 #include 
 
 /*
- * We only have to have statically allocated lppaca structs on
- * legacy iSeries, which supports at most 64 cpus.
- */
-#define NR_LPPACAS 1
-
-/*
- * The Hypervisor barfs if the lppaca crosses a page boundary.  A 1k
- * alignment is sufficient to prevent this
+ * The Hypervisor barfs if the lppaca crosses a page boundary.
  */
 struct lppaca {
/* cacheline 1 contains read-only data */
@@ -99,9 +92,7 @@ struct lppaca {
u8  reserved11[148];
volatile __be64 dtl_idx;/* Dispatch Trace Log head 
index */
u8  reserved12[96];
-} __attribute__((__aligned__(0x400)));
-
-extern struct lppaca lppaca[];
+} cacheline_aligned;
 
 #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 700cd25fbd28..c439277e0cf8 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -286,6 +286,10 @@ static union thread_union kexec_stack __init_task_data =
  * static PACA; we switch to kexec_paca.
  */
 struct paca_struct kexec_paca;
+#ifdef CONFIG_PPC_PSERIES
+/* align lppaca to 1K to avoid crossing page boundary */
+struct lppaca kexec_lppaca __attribute__((aligned(0x400)));
+#endif
 
 /* Our assembly helper, in misc_64.S */
 extern void kexec_sequence(void *newstack, unsigned long start,
@@ -329,11 +333,16 @@ void default_machine_kexec(struct kimage *image)
memcpy(&kexec_paca, get_paca(), sizeof(struct paca_struct));
kexec_paca.data_offset = 0xedeaddeadeeeUL;
paca_ptrs[kexec_paca.paca_index] = &kexec_paca;
+
+#ifdef CONFIG_PPC_PSERIES
+   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+   memcpy(&kexec_lppaca, get_lppaca(), sizeof(struct lppaca));
+   kexec_paca.lppaca_ptr = &kexec_lppaca;
+   }
+#endif
+
setup_paca(&kexec_paca);
 
-   /* XXX: If anyone does 'dynamic lppacas' this will also need to be
-* switched to a static version!
-*/
/*
 * On Book3S, the copy must happen with the MMU off if we are either
 * using Radix page tables or we are not in an LPAR since we can
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index bf5f5820a3e4..d929d146b977 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 
+static int __initdata paca_nr_cpu_ids;
+
 #ifdef CONFIG_PPC_PSERIES
 
 /*
@@ -29,32 +31,42 @@
  * change since the hypervisor knows its layout, so a 1kB alignment
  * will suffice to ensure that it doesn't cross a page boundary.
  */
-struct lppaca lppaca[] = {
-   [0 ... (NR_LPPACAS-1)] = {
+static inline void init_lppaca(struct lppaca *lppaca)
+{
+   *lppaca = (struct lppaca) {
.desc = cpu_to_be32(0xd397d781),/* "LpPa" */
.size = cpu_to_be16(sizeof(struct lppaca)),
.fpregs_in_use = 1,
.slb_count = cpu_to_be16(64),
.vmxregs_in_use = 0,
-   .page_ins = 0,
-   },
+   .page_ins = 0, };
 };
 
-static struct lppaca *extra_lppacas;
-static long __initdata lppaca_size;
+static struct lppaca ** __initdata lppaca_ptrs;
+
+static long __initdata lppaca_ptrs_size;
 
 static void __init allocate_lppacas(int nr_cpus, unsigned long limit)
 {
+   int cpu;
+
if (!firmware_has_feature(FW_FEATURE_LPAR))
return;
 
-   if (nr_cpus <= NR_LPPACAS)
-   return;
+   lppaca_ptrs_size = sizeof(struct lppaca *) * nr_cpu_ids;
+   lppaca_ptrs = __va(memblock_alloc_base(lppaca_ptrs_size, 0, limit));
+
+   for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+   unsigned long pa;
 
-   lppaca_size = PAGE_ALIGN(sizeof(struct lppaca) *
-(nr_cpus - NR_LPPACAS));
-   extra_lppacas = __va(memblock_alloc_base(lppaca_size,
-PAGE_SIZE, limit));
+   pa = memblock_alloc_base_nid(sizeof(struct lppaca), 0x400,
+   limit, early_cpu_to_node(cpu),
+   MEMBLOCK_NONE);
+   if (!pa)
+   pa = memblock_alloc_base(sizeof(struct lppaca), 0x400,
+   

[RFC PATCH 10/11] powerpc/64: allocate per-cpu stacks node-local if possible

2017-07-21 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup_64.c | 51 ++
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index d3c506a93b0b..5c89b771ac81 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -574,6 +574,21 @@ static __init u64 safe_stack_limit(void)
 #endif
 }
 
+static void *__init alloc_stack(unsigned long limit, int cpu)
+{
+   unsigned long pa;
+
+   pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,
+   early_cpu_to_node(cpu), MEMBLOCK_NONE);
+   if (!pa) {
+   pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
+   if (!pa)
+   panic("cannot allocate stacks");
+   }
+
+   return __va(pa);
+}
+
 void __init irqstack_early_init(void)
 {
u64 limit = safe_stack_limit();
@@ -584,12 +599,8 @@ void __init irqstack_early_init(void)
 * cannot afford to take SLB misses on them.
 */
for_each_possible_cpu(i) {
-   softirq_ctx[i] = (struct thread_info *)
-   __va(memblock_alloc_base(THREAD_SIZE,
-   THREAD_SIZE, limit));
-   hardirq_ctx[i] = (struct thread_info *)
-   __va(memblock_alloc_base(THREAD_SIZE,
-   THREAD_SIZE, limit));
+   softirq_ctx[i] = alloc_stack(limit, i);
+   hardirq_ctx[i] = alloc_stack(limit, i);
}
 }
 
@@ -597,20 +608,21 @@ void __init irqstack_early_init(void)
 void __init exc_lvl_early_init(void)
 {
unsigned int i;
-   unsigned long sp;
 
for_each_possible_cpu(i) {
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   critirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->crit_kstack = __va(sp + THREAD_SIZE);
+   void *sp;
+
+   sp = alloc_stack(ULONG_MAX, i);
+   critirq_ctx[i] = sp;
+   paca_ptrs[i]->crit_kstack = sp + THREAD_SIZE;
 
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   dbgirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->dbg_kstack = __va(sp + THREAD_SIZE);
+   sp = alloc_stack(ULONG_MAX, i);
+   dbgirq_ctx[i] = sp;
+   paca_ptrs[i]->dbg_kstack = sp + THREAD_SIZE;
 
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   mcheckirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->mc_kstack = __va(sp + THREAD_SIZE);
+   sp = alloc_stack(ULONG_MAX, i);
+   mcheckirq_ctx[i] = sp;
+   paca_ptrs[i]->mc_kstack = sp + THREAD_SIZE;
}
 
if (cpu_has_feature(CPU_FTR_DEBUG_LVL_EXC))
@@ -663,20 +675,21 @@ void __init emergency_stack_init(void)
 
for_each_possible_cpu(i) {
struct thread_info *ti;
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->emergency_sp = (void *)ti + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->nmi_emergency_sp = (void *)ti + THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->mc_emergency_sp = (void *)ti + THREAD_SIZE;
-- 
2.11.0



[RFC PATCH 11/11] powerpc/64s/radix: allocate kernel page tables node-local if possible

2017-07-21 Thread Nicholas Piggin
Try to allocate kernel page tables according to the node of
the memory they will map.
---
 arch/powerpc/include/asm/book3s/64/hash.h  |   2 +-
 arch/powerpc/include/asm/book3s/64/radix.h |   2 +-
 arch/powerpc/include/asm/sparsemem.h   |   2 +-
 arch/powerpc/kernel/setup_64.c |   3 +
 arch/powerpc/mm/hash_utils_64.c|   2 +-
 arch/powerpc/mm/mem.c  |   4 +-
 arch/powerpc/mm/numa.c |   9 +-
 arch/powerpc/mm/pgtable-book3s64.c |   6 +-
 arch/powerpc/mm/pgtable-radix.c| 177 +++--
 9 files changed, 136 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 0ce513f2926f..99ca49b0a801 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -199,7 +199,7 @@ extern int __meminit hash__vmemmap_create_mapping(unsigned 
long start,
 extern void hash__vmemmap_remove_mapping(unsigned long start,
 unsigned long page_size);
 
-int hash__create_section_mapping(unsigned long start, unsigned long end);
+int hash__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 int hash__remove_section_mapping(unsigned long start, unsigned long end);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 487709ff6875..d9770ce79ebf 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -315,7 +315,7 @@ static inline unsigned long radix__get_tree_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int radix__create_section_mapping(unsigned long start, unsigned long end);
+int radix__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index c88930c9db7f..5411557d7c1f 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -16,7 +16,7 @@
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-extern int create_section_mapping(unsigned long start, unsigned long end);
+extern int create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 5c89b771ac81..5520fad59cf4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -269,6 +269,7 @@ static void cpu_ready_for_interrupts(void)
  * device-tree is not accessible via normal means at this point.
  */
 
+void __init early_initmem_init(void);
 void __init early_setup(unsigned long dt_ptr)
 {
static __initdata struct paca_struct boot_paca;
@@ -313,6 +314,8 @@ void __init early_setup(unsigned long dt_ptr)
apply_feature_fixups();
setup_feature_keys();
 
+   early_initmem_init();
+
/* Initialize the hash table or TLB handling */
early_init_mmu();
 
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index d3da19cc4867..97bfb356d91d 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -777,7 +777,7 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
}
 }
 
-int hash__create_section_mapping(unsigned long start, unsigned long end)
+int hash__create_section_mapping(unsigned long start, unsigned long end, int 
nid)
 {
int rc = htab_bolt_mapping(start, end, __pa(start),
   pgprot_val(PAGE_KERNEL), mmu_linear_psize,
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 8541f18694a4..0542b5f48123 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -117,7 +117,7 @@ int memory_add_physaddr_to_nid(u64 start)
 }
 #endif
 
-int __weak create_section_mapping(unsigned long start, unsigned long end)
+int __weak create_section_mapping(unsigned long start, unsigned long end, int 
nid)
 {
return -ENODEV;
 }
@@ -136,7 +136,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
want_memblock)
resize_hpt_for_hotplug(memblock_phys_mem_size());
 
start = (unsigned long)__va(start);
-   rc = create_section_mapping(start, start + size);
+   rc = create_section_mapping(start, start + size, nid);
if (rc) {
pr_warning(
"Unable to create mapping for hot added memory 
0x%llx..0x%llx: %d\n",
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 55e3fa5fcfb0..4660cf5da6d3 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,12 @@ static void __init setup_node_data(int nid, u64 start_

Re: [PATCH V2 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-07-21 Thread Haren Myneni
On 07/17/2017 11:53 PM, Ram Pai wrote:
> On Mon, Jul 17, 2017 at 04:50:38PM -0700, Haren Myneni wrote:
>>
>> This patch adds P9 NX support for 842 compression engine. Virtual
>> Accelerator Switchboard (VAS) is used to access 842 engine on P9.
>>
>> For each NX engine per chip, setup receive window using
>> vas_rx_win_open() which configures RxFIFo with FIFO address, lpid,
>> pid and tid values. This unique (lpid, pid, tid) combination will
>> be used to identify the target engine.
>>
>> For crypto open request, open send window on the NX engine for
>> the corresponding chip / cpu where the open request is executed.
>> This send window will be closed upon crypto close request.
>>
>> NX provides high and normal priority FIFOs. For compression /
>> decompression requests, we use only hight priority FIFOs in kernel.
>>
>> Each NX request will be communicated to VAS using copy/paste
>> instructions with vas_copy_crb() / vas_paste_crb() functions.
>>
>> Signed-off-by: Haren Myneni 
>> ---
>>  drivers/crypto/nx/Kconfig  |   1 +
>>  drivers/crypto/nx/nx-842-powernv.c | 369 
>> -
>>  drivers/crypto/nx/nx-842.c |   2 +-
>>  3 files changed, 365 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
>> index ad7552a6998c..cd5dda9c48f4 100644
>> --- a/drivers/crypto/nx/Kconfig
>> +++ b/drivers/crypto/nx/Kconfig
>> @@ -38,6 +38,7 @@ config CRYPTO_DEV_NX_COMPRESS_PSERIES
>>  config CRYPTO_DEV_NX_COMPRESS_POWERNV
>>  tristate "Compression acceleration support on PowerNV platform"
>>  depends on PPC_POWERNV
>> +depends on PPC_VAS
>>  default y
>>  help
>>Support for PowerPC Nest (NX) compression acceleration. This
>> diff --git a/drivers/crypto/nx/nx-842-powernv.c 
>> b/drivers/crypto/nx/nx-842-powernv.c
>> index c0dd4c7e17d3..8d9d21420144 100644
>> --- a/drivers/crypto/nx/nx-842-powernv.c
>> +++ b/drivers/crypto/nx/nx-842-powernv.c
>> @@ -23,6 +23,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  MODULE_LICENSE("GPL");
>>  MODULE_AUTHOR("Dan Streetman ");
>> @@ -32,6 +33,9 @@ MODULE_ALIAS_CRYPTO("842-nx");
>>
>>  #define WORKMEM_ALIGN   (CRB_ALIGN)
>>  #define CSB_WAIT_MAX(5000) /* ms */
>> +#define VAS_RETRIES (10)
>> +/* # of requests allowed per RxFIFO at a time. 0 for unlimited */
>> +#define MAX_CREDITS_PER_RXFIFO  (64)
>>
>>  struct nx842_workmem {
>>  /* Below fields must be properly aligned */
>> @@ -42,16 +46,27 @@ struct nx842_workmem {
>>
>>  ktime_t start;
>>
>> +struct vas_window *txwin;   /* Used with VAS function */
>>  char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
>>  } __packed __aligned(WORKMEM_ALIGN);
>>
>>  struct nx842_coproc {
>>  unsigned int chip_id;
>>  unsigned int ct;
>> -unsigned int ci;
>> +unsigned int ci;/* Coprocessor instance, used with icswx */
>> +struct {
>> +struct vas_window *rxwin;
>> +int id;
>> +} vas;
> 
> ci and vas are mutually exclusive. a few bytes could be saved by unionizing 
> them?

We will have few coproc entries - NX engine per chip. 
> 
>>  struct list_head list;
>>  };
>>
>> +/*
>> + * Send the request to NX engine on the chip for the corresponding CPU
>> + * where the process is executing. Use with VAS function.
>> + */
>> +static DEFINE_PER_CPU(struct nx842_coproc *, coproc_inst);
>> +
>>  /* no cpu hotplug on powernv, so this list never changes after init */
>>  static LIST_HEAD(nx842_coprocs);
>>  static unsigned int nx842_ct;   /* used in icswx function */
>> @@ -513,6 +528,108 @@ static int nx842_exec_icswx(const unsigned char *in, 
>> unsigned int inlen,
>>  }
>>
>>  /**
>> + * nx842_exec_vas - compress/decompress data using the 842 algorithm
>> + *
>> + * (De)compression provided by the NX842 coprocessor on IBM PowerNV systems.
>> + * This compresses or decompresses the provided input buffer into the 
>> provided
>> + * output buffer.
>> + *
>> + * Upon return from this function @outlen contains the length of the
>> + * output data.  If there is an error then @outlen will be 0 and an
>> + * error will be specified by the return code from this function.
>> + *
>> + * The @workmem buffer should only be used by one function call at a time.
>> + *
>> + * @in: input buffer pointer
>> + * @inlen: input buffer size
>> + * @out: output buffer pointer
>> + * @outlenp: output buffer size pointer
>> + * @workmem: working memory buffer pointer, size determined by
>> + *   nx842_powernv_driver.workmem_size
>> + * @fc: function code, see CCW Function Codes in nx-842.h
>> + *
>> + * Returns:
>> + *   0  Success, output of length @outlenp stored in the buffer
>> + *  at @out
>> + *   -ENODEVHardware unavailable
>> + *   -ENOSPCOutput buffer is to small
>> + *   -EMSGSIZE  Input buffer too large
>> + *   -EINVALbuffer constraints do not fix nx842_constraints
>> + 

[PATCH V3 0/6] Enable NX 842 compression engine on Power9

2017-07-21 Thread Haren Myneni

P9 introduces Virtual Accelerator Switchboard (VAS) to communicate
with NX 842 engine. icswx function is used to access NX before.
On powerNV systems, NX-842 driver invokes VAS functions for
configuring RxFIFO (receive window) per each NX engine. VAS uses
this FIFO to communicate the request to NX. The kernel opens send
window which is used to transfer compression/decompression requests
to VAS. It maps the send window to the corresponding RxFIFO.
copy/paste instructions are used to pass the CRB to VAS.

This patch series adds P9 NX support for 842 compression engine.
First 4 patches reorganize the current code so that VAS function
can be added.
- nx842_powernv_function points to VAS function if VAS feature is
  available. Otherwise icswx function is used.
- Move configure CRB code nx842_cfg_crb() 
- In addition to freeing co-processor structs for initialization 
  failures and exit, both send and receive windows have to closed
  for VAS.
- Move updating coprocessor info list to nx842_add_coprocs_list().

The last 2 patches adds configuring and invoking VAS, and also
checking P9 NX specific errors that are provided in co-processor
status block (CSB) for failures.

Patches have been tested on P9 DD1 system using VAS changes and
on P8 HW to make sure no regression.

This patchset depends on VAS kernel changes:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-May/158178.html

Thanks to Sukadev Bhattiprolu for his review, input and testing with
VAS changes. Also thanks to Michael Ellerman and Benjamin Herrenschmidt
for their valuable guidance and comments.

Changelog[V3]
- preemption disable for copy/paste as Nichalos Piggin suggested.
- PTR_ALIGN for workmem buffer based on Ram Pai's comemnt.
 
Changelog[v2]
- Open/close send windows in nx842_poernv_crypto_init/exit_vas().
- Changes for the new device-tree NX properties such as priority
  and compatible properties.
- Incorporated review comments from Michael Ellerman.
- Other minor issues found during HW testing.

Haren Myneni (6):
  crypto/nx842: Rename nx842_powernv_function as icswx function
  crypto/nx: Create nx842_configure_crb function
  crypto/nx: Create nx842_delete_coprocs function
  crypto/nx: Add nx842_add_coprocs_list function
  crypto/nx: Add P9 NX specific error codes for 842 engine
  crypto/nx: Add P9 NX support for 842 compression engine.

 arch/powerpc/include/asm/icswx.h   |   3 +
 drivers/crypto/nx/Kconfig  |   1 +
 drivers/crypto/nx/nx-842-powernv.c | 499 +
 drivers/crypto/nx/nx-842.c |   2 +-
 drivers/crypto/nx/nx-842.h |   8 +
 5 files changed, 465 insertions(+), 48 deletions(-)

-- 
2.11.0





[PATCH V3 1/6] crypto/nx842: Rename nx842_powernv_function as icswx function

2017-07-21 Thread Haren Myneni

Rename nx842_powernv_function to nx842_powernv_exec.
nx842_powernv_exec points to nx842_exec_icswx and
will be point to VAS exec function which will be added later
for P9 NX support.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842-powernv.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 3abb045cdba7..161987698bbc 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -54,7 +54,11 @@ struct nx842_coproc {
 
 /* no cpu hotplug on powernv, so this list never changes after init */
 static LIST_HEAD(nx842_coprocs);
-static unsigned int nx842_ct;
+static unsigned int nx842_ct;  /* used in icswx function */
+
+static int (*nx842_powernv_exec)(const unsigned char *in,
+   unsigned int inlen, unsigned char *out,
+   unsigned int *outlenp, void *workmem, int fc);
 
 /**
  * setup_indirect_dde - Setup an indirect DDE
@@ -355,7 +359,7 @@ static int wait_for_csb(struct nx842_workmem *wmem,
 }
 
 /**
- * nx842_powernv_function - compress/decompress data using the 842 algorithm
+ * nx842_exec_icswx - compress/decompress data using the 842 algorithm
  *
  * (De)compression provided by the NX842 coprocessor on IBM PowerNV systems.
  * This compresses or decompresses the provided input buffer into the provided
@@ -385,7 +389,7 @@ static int wait_for_csb(struct nx842_workmem *wmem,
  *   -ETIMEDOUThardware did not complete operation in reasonable time
  *   -EINTRoperation was aborted
  */
-static int nx842_powernv_function(const unsigned char *in, unsigned int inlen,
+static int nx842_exec_icswx(const unsigned char *in, unsigned int inlen,
  unsigned char *out, unsigned int *outlenp,
  void *workmem, int fc)
 {
@@ -489,13 +493,13 @@ static int nx842_powernv_function(const unsigned char 
*in, unsigned int inlen,
  * @workmem: working memory buffer pointer, size determined by
  *   nx842_powernv_driver.workmem_size
  *
- * Returns: see @nx842_powernv_function()
+ * Returns: see @nx842_powernv_exec()
  */
 static int nx842_powernv_compress(const unsigned char *in, unsigned int inlen,
  unsigned char *out, unsigned int *outlenp,
  void *wmem)
 {
-   return nx842_powernv_function(in, inlen, out, outlenp,
+   return nx842_powernv_exec(in, inlen, out, outlenp,
  wmem, CCW_FC_842_COMP_CRC);
 }
 
@@ -517,13 +521,13 @@ static int nx842_powernv_compress(const unsigned char 
*in, unsigned int inlen,
  * @workmem: working memory buffer pointer, size determined by
  *   nx842_powernv_driver.workmem_size
  *
- * Returns: see @nx842_powernv_function()
+ * Returns: see @nx842_powernv_exec()
  */
 static int nx842_powernv_decompress(const unsigned char *in, unsigned int 
inlen,
unsigned char *out, unsigned int *outlenp,
void *wmem)
 {
-   return nx842_powernv_function(in, inlen, out, outlenp,
+   return nx842_powernv_exec(in, inlen, out, outlenp,
  wmem, CCW_FC_842_DECOMP_CRC);
 }
 
@@ -625,6 +629,8 @@ static __init int nx842_powernv_init(void)
if (!nx842_ct)
return -ENODEV;
 
+   nx842_powernv_exec = nx842_exec_icswx;
+
ret = crypto_register_alg(&nx842_powernv_alg);
if (ret) {
struct nx842_coproc *coproc, *n;
-- 
2.11.0





[PATCH V3 2/6] crypto/nx: Create nx842_configure_crb function

2017-07-21 Thread Haren Myneni

Configure CRB is moved to nx842_configure_crb() so that it can
be used for icswx and VAS exec functions. VAS function will be
added later with P9 support.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842-powernv.c | 57 +-
 1 file changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 161987698bbc..1bd19e03eb7d 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -358,6 +358,40 @@ static int wait_for_csb(struct nx842_workmem *wmem,
return 0;
 }
 
+static int nx842_config_crb(const unsigned char *in, unsigned int inlen,
+   unsigned char *out, unsigned int outlen,
+   struct nx842_workmem *wmem)
+{
+   struct coprocessor_request_block *crb;
+   struct coprocessor_status_block *csb;
+   u64 csb_addr;
+   int ret;
+
+   crb = &wmem->crb;
+   csb = &crb->csb;
+
+   /* Clear any previous values */
+   memset(crb, 0, sizeof(*crb));
+
+   /* set up DDLs */
+   ret = setup_ddl(&crb->source, wmem->ddl_in,
+   (unsigned char *)in, inlen, true);
+   if (ret)
+   return ret;
+
+   ret = setup_ddl(&crb->target, wmem->ddl_out,
+   out, outlen, false);
+   if (ret)
+   return ret;
+
+   /* set up CRB's CSB addr */
+   csb_addr = nx842_get_pa(csb) & CRB_CSB_ADDRESS;
+   csb_addr |= CRB_CSB_AT; /* Addrs are phys */
+   crb->csb_addr = cpu_to_be64(csb_addr);
+
+   return 0;
+}
+
 /**
  * nx842_exec_icswx - compress/decompress data using the 842 algorithm
  *
@@ -397,7 +431,6 @@ static int nx842_exec_icswx(const unsigned char *in, 
unsigned int inlen,
struct coprocessor_status_block *csb;
struct nx842_workmem *wmem;
int ret;
-   u64 csb_addr;
u32 ccw;
unsigned int outlen = *outlenp;
 
@@ -411,33 +444,19 @@ static int nx842_exec_icswx(const unsigned char *in, 
unsigned int inlen,
return -ENODEV;
}
 
-   crb = &wmem->crb;
-   csb = &crb->csb;
-
-   /* Clear any previous values */
-   memset(crb, 0, sizeof(*crb));
-
-   /* set up DDLs */
-   ret = setup_ddl(&crb->source, wmem->ddl_in,
-   (unsigned char *)in, inlen, true);
-   if (ret)
-   return ret;
-   ret = setup_ddl(&crb->target, wmem->ddl_out,
-   out, outlen, false);
+   ret = nx842_config_crb(in, inlen, out, outlen, wmem);
if (ret)
return ret;
 
+   crb = &wmem->crb;
+   csb = &crb->csb;
+
/* set up CCW */
ccw = 0;
ccw = SET_FIELD(CCW_CT, ccw, nx842_ct);
ccw = SET_FIELD(CCW_CI_842, ccw, 0); /* use 0 for hw auto-selection */
ccw = SET_FIELD(CCW_FC_842, ccw, fc);
 
-   /* set up CRB's CSB addr */
-   csb_addr = nx842_get_pa(csb) & CRB_CSB_ADDRESS;
-   csb_addr |= CRB_CSB_AT; /* Addrs are phys */
-   crb->csb_addr = cpu_to_be64(csb_addr);
-
wmem->start = ktime_get();
 
/* do ICSWX */
-- 
2.11.0





[PATCH V3 3/6] crypto/nx: Create nx842_delete_coprocs function

2017-07-21 Thread Haren Myneni

Move deleting coprocessors info upon exit or failure to
nx842_delete_coprocs().

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842-powernv.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 1bd19e03eb7d..67dc06f9b557 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -593,6 +593,16 @@ static int __init nx842_powernv_probe(struct device_node 
*dn)
return 0;
 }
 
+static void nx842_delete_coprocs(void)
+{
+   struct nx842_coproc *coproc, *n;
+
+   list_for_each_entry_safe(coproc, n, &nx842_coprocs, list) {
+   list_del(&coproc->list);
+   kfree(coproc);
+   }
+}
+
 static struct nx842_constraints nx842_powernv_constraints = {
.alignment =DDE_BUFFER_ALIGN,
.multiple = DDE_BUFFER_LAST_MULT,
@@ -652,13 +662,7 @@ static __init int nx842_powernv_init(void)
 
ret = crypto_register_alg(&nx842_powernv_alg);
if (ret) {
-   struct nx842_coproc *coproc, *n;
-
-   list_for_each_entry_safe(coproc, n, &nx842_coprocs, list) {
-   list_del(&coproc->list);
-   kfree(coproc);
-   }
-
+   nx842_delete_coprocs();
return ret;
}
 
@@ -668,13 +672,8 @@ module_init(nx842_powernv_init);
 
 static void __exit nx842_powernv_exit(void)
 {
-   struct nx842_coproc *coproc, *n;
-
crypto_unregister_alg(&nx842_powernv_alg);
 
-   list_for_each_entry_safe(coproc, n, &nx842_coprocs, list) {
-   list_del(&coproc->list);
-   kfree(coproc);
-   }
+   nx842_delete_coprocs();
 }
 module_exit(nx842_powernv_exit);
-- 
2.11.0





[PATCH V3 4/6] crypto/nx: Add nx842_add_coprocs_list function

2017-07-21 Thread Haren Myneni

Updating coprocessor list is moved to nx842_add_coprocs_list().
This function will be used for both icswx and VAS functions.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/nx-842-powernv.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 67dc06f9b557..829b5cad0043 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -550,6 +550,14 @@ static int nx842_powernv_decompress(const unsigned char 
*in, unsigned int inlen,
  wmem, CCW_FC_842_DECOMP_CRC);
 }
 
+static inline void nx842_add_coprocs_list(struct nx842_coproc *coproc,
+   int chipid)
+{
+   coproc->chip_id = chipid;
+   INIT_LIST_HEAD(&coproc->list);
+   list_add(&coproc->list, &nx842_coprocs);
+}
+
 static int __init nx842_powernv_probe(struct device_node *dn)
 {
struct nx842_coproc *coproc;
@@ -576,11 +584,9 @@ static int __init nx842_powernv_probe(struct device_node 
*dn)
if (!coproc)
return -ENOMEM;
 
-   coproc->chip_id = chip_id;
coproc->ct = ct;
coproc->ci = ci;
-   INIT_LIST_HEAD(&coproc->list);
-   list_add(&coproc->list, &nx842_coprocs);
+   nx842_add_coprocs_list(coproc, chip_id);
 
pr_info("coprocessor found on chip %d, CT %d CI %d\n", chip_id, ct, ci);
 
-- 
2.11.0





[PATCH V3 5/6] crypto/nx: Add P9 NX specific error codes for 842 engine

2017-07-21 Thread Haren Myneni

This patch adds changes for checking P9 specific 842 engine
error codes. These errros are reported in coprocessor status
block (CSB) for failures.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/icswx.h   |  3 +++
 drivers/crypto/nx/nx-842-powernv.c | 18 ++
 drivers/crypto/nx/nx-842.h |  8 
 3 files changed, 29 insertions(+)

diff --git a/arch/powerpc/include/asm/icswx.h b/arch/powerpc/include/asm/icswx.h
index 27e588f6c72e..6a2c87577541 100644
--- a/arch/powerpc/include/asm/icswx.h
+++ b/arch/powerpc/include/asm/icswx.h
@@ -69,7 +69,10 @@ struct coprocessor_completion_block {
 #define CSB_CC_WR_PROTECTION   (16)
 #define CSB_CC_UNKNOWN_CODE(17)
 #define CSB_CC_ABORT   (18)
+#define CSB_CC_EXCEED_BYTE_COUNT   (19)/* P9 or later */
 #define CSB_CC_TRANSPORT   (20)
+#define CSB_CC_INVALID_CRB (21)/* P9 or later */
+#define CSB_CC_INVALID_DDE (30)/* P9 or later */
 #define CSB_CC_SEGMENTED_DDL   (31)
 #define CSB_CC_PROGRESS_POINT  (32)
 #define CSB_CC_DDE_OVERFLOW(33)
diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index 829b5cad0043..c0dd4c7e17d3 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -243,6 +243,13 @@ static int wait_for_csb(struct nx842_workmem *wmem,
case CSB_CC_TEMPL_OVERFLOW:
CSB_ERR(csb, "Compressed data template shows data past end");
return -EINVAL;
+   case CSB_CC_EXCEED_BYTE_COUNT:  /* P9 or later */
+   /*
+* DDE byte count exceeds the limit specified in Maximum
+* byte count register.
+*/
+   CSB_ERR(csb, "DDE byte count exceeds the limit");
+   return -EINVAL;
 
/* these should not happen */
case CSB_CC_INVALID_ALIGN:
@@ -284,9 +291,17 @@ static int wait_for_csb(struct nx842_workmem *wmem,
CSB_ERR(csb, "Too many DDEs in DDL");
return -EINVAL;
case CSB_CC_TRANSPORT:
+   case CSB_CC_INVALID_CRB:/* P9 or later */
/* shouldn't happen, we setup CRB correctly */
CSB_ERR(csb, "Invalid CRB");
return -EINVAL;
+   case CSB_CC_INVALID_DDE:/* P9 or later */
+   /*
+* shouldn't happen, setup_direct/indirect_dde creates
+* DDE right
+*/
+   CSB_ERR(csb, "Invalid DDE");
+   return -EINVAL;
case CSB_CC_SEGMENTED_DDL:
/* shouldn't happen, setup_ddl creates DDL right */
CSB_ERR(csb, "Segmented DDL error");
@@ -330,6 +345,9 @@ static int wait_for_csb(struct nx842_workmem *wmem,
case CSB_CC_HW:
CSB_ERR(csb, "Correctable hardware error");
return -EPROTO;
+   case CSB_CC_HW_EXPIRED_TIMER:   /* P9 or later */
+   CSB_ERR(csb, "Job did not finish within allowed time");
+   return -EPROTO;
 
default:
CSB_ERR(csb, "Invalid CC %d", csb->cc);
diff --git a/drivers/crypto/nx/nx-842.h b/drivers/crypto/nx/nx-842.h
index 30929bd7d1a9..bb2f31792683 100644
--- a/drivers/crypto/nx/nx-842.h
+++ b/drivers/crypto/nx/nx-842.h
@@ -76,9 +76,17 @@
 #define CSB_CC_DECRYPT_OVERFLOW(64)
 /* asym crypt codes */
 #define CSB_CC_MINV_OVERFLOW   (128)
+/*
+ * HW error - Job did not finish in the maximum time allowed.
+ * Job terminated.
+ */
+#define CSB_CC_HW_EXPIRED_TIMER(224)
 /* These are reserved for hypervisor use */
 #define CSB_CC_HYP_RESERVE_START   (240)
 #define CSB_CC_HYP_RESERVE_END (253)
+#define CSB_CC_HYP_RESERVE_P9_END  (251)
+/* No valid interrupt server (P9 or later). */
+#define CSB_CC_HYP_RESERVE_NO_INTR_SERVER  (252)
 #define CSB_CC_HYP_NO_HW   (254)
 #define CSB_CC_HYP_HANG_ABORTED(255)
 
-- 
2.11.0





[PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-07-21 Thread Haren Myneni

This patch adds P9 NX support for 842 compression engine. Virtual
Accelerator Switchboard (VAS) is used to access 842 engine on P9.

For each NX engine per chip, setup receive window using
vas_rx_win_open() which configures RxFIFo with FIFO address, lpid,
pid and tid values. This unique (lpid, pid, tid) combination will
be used to identify the target engine.

For crypto open request, open send window on the NX engine for
the corresponding chip / cpu where the open request is executed.
This send window will be closed upon crypto close request.

NX provides high and normal priority FIFOs. For compression /
decompression requests, we use only hight priority FIFOs in kernel.

Each NX request will be communicated to VAS using copy/paste
instructions with vas_copy_crb() / vas_paste_crb() functions.

Signed-off-by: Haren Myneni 
---
 drivers/crypto/nx/Kconfig  |   1 +
 drivers/crypto/nx/nx-842-powernv.c | 375 -
 drivers/crypto/nx/nx-842.c |   2 +-
 3 files changed, 371 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
index ad7552a6998c..cd5dda9c48f4 100644
--- a/drivers/crypto/nx/Kconfig
+++ b/drivers/crypto/nx/Kconfig
@@ -38,6 +38,7 @@ config CRYPTO_DEV_NX_COMPRESS_PSERIES
 config CRYPTO_DEV_NX_COMPRESS_POWERNV
tristate "Compression acceleration support on PowerNV platform"
depends on PPC_POWERNV
+   depends on PPC_VAS
default y
help
  Support for PowerPC Nest (NX) compression acceleration. This
diff --git a/drivers/crypto/nx/nx-842-powernv.c 
b/drivers/crypto/nx/nx-842-powernv.c
index c0dd4c7e17d3..13089a0b9dfa 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Dan Streetman ");
@@ -32,6 +33,9 @@ MODULE_ALIAS_CRYPTO("842-nx");
 
 #define WORKMEM_ALIGN  (CRB_ALIGN)
 #define CSB_WAIT_MAX   (5000) /* ms */
+#define VAS_RETRIES(10)
+/* # of requests allowed per RxFIFO at a time. 0 for unlimited */
+#define MAX_CREDITS_PER_RXFIFO (1024)
 
 struct nx842_workmem {
/* Below fields must be properly aligned */
@@ -42,16 +46,27 @@ struct nx842_workmem {
 
ktime_t start;
 
+   struct vas_window *txwin;   /* Used with VAS function */
char padding[WORKMEM_ALIGN]; /* unused, to allow alignment */
 } __packed __aligned(WORKMEM_ALIGN);
 
 struct nx842_coproc {
unsigned int chip_id;
unsigned int ct;
-   unsigned int ci;
+   unsigned int ci;/* Coprocessor instance, used with icswx */
+   struct {
+   struct vas_window *rxwin;
+   int id;
+   } vas;
struct list_head list;
 };
 
+/*
+ * Send the request to NX engine on the chip for the corresponding CPU
+ * where the process is executing. Use with VAS function.
+ */
+static DEFINE_PER_CPU(struct nx842_coproc *, coproc_inst);
+
 /* no cpu hotplug on powernv, so this list never changes after init */
 static LIST_HEAD(nx842_coprocs);
 static unsigned int nx842_ct;  /* used in icswx function */
@@ -513,6 +528,105 @@ static int nx842_exec_icswx(const unsigned char *in, 
unsigned int inlen,
 }
 
 /**
+ * nx842_exec_vas - compress/decompress data using the 842 algorithm
+ *
+ * (De)compression provided by the NX842 coprocessor on IBM PowerNV systems.
+ * This compresses or decompresses the provided input buffer into the provided
+ * output buffer.
+ *
+ * Upon return from this function @outlen contains the length of the
+ * output data.  If there is an error then @outlen will be 0 and an
+ * error will be specified by the return code from this function.
+ *
+ * The @workmem buffer should only be used by one function call at a time.
+ *
+ * @in: input buffer pointer
+ * @inlen: input buffer size
+ * @out: output buffer pointer
+ * @outlenp: output buffer size pointer
+ * @workmem: working memory buffer pointer, size determined by
+ *   nx842_powernv_driver.workmem_size
+ * @fc: function code, see CCW Function Codes in nx-842.h
+ *
+ * Returns:
+ *   0 Success, output of length @outlenp stored in the buffer
+ * at @out
+ *   -ENODEV   Hardware unavailable
+ *   -ENOSPC   Output buffer is to small
+ *   -EMSGSIZE Input buffer too large
+ *   -EINVAL   buffer constraints do not fix nx842_constraints
+ *   -EPROTO   hardware error during operation
+ *   -ETIMEDOUThardware did not complete operation in reasonable time
+ *   -EINTRoperation was aborted
+ */
+static int nx842_exec_vas(const unsigned char *in, unsigned int inlen,
+ unsigned char *out, unsigned int *outlenp,
+ void *workmem, int fc)
+{
+   struct coprocessor_request_block *crb;
+   struct coprocessor_status_block *csb;
+   struct nx842_workmem *wmem;
+   struct vas_window *txwin;
+   int ret, i = 0;
+   u32 ccw;
+