Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode

2019-02-05 Thread Cédric Le Goater
On 2/6/19 2:18 AM, David Gibson wrote:
> On Wed, Feb 06, 2019 at 09:13:15AM +1100, Paul Mackerras wrote:
>> On Tue, Feb 05, 2019 at 12:31:28PM +0100, Cédric Le Goater wrote:
>> As for nesting, I suggest for the foreseeable future we stick to XICS
>> emulation in nested guests.
>
> ok. so no kernel_irqchip at all. hmm. 
>>>
>>> I was confused with what Paul calls 'XICS emulation'. It's not the QEMU
>>> XICS emulated device but the XICS-over-XIVE KVM device, the KVM XICS 
>>> device KVM uses when under a P9 processor. 
>>
>> Actually there are two separate implementations of XICS emulation in
>> KVM.  The first (older) one is almost entirely a software emulation
>> but does have some cases where it accesses an underlying XICS device
>> in order to make some things faster (IPIs and pass-through of a device
>> interrupt to a guest).  The other, newer one is the XICS-on-XIVE
>> emulation that Ben wrote, which uses the XIVE hardware pretty heavily.
>> My patch was about making the the older code work when there is no
>> XICS available to the host.
> 
> Ah, right.  To clarify my earlier statements in light of this:
> 
>  * We definitely want some sort of kernel-XICS available in a nested
>guest.  AIUI, this is now accomplished, so, Yay!
> 
>  * Implementing the L2 XICS in terms of L1's PAPR-XIVE would be a
>bonus, but it's a much lower priority.

Yes. In this case, the L1 KVM-HV should not advertise KVM_CAP_PPC_IRQ_XIVE
to QEMU which will restrict CAS to the XICS only interrupt mode.

C.




Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device

2019-02-05 Thread Cédric Le Goater
On 2/6/19 2:23 AM, David Gibson wrote:
> On Tue, Feb 05, 2019 at 01:55:40PM +0100, Cédric Le Goater wrote:
>> On 2/5/19 6:28 AM, David Gibson wrote:
>>> On Mon, Feb 04, 2019 at 12:30:39PM +0100, Cédric Le Goater wrote:
 On 2/4/19 5:45 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote:
>> This will let the guest create a memory mapping to expose the ESB MMIO
>> regions used to control the interrupt sources, to trigger events, to
>> EOI or to turn off the sources.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
>>  arch/powerpc/kvm/book3s_xive_native.c | 97 +++
>>  2 files changed, 101 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index 8c876c166ef2..6bb61ba141c2 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
>>  #define  KVM_XICS_PRESENTED (1ULL << 43)
>>  #define  KVM_XICS_QUEUED(1ULL << 44)
>>  
>> +/* POWER9 XIVE Native Interrupt Controller */
>> +#define KVM_DEV_XIVE_GRP_CTRL   1
>> +#define   KVM_DEV_XIVE_GET_ESB_FD   1
>
> Introducing a new FD for ESB and TIMA seems overkill.  Can't you get
> to both with an mmap() directly on the xive device fd?  Using the
> offset to distinguish which one to map, obviously.

 The page offset would define some sort of user API. It seems feasible.
 But I am not sure this would be practical in the future if we need to 
 tune the length.
>>>
>>> Um.. why not?  I mean, yes the XIVE supports rather a lot of
>>> interrupts, but we have 64-bits of offset we can play with - we can
>>> leave room for billions of ESB slots and still have room for billions
>>> of VPs.
>>
>> So the first 4 pages could be the TIMA pages and then would come  
>> the pages for the interrupt ESBs. I think that we can have different 
>> vm_fault handler for each mapping.
> 
> Um.. no, I'm saying you don't need to tightly pack them.  So you could
> have the ESB pages at 0, the TIMA at, say offset 2^60.

Well, we know that the TIMA is 4 pages wide and is "directly" related
with the KVM interrupt device. So being at offset 0 seems a good idea.
While the ESB segment is of a variable size depending on the number
of IRQs and it can come after I think.

>> I wonder how this will work out with pass-through. As Paul said in 
>> a previous email, it would be better to let QEMU request a new 
>> mapping to handle the ESB pages of the device being passed through.
>> I guess this is not a special case, just another offset and length.
> 
> Right, if we need multiple "chunks" of ESB pages we can given them
> each their own terabyte or several.  No need to be stingy with address
> space.

You can not put them anywhere. They should map the same interrupt range
of ESB pages, overlapping with the underlying segment of IPI ESB pages. 

C.


Re: [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration

2019-02-05 Thread Cédric Le Goater
On 2/6/19 2:24 AM, David Gibson wrote:
> On Wed, Feb 06, 2019 at 12:23:29PM +1100, David Gibson wrote:
>> On Tue, Feb 05, 2019 at 02:03:11PM +0100, Cédric Le Goater wrote:
>>> On 2/5/19 6:32 AM, David Gibson wrote:
 On Mon, Feb 04, 2019 at 05:07:28PM +0100, Cédric Le Goater wrote:
> On 2/4/19 6:21 AM, David Gibson wrote:
>> On Mon, Jan 07, 2019 at 07:43:27PM +0100, Cédric Le Goater wrote:
>>> Theses are use to capure the XIVE EAS table of the KVM device, the
>>> configuration of the source targets.
>>>
>>> Signed-off-by: Cédric Le Goater 
>>> ---
>>>  arch/powerpc/include/uapi/asm/kvm.h   | 11 
>>>  arch/powerpc/kvm/book3s_xive_native.c | 87 +++
>>>  2 files changed, 98 insertions(+)
>>>
>>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>>> b/arch/powerpc/include/uapi/asm/kvm.h
>>> index 1a8740629acf..faf024f39858 100644
>>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>>> @@ -683,9 +683,20 @@ struct kvm_ppc_cpu_char {
>>>  #define   KVM_DEV_XIVE_SAVE_EQ_PAGES   4
>>>  #define KVM_DEV_XIVE_GRP_SOURCES   2   /* 64-bit source 
>>> attributes */
>>>  #define KVM_DEV_XIVE_GRP_SYNC  3   /* 64-bit source 
>>> attributes */
>>> +#define KVM_DEV_XIVE_GRP_EAS   4   /* 64-bit eas 
>>> attributes */
>>>  
>>>  /* Layout of 64-bit XIVE source attribute values */
>>>  #define KVM_XIVE_LEVEL_SENSITIVE   (1ULL << 0)
>>>  #define KVM_XIVE_LEVEL_ASSERTED(1ULL << 1)
>>>  
>>> +/* Layout of 64-bit eas attribute values */
>>> +#define KVM_XIVE_EAS_PRIORITY_SHIFT0
>>> +#define KVM_XIVE_EAS_PRIORITY_MASK 0x7
>>> +#define KVM_XIVE_EAS_SERVER_SHIFT  3
>>> +#define KVM_XIVE_EAS_SERVER_MASK   0xfff8ULL
>>> +#define KVM_XIVE_EAS_MASK_SHIFT32
>>> +#define KVM_XIVE_EAS_MASK_MASK 0x1ULL
>>> +#define KVM_XIVE_EAS_EISN_SHIFT33
>>> +#define KVM_XIVE_EAS_EISN_MASK 0xfffeULL
>>> +
>>>  #endif /* __LINUX_KVM_POWERPC_H */
>>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>>> b/arch/powerpc/kvm/book3s_xive_native.c
>>> index f2de1bcf3b35..0468b605baa7 100644
>>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>>> @@ -525,6 +525,88 @@ static int kvmppc_xive_native_sync(struct 
>>> kvmppc_xive *xive, long irq, u64 addr)
>>> return 0;
>>>  }
>>>  
>>> +static int kvmppc_xive_native_set_eas(struct kvmppc_xive *xive, long 
>>> irq,
>>> + u64 addr)
>>
>> I'd prefer to avoid the name "EAS" here.  IIUC these aren't "raw" EAS
>> values, but rather essentially the "source config" in the terminology
>> of the PAPR hcalls.  Which, yes, is basically implemented by setting
>> the EAS, but since it's the PAPR architected state that we need to
>> preserve across migration, I'd prefer to stick as close as we can to
>> the PAPR terminology.
>
> But we don't have an equivalent name in the PAPR specs for the tuple 
> (prio, server). We could use the generic 'target' name may be ? even 
> if this is usually referring to a CPU number.

 Um.. what?  That's about terminology for one of the fields in this
 thing, not about the name for the thing itself.

> Or, IVE (Interrupt Vector Entry) ? which makes some sense. 
> This is was the former name in HW. I think we recycle it for KVM.

 That's a terrible idea, which will make a confusing situation even
 more confusing.
>>>
>>> Let's use SOURCE_CONFIG and QUEUE_CONFIG. The KVM ioctls are very 
>>> similar to the hcalls anyhow.
>>
>> Yes, I think that's a good idea.
> 
> Actually... AIUI the SET_CONFIG hcalls shouldn't be a fast path.  

No indeed. I have move them to standard hcalls in the current version.

> Can
> we simplify things further by removing the hcall implementation from
> the kernel entirely, and have qemu implement them by basically just
> forwarding them to the appropriate SET_CONFIG ioctl()?

Yes. I think we could. 

The hcalls H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG and 
the KVM ioctls to set the EQ and the SOURCE configuration have a 
lot in common. I need to look at how we can plug the KVM ioctl in 
the hcalls under QEMU.

We will have to convert the returned error to respect the PAPR 
specs or have the ioctls return H_* errors.


Let's dig that idea. If we choose that path, QEMU will have an 
up-to-date EAT and so we won't need to synchronize its state anymore 
for migration.
 
H_INT_GET_SOURCE_CONFIG can be implemented in QEMU without any KVM 
ioctl.

H_INT_GET_QUEUE_INFO could be implemented in QEMU. I need to check 
how we return the address of the END ESB in sPAPR. We haven't paid 
much attenti

[PATCH] powerpc/powernv/idle: Restore IAMR after idle

2019-02-05 Thread Russell Currey
Without restoring the IAMR after idle, execution prevention on POWER9
with Radix MMU is overwritten and the kernel can freely execute userspace 
without
faulting.

This is necessary when returning from any stop state that modifies user
state, as well as hypervisor state.

To test how this fails without this patch, load the lkdtm driver and
do the following:

   echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT

which won't fault, then boot the kernel with powersave=off, where it
will fault.  Applying this patch will fix this.

Fixes: 3b10d0095a1e ("powerpc/mm/radix: Prevent kernel execution of user
space")
Cc: 
Signed-off-by: Russell Currey 
---
 arch/powerpc/include/asm/cpuidle.h |  1 +
 arch/powerpc/kernel/asm-offsets.c  |  1 +
 arch/powerpc/kernel/idle_book3s.S  | 20 
 3 files changed, 22 insertions(+)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index 43e5f31fe64d..ad67dbe59498 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -77,6 +77,7 @@ struct stop_sprs {
u64 mmcr1;
u64 mmcr2;
u64 mmcra;
+   u64 iamr;
 };
 
 #define PNV_IDLE_NAME_LEN16
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9ffc72ded73a..10e0314c2b0d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -774,6 +774,7 @@ int main(void)
STOP_SPR(STOP_MMCR1, mmcr1);
STOP_SPR(STOP_MMCR2, mmcr2);
STOP_SPR(STOP_MMCRA, mmcra);
+   STOP_SPR(STOP_IAMR, iamr);
 #endif
 
DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 7f5ac2e8581b..bb4f552f6c7e 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -200,6 +200,12 @@ pnv_powersave_common:
/* Continue saving state */
SAVE_GPR(2, r1)
SAVE_NVGPRS(r1)
+
+BEGIN_FTR_SECTION
+   mfspr   r5, SPRN_IAMR
+   std r5, STOP_IAMR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
mfcrr5
std r5,_CCR(r1)
std r1,PACAR1(r13)
@@ -924,6 +930,13 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
REST_NVGPRS(r1)
REST_GPR(2, r1)
+
+BEGIN_FTR_SECTION
+   ld  r4, STOP_IAMR(r13)
+   mtspr   SPRN_IAMR, r4
+   isync
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
ld  r4,PACAKMSR(r13)
ld  r5,_LINK(r1)
ld  r6,_CCR(r1)
@@ -946,6 +959,13 @@ pnv_wakeup_noloss:
 BEGIN_FTR_SECTION
CHECK_HMI_INTERRUPT
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
+
+BEGIN_FTR_SECTION
+   ld  r4, STOP_IAMR(r13)
+   mtspr   SPRN_IAMR, r4
+   isync
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+
ld  r4,PACAKMSR(r13)
ld  r5,_NIP(r1)
ld  r6,_CCR(r1)
-- 
2.20.1



[PATCH v2] powerpc/perf: Use PVR rather than oprofile field to determine CPU version

2019-02-05 Thread Rashmica Gupta
Currently the perf CPU backend drivers detect what CPU they're on using
cur_cpu_spec->oprofile_cpu_type.

Although that works, it's a bit crufty to be using oprofile related fields,
especially seeing as oprofile is more or less unused these days.

It also means perf is reliant on the fragile logic in setup_cpu_spec()
which detects when we're using a logical PVR and copies back the PMU
related fields from the raw CPU entry. So lets check the PVR directly.

Suggested-by: Michael Ellerman 
Signed-off-by: Rashmica Gupta 
---
v2: fixed misspelling of PVR_VER_E500V2

 arch/powerpc/perf/e500-pmu.c| 10 ++
 arch/powerpc/perf/e6500-pmu.c   |  5 +++--
 arch/powerpc/perf/hv-24x7.c |  6 +++---
 arch/powerpc/perf/mpc7450-pmu.c |  5 +++--
 arch/powerpc/perf/power5+-pmu.c |  6 +++---
 arch/powerpc/perf/power5-pmu.c  |  5 +++--
 arch/powerpc/perf/power6-pmu.c  |  5 +++--
 arch/powerpc/perf/power7-pmu.c  |  7 ---
 arch/powerpc/perf/power8-pmu.c  |  5 +++--
 arch/powerpc/perf/power9-pmu.c  |  4 +---
 arch/powerpc/perf/ppc970-pmu.c  |  8 +---
 11 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/perf/e500-pmu.c b/arch/powerpc/perf/e500-pmu.c
index fb664929f5da..e1a185a30928 100644
--- a/arch/powerpc/perf/e500-pmu.c
+++ b/arch/powerpc/perf/e500-pmu.c
@@ -122,12 +122,14 @@ static struct fsl_emb_pmu e500_pmu = {
 
 static int init_e500_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile_cpu_type)
-   return -ENODEV;
+   unsigned int pvr = mfspr(SPRN_PVR);
 
-   if (!strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/e500mc"))
+   /* ec500mc */
+   if ((PVR_VER(pvr) == PVR_VER_E500MC) || (PVR_VER(pvr) == PVR_VER_E5500))
num_events = 256;
-   else if (strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/e500"))
+   /* e500 */
+   else if ((PVR_VER(pvr) != PVR_VER_E500V1) &&
+   (PVR_VER(pvr) != PVR_VER_E500V2))
return -ENODEV;
 
return register_fsl_emb_pmu(&e500_pmu);
diff --git a/arch/powerpc/perf/e6500-pmu.c b/arch/powerpc/perf/e6500-pmu.c
index 3d877aa777b5..47c93d13da1a 100644
--- a/arch/powerpc/perf/e6500-pmu.c
+++ b/arch/powerpc/perf/e6500-pmu.c
@@ -111,8 +111,9 @@ static struct fsl_emb_pmu e6500_pmu = {
 
 static int init_e6500_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile_cpu_type ||
-   strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/e6500"))
+   unsigned int pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) != PVR_VER_E6500)
return -ENODEV;
 
return register_fsl_emb_pmu(&e6500_pmu);
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 72238eedc360..30dd379ddcd3 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1583,16 +1583,16 @@ static int hv_24x7_init(void)
 {
int r;
unsigned long hret;
+   unsigned int pvr = mfspr(SPRN_PVR);
struct hv_perf_caps caps;
 
if (!firmware_has_feature(FW_FEATURE_LPAR)) {
pr_debug("not a virtualized system, not enabling\n");
return -ENODEV;
-   } else if (!cur_cpu_spec->oprofile_cpu_type)
-   return -ENODEV;
+   }
 
/* POWER8 only supports v1, while POWER9 only supports v2. */
-   if (!strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8"))
+   if (PVR_VER(pvr) == PVR_POWER8)
interface_version = 1;
else {
interface_version = 2;
diff --git a/arch/powerpc/perf/mpc7450-pmu.c b/arch/powerpc/perf/mpc7450-pmu.c
index d115c5635bf3..17e69cabbcac 100644
--- a/arch/powerpc/perf/mpc7450-pmu.c
+++ b/arch/powerpc/perf/mpc7450-pmu.c
@@ -413,8 +413,9 @@ struct power_pmu mpc7450_pmu = {
 
 static int __init init_mpc7450_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile_cpu_type ||
-   strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/7450"))
+   unsigned int pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) != PVR_7450)
return -ENODEV;
 
return register_power_pmu(&mpc7450_pmu);
diff --git a/arch/powerpc/perf/power5+-pmu.c b/arch/powerpc/perf/power5+-pmu.c
index 0526dac66007..17a32e7ef234 100644
--- a/arch/powerpc/perf/power5+-pmu.c
+++ b/arch/powerpc/perf/power5+-pmu.c
@@ -679,9 +679,9 @@ static struct power_pmu power5p_pmu = {
 
 static int __init init_power5p_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile_cpu_type ||
-   (strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power5+")
-&& strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power5++")))
+   unsigned int pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) != PVR_POWER5p)
return -ENODEV;
 
return register_power_pmu(&power5p_pmu);
diff --git a/arch/powerpc/perf/power5-pmu.c b/arch/powerpc/perf/power5-pmu.c
index 4dc99f9f7962..844782e6d367 100644
--- a/arch/powerpc/perf/power5-pmu.c
+++ b/arch/powerpc/perf/power5-pmu.c
@@ -620,8 +620,9 @@ static struct power_pmu power5_pmu = {
 
 static int __init init_power5_pmu(vo

[PATCH] powerpc/perf: Use PVR rather than oprofile field to determine CPU version

2019-02-05 Thread Rashmica Gupta
Currently the perf CPU backend drivers detect what CPU they're on using
cur_cpu_spec->oprofile_cpu_type.

Although that works, it's a bit crufty to be using oprofile related fields,
especially seeing as oprofile is more or less unused these days.

It also means perf is reliant on the fragile logic in setup_cpu_spec()
which detects when we're using a logical PVR and copies back the PMU
related fields from the raw CPU entry. So lets check the PVR directly.

Suggested-by: Michael Ellerman 
Signed-off-by: Rashmica Gupta 
---
 arch/powerpc/perf/e500-pmu.c| 10 ++
 arch/powerpc/perf/e6500-pmu.c   |  5 +++--
 arch/powerpc/perf/hv-24x7.c |  6 +++---
 arch/powerpc/perf/mpc7450-pmu.c |  5 +++--
 arch/powerpc/perf/power5+-pmu.c |  6 +++---
 arch/powerpc/perf/power5-pmu.c  |  5 +++--
 arch/powerpc/perf/power6-pmu.c  |  5 +++--
 arch/powerpc/perf/power7-pmu.c  |  7 ---
 arch/powerpc/perf/power8-pmu.c  |  5 +++--
 arch/powerpc/perf/power9-pmu.c  |  4 +---
 arch/powerpc/perf/ppc970-pmu.c  |  8 +---
 11 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/perf/e500-pmu.c b/arch/powerpc/perf/e500-pmu.c
index fb664929f5da..f3a4179f46c6 100644
--- a/arch/powerpc/perf/e500-pmu.c
+++ b/arch/powerpc/perf/e500-pmu.c
@@ -122,12 +122,14 @@ static struct fsl_emb_pmu e500_pmu = {
 
 static int init_e500_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile_cpu_type)
-   return -ENODEV;
+   unsigned int pvr = mfspr(SPRN_PVR);
 
-   if (!strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/e500mc"))
+   /* ec500mc */
+   if ((PVR_VER(pvr) == PVR_VER_E500MC) || (PVR_VER(pvr) == PVR_VER_E5500))
num_events = 256;
-   else if (strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/e500"))
+   /* e500 */
+   else if ((PVR_VER(pvr) != PVR_VER_E500V1) &&
+   (PVR_VER(pvr) != PVR_VER_E50V2))
return -ENODEV;
 
return register_fsl_emb_pmu(&e500_pmu);
diff --git a/arch/powerpc/perf/e6500-pmu.c b/arch/powerpc/perf/e6500-pmu.c
index 3d877aa777b5..47c93d13da1a 100644
--- a/arch/powerpc/perf/e6500-pmu.c
+++ b/arch/powerpc/perf/e6500-pmu.c
@@ -111,8 +111,9 @@ static struct fsl_emb_pmu e6500_pmu = {
 
 static int init_e6500_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile_cpu_type ||
-   strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/e6500"))
+   unsigned int pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) != PVR_VER_E6500)
return -ENODEV;
 
return register_fsl_emb_pmu(&e6500_pmu);
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 72238eedc360..30dd379ddcd3 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1583,16 +1583,16 @@ static int hv_24x7_init(void)
 {
int r;
unsigned long hret;
+   unsigned int pvr = mfspr(SPRN_PVR);
struct hv_perf_caps caps;
 
if (!firmware_has_feature(FW_FEATURE_LPAR)) {
pr_debug("not a virtualized system, not enabling\n");
return -ENODEV;
-   } else if (!cur_cpu_spec->oprofile_cpu_type)
-   return -ENODEV;
+   }
 
/* POWER8 only supports v1, while POWER9 only supports v2. */
-   if (!strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8"))
+   if (PVR_VER(pvr) == PVR_POWER8)
interface_version = 1;
else {
interface_version = 2;
diff --git a/arch/powerpc/perf/mpc7450-pmu.c b/arch/powerpc/perf/mpc7450-pmu.c
index d115c5635bf3..17e69cabbcac 100644
--- a/arch/powerpc/perf/mpc7450-pmu.c
+++ b/arch/powerpc/perf/mpc7450-pmu.c
@@ -413,8 +413,9 @@ struct power_pmu mpc7450_pmu = {
 
 static int __init init_mpc7450_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile_cpu_type ||
-   strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc/7450"))
+   unsigned int pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) != PVR_7450)
return -ENODEV;
 
return register_power_pmu(&mpc7450_pmu);
diff --git a/arch/powerpc/perf/power5+-pmu.c b/arch/powerpc/perf/power5+-pmu.c
index 0526dac66007..17a32e7ef234 100644
--- a/arch/powerpc/perf/power5+-pmu.c
+++ b/arch/powerpc/perf/power5+-pmu.c
@@ -679,9 +679,9 @@ static struct power_pmu power5p_pmu = {
 
 static int __init init_power5p_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile_cpu_type ||
-   (strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power5+")
-&& strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power5++")))
+   unsigned int pvr = mfspr(SPRN_PVR);
+
+   if (PVR_VER(pvr) != PVR_POWER5p)
return -ENODEV;
 
return register_power_pmu(&power5p_pmu);
diff --git a/arch/powerpc/perf/power5-pmu.c b/arch/powerpc/perf/power5-pmu.c
index 4dc99f9f7962..844782e6d367 100644
--- a/arch/powerpc/perf/power5-pmu.c
+++ b/arch/powerpc/perf/power5-pmu.c
@@ -620,8 +620,9 @@ static struct power_pmu power5_pmu = {
 
 static int __init init_power5_pmu(void)
 {
-   if (!cur_cpu_spec->oprofile

linux-next: manual merge of the akpm-current tree with the powerpc-fixes tree

2019-02-05 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/powerpc/mm/pgtable-book3s64.c

between commit:

  579b9239c1f3 ("powerpc/radix: Fix kernel crash with mremap()")

from the powerpc-fixes tree and commit:

  41bde21e85a7 ("arch/powerpc/mm: Nest MMU workaround for mprotect RW upgrade")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/mm/pgtable-book3s64.c
index ecd31569a120,9f154efed1ae..c11c60056669
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@@ -401,24 -398,27 +398,49 @@@ void arch_report_meminfo(struct seq_fil
  }
  #endif /* CONFIG_PROC_FS */
  
 +/*
 + * For hash translation mode, we use the deposited table to store hash slot
 + * information and they are stored at PTRS_PER_PMD offset from related pmd
 + * location. Hence a pmd move requires deposit and withdraw.
 + *
 + * For radix translation with split pmd ptl, we store the deposited table in 
the
 + * pmd page. Hence if we have different pmd page we need to withdraw during 
pmd
 + * move.
 + *
 + * With hash we use deposited table always irrespective of anon or not.
 + * With radix we use deposited table only for anonymous mapping.
 + */
 +int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
 + struct spinlock *old_pmd_ptl,
 + struct vm_area_struct *vma)
 +{
 +  if (radix_enabled())
 +  return (new_pmd_ptl != old_pmd_ptl) && vma_is_anonymous(vma);
 +
 +  return true;
 +}
++
+ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
+pte_t *ptep)
+ {
+   unsigned long pte_val;
+ 
+   /*
+* Clear the _PAGE_PRESENT so that no hardware parallel update is
+* possible. Also keep the pte_present true so that we don't take
+* wrong fault.
+*/
+   pte_val = pte_update(vma->vm_mm, addr, ptep, _PAGE_PRESENT, 
_PAGE_INVALID, 0);
+ 
+   return __pte(pte_val);
+ 
+ }
+ 
+ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
+pte_t *ptep, pte_t old_pte, pte_t pte)
+ {
+   if (radix_enabled())
+   return radix__ptep_modify_prot_commit(vma, addr,
+ ptep, old_pte, pte);
+   set_pte_at(vma->vm_mm, addr, ptep, pte);
+ }


pgpJMKIFacxeb.pgp
Description: OpenPGP digital signature


Re: [PATCH 1/4] powerpc/64s: Clear on-stack exception marker upon exception return

2019-02-05 Thread Michael Ellerman
Balbir Singh  writes:
> On Tue, Feb 5, 2019 at 10:24 PM Michael Ellerman  wrote:
>> Balbir Singh  writes:
>> > On Sat, Feb 2, 2019 at 12:14 PM Balbir Singh  wrote:
>> >> On Tue, Jan 22, 2019 at 10:57:21AM -0500, Joe Lawrence wrote:
>> >> > From: Nicolai Stange 
>> >> >
>> >> > The ppc64 specific implementation of the reliable stacktracer,
>> >> > save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
>> >> > trace" whenever it finds an exception frame on the stack. Stack frames
>> >> > are classified as exception frames if the STACK_FRAME_REGS_MARKER magic,
>> >> > as written by exception prologues, is found at a particular location.
>> >> >
>> >> > However, as observed by Joe Lawrence, it is possible in practice that
>> >> > non-exception stack frames can alias with prior exception frames and 
>> >> > thus,
>> >> > that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER 
>> >> > on
>> >> > the stack. It in turn falsely reports an unreliable stacktrace and 
>> >> > blocks
>> >> > any live patching transition to finish. Said condition lasts until the
>> >> > stack frame is overwritten/initialized by function call or other means.
>> >> >
>> >> > In principle, we could mitigate this by making the exception frame
>> >> > classification condition in save_stack_trace_tsk_reliable() stronger:
>> >> > in addition to testing for STACK_FRAME_REGS_MARKER, we could also take 
>> >> > into
>> >> > account that for all exceptions executing on the kernel stack
>> >> > - their stack frames's backlink pointers always match what is saved
>> >> >   in their pt_regs instance's ->gpr[1] slot and that
>> >> > - their exception frame size equals STACK_INT_FRAME_SIZE, a value
>> >> >   uncommonly large for non-exception frames.
>> >> >
>> >> > However, while these are currently true, relying on them would make the
>> >> > reliable stacktrace implementation more sensitive towards future 
>> >> > changes in
>> >> > the exception entry code. Note that false negatives, i.e. not detecting
>> >> > exception frames, would silently break the live patching consistency 
>> >> > model.
>> >> >
>> >> > Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
>> >> > rely on STACK_FRAME_REGS_MARKER as well.
>> >> >
>> >> > Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER
>> >> > for those exceptions running on the "normal" kernel stack and returning
>> >> > to kernelspace: because the topmost frame is ignored by the reliable 
>> >> > stack
>> >> > tracer anyway, returns to userspace don't need to take care of clearing
>> >> > the marker.
>> >> >
>> >> > Furthermore, as I don't have the ability to test this on Book 3E or
>> >> > 32 bits, limit the change to Book 3S and 64 bits.
>> >> >
>> >> > Finally, make the HAVE_RELIABLE_STACKTRACE Kconfig option depend on
>> >> > PPC_BOOK3S_64 for documentation purposes. Before this patch, it depended
>> >> > on PPC64 && CPU_LITTLE_ENDIAN and because CPU_LITTLE_ENDIAN implies
>> >> > PPC_BOOK3S_64, there's no functional change here.
>> >> >
>> >> > Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack 
>> >> > tracing for the consistency model")
>> >> > Reported-by: Joe Lawrence 
>> >> > Signed-off-by: Nicolai Stange 
>> >> > Signed-off-by: Joe Lawrence 
>> >> > ---
>> >> >  arch/powerpc/Kconfig   | 2 +-
>> >> >  arch/powerpc/kernel/entry_64.S | 7 +++
>> >> >  2 files changed, 8 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> >> > index 2890d36eb531..73bf87b1d274 100644
>> >> > --- a/arch/powerpc/Kconfig
>> >> > +++ b/arch/powerpc/Kconfig
>> >> > @@ -220,7 +220,7 @@ config PPC
>> >> >   select HAVE_PERF_USER_STACK_DUMP
>> >> >   select HAVE_RCU_TABLE_FREE  if SMP
>> >> >   select HAVE_REGS_AND_STACK_ACCESS_API
>> >> > - select HAVE_RELIABLE_STACKTRACE if PPC64 && 
>> >> > CPU_LITTLE_ENDIAN
>> >> > + select HAVE_RELIABLE_STACKTRACE if PPC_BOOK3S_64 && 
>> >> > CPU_LITTLE_ENDIAN
>> >> >   select HAVE_SYSCALL_TRACEPOINTS
>> >> >   select HAVE_VIRT_CPU_ACCOUNTING
>> >> >   select HAVE_IRQ_TIME_ACCOUNTING
>> >> > diff --git a/arch/powerpc/kernel/entry_64.S 
>> >> > b/arch/powerpc/kernel/entry_64.S
>> >> > index 435927f549c4..a2c168b395d2 100644
>> >> > --- a/arch/powerpc/kernel/entry_64.S
>> >> > +++ b/arch/powerpc/kernel/entry_64.S
>> >> > @@ -1002,6 +1002,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>> >> >   ld  r2,_NIP(r1)
>> >> >   mtspr   SPRN_SRR0,r2
>> >> >
>> >> > + /*
>> >> > +  * Leaving a stale exception_marker on the stack can confuse
>> >> > +  * the reliable stack unwinder later on. Clear it.
>> >> > +  */
>> >> > + li  r2,0
>> >> > + std r2,STACK_FRAME_OVERHEAD-16(r1)
>> >> > +
>> >>
>> >> Could you please double check, r4 is already 0 at this point
>> >> IIUC. So the change might be a simple
>> >>
>> >> std r4,STACK_FRAME_OVERHEAD-16(r1)
>> >>
>>

Re: [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook

2019-02-05 Thread Balbir Singh
On Tue, Feb 5, 2019 at 2:52 PM Alistair Popple  wrote:
>
> On Thursday, 31 January 2019 12:11:06 PM AEDT Andrea Arcangeli wrote:
> > On Thu, Jan 31, 2019 at 06:30:22PM +0800, Peter Xu wrote:
> > > The change_pte() notifier was designed to use as a quick path to
> > > update secondary MMU PTEs on write permission changes or PFN changes.
> > > For KVM, it could reduce the vm-exits when vcpu faults on the pages
> > > that was touched up by KSM.  It's not used to do cache invalidations,
> > > for example, if we see the notifier will be called before the real PTE
> > > update after all (please see set_pte_at_notify that set_pte_at was
> > > called later).
>
> Thanks for the fixup. I didn't realise that invalidate_range() always gets
> called but I now see that is the case so this change looks good to me as well.
>
> Reviewed-by: Alistair Popple 
>
I checked the three callers of set_pte_at_notify and the assumption
seems correct

Reviewed-by: Balbir Singh 


Re: [PATCH 1/4] powerpc/64s: Clear on-stack exception marker upon exception return

2019-02-05 Thread Balbir Singh
On Tue, Feb 5, 2019 at 10:24 PM Michael Ellerman  wrote:
>
> Balbir Singh  writes:
> > On Sat, Feb 2, 2019 at 12:14 PM Balbir Singh  wrote:
> >>
> >> On Tue, Jan 22, 2019 at 10:57:21AM -0500, Joe Lawrence wrote:
> >> > From: Nicolai Stange 
> >> >
> >> > The ppc64 specific implementation of the reliable stacktracer,
> >> > save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
> >> > trace" whenever it finds an exception frame on the stack. Stack frames
> >> > are classified as exception frames if the STACK_FRAME_REGS_MARKER magic,
> >> > as written by exception prologues, is found at a particular location.
> >> >
> >> > However, as observed by Joe Lawrence, it is possible in practice that
> >> > non-exception stack frames can alias with prior exception frames and 
> >> > thus,
> >> > that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER on
> >> > the stack. It in turn falsely reports an unreliable stacktrace and blocks
> >> > any live patching transition to finish. Said condition lasts until the
> >> > stack frame is overwritten/initialized by function call or other means.
> >> >
> >> > In principle, we could mitigate this by making the exception frame
> >> > classification condition in save_stack_trace_tsk_reliable() stronger:
> >> > in addition to testing for STACK_FRAME_REGS_MARKER, we could also take 
> >> > into
> >> > account that for all exceptions executing on the kernel stack
> >> > - their stack frames's backlink pointers always match what is saved
> >> >   in their pt_regs instance's ->gpr[1] slot and that
> >> > - their exception frame size equals STACK_INT_FRAME_SIZE, a value
> >> >   uncommonly large for non-exception frames.
> >> >
> >> > However, while these are currently true, relying on them would make the
> >> > reliable stacktrace implementation more sensitive towards future changes 
> >> > in
> >> > the exception entry code. Note that false negatives, i.e. not detecting
> >> > exception frames, would silently break the live patching consistency 
> >> > model.
> >> >
> >> > Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
> >> > rely on STACK_FRAME_REGS_MARKER as well.
> >> >
> >> > Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER
> >> > for those exceptions running on the "normal" kernel stack and returning
> >> > to kernelspace: because the topmost frame is ignored by the reliable 
> >> > stack
> >> > tracer anyway, returns to userspace don't need to take care of clearing
> >> > the marker.
> >> >
> >> > Furthermore, as I don't have the ability to test this on Book 3E or
> >> > 32 bits, limit the change to Book 3S and 64 bits.
> >> >
> >> > Finally, make the HAVE_RELIABLE_STACKTRACE Kconfig option depend on
> >> > PPC_BOOK3S_64 for documentation purposes. Before this patch, it depended
> >> > on PPC64 && CPU_LITTLE_ENDIAN and because CPU_LITTLE_ENDIAN implies
> >> > PPC_BOOK3S_64, there's no functional change here.
> >> >
> >> > Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack 
> >> > tracing for the consistency model")
> >> > Reported-by: Joe Lawrence 
> >> > Signed-off-by: Nicolai Stange 
> >> > Signed-off-by: Joe Lawrence 
> >> > ---
> >> >  arch/powerpc/Kconfig   | 2 +-
> >> >  arch/powerpc/kernel/entry_64.S | 7 +++
> >> >  2 files changed, 8 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> >> > index 2890d36eb531..73bf87b1d274 100644
> >> > --- a/arch/powerpc/Kconfig
> >> > +++ b/arch/powerpc/Kconfig
> >> > @@ -220,7 +220,7 @@ config PPC
> >> >   select HAVE_PERF_USER_STACK_DUMP
> >> >   select HAVE_RCU_TABLE_FREE  if SMP
> >> >   select HAVE_REGS_AND_STACK_ACCESS_API
> >> > - select HAVE_RELIABLE_STACKTRACE if PPC64 && 
> >> > CPU_LITTLE_ENDIAN
> >> > + select HAVE_RELIABLE_STACKTRACE if PPC_BOOK3S_64 && 
> >> > CPU_LITTLE_ENDIAN
> >> >   select HAVE_SYSCALL_TRACEPOINTS
> >> >   select HAVE_VIRT_CPU_ACCOUNTING
> >> >   select HAVE_IRQ_TIME_ACCOUNTING
> >> > diff --git a/arch/powerpc/kernel/entry_64.S 
> >> > b/arch/powerpc/kernel/entry_64.S
> >> > index 435927f549c4..a2c168b395d2 100644
> >> > --- a/arch/powerpc/kernel/entry_64.S
> >> > +++ b/arch/powerpc/kernel/entry_64.S
> >> > @@ -1002,6 +1002,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
> >> >   ld  r2,_NIP(r1)
> >> >   mtspr   SPRN_SRR0,r2
> >> >
> >> > + /*
> >> > +  * Leaving a stale exception_marker on the stack can confuse
> >> > +  * the reliable stack unwinder later on. Clear it.
> >> > +  */
> >> > + li  r2,0
> >> > + std r2,STACK_FRAME_OVERHEAD-16(r1)
> >> > +
> >>
> >> Could you please double check, r4 is already 0 at this point
> >> IIUC. So the change might be a simple
> >>
> >> std r4,STACK_FRAME_OVERHEAD-16(r1)
> >>
> >
> > r4 is not 0, sorry for the noise
>
>
 Isn't it?

It is, I seem to be reading the wrong bits and confused myself, had to
re-read mt

Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode

2019-02-05 Thread David Gibson
On Wed, Feb 06, 2019 at 09:13:15AM +1100, Paul Mackerras wrote:
> On Tue, Feb 05, 2019 at 12:31:28PM +0100, Cédric Le Goater wrote:
> > >>> As for nesting, I suggest for the foreseeable future we stick to XICS
> > >>> emulation in nested guests.
> > >>
> > >> ok. so no kernel_irqchip at all. hmm. 
> > 
> > I was confused with what Paul calls 'XICS emulation'. It's not the QEMU
> > XICS emulated device but the XICS-over-XIVE KVM device, the KVM XICS 
> > device KVM uses when under a P9 processor. 
> 
> Actually there are two separate implementations of XICS emulation in
> KVM.  The first (older) one is almost entirely a software emulation
> but does have some cases where it accesses an underlying XICS device
> in order to make some things faster (IPIs and pass-through of a device
> interrupt to a guest).  The other, newer one is the XICS-on-XIVE
> emulation that Ben wrote, which uses the XIVE hardware pretty heavily.
> My patch was about making the the older code work when there is no
> XICS available to the host.

Ah, right.  To clarify my earlier statements in light of this:

 * We definitely want some sort of kernel-XICS available in a nested
   guest.  AIUI, this is now accomplished, so, Yay!

 * Implementing the L2 XICS in terms of L1's PAPR-XIVE would be a
   bonus, but it's a much lower priority.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration

2019-02-05 Thread David Gibson
On Wed, Feb 06, 2019 at 12:23:29PM +1100, David Gibson wrote:
> On Tue, Feb 05, 2019 at 02:03:11PM +0100, Cédric Le Goater wrote:
> > On 2/5/19 6:32 AM, David Gibson wrote:
> > > On Mon, Feb 04, 2019 at 05:07:28PM +0100, Cédric Le Goater wrote:
> > >> On 2/4/19 6:21 AM, David Gibson wrote:
> > >>> On Mon, Jan 07, 2019 at 07:43:27PM +0100, Cédric Le Goater wrote:
> >  Theses are use to capure the XIVE EAS table of the KVM device, the
> >  configuration of the source targets.
> > 
> >  Signed-off-by: Cédric Le Goater 
> >  ---
> >   arch/powerpc/include/uapi/asm/kvm.h   | 11 
> >   arch/powerpc/kvm/book3s_xive_native.c | 87 +++
> >   2 files changed, 98 insertions(+)
> > 
> >  diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> >  b/arch/powerpc/include/uapi/asm/kvm.h
> >  index 1a8740629acf..faf024f39858 100644
> >  --- a/arch/powerpc/include/uapi/asm/kvm.h
> >  +++ b/arch/powerpc/include/uapi/asm/kvm.h
> >  @@ -683,9 +683,20 @@ struct kvm_ppc_cpu_char {
> >   #define   KVM_DEV_XIVE_SAVE_EQ_PAGES  4
> >   #define KVM_DEV_XIVE_GRP_SOURCES  2   /* 64-bit source 
> >  attributes */
> >   #define KVM_DEV_XIVE_GRP_SYNC 3   /* 64-bit source 
> >  attributes */
> >  +#define KVM_DEV_XIVE_GRP_EAS  4   /* 64-bit eas 
> >  attributes */
> >   
> >   /* Layout of 64-bit XIVE source attribute values */
> >   #define KVM_XIVE_LEVEL_SENSITIVE  (1ULL << 0)
> >   #define KVM_XIVE_LEVEL_ASSERTED   (1ULL << 1)
> >   
> >  +/* Layout of 64-bit eas attribute values */
> >  +#define KVM_XIVE_EAS_PRIORITY_SHIFT   0
> >  +#define KVM_XIVE_EAS_PRIORITY_MASK0x7
> >  +#define KVM_XIVE_EAS_SERVER_SHIFT 3
> >  +#define KVM_XIVE_EAS_SERVER_MASK  0xfff8ULL
> >  +#define KVM_XIVE_EAS_MASK_SHIFT   32
> >  +#define KVM_XIVE_EAS_MASK_MASK0x1ULL
> >  +#define KVM_XIVE_EAS_EISN_SHIFT   33
> >  +#define KVM_XIVE_EAS_EISN_MASK0xfffeULL
> >  +
> >   #endif /* __LINUX_KVM_POWERPC_H */
> >  diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
> >  b/arch/powerpc/kvm/book3s_xive_native.c
> >  index f2de1bcf3b35..0468b605baa7 100644
> >  --- a/arch/powerpc/kvm/book3s_xive_native.c
> >  +++ b/arch/powerpc/kvm/book3s_xive_native.c
> >  @@ -525,6 +525,88 @@ static int kvmppc_xive_native_sync(struct 
> >  kvmppc_xive *xive, long irq, u64 addr)
> > return 0;
> >   }
> >   
> >  +static int kvmppc_xive_native_set_eas(struct kvmppc_xive *xive, long 
> >  irq,
> >  +u64 addr)
> > >>>
> > >>> I'd prefer to avoid the name "EAS" here.  IIUC these aren't "raw" EAS
> > >>> values, but rather essentially the "source config" in the terminology
> > >>> of the PAPR hcalls.  Which, yes, is basically implemented by setting
> > >>> the EAS, but since it's the PAPR architected state that we need to
> > >>> preserve across migration, I'd prefer to stick as close as we can to
> > >>> the PAPR terminology.
> > >>
> > >> But we don't have an equivalent name in the PAPR specs for the tuple 
> > >> (prio, server). We could use the generic 'target' name may be ? even 
> > >> if this is usually referring to a CPU number.
> > > 
> > > Um.. what?  That's about terminology for one of the fields in this
> > > thing, not about the name for the thing itself.
> > > 
> > >> Or, IVE (Interrupt Vector Entry) ? which makes some sense. 
> > >> This is was the former name in HW. I think we recycle it for KVM.
> > > 
> > > That's a terrible idea, which will make a confusing situation even
> > > more confusing.
> > 
> > Let's use SOURCE_CONFIG and QUEUE_CONFIG. The KVM ioctls are very 
> > similar to the hcalls anyhow.
> 
> Yes, I think that's a good idea.

Actually... AIUI the SET_CONFIG hcalls shouldn't be a fast path.  Can
we simplify things further by removing the hcall implementation from
the kernel entirely, and have qemu implement them by basically just
forwarding them to the appropriate SET_CONFIG ioctl()?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device

2019-02-05 Thread David Gibson
On Tue, Feb 05, 2019 at 01:55:40PM +0100, Cédric Le Goater wrote:
> On 2/5/19 6:28 AM, David Gibson wrote:
> > On Mon, Feb 04, 2019 at 12:30:39PM +0100, Cédric Le Goater wrote:
> >> On 2/4/19 5:45 AM, David Gibson wrote:
> >>> On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote:
>  This will let the guest create a memory mapping to expose the ESB MMIO
>  regions used to control the interrupt sources, to trigger events, to
>  EOI or to turn off the sources.
> 
>  Signed-off-by: Cédric Le Goater 
>  ---
>   arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
>   arch/powerpc/kvm/book3s_xive_native.c | 97 +++
>   2 files changed, 101 insertions(+)
> 
>  diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>  b/arch/powerpc/include/uapi/asm/kvm.h
>  index 8c876c166ef2..6bb61ba141c2 100644
>  --- a/arch/powerpc/include/uapi/asm/kvm.h
>  +++ b/arch/powerpc/include/uapi/asm/kvm.h
>  @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
>   #define  KVM_XICS_PRESENTED (1ULL << 43)
>   #define  KVM_XICS_QUEUED(1ULL << 44)
>   
>  +/* POWER9 XIVE Native Interrupt Controller */
>  +#define KVM_DEV_XIVE_GRP_CTRL   1
>  +#define   KVM_DEV_XIVE_GET_ESB_FD   1
> >>>
> >>> Introducing a new FD for ESB and TIMA seems overkill.  Can't you get
> >>> to both with an mmap() directly on the xive device fd?  Using the
> >>> offset to distinguish which one to map, obviously.
> >>
> >> The page offset would define some sort of user API. It seems feasible.
> >> But I am not sure this would be practical in the future if we need to 
> >> tune the length.
> > 
> > Um.. why not?  I mean, yes the XIVE supports rather a lot of
> > interrupts, but we have 64-bits of offset we can play with - we can
> > leave room for billions of ESB slots and still have room for billions
> > of VPs.
> 
> So the first 4 pages could be the TIMA pages and then would come  
> the pages for the interrupt ESBs. I think that we can have different 
> vm_fault handler for each mapping.

Um.. no, I'm saying you don't need to tightly pack them.  So you could
have the ESB pages at 0, the TIMA at, say offset 2^60.

> I wonder how this will work out with pass-through. As Paul said in 
> a previous email, it would be better to let QEMU request a new 
> mapping to handle the ESB pages of the device being passed through.
> I guess this is not a special case, just another offset and length.

Right, if we need multiple "chunks" of ESB pages we can given them
each their own terabyte or several.  No need to be stingy with address
space.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration

2019-02-05 Thread David Gibson
On Tue, Feb 05, 2019 at 02:03:11PM +0100, Cédric Le Goater wrote:
> On 2/5/19 6:32 AM, David Gibson wrote:
> > On Mon, Feb 04, 2019 at 05:07:28PM +0100, Cédric Le Goater wrote:
> >> On 2/4/19 6:21 AM, David Gibson wrote:
> >>> On Mon, Jan 07, 2019 at 07:43:27PM +0100, Cédric Le Goater wrote:
>  Theses are use to capure the XIVE EAS table of the KVM device, the
>  configuration of the source targets.
> 
>  Signed-off-by: Cédric Le Goater 
>  ---
>   arch/powerpc/include/uapi/asm/kvm.h   | 11 
>   arch/powerpc/kvm/book3s_xive_native.c | 87 +++
>   2 files changed, 98 insertions(+)
> 
>  diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>  b/arch/powerpc/include/uapi/asm/kvm.h
>  index 1a8740629acf..faf024f39858 100644
>  --- a/arch/powerpc/include/uapi/asm/kvm.h
>  +++ b/arch/powerpc/include/uapi/asm/kvm.h
>  @@ -683,9 +683,20 @@ struct kvm_ppc_cpu_char {
>   #define   KVM_DEV_XIVE_SAVE_EQ_PAGES4
>   #define KVM_DEV_XIVE_GRP_SOURCES2   /* 64-bit source 
>  attributes */
>   #define KVM_DEV_XIVE_GRP_SYNC   3   /* 64-bit source 
>  attributes */
>  +#define KVM_DEV_XIVE_GRP_EAS4   /* 64-bit eas 
>  attributes */
>   
>   /* Layout of 64-bit XIVE source attribute values */
>   #define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>   #define KVM_XIVE_LEVEL_ASSERTED (1ULL << 1)
>   
>  +/* Layout of 64-bit eas attribute values */
>  +#define KVM_XIVE_EAS_PRIORITY_SHIFT 0
>  +#define KVM_XIVE_EAS_PRIORITY_MASK  0x7
>  +#define KVM_XIVE_EAS_SERVER_SHIFT   3
>  +#define KVM_XIVE_EAS_SERVER_MASK0xfff8ULL
>  +#define KVM_XIVE_EAS_MASK_SHIFT 32
>  +#define KVM_XIVE_EAS_MASK_MASK  0x1ULL
>  +#define KVM_XIVE_EAS_EISN_SHIFT 33
>  +#define KVM_XIVE_EAS_EISN_MASK  0xfffeULL
>  +
>   #endif /* __LINUX_KVM_POWERPC_H */
>  diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>  b/arch/powerpc/kvm/book3s_xive_native.c
>  index f2de1bcf3b35..0468b605baa7 100644
>  --- a/arch/powerpc/kvm/book3s_xive_native.c
>  +++ b/arch/powerpc/kvm/book3s_xive_native.c
>  @@ -525,6 +525,88 @@ static int kvmppc_xive_native_sync(struct 
>  kvmppc_xive *xive, long irq, u64 addr)
>   return 0;
>   }
>   
>  +static int kvmppc_xive_native_set_eas(struct kvmppc_xive *xive, long 
>  irq,
>  +  u64 addr)
> >>>
> >>> I'd prefer to avoid the name "EAS" here.  IIUC these aren't "raw" EAS
> >>> values, but rather essentially the "source config" in the terminology
> >>> of the PAPR hcalls.  Which, yes, is basically implemented by setting
> >>> the EAS, but since it's the PAPR architected state that we need to
> >>> preserve across migration, I'd prefer to stick as close as we can to
> >>> the PAPR terminology.
> >>
> >> But we don't have an equivalent name in the PAPR specs for the tuple 
> >> (prio, server). We could use the generic 'target' name may be ? even 
> >> if this is usually referring to a CPU number.
> > 
> > Um.. what?  That's about terminology for one of the fields in this
> > thing, not about the name for the thing itself.
> > 
> >> Or, IVE (Interrupt Vector Entry) ? which makes some sense. 
> >> This is was the former name in HW. I think we recycle it for KVM.
> > 
> > That's a terrible idea, which will make a confusing situation even
> > more confusing.
> 
> Let's use SOURCE_CONFIG and QUEUE_CONFIG. The KVM ioctls are very 
> similar to the hcalls anyhow.

Yes, I think that's a good idea.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state

2019-02-05 Thread David Gibson
On Tue, Feb 05, 2019 at 12:58:54PM +0100, Cédric Le Goater wrote:
> On 2/5/19 6:33 AM, David Gibson wrote:
> > On Mon, Feb 04, 2019 at 07:57:26PM +0100, Cédric Le Goater wrote:
> >> On 2/4/19 6:26 AM, David Gibson wrote:
> >>> On Mon, Jan 07, 2019 at 08:10:04PM +0100, Cédric Le Goater wrote:
>  At a VCPU level, the state of the thread context interrupt management
>  registers needs to be collected. These registers are cached under the
>  'xive_saved_state.w01' field of the VCPU when the VPCU context is
>  pulled from the HW thread. An OPAL call retrieves the backup of the
>  IPB register in the NVT structure and merges it in the KVM state.
> 
>  The structures of the interface between QEMU and KVM provisions some
>  extra room (two u64) for further extensions if more state needs to be
>  transferred back to QEMU.
> 
>  Signed-off-by: Cédric Le Goater 
>  ---
>   arch/powerpc/include/asm/kvm_ppc.h|  5 ++
>   arch/powerpc/include/uapi/asm/kvm.h   |  2 +
>   arch/powerpc/kvm/book3s.c | 24 +
>   arch/powerpc/kvm/book3s_xive_native.c | 78 +++
>   4 files changed, 109 insertions(+)
> 
>  diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>  b/arch/powerpc/include/asm/kvm_ppc.h
>  index 4cc897039485..49c488af168c 100644
>  --- a/arch/powerpc/include/asm/kvm_ppc.h
>  +++ b/arch/powerpc/include/asm/kvm_ppc.h
>  @@ -270,6 +270,7 @@ union kvmppc_one_reg {
>   u64 addr;
>   u64 length;
>   }   vpaval;
>  +u64 xive_timaval[4];
>   };
>   
>   struct kvmppc_ops {
>  @@ -603,6 +604,8 @@ extern void kvmppc_xive_native_cleanup_vcpu(struct 
>  kvm_vcpu *vcpu);
>   extern void kvmppc_xive_native_init_module(void);
>   extern void kvmppc_xive_native_exit_module(void);
>   extern int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd);
>  +extern int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union 
>  kvmppc_one_reg *val);
>  +extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union 
>  kvmppc_one_reg *val);
>   
>   #else
>   static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 
>  server,
>  @@ -637,6 +640,8 @@ static inline void 
>  kvmppc_xive_native_init_module(void) { }
>   static inline void kvmppc_xive_native_exit_module(void) { }
>   static inline int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 
>  cmd)
>   { return 0; }
>  +static inline int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, 
>  union kvmppc_one_reg *val) { return 0; }
>  +static inline int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, 
>  union kvmppc_one_reg *val) { return -ENOENT; }
> >>>
> >>> IIRC "VP" is the old name for "TCTX".  Since we're using tctx in the
> >>> rest of the XIVE code, can we use it here as well.
> >>
> >> OK. The state we are getting or setting is indeed related to the thread 
> >> interrupt  context registers. 
> >>
> >> The name VP is related to an identifier to some interrupt context under 
> >> OPAL (NVT in HW to be precise).
> > 
> > Oh, sorry, "NVT" was the name I was looking for, not "TCTX".  But in
> > any case, please lets standardize on one.
> 
> There is some confusion in the naming for :
> 
>  - VPVirtual Processor (XIVE 1)
>  - VPD   Virtual Processor Descriptor (XIVE 1)
>  - TCTX  Thread interrupt context registers
>  - NVT   Notify Virtual Target. Former VP. 
>  - NVTS  Notify Virtual Target Structure. Where the TCTX regs are cached.
> 
> 
> I am fine with using NVT because this is indeed the name of the XIVE 
> structure where the HW caches the thread interrupt context registers.
> 
> But the XIVE native layer and the XICS-over-XIVE KVM device use the
> name VP (the old one). I don't think we want to change these now.

Ah, right.  It now occurs to me that the place I've already seen NVT
used is in the qemu code, whereas this is kernel.  In that case
sticking to VP here makes sense.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] mtd: powernv: SPDX and comment fixups

2019-02-05 Thread Stewart Smith
Joel Stanley  writes:
> converts the powernv flash driver to use SPDX, and adds some
> clarifying comments that came out of a discussion on how the mtd driver
> works.
>
> Signed-off-by: Joel Stanley 

We probably don't need to mention the dim dark corners of the FFS format
and I kind of wish it'd die rather than spread further.

Reviewed-by: Stewart Smith 


-- 
Stewart Smith
OPAL Architect, IBM.



[PATCH] mtd: powernv: SPDX and comment fixups

2019-02-05 Thread Joel Stanley
This converts the powernv flash driver to use SPDX, and adds some
clarifying comments that came out of a discussion on how the mtd driver
works.

Signed-off-by: Joel Stanley 
---
 drivers/mtd/devices/powernv_flash.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/mtd/devices/powernv_flash.c 
b/drivers/mtd/devices/powernv_flash.c
index 22f753e555ac..0bf43336c3f7 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -1,17 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0+
+
 /*
  * OPAL PNOR flash MTD abstraction
  *
  * Copyright IBM 2015
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
  */
 
 #include 
@@ -261,6 +253,14 @@ static int powernv_flash_probe(struct platform_device 
*pdev)
 * The current flash that skiboot exposes is one contiguous flash chip
 * with an ffs partition at the start, it should prove easier for users
 * to deal with partitions or not as they see fit
+*
+* When developing the skiboot MTD driver an experiment with FFS
+* parsing in the kernel, and exposing a seperate /dev/mtdX for each
+* partition (eg BOOTKERNEL, PAYLOAD, NVRAM, etc), was done.
+*
+* We didn't go with that as it meant users couldn't do a full flash
+* re-write, as this can cause a partition to change size, and there
+* wasn't a way to tell the MTD layer that a device has shrunk/grown.
 */
return mtd_device_register(&data->mtd, NULL, 0);
 }
-- 
2.20.1



[PATCH net] net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

2019-02-05 Thread Yang Wei
From: Yang Wei 

dev_consume_skb_irq() should be called in hdlc_tx_done() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei 
---
 drivers/net/wan/fsl_ucc_hdlc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 66d889d..a08f04c 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -482,7 +482,7 @@ static int hdlc_tx_done(struct ucc_hdlc_private *priv)
memset(priv->tx_buffer +
   (be32_to_cpu(bd->buf) - priv->dma_tx_addr),
   0, skb->len);
-   dev_kfree_skb_irq(skb);
+   dev_consume_skb_irq(skb);
 
priv->tx_skbuff[priv->skb_dirtytx] = NULL;
priv->skb_dirtytx =
-- 
2.7.4




Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode

2019-02-05 Thread Paul Mackerras
On Tue, Feb 05, 2019 at 12:31:28PM +0100, Cédric Le Goater wrote:
> >>> As for nesting, I suggest for the foreseeable future we stick to XICS
> >>> emulation in nested guests.
> >>
> >> ok. so no kernel_irqchip at all. hmm. 
> 
> I was confused with what Paul calls 'XICS emulation'. It's not the QEMU
> XICS emulated device but the XICS-over-XIVE KVM device, the KVM XICS 
> device KVM uses when under a P9 processor. 

Actually there are two separate implementations of XICS emulation in
KVM.  The first (older) one is almost entirely a software emulation
but does have some cases where it accesses an underlying XICS device
in order to make some things faster (IPIs and pass-through of a device
interrupt to a guest).  The other, newer one is the XICS-on-XIVE
emulation that Ben wrote, which uses the XIVE hardware pretty heavily.
My patch was about making the the older code work when there is no
XICS available to the host.

Paul.


[PATCH v2] powerpc/mm: move a KERN_WARNING message to pr_debug()

2019-02-05 Thread Laurent Vivier
resize_hpt_for_hotplug() reports a warning when it cannot
increase the hash page table ("Unable to resize hash page
table to target order") but this is not blocking and
can make user thinks something has not worked properly.
As we move the message to the debug area, report again the
ENODEV error.

If the operation cannot be done the real error message
will be reported by arch_add_memory() if create_section_mapping()
fails.

Fixes: 7339390d772dd
   powerpc/pseries: Don't give a warning when HPT resizing isn't available
Signed-off-by: Laurent Vivier 
---

Notes:
v2:
 - use pr_debug instead of printk(KERN_DEBUG
 - remove check for ENODEV

 arch/powerpc/mm/hash_utils_64.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0cc7fbc3bd1c..6a0cc4eb2c83 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -777,10 +777,9 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
int rc;
 
rc = mmu_hash_ops.resize_hpt(target_hpt_shift);
-   if (rc && (rc != -ENODEV))
-   printk(KERN_WARNING
-  "Unable to resize hash page table to target 
order %d: %d\n",
-  target_hpt_shift, rc);
+   if (rc)
+   pr_debug("Unable to resize hash page table to target 
order %d: %d\n",
+target_hpt_shift, rc);
}
 }
 
-- 
2.20.1



Re: [PATCH] powerpc/mm: move a KERN_WARNING message to KERN_DEBUG

2019-02-05 Thread Christophe Leroy




Le 05/02/2019 à 19:03, Laurent Vivier a écrit :

resize_hpt_for_hotplug() reports a warning when it cannot
increase the hash page table ("Unable to resize hash page
table to target order") but this is not blocking and
can make user thinks something has not worked properly.

If the operation cannot be done the real error message
will be reported by arch_add_memory() if create_section_mapping()
fails.

Signed-off-by: Laurent Vivier 
---
  arch/powerpc/mm/hash_utils_64.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0cc7fbc3bd1c..b762bdceb510 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -778,7 +778,7 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
  
  		rc = mmu_hash_ops.resize_hpt(target_hpt_shift);

if (rc && (rc != -ENODEV))
-   printk(KERN_WARNING
+   printk(KERN_DEBUG


You should use pr_debug() instead.

Christophe


   "Unable to resize hash page table to target order %d: 
%d\n",
   target_hpt_shift, rc);
}



Re: [PATCH 16/19] KVM: PPC: Book3S HV: add get/set accessors for the EQ configuration

2019-02-05 Thread Cédric Le Goater
On 2/4/19 6:24 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:28PM +0100, Cédric Le Goater wrote:
>> These are used to capture the XIVE END table of the KVM device. It
>> relies on an OPAL call to retrieve from the XIVE IC the EQ toggle bit
>> and index which are updated by the HW when events are enqueued in the
>> guest RAM.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h   |  21 
>>  arch/powerpc/kvm/book3s_xive_native.c | 166 ++
>>  2 files changed, 187 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index faf024f39858..95302558ce10 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -684,6 +684,7 @@ struct kvm_ppc_cpu_char {
>>  #define KVM_DEV_XIVE_GRP_SOURCES2   /* 64-bit source attributes */
>>  #define KVM_DEV_XIVE_GRP_SYNC   3   /* 64-bit source 
>> attributes */
>>  #define KVM_DEV_XIVE_GRP_EAS4   /* 64-bit eas 
>> attributes */
>> +#define KVM_DEV_XIVE_GRP_EQ 5   /* 64-bit eq attributes */
>>  
>>  /* Layout of 64-bit XIVE source attribute values */
>>  #define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>> @@ -699,4 +700,24 @@ struct kvm_ppc_cpu_char {
>>  #define KVM_XIVE_EAS_EISN_SHIFT 33
>>  #define KVM_XIVE_EAS_EISN_MASK  0xfffeULL
>>  
>> +/* Layout of 64-bit eq attribute */
>> +#define KVM_XIVE_EQ_PRIORITY_SHIFT  0
>> +#define KVM_XIVE_EQ_PRIORITY_MASK   0x7
>> +#define KVM_XIVE_EQ_SERVER_SHIFT3
>> +#define KVM_XIVE_EQ_SERVER_MASK 0xfff8ULL
>> +
>> +/* Layout of 64-bit eq attribute values */
>> +struct kvm_ppc_xive_eq {
>> +__u32 flags;
>> +__u32 qsize;
>> +__u64 qpage;
>> +__u32 qtoggle;
>> +__u32 qindex;
> 
> Should we pad this in case a) we discover some fields in the EQ that
> we thought weren't relevant to the guest actually are or b) future
> XIVE extensions add something we need to migrate.

The underlying XIVE structure is composed of 32bytes. I will double the
size.

Thanks,

C.


> 
>> +};
>> +
>> +#define KVM_XIVE_EQ_FLAG_ENABLED0x0001
>> +#define KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY  0x0002
>> +#define KVM_XIVE_EQ_FLAG_ESCALATE   0x0004
>> +
>> +
>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>> b/arch/powerpc/kvm/book3s_xive_native.c
>> index 0468b605baa7..f4eb71eafc57 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -607,6 +607,164 @@ static int kvmppc_xive_native_get_eas(struct 
>> kvmppc_xive *xive, long irq,
>>  return 0;
>>  }
>>  
>> +static int kvmppc_xive_native_set_queue(struct kvmppc_xive *xive, long 
>> eq_idx,
>> +  u64 addr)
>> +{
>> +struct kvm *kvm = xive->kvm;
>> +struct kvm_vcpu *vcpu;
>> +struct kvmppc_xive_vcpu *xc;
>> +void __user *ubufp = (u64 __user *) addr;
>> +u32 server;
>> +u8 priority;
>> +struct kvm_ppc_xive_eq kvm_eq;
>> +int rc;
>> +__be32 *qaddr = 0;
>> +struct page *page;
>> +struct xive_q *q;
>> +
>> +/*
>> + * Demangle priority/server tuple from the EQ index
>> + */
>> +priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >>
>> +KVM_XIVE_EQ_PRIORITY_SHIFT;
>> +server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >>
>> +KVM_XIVE_EQ_SERVER_SHIFT;
>> +
>> +if (copy_from_user(&kvm_eq, ubufp, sizeof(kvm_eq)))
>> +return -EFAULT;
>> +
>> +vcpu = kvmppc_xive_find_server(kvm, server);
>> +if (!vcpu) {
>> +pr_err("Can't find server %d\n", server);
>> +return -ENOENT;
>> +}
>> +xc = vcpu->arch.xive_vcpu;
>> +
>> +if (priority != xive_prio_from_guest(priority)) {
>> +pr_err("Trying to restore invalid queue %d for VCPU %d\n",
>> +   priority, server);
>> +return -EINVAL;
>> +}
>> +q = &xc->queues[priority];
>> +
>> +pr_devel("%s VCPU %d priority %d fl:%x sz:%d addr:%llx g:%d idx:%d\n",
>> + __func__, server, priority, kvm_eq.flags,
>> + kvm_eq.qsize, kvm_eq.qpage, kvm_eq.qtoggle, kvm_eq.qindex);
>> +
>> +rc = xive_native_validate_queue_size(kvm_eq.qsize);
>> +if (rc || !kvm_eq.qsize) {
>> +pr_err("invalid queue size %d\n", kvm_eq.qsize);
>> +return rc;
>> +}
>> +
>> +page = gfn_to_page(kvm, gpa_to_gfn(kvm_eq.qpage));
>> +if (is_error_page(page)) {
>> +pr_warn("Couldn't get guest page for %llx!\n", kvm_eq.qpage);
>> +return -ENOMEM;
>> +}
>> +qaddr = page_to_virt(page) + (kvm_eq.qpage & ~PAGE_MASK);
>> +
>> +/* Backup queue page guest address for migration */
>> +q->guest_qpage = kvm_eq.qpage;
>> +q->guest_qsize = kvm_eq.qsize;
>> +
>> +rc = xive_native_configure_queue(xc->vp_id, q, priority,

[PATCH] powerpc/mm: move a KERN_WARNING message to KERN_DEBUG

2019-02-05 Thread Laurent Vivier
resize_hpt_for_hotplug() reports a warning when it cannot
increase the hash page table ("Unable to resize hash page
table to target order") but this is not blocking and
can make user thinks something has not worked properly.

If the operation cannot be done the real error message
will be reported by arch_add_memory() if create_section_mapping()
fails.

Signed-off-by: Laurent Vivier 
---
 arch/powerpc/mm/hash_utils_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0cc7fbc3bd1c..b762bdceb510 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -778,7 +778,7 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
 
rc = mmu_hash_ops.resize_hpt(target_hpt_shift);
if (rc && (rc != -ENODEV))
-   printk(KERN_WARNING
+   printk(KERN_DEBUG
   "Unable to resize hash page table to target 
order %d: %d\n",
   target_hpt_shift, rc);
}
-- 
2.20.1



Re: [PATCHv6 1/4] dt-bindings: add DT binding for the layerscape PCIe controller with EP mode

2019-02-05 Thread Lorenzo Pieralisi
On Tue, Jan 22, 2019 at 02:33:25PM +0800, Xiaowei Bao wrote:
> Add the documentation for the Device Tree binding for the layerscape PCIe
> controller with EP mode.
> 
> Signed-off-by: Xiaowei Bao 
> Reviewed-by: Minghuan Lian 
> Reviewed-by: Zhiqiang Hou 
> Reviewed-by: Rob Herring 
> ---
> v2:
>  - Add the SoC specific compatibles.
> v3:
>  - modify the commit message.
> v4:
>  - no change.
> v5:
>  - no change.
> v6:
>  - no change.
> 
>  .../devicetree/bindings/pci/layerscape-pci.txt |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)

Applied the series to pci/layerscape for v5.1, thanks.

Lorenzo

> diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt 
> b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> index 9b2b8d6..e20ceaa 100644
> --- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> +++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
> @@ -13,6 +13,7 @@ information.
>  
>  Required properties:
>  - compatible: should contain the platform identifier such as:
> +  RC mode:
>  "fsl,ls1021a-pcie"
>  "fsl,ls2080a-pcie", "fsl,ls2085a-pcie"
>  "fsl,ls2088a-pcie"
> @@ -20,6 +21,8 @@ Required properties:
>  "fsl,ls1046a-pcie"
>  "fsl,ls1043a-pcie"
>  "fsl,ls1012a-pcie"
> +  EP mode:
> + "fsl,ls1046a-pcie-ep", "fsl,ls-pcie-ep"
>  - reg: base addresses and lengths of the PCIe controller register blocks.
>  - interrupts: A list of interrupt outputs of the controller. Must contain an
>entry for each entry in the interrupt-names property.
> -- 
> 1.7.1
> 


Re: [PATCH v3 1/2] mm: add probe_user_read()

2019-02-05 Thread Murilo Opsfelder Araujo
Hi, Christophe.

On Wed, Jan 16, 2019 at 04:59:27PM +, Christophe Leroy wrote:
> In powerpc code, there are several places implementing safe
> access to user data. This is sometimes implemented using
> probe_kernel_address() with additional access_ok() verification,
> sometimes with get_user() enclosed in a pagefault_disable()/enable()
> pair, etc. :
> show_user_instructions()
> bad_stack_expansion()
> p9_hmi_special_emu()
> fsl_pci_mcheck_exception()
> read_user_stack_64()
> read_user_stack_32() on PPC64
> read_user_stack_32() on PPC32
> power_pmu_bhrb_to()
>
> In the same spirit as probe_kernel_read(), this patch adds
> probe_user_read().
>
> probe_user_read() does the same as probe_kernel_read() but
> first checks that it is really a user address.
>
> The patch defines this function as a static inline so the "size"
> variable can be examined for const-ness by the check_object_size()
> in __copy_from_user_inatomic()
>
> Signed-off-by: Christophe Leroy 
> ---
>  v3: Moved 'Returns:" comment after description.
>  Explained in the commit log why the function is defined static inline
>
>  v2: Added "Returns:" comment and removed probe_user_address()
>
>  include/linux/uaccess.h | 34 ++
>  1 file changed, 34 insertions(+)
>
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 37b226e8df13..ef99edd63da3 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -263,6 +263,40 @@ extern long strncpy_from_unsafe(char *dst, const void 
> *unsafe_addr, long count);
>  #define probe_kernel_address(addr, retval)   \
>   probe_kernel_read(&retval, addr, sizeof(retval))
>
> +/**
> + * probe_user_read(): safely attempt to read from a user location
> + * @dst: pointer to the buffer that shall take the data
> + * @src: address to read from
> + * @size: size of the data chunk
> + *
> + * Safely read from address @src to the buffer at @dst.  If a kernel fault
> + * happens, handle that and return -EFAULT.
> + *
> + * We ensure that the copy_from_user is executed in atomic context so that
> + * do_page_fault() doesn't attempt to take mmap_sem.  This makes
> + * probe_user_read() suitable for use within regions where the caller
> + * already holds mmap_sem, or other locks which nest inside mmap_sem.
> + *
> + * Returns: 0 on success, -EFAULT on error.
> + */
> +
> +#ifndef probe_user_read
> +static __always_inline long probe_user_read(void *dst, const void __user 
> *src,
> + size_t size)
> +{
> + long ret;
> +
> + if (!access_ok(src, size))
> + return -EFAULT;

Hopefully, there is still time for a minor comment.

Do we need to differentiate the returned error here, e.g.: return
-EACCES?

I wonder if there will be situations where callers need to know why
probe_user_read() failed.

Besides that:

Acked-by: Murilo Opsfelder Araujo 

> +
> + pagefault_disable();
> + ret = __copy_from_user_inatomic(dst, src, size);
> + pagefault_enable();
> +
> + return ret ? -EFAULT : 0;
> +}
> +#endif
> +
>  #ifndef user_access_begin
>  #define user_access_begin(ptr,len) access_ok(ptr, len)
>  #define user_access_end() do { } while (0)
> --
> 2.13.3
>

--
Murilo



Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-05 Thread Michael S. Tsirkin
On Tue, Feb 05, 2019 at 08:24:07AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 04, 2019 at 04:38:21PM -0500, Michael S. Tsirkin wrote:
> > It was designed to make, when set, as many guests as we can work
> > correctly, and it seems to be successful in doing exactly that.
> > 
> > Unfortunately there could be legacy guests that do work correctly but
> > become slow. Whether trying to somehow work around that
> > can paint us into a corner where things again don't
> > work for some people is a question worth discussing.
> 
> The other problem is that some qemu machines just throw passthrough
> devices and virtio devices on the same virtual PCI(e) bus, and have a
> common IOMMU setup for the whole bus / root port / domain.  I think
> this is completely bogus, but unfortunately it is out in the field.
> 
> Given that power is one of these examples I suspect that is what
> Thiago referes to.  But in this case the answer can't be that we
> pile on hack ontop of another, but instead introduce a new qemu
> machine that separates these clearly, and make that mandatory for
> the secure guest support.

That could we one approach, assuming one exists that guests
already support.

-- 
MST


Re: [RFC PATCH] powerpc: fix get_arch_dma_ops() for NTB devices

2019-02-05 Thread Christoph Hellwig
On Tue, Feb 05, 2019 at 10:20:32PM +1100, Michael Ellerman wrote:
> get_dma_ops() falls into arch-dependant get_arch_dma_ops(), which
> historically returns NULL on PowerPC. Therefore dma_set_mask() fails.
> This affects Switchtec (and probably other) NTB devices, that they fail
> to initialize.
> The proposed patch should fix the issue.

I'm not a fan of this.  powerpc, just like arm64 for example has
required that we set a specific per-device dma ops, which keeps
the assignments clear.  Where do the NTB devices come from?

Might be worth adding the NTB maintainers to the CC list and maybe
linux-iommu.


[PATCH 2/8] perf mem/c2c: Fix perf_mem_events to support powerpc

2019-02-05 Thread Arnaldo Carvalho de Melo
From: Ravi Bangoria 

PowerPC hardware does not have a builtin latency filter (--ldlat) for
the "mem-load" event and perf_mem_events by default includes
"/ldlat=30/" which is causing a failure on PowerPC. Refactor the code to
support "perf mem/c2c" on PowerPC.

This patch depends on kernel side changes done my Madhavan:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182596.html

Signed-off-by: Ravi Bangoria 
Acked-by: Jiri Olsa 
Cc: Dick Fowles 
Cc: Don Zickus 
Cc: Joe Mario 
Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Namhyung Kim 
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20190129132412.771-1-ravi.bango...@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-c2c.txt | 16 
 tools/perf/Documentation/perf-mem.txt |  2 +-
 tools/perf/arch/powerpc/util/Build|  1 +
 tools/perf/arch/powerpc/util/mem-events.c | 11 +++
 tools/perf/util/mem-events.c  |  2 +-
 5 files changed, 26 insertions(+), 6 deletions(-)
 create mode 100644 tools/perf/arch/powerpc/util/mem-events.c

diff --git a/tools/perf/Documentation/perf-c2c.txt 
b/tools/perf/Documentation/perf-c2c.txt
index 095aebdc5bb7..e6150f21267d 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -19,8 +19,11 @@ C2C stands for Cache To Cache.
 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
 you to track down the cacheline contentions.
 
-The tool is based on x86's load latency and precise store facility events
-provided by Intel CPUs. These events provide:
+On x86, the tool is based on load latency and precise store facility events
+provided by Intel CPUs. On PowerPC, the tool uses random instruction sampling
+with thresholding feature.
+
+These events provide:
   - memory address of the access
   - type of the access (load and store details)
   - latency (in cycles) of the load access
@@ -46,7 +49,7 @@ RECORD OPTIONS
 
 -l::
 --ldlat::
-   Configure mem-loads latency.
+   Configure mem-loads latency. (x86 only)
 
 -k::
 --all-kernel::
@@ -119,11 +122,16 @@ Following perf record options are configured by default:
   -W,-d,--phys-data,--sample-cpu
 
 Unless specified otherwise with '-e' option, following events are monitored by
-default:
+default on x86:
 
   cpu/mem-loads,ldlat=30/P
   cpu/mem-stores/P
 
+and following on PowerPC:
+
+  cpu/mem-loads/
+  cpu/mem-stores/
+
 User can pass any 'perf record' option behind '--' mark, like (to enable
 callchains and system wide monitoring):
 
diff --git a/tools/perf/Documentation/perf-mem.txt 
b/tools/perf/Documentation/perf-mem.txt
index f8d2167cf3e7..199ea0f0a6c0 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -82,7 +82,7 @@ RECORD OPTIONS
Be more verbose (show counter open errors, etc)
 
 --ldlat ::
-   Specify desired latency for loads event.
+   Specify desired latency for loads event. (x86 only)
 
 In addition, for report all perf report options are valid, and for record
 all perf record options.
diff --git a/tools/perf/arch/powerpc/util/Build 
b/tools/perf/arch/powerpc/util/Build
index 2e6595310420..ba98bd006488 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -2,6 +2,7 @@ libperf-y += header.o
 libperf-y += sym-handling.o
 libperf-y += kvm-stat.o
 libperf-y += perf_regs.o
+libperf-y += mem-events.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/mem-events.c 
b/tools/perf/arch/powerpc/util/mem-events.c
new file mode 100644
index ..d08311f04e95
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/mem-events.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "mem-events.h"
+
+/* PowerPC does not support 'ldlat' parameter. */
+char *perf_mem_events__name(int i)
+{
+   if (i == PERF_MEM_EVENTS__LOAD)
+   return (char *) "cpu/mem-loads/";
+
+   return (char *) "cpu/mem-stores/";
+}
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 93f74d8d3cdd..42c3e5a229d2 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -28,7 +28,7 @@ struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = 
{
 static char mem_loads_name[100];
 static bool mem_loads_name__init;
 
-char *perf_mem_events__name(int i)
+char * __weak perf_mem_events__name(int i)
 {
if (i == PERF_MEM_EVENTS__LOAD) {
if (!mem_loads_name__init) {
-- 
2.20.1



Re: [PATCH] powerpc/prom_init: add __init markers to all functions

2019-02-05 Thread Masahiro Yamada
On Tue, Feb 5, 2019 at 7:33 PM Michael Ellerman  wrote:
>
> Masahiro Yamada  writes:
>
> > It is fragile to rely on the compiler's optimization to avoid the
> > section mismatch. Some functions may not be necessarily inlined
> > when the compiler's inlining heuristic changes.
> >
> > Add __init markers consistently.
> >
> > As for prom_getprop() and prom_getproplen(), they are marked as
> > 'inline', so inlining is guaranteed because PowerPC never enables
> > CONFIG_OPTIMIZE_INLINING. However, it would be better to leave the
> > inlining decision to the compiler. I replaced 'inline' with __init.
>
> I'm going to drop that part because it breaks the build in some
> configurations (as reported by the build robot).


If you drop this part, my motivation for this patch is lost.

My motivation is to allow all architectures to enable
CONFIG_OPTIMIZE_INLINING.
(Currently, only x86 can enable it, but I see nothing arch-dependent
in this feature.)


When I tested it in 0-day bot, it reported
section mismatches from prom_getprop() and prom_getproplen().

So, I want to fix the section mismatches without
relying on 'inline'.


My suggestion is this:

static int __init __maybe_unused prom_getproplen(phandle node,
 const char *pname)
{
return call_prom("getproplen", 2, 1, node, ADDR(pname));
}


It is true you can use the side-effect of 'inline'
to hide the unused function warnings, but I prefer
as less inline markers as possible in *.c files.





> > diff --git a/arch/powerpc/kernel/prom_init.c 
> > b/arch/powerpc/kernel/prom_init.c
> > index f33ff41..85b0719 100644
> > --- a/arch/powerpc/kernel/prom_init.c
> > +++ b/arch/powerpc/kernel/prom_init.c
> > @@ -501,19 +501,19 @@ static int __init prom_next_node(phandle *nodep)
> >   }
> >  }
> >
> > -static inline int prom_getprop(phandle node, const char *pname,
> > +static int __init prom_getprop(phandle node, const char *pname,
> >  void *value, size_t valuelen)
> >  {
> >   return call_prom("getprop", 4, 1, node, ADDR(pname),
> >(u32)(unsigned long) value, (u32) valuelen);
> >  }
> >
> > -static inline int prom_getproplen(phandle node, const char *pname)
> > +static int __init prom_getproplen(phandle node, const char *pname)
> >  {
> >   return call_prom("getproplen", 2, 1, node, ADDR(pname));
> >  }
> >
> > -static void add_string(char **str, const char *q)
> > +static void __init add_string(char **str, const char *q)
> >  {
> >   char *p = *str;
> >
> > @@ -523,7 +523,7 @@ static void add_string(char **str, const char *q)
> >   *str = p;
> >  }
> >
> > -static char *tohex(unsigned int x)
> > +static char __init *tohex(unsigned int x)
> >  {
> >   static const char digits[] __initconst = "0123456789abcdef";
> >   static char result[9] __prombss;
> > @@ -570,7 +570,7 @@ static int __init prom_setprop(phandle node, const char 
> > *nodename,
> >  #define islower(c)   ('a' <= (c) && (c) <= 'z')
> >  #define toupper(c)   (islower(c) ? ((c) - 'a' + 'A') : (c))
> >
> > -static unsigned long prom_strtoul(const char *cp, const char **endp)
> > +static unsigned long __init prom_strtoul(const char *cp, const char **endp)
> >  {
> >   unsigned long result = 0, base = 10, value;
> >
> > @@ -595,7 +595,7 @@ static unsigned long prom_strtoul(const char *cp, const 
> > char **endp)
> >   return result;
> >  }
> >
> > -static unsigned long prom_memparse(const char *ptr, const char **retptr)
> > +static unsigned long __init prom_memparse(const char *ptr, const char 
> > **retptr)
> >  {
> >   unsigned long ret = prom_strtoul(ptr, retptr);
> >   int shift = 0;
> > @@ -2924,7 +2924,7 @@ static void __init fixup_device_tree_pasemi(void)
> >   prom_setprop(iob, name, "device_type", "isa", sizeof("isa"));
> >  }
> >  #else/* !CONFIG_PPC_PASEMI_NEMO */
> > -static inline void fixup_device_tree_pasemi(void) { }
> > +static inline void __init fixup_device_tree_pasemi(void) { }
>
> I don't think we need __init for an empty static inline.

I prefer 'static __init' to 'static inline',
but I can drop this if you are uncomfortable with it.

My work will not be blocked by this.



> >  #endif
> >
> >  static void __init fixup_device_tree(void)
> > @@ -2986,15 +2986,15 @@ static void __init prom_check_initrd(unsigned long 
> > r3, unsigned long r4)
> >
> >  #ifdef CONFIG_PPC64
> >  #ifdef CONFIG_RELOCATABLE
> > -static void reloc_toc(void)
> > +static void __init reloc_toc(void)
> >  {
> >  }
> >
> > -static void unreloc_toc(void)
> > +static void __init unreloc_toc(void)
> >  {
> >  }
>
> Those should be empty static inlines, I'll fix them up.

As I said above, I believe 'static inline' is mostly useful in headers,
but this is up to you.


BTW, I have v2 in hand already.
Do you need it if it is convenient for you?

I added __init to enter_prom() as well,
but you may not be comfortable with
replacing inline with __init.





> >  #else
> 

Re: [PATCH 09/19] KVM: PPC: Book3S HV: add a SET_SOURCE control to the XIVE native device

2019-02-05 Thread Cédric Le Goater
On 2/5/19 6:35 AM, David Gibson wrote:
> On Mon, Feb 04, 2019 at 08:07:20PM +0100, Cédric Le Goater wrote:
>> On 2/4/19 5:57 AM, David Gibson wrote:
>>> On Mon, Jan 07, 2019 at 07:43:21PM +0100, Cédric Le Goater wrote:
> [snip]
 +  sb = kvmppc_xive_create_src_block(xive, irq);
 +  if (!sb) {
 +  pr_err("Failed to create block...\n");
 +  return -ENOMEM;
 +  }
 +  }
 +  state = &sb->irq_state[idx];
 +
 +  if (get_user(val, ubufp)) {
 +  pr_err("fault getting user info !\n");
 +  return -EFAULT;
 +  }
 +
 +  /*
 +   * If the source doesn't already have an IPI, allocate
 +   * one and get the corresponding data
 +   */
 +  if (!state->ipi_number) {
 +  state->ipi_number = xive_native_alloc_irq();
 +  if (state->ipi_number == 0) {
 +  pr_err("Failed to allocate IRQ !\n");
 +  return -ENOMEM;
 +  }
>>>
>>> Am I right in thinking this is the point at which a specific guest irq
>>> number gets bound to a specific host irq number?
>>
>> yes. the XIVE IRQ state caches this information and 'state' should be 
>> protected before being assigned, indeed ... The XICS-over-XIVE device
>> also has the same race issue.
>>
>> It's not showing because where initializing the KVM device sequentially
>> from QEMU and only once.
> 
> Ok.
> 
> So, for the passthrough case, what's the point at which we know that a
> particular guest interrupt needs to be bound to a specific real
> hardware interrupt, rather than a generic IPI?

when the guest driver requests MSIs, VFIO requests a mapping of the 
HW irqs in the guest IRQ space. This is very briefly said as VFIO is 
a huge framework. 

Patch 18 adds some initial support to handle the ESB pages but this 
should be done at the QEMU level.

C. 


Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device

2019-02-05 Thread Cédric Le Goater
On 2/5/19 6:28 AM, David Gibson wrote:
> On Mon, Feb 04, 2019 at 12:30:39PM +0100, Cédric Le Goater wrote:
>> On 2/4/19 5:45 AM, David Gibson wrote:
>>> On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote:
 This will let the guest create a memory mapping to expose the ESB MMIO
 regions used to control the interrupt sources, to trigger events, to
 EOI or to turn off the sources.

 Signed-off-by: Cédric Le Goater 
 ---
  arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
  arch/powerpc/kvm/book3s_xive_native.c | 97 +++
  2 files changed, 101 insertions(+)

 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index 8c876c166ef2..6bb61ba141c2 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
  #define  KVM_XICS_PRESENTED   (1ULL << 43)
  #define  KVM_XICS_QUEUED  (1ULL << 44)
  
 +/* POWER9 XIVE Native Interrupt Controller */
 +#define KVM_DEV_XIVE_GRP_CTRL 1
 +#define   KVM_DEV_XIVE_GET_ESB_FD 1
>>>
>>> Introducing a new FD for ESB and TIMA seems overkill.  Can't you get
>>> to both with an mmap() directly on the xive device fd?  Using the
>>> offset to distinguish which one to map, obviously.
>>
>> The page offset would define some sort of user API. It seems feasible.
>> But I am not sure this would be practical in the future if we need to 
>> tune the length.
> 
> Um.. why not?  I mean, yes the XIVE supports rather a lot of
> interrupts, but we have 64-bits of offset we can play with - we can
> leave room for billions of ESB slots and still have room for billions
> of VPs.

So the first 4 pages could be the TIMA pages and then would come  
the pages for the interrupt ESBs. I think that we can have different 
vm_fault handler for each mapping.
 
I wonder how this will work out with pass-through. As Paul said in 
a previous email, it would be better to let QEMU request a new 
mapping to handle the ESB pages of the device being passed through.
I guess this is not a special case, just another offset and length.

I will give it a try.

Thanks,

C. 


Re: [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration

2019-02-05 Thread Cédric Le Goater
On 2/5/19 6:32 AM, David Gibson wrote:
> On Mon, Feb 04, 2019 at 05:07:28PM +0100, Cédric Le Goater wrote:
>> On 2/4/19 6:21 AM, David Gibson wrote:
>>> On Mon, Jan 07, 2019 at 07:43:27PM +0100, Cédric Le Goater wrote:
 Theses are use to capure the XIVE EAS table of the KVM device, the
 configuration of the source targets.

 Signed-off-by: Cédric Le Goater 
 ---
  arch/powerpc/include/uapi/asm/kvm.h   | 11 
  arch/powerpc/kvm/book3s_xive_native.c | 87 +++
  2 files changed, 98 insertions(+)

 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index 1a8740629acf..faf024f39858 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -683,9 +683,20 @@ struct kvm_ppc_cpu_char {
  #define   KVM_DEV_XIVE_SAVE_EQ_PAGES  4
  #define KVM_DEV_XIVE_GRP_SOURCES  2   /* 64-bit source attributes */
  #define KVM_DEV_XIVE_GRP_SYNC 3   /* 64-bit source 
 attributes */
 +#define KVM_DEV_XIVE_GRP_EAS  4   /* 64-bit eas 
 attributes */
  
  /* Layout of 64-bit XIVE source attribute values */
  #define KVM_XIVE_LEVEL_SENSITIVE  (1ULL << 0)
  #define KVM_XIVE_LEVEL_ASSERTED   (1ULL << 1)
  
 +/* Layout of 64-bit eas attribute values */
 +#define KVM_XIVE_EAS_PRIORITY_SHIFT   0
 +#define KVM_XIVE_EAS_PRIORITY_MASK0x7
 +#define KVM_XIVE_EAS_SERVER_SHIFT 3
 +#define KVM_XIVE_EAS_SERVER_MASK  0xfff8ULL
 +#define KVM_XIVE_EAS_MASK_SHIFT   32
 +#define KVM_XIVE_EAS_MASK_MASK0x1ULL
 +#define KVM_XIVE_EAS_EISN_SHIFT   33
 +#define KVM_XIVE_EAS_EISN_MASK0xfffeULL
 +
  #endif /* __LINUX_KVM_POWERPC_H */
 diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
 b/arch/powerpc/kvm/book3s_xive_native.c
 index f2de1bcf3b35..0468b605baa7 100644
 --- a/arch/powerpc/kvm/book3s_xive_native.c
 +++ b/arch/powerpc/kvm/book3s_xive_native.c
 @@ -525,6 +525,88 @@ static int kvmppc_xive_native_sync(struct kvmppc_xive 
 *xive, long irq, u64 addr)
return 0;
  }
  
 +static int kvmppc_xive_native_set_eas(struct kvmppc_xive *xive, long irq,
 +u64 addr)
>>>
>>> I'd prefer to avoid the name "EAS" here.  IIUC these aren't "raw" EAS
>>> values, but rather essentially the "source config" in the terminology
>>> of the PAPR hcalls.  Which, yes, is basically implemented by setting
>>> the EAS, but since it's the PAPR architected state that we need to
>>> preserve across migration, I'd prefer to stick as close as we can to
>>> the PAPR terminology.
>>
>> But we don't have an equivalent name in the PAPR specs for the tuple 
>> (prio, server). We could use the generic 'target' name may be ? even 
>> if this is usually referring to a CPU number.
> 
> Um.. what?  That's about terminology for one of the fields in this
> thing, not about the name for the thing itself.
> 
>> Or, IVE (Interrupt Vector Entry) ? which makes some sense. 
>> This is was the former name in HW. I think we recycle it for KVM.
> 
> That's a terrible idea, which will make a confusing situation even
> more confusing.

Let's use SOURCE_CONFIG and QUEUE_CONFIG. The KVM ioctls are very 
similar to the hcalls anyhow.

C.



Re: [RFC/WIP] powerpc: Fix 32-bit handling of MSR_EE on exceptions

2019-02-05 Thread Benjamin Herrenschmidt
On Tue, 2019-02-05 at 10:45 +0100, Christophe Leroy wrote:
> > > I tested it on the 8xx with the below changes in addition. No issue seen
> > > so far.
> > 
> > Thanks !
> > 
> > I'll merge that in.
> 
> I'm currently working on a refactorisation and simplification of
> exception and syscall entry on ppc32.
> 
> I plan to take your patch in my serie as it helps quite a bit. I hope 
> you don't mind. I expect to come out with a series this week.

Ah ok, you want to take over the series then ? We still need to convert
all the other CPU variants... to be honest I've been distracted, and
taking some time off. I'll be leaving IBM by the end of next week, so I
don't really see myself finishing this work properly.

> > The main obscure area is that business with the irqsoff tracer and thus
> > the need to create stack frames around calls to trace_hardirqs_* ... we
> > do it in some places and not others, but I've not managed to make it
> > crash either. I need to get to the bottom of that, and possibly provide
> > proper macro helpers like ppc64 has to do it.
> 
> I can't see anything special around this in ppc32 code. As far as I 
> understand, a stack frame is put in place when there is a need to
> save and restore some volatile registers. At the places where nothing 
> needs to be saved, nothing is done. I think that's the normal way for 
> any function call, isn't it ?

Not exactly. There's an issue with one of the tracers using
__bultin_return_address(1) which can crash afaik if we don't have
"enough" stack frames on the stack, so there are cases where we need to
create one explicitly around the tracing calls bcs there's only one on
the actual stack.

I don't know the full details, I was planning on doing a bunch of tests
in sim to figure out exactly what happens and what needs to be done
(and whether our existing code is correct or not), but didn't get to it
so far.

Cheers,
Ben.
 



Re: [RFC/WIP] powerpc: Fix 32-bit handling of MSR_EE on exceptions

2019-02-05 Thread Christophe Leroy




Le 20/12/2018 à 06:40, Benjamin Herrenschmidt a écrit :

Hi folks !

Why trying to figure out why we had occasionally lockdep barf about
interrupt state on ppc32 (440 in my case but I could reproduce on e500
as well using qemu), I realized that we are still doing something
rather gothic and wrong on 32-bit which we stopped doing on 64-bit
a while ago.

We have that thing where some handlers "copy" the EE value from the
original stack frame into the new MSR before transferring to the
handler.

Thus for a number of exceptions, we enter the handlers with interrupts
enabled.

This is rather fishy, some of the stuff that handlers might do early
on such as irq_enter/exit or user_exit, context tracking, etc... should
be run with interrupts off afaik.

Generally our handlers know when to re-enable interrupts if needed
(though some of the FSL specific SPE ones don't).

The problem we were having is that we assumed these interrupts would
return with interrupts enabled. However that isn't the case.

Instead, this changes things so that we always enter exception handlers
with interrupts *off* with the notable exception of syscalls which are
special (and get a fast path).

Currently, the patch only changes BookE (440 and E5xx tested in qemu),
the same recipe needs to be applied to 6xx, 8xx and 40x.

Also I'm not sure whether we need to create a stack frame around some
of the calls to trace_hardirqs_* in asm. ppc64 does it, due to problems
with the irqsoff tracer, but I haven't managed to reproduce those
issues. We need to look into it a bit more.

I'll work more on this in the next few days, comments appreciated.

Not-signed-off-by: Benjamin Herrenschmidt 

---
  arch/powerpc/kernel/entry_32.S   | 113 ++-
  arch/powerpc/kernel/head_44x.S   |   9 +--
  arch/powerpc/kernel/head_booke.h |  34 ---
  arch/powerpc/kernel/head_fsl_booke.S |  28 -
  arch/powerpc/kernel/traps.c  |   8 +++
  5 files changed, 111 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 3841d74..39b4cb5 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -34,6 +34,9 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
  
  /*

   * MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE.
@@ -205,20 +208,46 @@ transfer_to_handler_cont:
mflrr9
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
+   mtspr   SPRN_NRI, r0
+#endif
+
  #ifdef CONFIG_TRACE_IRQFLAGS
+   /*
+* When tracing IRQ state (lockdep) we enable the MMU before we call
+* the IRQ tracing functions as they might access vmalloc space or
+* perform IOs for console output.
+*
+* To speed up the syscall path where interrupts stay on, let's check
+* first if we are changing the MSR value at all.
+*/
+   lwz r12,_MSR(r1)


This one cannot work. MMU is not reenabled yet, so r1 cannot be used. 
And r11 now has the virt address of handler, so can't be used either.


Christophe


+   xor r0,r10,r12
+   andi.   r0,r0,MSR_EE
+   bne 1f
+
+   /* MSR isn't changing, just transition directly */
+   lwz r0,GPR0(r1)
+   mtspr   SPRN_SRR0,r11
+   mtspr   SPRN_SRR1,r10
+   mtlrr9
+   SYNC
+   RFI
+
+1: /* MSR is changing, re-enable MMU so we can notify lockdep. We need to
+* keep interrupts disabled at this point otherwise we might risk
+* taking an interrupt before we tell lockdep they are enabled.
+*/
lis r12,reenable_mmu@h
ori r12,r12,reenable_mmu@l
+   lis r0,MSR_KERNEL@h
+   ori r0,r0,MSR_KERNEL@l
mtspr   SPRN_SRR0,r12
-   mtspr   SPRN_SRR1,r10
+   mtspr   SPRN_SRR1,r0
SYNC
RFI
-reenable_mmu:  /* re-enable mmu so we can */
-   mfmsr   r10
-   lwz r12,_MSR(r1)
-   xor r10,r10,r12
-   andi.   r10,r10,MSR_EE  /* Did EE change? */
-   beq 1f
  
+reenable_mmu:

/*
 * The trace_hardirqs_off will use CALLER_ADDR0 and CALLER_ADDR1.
 * If from user mode there is only one stack frame on the stack, and
@@ -239,8 +268,29 @@ reenable_mmu:  /* re-enable 
mmu so we can */
stw r3,16(r1)
stw r4,20(r1)
stw r5,24(r1)
-   bl  trace_hardirqs_off
-   lwz r5,24(r1)
+
+   /* Are we enabling or disabling interrupts ? */
+   andi.   r0,r10,MSR_EE
+   beq 1f
+
+   /* If we are enabling interrupt, this is a syscall. They shouldn't
+* happen while interrupts are disabled, so let's do a warning here.
+*/
+0: trap
+   EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BU

Re: [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state

2019-02-05 Thread Cédric Le Goater
On 2/5/19 6:33 AM, David Gibson wrote:
> On Mon, Feb 04, 2019 at 07:57:26PM +0100, Cédric Le Goater wrote:
>> On 2/4/19 6:26 AM, David Gibson wrote:
>>> On Mon, Jan 07, 2019 at 08:10:04PM +0100, Cédric Le Goater wrote:
 At a VCPU level, the state of the thread context interrupt management
 registers needs to be collected. These registers are cached under the
 'xive_saved_state.w01' field of the VCPU when the VPCU context is
 pulled from the HW thread. An OPAL call retrieves the backup of the
 IPB register in the NVT structure and merges it in the KVM state.

 The structures of the interface between QEMU and KVM provisions some
 extra room (two u64) for further extensions if more state needs to be
 transferred back to QEMU.

 Signed-off-by: Cédric Le Goater 
 ---
  arch/powerpc/include/asm/kvm_ppc.h|  5 ++
  arch/powerpc/include/uapi/asm/kvm.h   |  2 +
  arch/powerpc/kvm/book3s.c | 24 +
  arch/powerpc/kvm/book3s_xive_native.c | 78 +++
  4 files changed, 109 insertions(+)

 diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
 b/arch/powerpc/include/asm/kvm_ppc.h
 index 4cc897039485..49c488af168c 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -270,6 +270,7 @@ union kvmppc_one_reg {
u64 addr;
u64 length;
}   vpaval;
 +  u64 xive_timaval[4];
  };
  
  struct kvmppc_ops {
 @@ -603,6 +604,8 @@ extern void kvmppc_xive_native_cleanup_vcpu(struct 
 kvm_vcpu *vcpu);
  extern void kvmppc_xive_native_init_module(void);
  extern void kvmppc_xive_native_exit_module(void);
  extern int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd);
 +extern int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union 
 kvmppc_one_reg *val);
 +extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union 
 kvmppc_one_reg *val);
  
  #else
  static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 
 server,
 @@ -637,6 +640,8 @@ static inline void 
 kvmppc_xive_native_init_module(void) { }
  static inline void kvmppc_xive_native_exit_module(void) { }
  static inline int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd)
{ return 0; }
 +static inline int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union 
 kvmppc_one_reg *val) { return 0; }
 +static inline int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union 
 kvmppc_one_reg *val) { return -ENOENT; }
>>>
>>> IIRC "VP" is the old name for "TCTX".  Since we're using tctx in the
>>> rest of the XIVE code, can we use it here as well.
>>
>> OK. The state we are getting or setting is indeed related to the thread 
>> interrupt  context registers. 
>>
>> The name VP is related to an identifier to some interrupt context under 
>> OPAL (NVT in HW to be precise).
> 
> Oh, sorry, "NVT" was the name I was looking for, not "TCTX".  But in
> any case, please lets standardize on one.

There is some confusion in the naming for :

 - VPVirtual Processor (XIVE 1)
 - VPD   Virtual Processor Descriptor (XIVE 1)
 - TCTX  Thread interrupt context registers
 - NVT   Notify Virtual Target. Former VP. 
 - NVTS  Notify Virtual Target Structure. Where the TCTX regs are cached.


I am fine with using NVT because this is indeed the name of the XIVE 
structure where the HW caches the thread interrupt context registers.

But the XIVE native layer and the XICS-over-XIVE KVM device use the
name VP (the old one). I don't think we want to change these now.

C. 


Re: [PATCH] hugetlb: allow to free gigantic pages regardless of the configuration

2019-02-05 Thread Alex Ghiti

On 2/5/19 6:23 AM, Michael Ellerman wrote:

Alexandre Ghiti  writes:


From: Alexandre Ghiti 

On systems without CMA or (MEMORY_ISOLATION && COMPACTION) activated but
that support gigantic pages, boottime reserved gigantic pages can not be
freed at all. This patchs simply enables the possibility to hand back
those pages to memory allocator.

This commit then renames gigantic_page_supported and
ARCH_HAS_GIGANTIC_PAGE to make them more accurate. Indeed, those values
being false does not mean that the system cannot use gigantic pages: it
just means that runtime allocation of gigantic pages is not supported,
one can still allocate boottime gigantic pages if the architecture supports
it.

Signed-off-by: Alexandre Ghiti 
---

- Compiled on all architectures
- Tested on riscv architecture

  arch/arm64/Kconfig   |  2 +-
  arch/arm64/include/asm/hugetlb.h |  7 +++--
  arch/powerpc/include/asm/book3s/64/hugetlb.h |  4 +--
  arch/powerpc/platforms/Kconfig.cputype   |  2 +-

The powerpc parts look fine.

Acked-by: Michael Ellerman  (powerpc)


Thank you Michael,

Alex



cheers


  arch/s390/Kconfig|  2 +-
  arch/s390/include/asm/hugetlb.h  |  7 +++--
  arch/x86/Kconfig |  2 +-
  arch/x86/include/asm/hugetlb.h   |  7 +++--
  fs/Kconfig   |  2 +-
  include/linux/gfp.h  |  2 +-
  mm/hugetlb.c | 43 +++-
  mm/page_alloc.c  |  4 +--
  12 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d366127..18239cbd7fcd 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -18,7 +18,7 @@ config ARM64
select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
-   select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
+   select ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION if (MEMORY_ISOLATION 
&& COMPACTION) || CMA
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index fb6609875455..797fc77eabcd 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -65,8 +65,11 @@ extern void set_huge_swap_pte_at(struct mm_struct *mm, 
unsigned long addr,
  
  #include 
  
-#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE

-static inline bool gigantic_page_supported(void) { return true; }
+#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION
+static inline bool gigantic_page_runtime_allocation_supported(void)
+{
+   return true;
+}
  #endif
  
  #endif /* __ASM_HUGETLB_H */

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index 5b0177733994..7711f0e2c7e5 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -32,8 +32,8 @@ static inline int hstate_get_psize(struct hstate *hstate)
}
  }
  
-#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE

-static inline bool gigantic_page_supported(void)
+#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION
+static inline bool gigantic_page_runtime_allocation_supported(void)
  {
return true;
  }
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 8c7464c3f27f..779e06bac697 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -319,7 +319,7 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
  config PPC_RADIX_MMU
bool "Radix MMU Support"
depends on PPC_BOOK3S_64
-   select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
+   select ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION if (MEMORY_ISOLATION 
&& COMPACTION) || CMA
default y
help
  Enable support for the Power ISA 3.0 Radix style MMU. Currently this
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index ed554b09eb3f..6776eef6a9ae 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -69,7 +69,7 @@ config S390
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
-   select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
+   select ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION if (MEMORY_ISOLATION 
&& COMPACTION) || CMA
select ARCH_HAS_KCOV
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_MEMORY
diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
index 2d1afa58a4b6..57c952f5388e 100644
--- a/arch/s390/include/asm/hugetlb.h
+++ b/arch/s390/include/asm/hugetlb.h
@@ -116,7 +116,10 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t 
newprot)
return pte_modify(pte, newprot);
  }

[PATCH v16 21/21] powerpc: clean stack pointers naming

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Some stack pointers used to also be thread_info pointers
and were called tp. Now that they are only stack pointers,
rename them sp.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/irq.c  | 17 +++--
 arch/powerpc/kernel/setup_64.c | 11 +++
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 938944c6e2ee..8a936723c791 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -659,21 +659,21 @@ void __do_irq(struct pt_regs *regs)
 void do_IRQ(struct pt_regs *regs)
 {
struct pt_regs *old_regs = set_irq_regs(regs);
-   void *curtp, *irqtp, *sirqtp;
+   void *cursp, *irqsp, *sirqsp;
 
/* Switch to the irq stack to handle this */
-   curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
-   irqtp = hardirq_ctx[raw_smp_processor_id()];
-   sirqtp = softirq_ctx[raw_smp_processor_id()];
+   cursp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
+   irqsp = hardirq_ctx[raw_smp_processor_id()];
+   sirqsp = softirq_ctx[raw_smp_processor_id()];
 
/* Already there ? */
-   if (unlikely(curtp == irqtp || curtp == sirqtp)) {
+   if (unlikely(cursp == irqsp || cursp == sirqsp)) {
__do_irq(regs);
set_irq_regs(old_regs);
return;
}
/* Switch stack and call */
-   call_do_irq(regs, irqtp);
+   call_do_irq(regs, irqsp);
 
set_irq_regs(old_regs);
 }
@@ -695,10 +695,7 @@ void *hardirq_ctx[NR_CPUS] __read_mostly;
 
 void do_softirq_own_stack(void)
 {
-   void *irqtp;
-
-   irqtp = softirq_ctx[smp_processor_id()];
-   call_do_softirq(irqtp);
+   call_do_softirq(softirq_ctx[smp_processor_id()]);
 }
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 2db1c5f7d141..daa361fc6a24 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -716,19 +716,14 @@ void __init emergency_stack_init(void)
limit = min(ppc64_bolted_size(), ppc64_rma_size);
 
for_each_possible_cpu(i) {
-   void *ti;
-
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->nmi_emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->nmi_emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = alloc_stack(limit, i);
-   paca_ptrs[i]->mc_emergency_sp = ti + THREAD_SIZE;
+   paca_ptrs[i]->mc_emergency_sp = alloc_stack(limit, i) + 
THREAD_SIZE;
 #endif
}
 }
-- 
2.20.1



[PATCH v16 20/21] powerpc/64: Replace CURRENT_THREAD_INFO with PACA_CURRENT_TI

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.

This patch replaces it by loads of the value at PACA_CURRENT_TI(r13).

Signed-off-by: Christophe Leroy 
[mpe: Add PACA_CURRENT_TI rather than using PACACURRENT]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/exception-64s.h   |  4 ++--
 arch/powerpc/include/asm/thread_info.h |  4 
 arch/powerpc/kernel/asm-offsets.c  |  2 ++
 arch/powerpc/kernel/entry_64.S | 10 +-
 arch/powerpc/kernel/exceptions-64e.S   |  2 +-
 arch/powerpc/kernel/exceptions-64s.S   |  2 +-
 arch/powerpc/kernel/idle_book3e.S  |  2 +-
 arch/powerpc/kernel/idle_power4.S  |  2 +-
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  6 +++---
 9 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..f0f0ff192e87 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -671,7 +671,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define RUNLATCH_ON\
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r3, r1);\
+   ld  r3, PACA_CURRENT_TI(r13);   \
ld  r4,TI_LOCAL_FLAGS(r3);  \
andi.   r0,r4,_TLF_RUNLATCH;\
beqlppc64_runlatch_on_trampoline;   \
@@ -721,7 +721,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 #ifdef CONFIG_PPC_970_NAP
 #define FINISH_NAP \
 BEGIN_FTR_SECTION  \
-   CURRENT_THREAD_INFO(r11, r1);   \
+   ld  r11, PACA_CURRENT_TI(r13);  \
ld  r9,TI_LOCAL_FLAGS(r11); \
andi.   r10,r9,_TLF_NAPPING;\
bnelpower4_fixup_nap;   \
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index c959b8d66cac..8e1d0195ac36 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -17,10 +17,6 @@
 
 #define THREAD_SIZE(1 << THREAD_SHIFT)
 
-#ifdef CONFIG_PPC64
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#endif
-
 #ifndef __ASSEMBLY__
 #include 
 #include 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 03439785c2ea..7a1b93c5af63 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -179,6 +179,8 @@ int main(void)
OFFSET(PACAPROCSTART, paca_struct, cpu_start);
OFFSET(PACAKSAVE, paca_struct, kstack);
OFFSET(PACACURRENT, paca_struct, __current);
+   DEFINE(PACA_CURRENT_TI, offsetof(struct paca_struct, __current) +
+   offsetof(struct task_struct, thread_info));
OFFSET(PACASAVEDMSR, paca_struct, saved_msr);
OFFSET(PACAR1, paca_struct, saved_r1);
OFFSET(PACATOC, paca_struct, kernel_toc);
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 21f1cb4d464e..259fcc82ec75 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -166,7 +166,7 @@ system_call:/* label this so stack 
traces look sane */
li  r10,IRQS_ENABLED
std r10,SOFTE(r1)
 
-   CURRENT_THREAD_INFO(r11, r1)
+   ld  r11, PACA_CURRENT_TI(r13)
ld  r10,TI_FLAGS(r11)
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne .Lsyscall_dotrace   /* does not return */
@@ -213,7 +213,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r3,RESULT(r1)
 #endif
 
-   CURRENT_THREAD_INFO(r12, r1)
+   ld  r12, PACA_CURRENT_TI(r13)
 
ld  r8,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3S
@@ -346,7 +346,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* Repopulate r9 and r10 for the syscall path */
addir9,r1,STACK_FRAME_OVERHEAD
-   CURRENT_THREAD_INFO(r10, r1)
+   ld  r10, PACA_CURRENT_TI(r13)
ld  r10,TI_FLAGS(r10)
 
cmpldi  r0,NR_syscalls
@@ -740,7 +740,7 @@ _GLOBAL(ret_from_except_lite)
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACA_CURRENT_TI(r13)
ld  r3,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3E
ld  r10,PACACURRENT(r13)
@@ -854,7 +854,7 @@ _GLOBAL(ret_from_except_lite)
 1: bl  preempt_schedule_irq
 
/* Re-test flags and eventually loop */
-   CURRENT_THREAD_INFO(r9, r1)
+   ld  r9, PACA_CURRENT_TI(r13)
ld  r4,TI_FLAGS(r9)
andi.   r0,r4,_TIF_NEED_RESCHED
bne 1b
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b

[PATCH v16 19/21] powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Now that thread_info is similar to task_struct, its address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.

This patch also moves the 'tovirt(r2, r2)' down just before the
reactivation of MMU translation, so that we keep the physical address
of 'current' in r2 until then. It avoids a few calls to tophys().

At the same time, as the 'cpu' field is not anymore in thread_info,
TI_CPU is renamed TASK_CPU by this patch.

It also allows to get rid of a couple of
'#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE' as ACCOUNT_CPU_USER_ENTRY()
and ACCOUNT_CPU_USER_EXIT() are empty when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not defined.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Makefile  |  2 +-
 arch/powerpc/include/asm/thread_info.h |  2 -
 arch/powerpc/kernel/asm-offsets.c  |  2 +-
 arch/powerpc/kernel/entry_32.S | 55 +-
 arch/powerpc/kernel/epapr_hcalls.S |  5 +--
 arch/powerpc/kernel/head_fsl_booke.S   |  5 +--
 arch/powerpc/kernel/idle_6xx.S |  9 ++---
 arch/powerpc/kernel/idle_e500.S|  8 ++--
 arch/powerpc/kernel/misc_32.S  |  3 +-
 arch/powerpc/mm/hash_low_32.S  | 14 +++
 arch/powerpc/sysdev/6xx-suspend.S  |  5 +--
 11 files changed, 38 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 53ffe935f3b0..7de49889bd5d 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -431,7 +431,7 @@ ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
 task_cpu_prepare: prepare0
-   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
 
 # Check toolchain versions:
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index d91523c2c7d8..c959b8d66cac 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -19,8 +19,6 @@
 
 #ifdef CONFIG_PPC64
 #define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(ld dest, 
PACACURRENT(r13))
-#else
-#define CURRENT_THREAD_INFO(dest, sp)  stringify_in_c(mr dest, r2)
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 94ac190a0b16..03439785c2ea 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -96,7 +96,7 @@ int main(void)
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
 #ifdef CONFIG_SMP
-   OFFSET(TI_CPU, task_struct, cpu);
+   OFFSET(TASK_CPU, task_struct, cpu);
 #endif
 
 #ifdef CONFIG_LIVEPATCH
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index aea22c7b891f..a5e2d5585dcb 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -151,7 +151,6 @@
stw r2,_XER(r11)
mfspr   r12,SPRN_SPRG_THREAD
addir2,r12,-THREAD
-   tovirt(r2,r2)   /* set r2 to current */
beq 2f  /* if from user, fix up THREAD.regs */
addir11,r1,STACK_FRAME_OVERHEAD
stw r11,PT_REGS(r12)
@@ -161,11 +160,7 @@
lwz r12,THREAD_DBCR0(r12)
andis.  r12,r12,DBCR0_IDM@h
 #endif
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
-   ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
-#endif
+   ACCOUNT_CPU_USER_ENTRY(r2, r11, r12)
 #if defined(CONFIG_40x) || defined(CONFIG_BOOKE)
beq+3f
/* From user and task is ptraced - load up global dbcr0 */
@@ -175,8 +170,7 @@
tophys(r11,r11)
addir11,r11,global_dbcr0@l
 #ifdef CONFIG_SMP
-   CURRENT_THREAD_INFO(r9, r1)
-   lwz r9,TI_CPU(r9)
+   lwz r9,TASK_CPU(r2)
slwir9,r9,3
add r11,r11,r9
 #endif
@@ -197,9 +191,7 @@
ble-stack_ovf   /* then the kernel stack overflowed */
 5:
 #if defined(CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9,r9)   /* check local flags */
-   lwz r12,TI_LOCAL_FLAGS(r9)
+   lwz r12,TI_LOCAL_FLAGS(r2)
mtcrf   0x01,r12
bt- 31-TLF_NAPPING,4f
bt- 31-TLF_SLEEPING,7f
@@ -208,6 +200,7 @@
 transfer_to_handler_cont:
 3:
mflrr9
+   tovirt(r2, r2)  /* set r2 to current */
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
@@ -271,11 +264,11 @@ reenable_mmu: /* re-enable 
mmu so we can */
 
 #if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
 4: rlwinm  r12,r12,0,~_TLF

[PATCH v16 18/21] powerpc: 'current_set' is now a table of task_struct pointers

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.

Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.

As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ++--
 arch/powerpc/kernel/head_32.S |  6 +++---
 arch/powerpc/kernel/head_44x.S|  4 ++--
 arch/powerpc/kernel/head_fsl_booke.S  |  4 ++--
 arch/powerpc/kernel/smp.c | 10 --
 5 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1d911f68a23b..1484df6779ab 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -23,8 +23,8 @@
 #include 
 
 /* SMP */
-extern struct thread_info *current_set[NR_CPUS];
-extern struct thread_info *secondary_ti;
+extern struct task_struct *current_set[NR_CPUS];
+extern struct task_struct *secondary_current;
 void start_secondary(void *unused);
 
 /* kexec */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 309a45779ad5..146385b1c2da 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -846,9 +846,9 @@ _ENTRY(copy_and_flush)
 #endif /* CONFIG_PPC_BOOK3S_32 */
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   tophys(r1,r1)
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   tophys(r2,r2)
+   lwz r2,secondary_current@l(r2)
tophys(r1,r2)
lwz r1,TASK_STACK(r1)
 
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index f94a93b6c2f2..37117ab11584 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1020,8 +1020,8 @@ _GLOBAL(start_secondary_47x)
/* Now we can get our task struct and real stack pointer */
 
/* Get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* Current stack pointer */
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 11f38adbe020..4ed2a7c8e89b 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1091,8 +1091,8 @@ _GLOBAL(set_context)
bl  call_setup_cpu
 
/* get current's stack and current */
-   lis r1,secondary_ti@ha
-   lwz r2,secondary_ti@l(r1)
+   lis r2,secondary_current@ha
+   lwz r2,secondary_current@l(r2)
lwz r1,TASK_STACK(r2)
 
/* stack */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index aa4517686f90..a41fa8924004 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -76,7 +76,7 @@
 static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 #endif
 
-struct thread_info *secondary_ti;
+struct task_struct *secondary_current;
 bool has_big_cores;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
@@ -664,7 +664,7 @@ void smp_send_stop(void)
 }
 #endif /* CONFIG_NMI_IPI */
 
-struct thread_info *current_set[NR_CPUS];
+struct task_struct *current_set[NR_CPUS];
 
 static void smp_store_cpu_info(int id)
 {
@@ -929,7 +929,7 @@ void smp_prepare_boot_cpu(void)
paca_ptrs[boot_cpuid]->__current = current;
 #endif
set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
-   current_set[boot_cpuid] = task_thread_info(current);
+   current_set[boot_cpuid] = current;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -1014,15 +1014,13 @@ static bool secondaries_inhibited(void)
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 {
-   struct thread_info *ti = task_thread_info(idle);
-
 #ifdef CONFIG_PPC64
paca_ptrs[cpu]->__current = idle;
paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
 THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
idle->cpu = cpu;
-   secondary_ti = current_set[cpu] = ti;
+   secondary_current = current_set[cpu] = idle;
 }
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
-- 
2.20.1



[PATCH v16 17/21] powerpc: regain entire stack space

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

thread_info is not anymore in the stack, so the entire stack
can now be used.

There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.

When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/irq.h   | 10 -
 arch/powerpc/include/asm/processor.h |  3 +--
 arch/powerpc/kernel/asm-offsets.c|  1 -
 arch/powerpc/kernel/entry_32.S   | 14 
 arch/powerpc/kernel/irq.c| 19 -
 arch/powerpc/kernel/misc_32.S|  6 ++
 arch/powerpc/kernel/process.c| 32 +++-
 arch/powerpc/kernel/setup_64.c   |  8 +++
 8 files changed, 38 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 28a7ace0a1b9..c91a60cda4fa 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -48,16 +48,16 @@ struct pt_regs;
  * Per-cpu stacks for handling critical, debug and machine check
  * level interrupts.
  */
-extern struct thread_info *critirq_ctx[NR_CPUS];
-extern struct thread_info *dbgirq_ctx[NR_CPUS];
-extern struct thread_info *mcheckirq_ctx[NR_CPUS];
+extern void *critirq_ctx[NR_CPUS];
+extern void *dbgirq_ctx[NR_CPUS];
+extern void *mcheckirq_ctx[NR_CPUS];
 #endif
 
 /*
  * Per-cpu stacks for handling hard and soft interrupts.
  */
-extern struct thread_info *hardirq_ctx[NR_CPUS];
-extern struct thread_info *softirq_ctx[NR_CPUS];
+extern void *hardirq_ctx[NR_CPUS];
+extern void *softirq_ctx[NR_CPUS];
 
 void call_do_softirq(void *sp);
 void call_do_irq(struct pt_regs *regs, void *sp);
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 9226fb83a82e..ba2f0bc680e4 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -267,8 +267,7 @@ struct thread_struct {
 #define ARCH_MIN_TASKALIGN 16
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
&init_stack)
-#define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)&init_stack)
+#define INIT_SP_LIMIT  ((unsigned long)&init_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 1fb52206c106..94ac190a0b16 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -92,7 +92,6 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
OFFSET(TASK_STACK, task_struct, stack);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 3255c0840beb..aea22c7b891f 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -97,14 +97,11 @@
mfspr   r0,SPRN_SRR1
stw r0,_SRR1(r11)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,SAVED_KSP_LIMIT(r11)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
@@ -121,14 +118,11 @@
mfspr   r0,SPRN_SRR1
stw r0,crit_srr1@l(0)
 
-   /* set the stack limit to the current stack
-* and set the limit to protect the thread_info
-* struct
-*/
+   /* set the stack limit to the current stack */
mfspr   r8,SPRN_SPRG_THREAD
lwz r0,KSP_LIMIT(r8)
stw r0,saved_ksp_limit@l(0)
-   rlwimi  r0,r1,0,0,(31-THREAD_SHIFT)
+   rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
 #endif
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 85c48911938a..938944c6e2ee 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -618,9 +618,8 @@ static inline void check_stack_overflow(void)
sp = current_stack_pointer() & (THREAD_SIZE-1);
 
/* check for stack overflow: is there less than 2KB free? */
-   if (unlikely(sp < (sizeof(struct thread_info) + 2048))) {
-   pr_err("do_IRQ: stack overflow: %ld\n",
-   sp - sizeof(struct thread_info));
+   if (unlikely(sp < 2048)) {
+   pr_err("do_IRQ: stack overflow: %ld\n

[PATCH v16 16/21] powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
  - It protects thread_info from corruption in the case of stack
overflows.
  - Its address is harder to determine if stack addresses are leaked,
making a number of attacks more difficult.

This has the following consequences:
  - thread_info is now located at the beginning of task_struct.
  - The 'cpu' field is now in task_struct, and only exists when
CONFIG_SMP is active.
  - thread_info doesn't have anymore the 'task' field.

This patch:
  - Removes all recopy of thread_info struct when the stack changes.
  - Changes the CURRENT_THREAD_INFO() macro to point to current.
  - Selects CONFIG_THREAD_INFO_IN_TASK.
  - Modifies raw_smp_processor_id() to get ->cpu from current without
including linux/sched.h to avoid circular inclusion and without
including asm/asm-offsets.h to avoid symbol names duplication
between ASM constants and C constants.
  - Modifies klp_init_thread_info() to take a task_struct pointer
argument.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Add task_stack.h to livepatch.h to fix build fails]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  7 +++
 arch/powerpc/include/asm/irq.h |  4 --
 arch/powerpc/include/asm/livepatch.h   |  7 ++-
 arch/powerpc/include/asm/smp.h | 17 +-
 arch/powerpc/include/asm/thread_info.h | 17 +-
 arch/powerpc/kernel/asm-offsets.c  |  7 ++-
 arch/powerpc/kernel/entry_32.S |  9 ++-
 arch/powerpc/kernel/exceptions-64e.S   | 11 
 arch/powerpc/kernel/head_32.S  |  6 +-
 arch/powerpc/kernel/head_44x.S |  4 +-
 arch/powerpc/kernel/head_64.S  |  1 +
 arch/powerpc/kernel/head_booke.h   |  8 +--
 arch/powerpc/kernel/head_fsl_booke.S   |  7 +--
 arch/powerpc/kernel/irq.c  | 79 +-
 arch/powerpc/kernel/kgdb.c | 28 -
 arch/powerpc/kernel/machine_kexec_64.c |  6 +-
 arch/powerpc/kernel/process.c  |  2 +-
 arch/powerpc/kernel/setup-common.c |  2 +-
 arch/powerpc/kernel/setup_64.c | 21 ---
 arch/powerpc/kernel/smp.c  |  2 +-
 arch/powerpc/net/bpf_jit32.h   |  5 +-
 22 files changed, 57 insertions(+), 194 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 08908219fba9..3f237ffa0649 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -241,6 +241,7 @@ config PPC
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
+   select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
#
# Please keep this list sorted alphabetically.
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index ac033341ed55..53ffe935f3b0 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -427,6 +427,13 @@ else
 endif
 endif
 
+ifdef CONFIG_SMP
+prepare: task_cpu_prepare
+
+task_cpu_prepare: prepare0
+   $(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == "TI_CPU") 
print $$3;}' include/generated/asm-offsets.h))
+endif
+
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 2efbae8d93be..28a7ace0a1b9 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -51,9 +51,6 @@ struct pt_regs;
 extern struct thread_info *critirq_ctx[NR_CPUS];
 extern struct thread_info *dbgirq_ctx[NR_CPUS];
 extern struct thread_info *mcheckirq_ctx[NR_CPUS];
-extern void exc_lvl_ctx_init(void);
-#else
-#define exc_lvl_ctx_init()
 #endif
 
 /*
@@ -62,7 +59,6 @@ extern void exc_lvl_ctx_init(void);
 extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
-extern void irq_ctx_init(void);
 void call_do_softirq(void *sp);
 void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index 47a03b9b528b..5070df19d463 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -21,6 +21,7 @@
 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_LIVEPATCH
 static inline int klp_check_compiler_support(void)
@@ -43,13 +44,13 @@ static inline unsigned long 
klp_get_ftrace_location(unsigned long faddr)
return ftrace_location_range(faddr, faddr + 16);
 }
 
-static inline void klp_init_thread_info(struct thread_info *ti)
+static inline void klp_init_thread_info(struct task_struct *p)
 {
/* + 1 to account for STACK_END_MAGIC */
-   ti->livepatch_sp = (unsigned long *)(ti + 1) + 1;
+   task_thread_info(p)->livepatch_sp = end_of_stack(p) + 1;
 }
 #

[PATCH v16 15/21] powerpc/idle/6xx: Use r1 with CURRENT_THREAD_INFO()

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Make sure CURRENT_THREAD_INFO() is used with r1 which is the virtual
address of the stack, in order to ease the switch to r2 when we enable
THREAD_INFO_IN_TASK, as we have no register having the phys address of
current.

Signed-off-by: Christophe Leroy 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/idle_6xx.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/idle_6xx.S b/arch/powerpc/kernel/idle_6xx.S
index ff026c9d3cab..d9b6e7e0b5e3 100644
--- a/arch/powerpc/kernel/idle_6xx.S
+++ b/arch/powerpc/kernel/idle_6xx.S
@@ -159,7 +159,8 @@ _GLOBAL(power_save_ppc32_restore)
stw r9,_NIP(r11)/* make it do a blr */
 
 #ifdef CONFIG_SMP
-   CURRENT_THREAD_INFO(r12, r11)
+   CURRENT_THREAD_INFO(r12, r1)
+   tophys(r12, r12)
lwz r11,TI_CPU(r12) /* get cpu number * 4 */
slwir11,r11,2
 #else
-- 
2.20.1



[PATCH v16 14/21] powerpc: Use task_stack_page() in current_pt_regs()

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Change current_pt_regs() to use task_stack_page() rather than
current_thread_info() so that it keeps working once we enable
THREAD_INFO_IN_TASK.

Signed-off-by: Christophe Leroy 
[mpe: Split out of large patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/ptrace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 0b8a735b6d85..64271e562fed 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -157,7 +157,7 @@ extern int ptrace_put_reg(struct task_struct *task, int 
regno,
  unsigned long data);
 
 #define current_pt_regs() \
-   ((struct pt_regs *)((unsigned long)current_thread_info() + THREAD_SIZE) 
- 1)
+   ((struct pt_regs *)((unsigned long)task_stack_page(current) + 
THREAD_SIZE) - 1)
 /*
  * We use the least-significant bit of the trap field to indicate
  * whether we have saved the full set of registers, or only a
-- 
2.20.1



[PATCH v16 13/21] powerpc: Use linux/thread_info.h in processor.h

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

When we enable THREAD_INFO_IN_TASK we will remove our definition of
current_thread_info(). Instead it will come from linux/thread_info.h

So switch processor.h to include the latter, so that it can continue
to find current_thread_info().

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index b40b614047e4..9226fb83a82e 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -40,7 +40,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
-#include 
+#include 
 #include 
 #include 
 
-- 
2.20.1



[PATCH v16 12/21] powerpc: Use sizeof(struct thread_info) in INIT_SP_LIMIT

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol
won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof
the type which is the same value but will continue to work.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index d9b1503ba0f0..b40b614047e4 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -268,7 +268,7 @@ struct thread_struct {
 
 #define INIT_SP(sizeof(init_stack) + (unsigned long) 
&init_stack)
 #define INIT_SP_LIMIT \
-   (_ALIGN_UP(sizeof(init_thread_info), 16) + (unsigned long) &init_stack)
+   (_ALIGN_UP(sizeof(struct thread_info), 16) + (unsigned long)&init_stack)
 
 #ifdef CONFIG_SPE
 #define SPEFSCR_INIT \
-- 
2.20.1



[PATCH v16 11/21] powerpc/64: Use task_stack_page() to initialise paca->kstack

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Rather than using the thread info use task_stack_page() to initialise
paca->kstack, that way it will work with THREAD_INFO_IN_TASK.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/smp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3f15edf25a0d..1d3e7cb6704d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1017,7 +1018,8 @@ static void cpu_idle_thread_init(unsigned int cpu, struct 
task_struct *idle)
 
 #ifdef CONFIG_PPC64
paca_ptrs[cpu]->__current = idle;
-   paca_ptrs[cpu]->kstack = (unsigned long)ti + THREAD_SIZE - 
STACK_FRAME_OVERHEAD;
+   paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
+THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
ti->cpu = cpu;
secondary_ti = current_set[cpu] = ti;
-- 
2.20.1



[PATCH v16 10/21] powerpc: Update comments in preparation for THREAD_INFO_IN_TASK

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Update a few comments that talk about current_thread_info() in
preparation for THREAD_INFO_IN_TASK.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/reg.h   | 2 +-
 arch/powerpc/kernel/head_32.S| 2 +-
 arch/powerpc/kernel/head_44x.S   | 2 +-
 arch/powerpc/kernel/head_fsl_booke.S | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1c98ef1f2d5b..581e61db2dcf 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1062,7 +1062,7 @@
  * - SPRG9 debug exception scratch
  *
  * All 32-bit:
- * - SPRG3 current thread_info pointer
+ * - SPRG3 current thread_struct physical addr pointer
  *(virtual on BookE, physical on others)
  *
  * 32-bit classic:
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 9268e5e87949..8282d25948ae 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -845,7 +845,7 @@ _ENTRY(copy_and_flush)
bl  init_idle_6xx
 #endif /* CONFIG_PPC_BOOK3S_32 */
 
-   /* get current_thread_info and current */
+   /* get current's stack and current */
lis r1,secondary_ti@ha
tophys(r1,r1)
lwz r1,secondary_ti@l(r1)
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index bf23c19c92d6..4e8c8bf50413 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -1019,7 +1019,7 @@ _GLOBAL(start_secondary_47x)
 
/* Now we can get our task struct and real stack pointer */
 
-   /* Get current_thread_info and current */
+   /* Get current's stack and current */
lis r1,secondary_ti@ha
lwz r1,secondary_ti@l(r1)
lwz r2,TI_TASK(r1)
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 42d8d6fc00cb..6301bb24889a 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -1091,7 +1091,7 @@ _GLOBAL(set_context)
mr  r4,r24  /* Why? */
bl  call_setup_cpu
 
-   /* get current_thread_info and current */
+   /* get current's stack and current */
lis r1,secondary_ti@ha
lwz r1,secondary_ti@l(r1)
lwz r2,TI_TASK(r1)
-- 
2.20.1



[PATCH v16 09/21] powerpc: Replace current_thread_info()->task with current

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

We have a few places that use current_thread_info()->task to access
current. This won't work with THREAD_INFO_IN_TASK so fix them now.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/process.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4ffbb677c9f5..21c1e11a06de 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1231,8 +1231,8 @@ struct task_struct *__switch_to(struct task_struct *prev,
batch->active = 1;
}
 
-   if (current_thread_info()->task->thread.regs) {
-   restore_math(current_thread_info()->task->thread.regs);
+   if (current->thread.regs) {
+   restore_math(current->thread.regs);
 
/*
 * The copy-paste buffer can only store into foreign real
@@ -1242,7 +1242,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 * mappings, we must issue a cp_abort to clear any state and
 * prevent snooping, corruption or a covert channel.
 */
-   if (current_thread_info()->task->thread.used_vas)
+   if (current->thread.used_vas)
asm volatile(PPC_CP_ABORT);
}
 #endif /* CONFIG_PPC_BOOK3S_64 */
-- 
2.20.1



[PATCH v16 08/21] powerpc: Don't use CURRENT_THREAD_INFO to find the stack

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

A few places use CURRENT_THREAD_INFO, or the C version, to find the
stack. This will no longer work with THREAD_INFO_IN_TASK so change
them to find the stack in other ways.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/entry_64.S | 2 +-
 arch/powerpc/kernel/irq.c  | 2 +-
 arch/powerpc/kernel/misc_32.S  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index c17c1bed6148..21f1cb4d464e 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -689,7 +689,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 2:
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
-   CURRENT_THREAD_INFO(r7, r8)  /* base of new stack */
+   clrrdi  r7, r8, THREAD_SHIFT/* base of new stack */
/* Note: this uses SWITCH_FRAME_SIZE rather than INT_FRAME_SIZE
   because we don't need to leave the 288-byte ABI gap at the
   top of the kernel stack. */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 4a5dd8800946..531e9ef153c0 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -663,7 +663,7 @@ void do_IRQ(struct pt_regs *regs)
struct thread_info *curtp, *irqtp, *sirqtp;
 
/* Switch to the irq stack to handle this */
-   curtp = current_thread_info();
+   curtp = (void *)(current_stack_pointer() & ~(THREAD_SIZE - 1));
irqtp = hardirq_ctx[raw_smp_processor_id()];
sirqtp = softirq_ctx[raw_smp_processor_id()];
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 242f0c88010e..b37b50fde828 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -603,7 +603,7 @@ EXPORT_SYMBOL(__bswapdi2)
 #ifdef CONFIG_SMP
 _GLOBAL(start_secondary_resume)
/* Reset stack */
-   CURRENT_THREAD_INFO(r1, r1)
+   rlwinm  r1, r1, 0, 0, 31 - THREAD_SHIFT
addir1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD
li  r3,0
stw r3,0(r1)/* Zero the stack frame pointer */
-- 
2.20.1



[PATCH v16 07/21] powerpc: call_do_[soft]irq() takes a pointer to the stack

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack. Currently that's the same
thing as the thread_info, but won't be with THREAD_INFO_IN_TASK.

So change the parameter to void* and rename it 'sp'.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/irq.h | 4 ++--
 arch/powerpc/kernel/misc_32.S  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index ee39ce56b2a2..2efbae8d93be 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -63,8 +63,8 @@ extern struct thread_info *hardirq_ctx[NR_CPUS];
 extern struct thread_info *softirq_ctx[NR_CPUS];
 
 extern void irq_ctx_init(void);
-extern void call_do_softirq(struct thread_info *tp);
-extern void call_do_irq(struct pt_regs *regs, struct thread_info *tp);
+void call_do_softirq(void *sp);
+void call_do_irq(struct pt_regs *regs, void *sp);
 extern void do_IRQ(struct pt_regs *regs);
 extern void __init init_IRQ(void);
 extern void __do_irq(struct pt_regs *regs);
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 02b8cdd73792..242f0c88010e 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -60,7 +60,7 @@ _GLOBAL(call_do_softirq)
blr
 
 /*
- * void call_do_irq(struct pt_regs *regs, struct thread_info *irqtp);
+ * void call_do_irq(struct pt_regs *regs, void *sp);
  */
 _GLOBAL(call_do_irq)
mflrr0
-- 
2.20.1



[PATCH v16 06/21] powerpc: Rename THREAD_INFO to TASK_STACK

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

This patch renames THREAD_INFO to TASK_STACK, because it is in fact
the offset of the pointer to the stack in task_struct so this pointer
will not be impacted by the move of THREAD_INFO.

Also make it available on 64-bit, as we'll need it there when we
activate THREAD_INFO_IN_TASK.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Make available on 64-bit]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/asm-offsets.c| 2 +-
 arch/powerpc/kernel/entry_32.S   | 2 +-
 arch/powerpc/kernel/head_32.S| 2 +-
 arch/powerpc/kernel/head_40x.S   | 4 ++--
 arch/powerpc/kernel/head_8xx.S   | 2 +-
 arch/powerpc/kernel/head_booke.h | 4 ++--
 arch/powerpc/kernel/head_fsl_booke.S | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 9ffc72ded73a..b2b52e002a76 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -90,10 +90,10 @@ int main(void)
DEFINE(SIGSEGV, SIGSEGV);
DEFINE(NMI_MASK, NMI_MASK);
 #else
-   OFFSET(THREAD_INFO, task_struct, stack);
DEFINE(THREAD_INFO_GAP, _ALIGN_UP(sizeof(struct thread_info), 16));
OFFSET(KSP_LIMIT, thread_struct, ksp_limit);
 #endif /* CONFIG_PPC64 */
+   OFFSET(TASK_STACK, task_struct, stack);
 
 #ifdef CONFIG_LIVEPATCH
OFFSET(TI_livepatch_sp, thread_info, livepatch_sp);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index d4c6186aa7e8..f1646d845404 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -1168,7 +1168,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
mfspr   r9,SPRN_SPRG_THREAD
lwz r10,SAVED_KSP_LIMIT(r1)
stw r10,KSP_LIMIT(r9)
-   lwz r9,THREAD_INFO-THREAD(r9)
+   lwz r9,TASK_STACK-THREAD(r9)
CURRENT_THREAD_INFO(r10, r1)
lwz r10,TI_PREEMPT(r10)
stw r10,TI_PREEMPT(r9)
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 05b08db3901d..9268e5e87949 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -261,7 +261,7 @@ _ENTRY(_start);
tophys(r11,r1); /* use tophys(r1) if kernel */ \
beq 1f; \
mfspr   r11,SPRN_SPRG_THREAD;   \
-   lwz r11,THREAD_INFO-THREAD(r11);\
+   lwz r11,TASK_STACK-THREAD(r11); \
addir11,r11,THREAD_SIZE;\
tophys(r11,r11);\
 1: subir11,r11,INT_FRAME_SIZE  /* alloc exc. frame */
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index b19d78410511..3088c9f29f5e 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -115,7 +115,7 @@ _ENTRY(saved_ksp_limit)
andi.   r11,r11,MSR_PR;  \
beq 1f;  \
mfspr   r1,SPRN_SPRG_THREAD;/* if from user, start at top of   */\
-   lwz r1,THREAD_INFO-THREAD(r1); /* this thread's kernel stack   */\
+   lwz r1,TASK_STACK-THREAD(r1); /* this thread's kernel stack   */\
addir1,r1,THREAD_SIZE;   \
 1: subir1,r1,INT_FRAME_SIZE;   /* Allocate an exception frame */\
tophys(r11,r1);  \
@@ -158,7 +158,7 @@ _ENTRY(saved_ksp_limit)
beq 1f;  \
/* COMING FROM USER MODE */  \
mfspr   r11,SPRN_SPRG_THREAD;   /* if from user, start at top of   */\
-   lwz r11,THREAD_INFO-THREAD(r11); /* this thread's kernel stack */\
+   lwz r11,TASK_STACK-THREAD(r11); /* this thread's kernel stack */\
 1: addir11,r11,THREAD_SIZE-INT_FRAME_SIZE; /* Alloc an excpt frm  */\
tophys(r11,r11); \
stw r10,_CCR(r11);  /* save various registers  */\
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 57deb1e9ffea..5f5f89e87e3a 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -142,7 +142,7 @@ _ENTRY(_start);
tophys(r11,r1); /* use tophys(r1) if kernel */ \
beq 1f; \
mfspr   r11,SPRN_SPRG_THREAD;   \
-   lwz r11,THREAD_INFO-THREAD(r11);\
+   lwz r11,TASK_STACK-THREAD(r11); \
addir11,r11,THREAD_SIZE;\
tophys(r11,r11);\
 1: subir11,r11,INT_FRAME_SIZE  /* alloc exc. frame */
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 306e26c073a0..69e80e6d0d16 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ 

[PATCH v16 05/21] powerpc: prep stack walkers for THREAD_INFO_IN_TASK

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

[text copied from commit 9bbd4c56b0b6
("arm64: prep stack walkers for THREAD_INFO_IN_TASK")]

When CONFIG_THREAD_INFO_IN_TASK is selected, task stacks may be freed
before a task is destroyed. To account for this, the stacks are
refcounted, and when manipulating the stack of another task, it is
necessary to get/put the stack to ensure it isn't freed and/or re-used
while we do so.

This patch reworks the powerpc stack walking code to account for this.
When CONFIG_THREAD_INFO_IN_TASK is not selected these perform no
refcounting, and this should only be a structural change that does not
affect behaviour.

Acked-by: Mark Rutland 
Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/process.c| 23 +--
 arch/powerpc/kernel/stacktrace.c | 29 ++---
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ce393df243aa..4ffbb677c9f5 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2027,7 +2027,7 @@ int validate_sp(unsigned long sp, struct task_struct *p,
 
 EXPORT_SYMBOL(validate_sp);
 
-unsigned long get_wchan(struct task_struct *p)
+static unsigned long __get_wchan(struct task_struct *p)
 {
unsigned long ip, sp;
int count = 0;
@@ -2053,6 +2053,20 @@ unsigned long get_wchan(struct task_struct *p)
return 0;
 }
 
+unsigned long get_wchan(struct task_struct *p)
+{
+   unsigned long ret;
+
+   if (!try_get_task_stack(p))
+   return 0;
+
+   ret = __get_wchan(p);
+
+   put_task_stack(p);
+
+   return ret;
+}
+
 static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
 void show_stack(struct task_struct *tsk, unsigned long *stack)
@@ -2067,6 +2081,9 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
int curr_frame = 0;
 #endif
 
+   if (!try_get_task_stack(tsk))
+   return;
+
sp = (unsigned long) stack;
if (tsk == NULL)
tsk = current;
@@ -2081,7 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
printk("Call Trace:\n");
do {
if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD))
-   return;
+   break;
 
stack = (unsigned long *) sp;
newsp = stack[0];
@@ -2121,6 +2138,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
*stack)
 
sp = newsp;
} while (count++ < kstack_depth_to_print);
+
+   put_task_stack(tsk);
 }
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index cf31ce6c1f53..f958f3bcba04 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -67,12 +67,17 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct 
stack_trace *trace)
 {
unsigned long sp;
 
+   if (!try_get_task_stack(tsk))
+   return;
+
if (tsk == current)
sp = current_stack_pointer();
else
sp = tsk->thread.ksp;
 
save_context_stack(trace, sp, tsk, 0);
+
+   put_task_stack(tsk);
 }
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
 
@@ -90,9 +95,8 @@ EXPORT_SYMBOL_GPL(save_stack_trace_regs);
  *
  * If the task is not 'current', the caller *must* ensure the task is inactive.
  */
-int
-save_stack_trace_tsk_reliable(struct task_struct *tsk,
-   struct stack_trace *trace)
+static int __save_stack_trace_tsk_reliable(struct task_struct *tsk,
+  struct stack_trace *trace)
 {
unsigned long sp;
unsigned long newsp;
@@ -197,6 +201,25 @@ save_stack_trace_tsk_reliable(struct task_struct *tsk,
}
return 0;
 }
+
+int save_stack_trace_tsk_reliable(struct task_struct *tsk,
+ struct stack_trace *trace)
+{
+   int ret;
+
+   /*
+* If the task doesn't have a stack (e.g., a zombie), the stack is
+* "reliably" empty.
+*/
+   if (!try_get_task_stack(tsk))
+   return 0;
+
+   ret = __save_stack_trace_tsk_reliable(tsk, trace);
+
+   put_task_stack(tsk);
+
+   return ret;
+}
 EXPORT_SYMBOL_GPL(save_stack_trace_tsk_reliable);
 #endif /* CONFIG_HAVE_RELIABLE_STACKTRACE */
 
-- 
2.20.1



[PATCH v16 04/21] powerpc: Only use task_struct 'cpu' field on SMP

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field
gets moved into task_struct and only defined when CONFIG_SMP is set.

This patch ensures that TI_CPU is only used when CONFIG_SMP is set and
that task_struct 'cpu' field is not used directly out of SMP code.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/head_fsl_booke.S | 2 ++
 arch/powerpc/kernel/misc_32.S| 4 
 arch/powerpc/xmon/xmon.c | 2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 2386ce2a9c6e..2c21e8642a00 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -243,8 +243,10 @@ _ENTRY(__early_start)
li  r0,0
stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
 
+#ifdef CONFIG_SMP
CURRENT_THREAD_INFO(r22, r1)
stw r24, TI_CPU(r22)
+#endif
 
bl  early_init
 
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 57d2ffb2d45c..02b8cdd73792 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -183,10 +183,14 @@ _GLOBAL(low_choose_750fx_pll)
or  r4,r4,r5
mtspr   SPRN_HID1,r4
 
+#ifdef CONFIG_SMP
/* Store new HID1 image */
CURRENT_THREAD_INFO(r6, r1)
lwz r6,TI_CPU(r6)
slwir6,r6,2
+#else
+   li  r6, 0
+#endif
addis   r6,r6,nap_save_hid1@ha
stw r4,nap_save_hid1@l(r6)
 
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 757b8499aba2..a0f44f992360 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2997,7 +2997,7 @@ static void show_task(struct task_struct *tsk)
printf("%px %016lx %6d %6d %c %2d %s\n", tsk,
tsk->thread.ksp,
tsk->pid, rcu_dereference(tsk->parent)->pid,
-   state, task_thread_info(tsk)->cpu,
+   state, task_cpu(tsk),
tsk->comm);
 }
 
-- 
2.20.1



[PATCH v16 03/21] powerpc: Avoid circular header inclusion in mmu-hash.h

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes
asm/current.h. This generates a circular dependency. To avoid that,
asm/processor.h shall not be included in mmu-hash.h.

In order to do that, this patch moves into a new header called
asm/task_size_64/32.h all the TASK_SIZE related constants, which can
then be included in mmu-hash.h directly.

Signed-off-by: Christophe Leroy 
Reviewed-by: Nicholas Piggin 
[mpe: Split out all the TASK_SIZE constants not just 64-bit ones]
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   2 +-
 arch/powerpc/include/asm/processor.h  | 100 +-
 arch/powerpc/include/asm/task_size_32.h   |  21 
 arch/powerpc/include/asm/task_size_64.h   |  79 ++
 arch/powerpc/kvm/book3s_hv_hmi.c  |   1 +
 5 files changed, 107 insertions(+), 96 deletions(-)
 create mode 100644 arch/powerpc/include/asm/task_size_32.h
 create mode 100644 arch/powerpc/include/asm/task_size_64.h

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 12e522807f9f..a28a28079edb 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -23,7 +23,7 @@
  */
 #include 
 #include 
-#include 
+#include 
 #include 
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ee58526cb6c2..d9b1503ba0f0 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -77,105 +77,15 @@ extern int _chrp_type;
 
 #ifdef __KERNEL__
 
-struct task_struct;
-void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp);
-void release_thread(struct task_struct *);
-
-#ifdef CONFIG_PPC32
-
-#if CONFIG_TASK_SIZE > CONFIG_KERNEL_START
-#error User TASK_SIZE overlaps with KERNEL_START address
-#endif
-#define TASK_SIZE  (CONFIG_TASK_SIZE)
-
-/* This decides where the kernel will search for a free chunk of vm
- * space during mmap's.
- */
-#define TASK_UNMAPPED_BASE (TASK_SIZE / 8 * 3)
-#endif
-
 #ifdef CONFIG_PPC64
-/*
- * 64-bit user address space can have multiple limits
- * For now supported values are:
- */
-#define TASK_SIZE_64TB  (0x4000UL)
-#define TASK_SIZE_128TB (0x8000UL)
-#define TASK_SIZE_512TB (0x0002UL)
-#define TASK_SIZE_1PB   (0x0004UL)
-#define TASK_SIZE_2PB   (0x0008UL)
-/*
- * With 52 bits in the address we can support
- * upto 4PB of range.
- */
-#define TASK_SIZE_4PB   (0x0010UL)
-
-/*
- * For now 512TB is only supported with book3s and 64K linux page size.
- */
-#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_64K_PAGES)
-/*
- * Max value currently used:
- */
-#define TASK_SIZE_USER64   TASK_SIZE_4PB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
-#define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
-#else
-#define TASK_SIZE_USER64   TASK_SIZE_64TB
-#define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_64TB
-/*
- * We don't need to allocate extended context ids for 4K page size, because
- * we limit the max effective address on this config to 64TB.
- */
-#define TASK_CONTEXT_SIZE  TASK_SIZE_64TB
-#endif
-
-/*
- * 32-bit user address space is 4GB - 1 page
- * (this 1 page is needed so referencing of 0x generates EFAULT
- */
-#define TASK_SIZE_USER32 (0x0001UL - (1*PAGE_SIZE))
-
-#define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
-   TASK_SIZE_USER32 : TASK_SIZE_USER64)
-#define TASK_SIZETASK_SIZE_OF(current)
-/* This decides where the kernel will search for a free chunk of vm
- * space during mmap's.
- */
-#define TASK_UNMAPPED_BASE_USER32 (PAGE_ALIGN(TASK_SIZE_USER32 / 4))
-#define TASK_UNMAPPED_BASE_USER64 (PAGE_ALIGN(DEFAULT_MAP_WINDOW_USER64 / 4))
-
-#define TASK_UNMAPPED_BASE ((is_32bit_task()) ? \
-   TASK_UNMAPPED_BASE_USER32 : TASK_UNMAPPED_BASE_USER64 )
-#endif
-
-/*
- * Initial task size value for user applications. For book3s 64 we start
- * with 128TB and conditionally enable upto 512TB
- */
-#ifdef CONFIG_PPC_BOOK3S_64
-#define DEFAULT_MAP_WINDOW ((is_32bit_task()) ?\
-TASK_SIZE_USER32 : DEFAULT_MAP_WINDOW_USER64)
+#include 
 #else
-#define DEFAULT_MAP_WINDOW TASK_SIZE
+#include 
 #endif
 
-#ifdef __powerpc64__
-
-#define STACK_TOP_USER64 DEFAULT_MAP_WINDOW_USER64
-#define STACK_TOP_USER32 TASK_SIZE_USER32
-
-#define STACK_TOP (is_32bit_task() ? \
-  STACK_TOP_USER32 : STACK_TOP_USER64)
-
-#define STACK_TOP_MAX TASK_SIZE_USER64
-
-#else /* __powerpc64__ */
-
-#define STACK_TOP TASK_SIZE
-#define STACK_TOP_MAX  STACK_TOP
-
-#endif /* __powerpc64__ */
+struct task_struct;
+void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp);
+void release_thread(struct task_struct *);
 
 typed

[PATCH v16 02/21] powerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/booke

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

40x/booke have another path to reach 3f from transfer_to_handler,
make sure it also calls ACCOUNT_CPU_USER_ENTRY() when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is selected.

Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/entry_32.S | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 0768dfd8a64e..d4c6186aa7e8 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -166,6 +166,13 @@
   internal debug mode bit to do this. */
lwz r12,THREAD_DBCR0(r12)
andis.  r12,r12,DBCR0_IDM@h
+#endif
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+   CURRENT_THREAD_INFO(r9, r1)
+   tophys(r9, r9)
+   ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
+#endif
+#if defined(CONFIG_40x) || defined(CONFIG_BOOKE)
beq+3f
/* From user and task is ptraced - load up global dbcr0 */
li  r12,-1  /* clear all pending debug events */
@@ -185,11 +192,6 @@
addir12,r12,-1
stw r12,4(r11)
 #endif
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
-   CURRENT_THREAD_INFO(r9, r1)
-   tophys(r9, r9)
-   ACCOUNT_CPU_USER_ENTRY(r9, r11, r12)
-#endif
 
b   3f
 
-- 
2.20.1



[PATCH v16 01/21] powerpc/irq: use memblock functions returning virtual address

2019-02-05 Thread Michael Ellerman
From: Christophe Leroy 

Since only the virtual address of allocated blocks is used,
lets use functions returning directly virtual address.

Those functions have the advantage of also zeroing the block.

Suggested-by: Mike Rapoport 
Acked-by: Mike Rapoport 
Signed-off-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/irq.c  |  5 -
 arch/powerpc/kernel/setup_32.c | 26 --
 arch/powerpc/kernel/setup_64.c | 19 +++
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index bb299613a462..4a5dd8800946 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -725,18 +725,15 @@ void exc_lvl_ctx_init(void)
 #endif
 #endif
 
-   memset((void *)critirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = critirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = 0;
 
 #ifdef CONFIG_BOOKE
-   memset((void *)dbgirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = dbgirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = 0;
 
-   memset((void *)mcheckirq_ctx[cpu_nr], 0, THREAD_SIZE);
tp = mcheckirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = HARDIRQ_OFFSET;
@@ -754,12 +751,10 @@ void irq_ctx_init(void)
int i;
 
for_each_possible_cpu(i) {
-   memset((void *)softirq_ctx[i], 0, THREAD_SIZE);
tp = softirq_ctx[i];
tp->cpu = i;
klp_init_thread_info(tp);
 
-   memset((void *)hardirq_ctx[i], 0, THREAD_SIZE);
tp = hardirq_ctx[i];
tp->cpu = i;
klp_init_thread_info(tp);
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 947f904688b0..1f0b7629c1a6 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -196,6 +196,17 @@ static int __init ppc_init(void)
 }
 arch_initcall(ppc_init);
 
+static void *__init alloc_stack(void)
+{
+   void *ptr = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
+
+   if (!ptr)
+   panic("cannot allocate %d bytes for stack at %pS\n",
+ THREAD_SIZE, (void *)_RET_IP_);
+
+   return ptr;
+}
+
 void __init irqstack_early_init(void)
 {
unsigned int i;
@@ -203,10 +214,8 @@ void __init irqstack_early_init(void)
/* interrupt stacks must be in lowmem, we get that for free on ppc32
 * as the memblock is limited to lowmem by default */
for_each_possible_cpu(i) {
-   softirq_ctx[i] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
-   hardirq_ctx[i] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   softirq_ctx[i] = alloc_stack();
+   hardirq_ctx[i] = alloc_stack();
}
 }
 
@@ -224,13 +233,10 @@ void __init exc_lvl_early_init(void)
hw_cpu = 0;
 #endif
 
-   critirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   critirq_ctx[hw_cpu] = alloc_stack();
 #ifdef CONFIG_BOOKE
-   dbgirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
-   mcheckirq_ctx[hw_cpu] = (struct thread_info *)
-   __va(memblock_phys_alloc(THREAD_SIZE, THREAD_SIZE));
+   dbgirq_ctx[hw_cpu] = alloc_stack();
+   mcheckirq_ctx[hw_cpu] = alloc_stack();
 #endif
}
 }
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 236c1151a3a7..080dd515d587 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -634,19 +634,17 @@ __init u64 ppc64_bolted_size(void)
 
 static void *__init alloc_stack(unsigned long limit, int cpu)
 {
-   unsigned long pa;
+   void *ptr;
 
BUILD_BUG_ON(STACK_INT_FRAME_SIZE % 16);
 
-   pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,
-   early_cpu_to_node(cpu), MEMBLOCK_NONE);
-   if (!pa) {
-   pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
-   if (!pa)
-   panic("cannot allocate stacks");
-   }
+   ptr = memblock_alloc_try_nid(THREAD_SIZE, THREAD_SIZE,
+MEMBLOCK_LOW_LIMIT, limit,
+early_cpu_to_node(cpu));
+   if (!ptr)
+   panic("cannot allocate stacks");
 
-   return __va(pa);
+   return ptr;
 }
 
 void __init irqstack_early_init(void)
@@ -739,20 +737,17 @@ void __init emergency_stack_init(void)
struct thread_info *ti;
 
ti = alloc_stack(limit,

[PATCH v16 00/21] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2019-02-05 Thread Michael Ellerman
The purpose of this series is to activate CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
  - It protects thread_info from corruption in the case of stack
overflows.
  - Its address is harder to determine if stack addresses are leaked,
making a number of attacks more difficult.

Changes in v16 (mpe):
 - split the prepartion patches out into smaller pieces.
 - move all TASK_SIZE related contents out of processor.h
 - fix build failures with livepatching enabled (include sched/task_stack.h)
 - Use PACA_CURRENT_TI for the offset of the thread info in paca->current.

Changes in v15:
 - switched patch 1 and 2.
 - resync patch 1 with linux/next. As memblock modifications are now fully 
merged in
 linux-mm tree, this patch voids as soon as linux-mm gets merged into 
powerpc/merge branch
 - Fixed build failure on 64le due to call to __save_stack_trace_tsk_reliable() 
(patch 5)
 - Taken the renaming of THREAD_INFO to TASK_STACK out of the preparation patch 
to ease review (hence new patch 6)
 - Fixed one place where r11 (physical address of stack) was used instead of r1 
to locate
 thread_info, inducing a bug when switching to r2 which is virtual address of 
current (patch 7)
 - Keeping physical address of current in r2 until MMU translation is 
reactivated (patch 11)

Changes in v14 (ie since v13):
 - Added in front a fixup patch which conflicts with this serie
 - Added a patch for using try_get_task_stack()/put_task_stack() in stack 
walkers.
 - Fixed compilation failure in the preparation patch (by moving the 
modification
 of klp_init_thread_info() to the following patch)

Changes since v12:
 - Patch 1: Taken comment from Mike (re-introduced the 'panic' in case memblock 
allocation fails in setup_64.c
 - Patch 1: Added alloc_stack() function in setup_32.c to also panic in case of 
allocation failure.

Changes since v11:
 - Rebased on 81775f5563fa ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
 - Added a first patch to change memblock allocs to functions returning virtual 
addrs. This removes
   the memset() which were the only remaining stuff in irq_ctx_init() and 
exc_lvl_ctx_init() at the end.
 - dropping irq_ctx_init() and exc_lvl_ctx_init() in patch 5 (powerpc: Activate 
CONFIG_THREAD_INFO_IN_TASK)
 - A few cosmetic changes in commit log and code.

Changes since v10:
 - Rebased on 21622a0d2023 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Fixed conflict in setup_32.S

Changes since v9:
 - Rebased on 183cbf93be88 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Fixed conflict on xmon

Changes since v8:
 - Rebased on e589b79e40d9 ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")
  ==> Main impact was conflicts due to commit 9a8dd708d547 ("memblock: rename 
memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*")

Changes since v7:
 - Rebased on fb6c6ce7907d ("Automatic merge of branches 'master', 'next' and 
'fixes' into merge")

Changes since v6:
 - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' patch 
(early crash with CONFIG_KMEMLEAK)

Changes since v5:
 - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding
 - Fixed PPC_BPF_LOAD_CPU() macro

Changes since v4:
 - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h is 
not
 already existing, was due to spaces instead of a tab in the Makefile

Changes since RFC v3: (based on Nick's review)
 - Renamed task_size.h to task_size_user64.h to better relate to what it 
contains.
 - Handling of the isolation of thread_info cpu field inside CONFIG_SMP #ifdefs 
moved to a separate patch.
 - Removed CURRENT_THREAD_INFO macro completely.
 - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is 
defined.
 - Added a patch at the end to rename 'tp' pointers to 'sp' pointers
 - Renamed 'tp' into 'sp' pointers in preparation patch when relevant
 - Fixed a few commit logs
 - Fixed checkpatch report.

Changes since RFC v2:
 - Removed the modification of names in asm-offsets
 - Created a rule in arch/powerpc/Makefile to append the offset of current->cpu 
in CFLAGS
 - Modified asm/smp.h to use the offset set in CFLAGS
 - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
 - Moved the modification of current_pt_regs in the patch activating 
CONFIG_THREAD_INFO_IN_TASK

Changes since RFC v1:
 - Removed the first patch which was modifying header inclusion order in timer
 - Modified some names in asm-offsets to avoid conflicts when including 
asm-offsets in C files
 - Modified asm/smp.h to avoid having to include linux/sched.h (using 
asm-offsets instead)
 - Moved some changes from the activation patch to the preparation patch.


Christophe Leroy (21):
  powerpc/irq: use memblock functions returning virtual address
  powerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/b

Re: [PATCH 00/19] KVM: PPC: Book3S HV: add XIVE native exploitation mode

2019-02-05 Thread Cédric Le Goater
>>> As for nesting, I suggest for the foreseeable future we stick to XICS
>>> emulation in nested guests.
>>
>> ok. so no kernel_irqchip at all. hmm. 

I was confused with what Paul calls 'XICS emulation'. It's not the QEMU
XICS emulated device but the XICS-over-XIVE KVM device, the KVM XICS 
device KVM uses when under a P9 processor. 

> That would certainly be step 0, making sure the capability advertises
> this correctly.  I think we do want to make XICs-on-XIVE emulation
> work in a KVM L1 (so we'd need to have it make XIVE hcalls to the L0
> instead of OPAL calls).

With the latest patch of Paul, the KVM XICS device is available for L2
and it works quite well. 

I also want to test it when L1 runs in KVM XIVE native mode, with the 
current patchset, to see how it behaves.

> XIVE-on-XIVE for L1 would be nice too, which would mean implementing
> the XIVE hcalls from the L2 in terms of XIVE hcalls to the L0.  I
> think it's ok to delay this indefinitely as long as the caps advertise
> correctly so that qemu will use userspace emulation until its ready.

ok. I need to fix this in the current patchset.

Thanks,

C. 



Re: powerpc/papr_scm: Use the correct bind address

2019-02-05 Thread Michael Ellerman
On Thu, 2019-01-31 at 01:53:47 UTC, Oliver O'Halloran wrote:
> When binding an SCM volume to a physical address the hypervisor has the
> option to return early with a continue token with the expectation that
> the guest will resume the bind operation until it completes. A quirk of
> this interface is that the bind address will only be returned by the
> first bind h-call and the subsequent calls will return
> 0x___ for the bind address.
> 
> We currently do not save the address returned by the first h-call. As a
> result we will use the junk address as the base of the bound region if
> the hypervisor decides to split the bind across multiple h-calls. This
> bug was found when testing with very large SCM volumes where the bind
> process would take more time than they hypervisor's internal h-call time
> limit would allow. This patch fixes the issue by saving the bind address
> from the first call.
> 
> Cc: sta...@vger.kernel.org
> Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
> Signed-off-by: Oliver O'Halloran 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/5a3840a470c41ec0b85cd36ca8037033

cheers


Re: arch/powerpc/radix: Fix kernel crash with mremap

2019-02-05 Thread Michael Ellerman
On Wed, 2019-01-23 at 06:21:38 UTC, "Aneesh Kumar K.V" wrote:
> With support for split pmd lock, we use pmd page pmd_huge_pte pointer to store
> the deposited page table. In those config when we move page tables we need to
> make sure we move the depoisted page table to the right pmd page. Otherwise 
> this
> can result in crash when we withdraw of deposited page table because we can 
> find
> the pmd_huge_pte NULL.
> 
> c04a1230 __split_huge_pmd+0x1070/0x1940
> c04a0ff4 __split_huge_pmd+0xe34/0x1940 (unreliable)
> c04a4000 vma_adjust_trans_huge+0x110/0x1c0
> c042fe04 __vma_adjust+0x2b4/0x9b0
> c04316e8 __split_vma+0x1b8/0x280
> c043192c __do_munmap+0x13c/0x550
> c0439390 sys_mremap+0x220/0x7e0
> c000b488 system_call+0x5c/0x70
> 
> Fixes: 675d995297d4 ("powerpc/book3s64: Enable split pmd ptlock.")
> Signed-off-by: Aneesh Kumar K.V 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/579b9239c1f38665b21e8d0e6ee83ecc

cheers


Re: [PATCH 1/4] powerpc/64s: Clear on-stack exception marker upon exception return

2019-02-05 Thread Michael Ellerman
Balbir Singh  writes:
> On Sat, Feb 2, 2019 at 12:14 PM Balbir Singh  wrote:
>>
>> On Tue, Jan 22, 2019 at 10:57:21AM -0500, Joe Lawrence wrote:
>> > From: Nicolai Stange 
>> >
>> > The ppc64 specific implementation of the reliable stacktracer,
>> > save_stack_trace_tsk_reliable(), bails out and reports an "unreliable
>> > trace" whenever it finds an exception frame on the stack. Stack frames
>> > are classified as exception frames if the STACK_FRAME_REGS_MARKER magic,
>> > as written by exception prologues, is found at a particular location.
>> >
>> > However, as observed by Joe Lawrence, it is possible in practice that
>> > non-exception stack frames can alias with prior exception frames and thus,
>> > that the reliable stacktracer can find a stale STACK_FRAME_REGS_MARKER on
>> > the stack. It in turn falsely reports an unreliable stacktrace and blocks
>> > any live patching transition to finish. Said condition lasts until the
>> > stack frame is overwritten/initialized by function call or other means.
>> >
>> > In principle, we could mitigate this by making the exception frame
>> > classification condition in save_stack_trace_tsk_reliable() stronger:
>> > in addition to testing for STACK_FRAME_REGS_MARKER, we could also take into
>> > account that for all exceptions executing on the kernel stack
>> > - their stack frames's backlink pointers always match what is saved
>> >   in their pt_regs instance's ->gpr[1] slot and that
>> > - their exception frame size equals STACK_INT_FRAME_SIZE, a value
>> >   uncommonly large for non-exception frames.
>> >
>> > However, while these are currently true, relying on them would make the
>> > reliable stacktrace implementation more sensitive towards future changes in
>> > the exception entry code. Note that false negatives, i.e. not detecting
>> > exception frames, would silently break the live patching consistency model.
>> >
>> > Furthermore, certain other places (diagnostic stacktraces, perf, xmon)
>> > rely on STACK_FRAME_REGS_MARKER as well.
>> >
>> > Make the exception exit code clear the on-stack STACK_FRAME_REGS_MARKER
>> > for those exceptions running on the "normal" kernel stack and returning
>> > to kernelspace: because the topmost frame is ignored by the reliable stack
>> > tracer anyway, returns to userspace don't need to take care of clearing
>> > the marker.
>> >
>> > Furthermore, as I don't have the ability to test this on Book 3E or
>> > 32 bits, limit the change to Book 3S and 64 bits.
>> >
>> > Finally, make the HAVE_RELIABLE_STACKTRACE Kconfig option depend on
>> > PPC_BOOK3S_64 for documentation purposes. Before this patch, it depended
>> > on PPC64 && CPU_LITTLE_ENDIAN and because CPU_LITTLE_ENDIAN implies
>> > PPC_BOOK3S_64, there's no functional change here.
>> >
>> > Fixes: df78d3f61480 ("powerpc/livepatch: Implement reliable stack tracing 
>> > for the consistency model")
>> > Reported-by: Joe Lawrence 
>> > Signed-off-by: Nicolai Stange 
>> > Signed-off-by: Joe Lawrence 
>> > ---
>> >  arch/powerpc/Kconfig   | 2 +-
>> >  arch/powerpc/kernel/entry_64.S | 7 +++
>> >  2 files changed, 8 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> > index 2890d36eb531..73bf87b1d274 100644
>> > --- a/arch/powerpc/Kconfig
>> > +++ b/arch/powerpc/Kconfig
>> > @@ -220,7 +220,7 @@ config PPC
>> >   select HAVE_PERF_USER_STACK_DUMP
>> >   select HAVE_RCU_TABLE_FREE  if SMP
>> >   select HAVE_REGS_AND_STACK_ACCESS_API
>> > - select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
>> > + select HAVE_RELIABLE_STACKTRACE if PPC_BOOK3S_64 && 
>> > CPU_LITTLE_ENDIAN
>> >   select HAVE_SYSCALL_TRACEPOINTS
>> >   select HAVE_VIRT_CPU_ACCOUNTING
>> >   select HAVE_IRQ_TIME_ACCOUNTING
>> > diff --git a/arch/powerpc/kernel/entry_64.S 
>> > b/arch/powerpc/kernel/entry_64.S
>> > index 435927f549c4..a2c168b395d2 100644
>> > --- a/arch/powerpc/kernel/entry_64.S
>> > +++ b/arch/powerpc/kernel/entry_64.S
>> > @@ -1002,6 +1002,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>> >   ld  r2,_NIP(r1)
>> >   mtspr   SPRN_SRR0,r2
>> >
>> > + /*
>> > +  * Leaving a stale exception_marker on the stack can confuse
>> > +  * the reliable stack unwinder later on. Clear it.
>> > +  */
>> > + li  r2,0
>> > + std r2,STACK_FRAME_OVERHEAD-16(r1)
>> > +
>>
>> Could you please double check, r4 is already 0 at this point
>> IIUC. So the change might be a simple
>>
>> std r4,STACK_FRAME_OVERHEAD-16(r1)
>>
>
> r4 is not 0, sorry for the noise

Isn't it?

cheers


Re: [PATCH] hugetlb: allow to free gigantic pages regardless of the configuration

2019-02-05 Thread Michael Ellerman
Alexandre Ghiti  writes:

> From: Alexandre Ghiti 
>
> On systems without CMA or (MEMORY_ISOLATION && COMPACTION) activated but
> that support gigantic pages, boottime reserved gigantic pages can not be
> freed at all. This patchs simply enables the possibility to hand back
> those pages to memory allocator.
>
> This commit then renames gigantic_page_supported and
> ARCH_HAS_GIGANTIC_PAGE to make them more accurate. Indeed, those values
> being false does not mean that the system cannot use gigantic pages: it
> just means that runtime allocation of gigantic pages is not supported,
> one can still allocate boottime gigantic pages if the architecture supports
> it.
>
> Signed-off-by: Alexandre Ghiti 
> ---
>
> - Compiled on all architectures
> - Tested on riscv architecture
>
>  arch/arm64/Kconfig   |  2 +-
>  arch/arm64/include/asm/hugetlb.h |  7 +++--
>  arch/powerpc/include/asm/book3s/64/hugetlb.h |  4 +--
>  arch/powerpc/platforms/Kconfig.cputype   |  2 +-

The powerpc parts look fine.

Acked-by: Michael Ellerman  (powerpc)

cheers

>  arch/s390/Kconfig|  2 +-
>  arch/s390/include/asm/hugetlb.h  |  7 +++--
>  arch/x86/Kconfig |  2 +-
>  arch/x86/include/asm/hugetlb.h   |  7 +++--
>  fs/Kconfig   |  2 +-
>  include/linux/gfp.h  |  2 +-
>  mm/hugetlb.c | 43 
> +++-
>  mm/page_alloc.c  |  4 +--
>  12 files changed, 48 insertions(+), 36 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4168d366127..18239cbd7fcd 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -18,7 +18,7 @@ config ARM64
>   select ARCH_HAS_FAST_MULTIPLIER
>   select ARCH_HAS_FORTIFY_SOURCE
>   select ARCH_HAS_GCOV_PROFILE_ALL
> - select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
> + select ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION if (MEMORY_ISOLATION 
> && COMPACTION) || CMA
>   select ARCH_HAS_KCOV
>   select ARCH_HAS_MEMBARRIER_SYNC_CORE
>   select ARCH_HAS_PTE_SPECIAL
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index fb6609875455..797fc77eabcd 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -65,8 +65,11 @@ extern void set_huge_swap_pte_at(struct mm_struct *mm, 
> unsigned long addr,
>  
>  #include 
>  
> -#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> -static inline bool gigantic_page_supported(void) { return true; }
> +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION
> +static inline bool gigantic_page_runtime_allocation_supported(void)
> +{
> + return true;
> +}
>  #endif
>  
>  #endif /* __ASM_HUGETLB_H */
> diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
> b/arch/powerpc/include/asm/book3s/64/hugetlb.h
> index 5b0177733994..7711f0e2c7e5 100644
> --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
> +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
> @@ -32,8 +32,8 @@ static inline int hstate_get_psize(struct hstate *hstate)
>   }
>  }
>  
> -#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> -static inline bool gigantic_page_supported(void)
> +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION
> +static inline bool gigantic_page_runtime_allocation_supported(void)
>  {
>   return true;
>  }
> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> b/arch/powerpc/platforms/Kconfig.cputype
> index 8c7464c3f27f..779e06bac697 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -319,7 +319,7 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>  config PPC_RADIX_MMU
>   bool "Radix MMU Support"
>   depends on PPC_BOOK3S_64
> - select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
> + select ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION if (MEMORY_ISOLATION 
> && COMPACTION) || CMA
>   default y
>   help
> Enable support for the Power ISA 3.0 Radix style MMU. Currently this
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index ed554b09eb3f..6776eef6a9ae 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -69,7 +69,7 @@ config S390
>   select ARCH_HAS_ELF_RANDOMIZE
>   select ARCH_HAS_FORTIFY_SOURCE
>   select ARCH_HAS_GCOV_PROFILE_ALL
> - select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
> + select ARCH_HAS_GIGANTIC_PAGE_RUNTIME_ALLOCATION if (MEMORY_ISOLATION 
> && COMPACTION) || CMA
>   select ARCH_HAS_KCOV
>   select ARCH_HAS_PTE_SPECIAL
>   select ARCH_HAS_SET_MEMORY
> diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
> index 2d1afa58a4b6..57c952f5388e 100644
> --- a/arch/s390/include/asm/hugetlb.h
> +++ b/arch/s390/include/asm/hugetlb.h
> @@ -116,7 +116,10 @@ static inline pte_t huge_pte_mod

Re: [RFC PATCH] powerpc: fix get_arch_dma_ops() for NTB devices

2019-02-05 Thread Michael Ellerman
Christoph Hellwig  writes:
> On Wed, Jan 30, 2019 at 11:58:40PM +1100, Michael Ellerman wrote:
>> Alexander Fomichev  writes:
>> 
>> > get_dma_ops() falls into arch-dependant get_arch_dma_ops(), which
>> > historically returns NULL on PowerPC. Therefore dma_set_mask() fails.
>> > This affects Switchtec (and probably other) NTB devices, that they fail
>> > to initialize.
>> 
>> What's an NTB device?
>> 
>> drivers/ntb I assume?
>> 
>> So it's a PCI device of some sort, but presumably the device you're
>> calling dma_set_mask() on is an NTB device not a PCI device?
>> 
>> But then it works if you tell it to use the PCI DMA ops?
>> 
>> At the very least the code should be checking for the NTB bus type and
>> only returning the PCI ops in that specific case, not for all devices.
>
> Can you provide the context?  E.g. the patch and the rest of the commit
> log.  This all looks rather odd to me.

Sorry, here it is.

Or on lore: 
https://lore.kernel.org/linuxppc-dev/20190128133203.mon4a3nkrzijn43g@alfbook-pro.local/

Subject: [RFC PATCH] powerpc: fix get_arch_dma_ops() for NTB devices

get_dma_ops() falls into arch-dependant get_arch_dma_ops(), which
historically returns NULL on PowerPC. Therefore dma_set_mask() fails.
This affects Switchtec (and probably other) NTB devices, that they fail
to initialize.
The proposed patch should fix the issue.

---
 arch/powerpc/include/asm/dma-mapping.h | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index ebf6680..cb6ac96 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -70,14 +70,11 @@ extern struct dma_map_ops dma_iommu_ops;
 #endif
 extern const struct dma_map_ops dma_nommu_ops;

+extern const struct dma_map_ops *get_pci_dma_ops(void);
+
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-   /* We don't handle the NULL dev case for ISA for now. We could
-* do it via an out of line call but it is not needed for now. The
-* only ISA DMA device we support is the floppy and we have a hack
-* in the floppy driver directly to get a device for us.
-*/
-   return NULL;
+   return get_pci_dma_ops();
 }

 /*


Re: [PATCH] powerpc/prom_init: add __init markers to all functions

2019-02-05 Thread Michael Ellerman
Masahiro Yamada  writes:

> It is fragile to rely on the compiler's optimization to avoid the
> section mismatch. Some functions may not be necessarily inlined
> when the compiler's inlining heuristic changes.
>
> Add __init markers consistently.
>
> As for prom_getprop() and prom_getproplen(), they are marked as
> 'inline', so inlining is guaranteed because PowerPC never enables
> CONFIG_OPTIMIZE_INLINING. However, it would be better to leave the
> inlining decision to the compiler. I replaced 'inline' with __init.

I'm going to drop that part because it breaks the build in some
configurations (as reported by the build robot).

> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index f33ff41..85b0719 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -501,19 +501,19 @@ static int __init prom_next_node(phandle *nodep)
>   }
>  }
>  
> -static inline int prom_getprop(phandle node, const char *pname,
> +static int __init prom_getprop(phandle node, const char *pname,
>  void *value, size_t valuelen)
>  {
>   return call_prom("getprop", 4, 1, node, ADDR(pname),
>(u32)(unsigned long) value, (u32) valuelen);
>  }
>  
> -static inline int prom_getproplen(phandle node, const char *pname)
> +static int __init prom_getproplen(phandle node, const char *pname)
>  {
>   return call_prom("getproplen", 2, 1, node, ADDR(pname));
>  }
>  
> -static void add_string(char **str, const char *q)
> +static void __init add_string(char **str, const char *q)
>  {
>   char *p = *str;
>  
> @@ -523,7 +523,7 @@ static void add_string(char **str, const char *q)
>   *str = p;
>  }
>  
> -static char *tohex(unsigned int x)
> +static char __init *tohex(unsigned int x)
>  {
>   static const char digits[] __initconst = "0123456789abcdef";
>   static char result[9] __prombss;
> @@ -570,7 +570,7 @@ static int __init prom_setprop(phandle node, const char 
> *nodename,
>  #define islower(c)   ('a' <= (c) && (c) <= 'z')
>  #define toupper(c)   (islower(c) ? ((c) - 'a' + 'A') : (c))
>  
> -static unsigned long prom_strtoul(const char *cp, const char **endp)
> +static unsigned long __init prom_strtoul(const char *cp, const char **endp)
>  {
>   unsigned long result = 0, base = 10, value;
>  
> @@ -595,7 +595,7 @@ static unsigned long prom_strtoul(const char *cp, const 
> char **endp)
>   return result;
>  }
>  
> -static unsigned long prom_memparse(const char *ptr, const char **retptr)
> +static unsigned long __init prom_memparse(const char *ptr, const char 
> **retptr)
>  {
>   unsigned long ret = prom_strtoul(ptr, retptr);
>   int shift = 0;
> @@ -2924,7 +2924,7 @@ static void __init fixup_device_tree_pasemi(void)
>   prom_setprop(iob, name, "device_type", "isa", sizeof("isa"));
>  }
>  #else/* !CONFIG_PPC_PASEMI_NEMO */
> -static inline void fixup_device_tree_pasemi(void) { }
> +static inline void __init fixup_device_tree_pasemi(void) { }

I don't think we need __init for an empty static inline.

>  #endif
>  
>  static void __init fixup_device_tree(void)
> @@ -2986,15 +2986,15 @@ static void __init prom_check_initrd(unsigned long 
> r3, unsigned long r4)
>  
>  #ifdef CONFIG_PPC64
>  #ifdef CONFIG_RELOCATABLE
> -static void reloc_toc(void)
> +static void __init reloc_toc(void)
>  {
>  }
>  
> -static void unreloc_toc(void)
> +static void __init unreloc_toc(void)
>  {
>  }

Those should be empty static inlines, I'll fix them up.

>  #else
> -static void __reloc_toc(unsigned long offset, unsigned long nr_entries)
> +static void __init __reloc_toc(unsigned long offset, unsigned long 
> nr_entries)
>  {
>   unsigned long i;
>   unsigned long *toc_entry;
> @@ -3008,7 +3008,7 @@ static void __reloc_toc(unsigned long offset, unsigned 
> long nr_entries)
>   }
>  }
>  
> -static void reloc_toc(void)
> +static void __init reloc_toc(void)
>  {
>   unsigned long offset = reloc_offset();
>   unsigned long nr_entries =
> @@ -3019,7 +3019,7 @@ static void reloc_toc(void)
>   mb();
>  }
>  
> -static void unreloc_toc(void)
> +static void __init unreloc_toc(void)
>  {
>   unsigned long offset = reloc_offset();
>   unsigned long nr_entries =


cheers


Re: [PATCH v02] powerpc/pseries: Check for ceded CPU's during LPAR migration

2019-02-05 Thread Michael Ellerman
Michael Bringmann  writes:
> See below.
>
> On 1/31/19 3:53 PM, Michael Bringmann wrote:
>> On 1/30/19 11:38 PM, Michael Ellerman wrote:
>>> Michael Bringmann  writes:
 This patch is to check for cede'ed CPUs during LPM.  Some extreme
 tests encountered a problem ehere Linux has put some threads to
 sleep (possibly to save energy or something), LPM was attempted,
 and the Linux kernel didn't awaken the sleeping threads, but issued
 the H_JOIN for the active threads.  Since the sleeping threads
 are not awake, they can not issue the expected H_JOIN, and the
 partition would never suspend.  This patch wakes the sleeping
 threads back up.
>>>
>>> I'm don't think this is the right solution.
>>>
>>> Just after your for loop we do an on_each_cpu() call, which sends an IPI
>>> to every CPU, and that should wake all CPUs up from CEDE.
>>>
>>> If that's not happening then there is a bug somewhere, and we need to
>>> work out where.
>
> From Pete Heyrman:
> Both sending IPI or H_PROD will awaken a logical processors that has 
> ceded.
> When you have logical proc doing cede and one logical proc doing prod or 
> IPI
> you have a race condition that the prod/IPI can proceed the cede request.
> If you use prod, the hypervisor takes care of the synchronization by 
> ignoring
> a cede request if it was preceeded by a prod.  With IPI the interrupt is
> delivered which could then be followed by a cede so OS would need to 
> provide
> synchronization.
>
> Shouldn't this answer your concerns about race conditions and the suitability
> of using H_PROD?

No sorry it doesn't.

Assuming the other CPU is idle it will just continually do CEDE in a
loop, sending it a PROD will just wake it up once and then it will CEDE
again. That first CEDE might return immediately on seeing the PROD, but
then the kernel will just CEDE again because it has nothing to do.

In contrast the IPI we send wakes up the other CPU and tells it to run a
function, rtas_percpu_suspend_me(), which does the H_JOIN directly.

I still don't understand how the original bug ever even happened. That's
what I want to know.

The way we do the joining and suspend seems like it could be simpler,
there's a bunch of atomic flags and __rtas_suspend_last_cpu() seems to
duplicate much of __rtas_suspend_cpu(). It seems more likely we have a
bug in there somewhere.

cheers


[PATCH v4 2/2] drivers: soc: fsl: add qixis driver

2019-02-05 Thread Pankaj Bansal
FPGA on LX2160AQDS/LX2160ARDB connected on I2C bus, so add qixis
driver which is basically an i2c client driver to control FPGA.

Also added platform driver for MMIO based FPGA, like the one available
on LS2088ARDB/LS2088AQDS.

Signed-off-by: Wang Dongsheng 
Signed-off-by: Pankaj Bansal 
---

Notes:
V4:
- Fix compilation error when qixis_ctrl is built as standalone module.
V3:
- Add MMIO based FPGA driver
V2:
- Modify the driver to not create platform devices corresponding to 
subnodes.
  because the subnodes are not actual devices.
- Use mdio_mux_regmap_init/mdio_mux_regmap_uninit
- Remove header file from include folder, as no qixis api is called from 
outside
- Add regmap_exit in driver's remove function
Dendencies:
- https://www.mail-archive.com/netdev@vger.kernel.org/msg281274.html

 drivers/soc/fsl/Kconfig  |  11 ++
 drivers/soc/fsl/Makefile |   1 +
 drivers/soc/fsl/qixis_ctrl.c | 222 +
 3 files changed, 234 insertions(+)

diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
index 8f80e8bbf29e..75993be04e42 100644
--- a/drivers/soc/fsl/Kconfig
+++ b/drivers/soc/fsl/Kconfig
@@ -28,4 +28,15 @@ config FSL_MC_DPIO
  other DPAA2 objects. This driver does not expose the DPIO
  objects individually, but groups them under a service layer
  API.
+
+config FSL_QIXIS
+   tristate "QIXIS system controller driver"
+   depends on OF
+   select REGMAP_I2C
+   select REGMAP_MMIO
+   default n
+   help
+ Say y here to enable QIXIS system controller api. The qixis driver
+ provides FPGA functions to control system.
+
 endmenu
diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
index 803ef1bfb5ff..47e0cfc66ca4 100644
--- a/drivers/soc/fsl/Makefile
+++ b/drivers/soc/fsl/Makefile
@@ -5,5 +5,6 @@
 obj-$(CONFIG_FSL_DPAA) += qbman/
 obj-$(CONFIG_QUICC_ENGINE) += qe/
 obj-$(CONFIG_CPM)  += qe/
+obj-$(CONFIG_FSL_QIXIS)+= qixis_ctrl.o
 obj-$(CONFIG_FSL_GUTS) += guts.o
 obj-$(CONFIG_FSL_MC_DPIO)  += dpio/
diff --git a/drivers/soc/fsl/qixis_ctrl.c b/drivers/soc/fsl/qixis_ctrl.c
new file mode 100644
index ..a8108bfd5195
--- /dev/null
+++ b/drivers/soc/fsl/qixis_ctrl.c
@@ -0,0 +1,222 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/* Freescale QIXIS system controller driver.
+ *
+ * Copyright 2015 Freescale Semiconductor, Inc.
+ * Copyright 2018-2019 NXP
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* QIXIS MAP */
+struct fsl_qixis_regs {
+   u8  id; /* Identification Registers */
+   u8  version;/* Version Register */
+   u8  qixis_ver;  /* QIXIS Version Register */
+   u8  reserved1[0x1f];
+};
+
+struct mdio_mux_data {
+   void*data;
+   struct list_headlink;
+};
+
+struct qixis_priv {
+   struct regmap   *regmap;
+   struct list_headmdio_mux_list;
+};
+
+static struct regmap_config qixis_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+};
+
+static int fsl_qixis_mdio_mux_init(struct device *dev, struct qixis_priv *priv)
+{
+   struct device_node *child;
+   struct mdio_mux_data *mux_data;
+   int ret;
+
+   INIT_LIST_HEAD(&priv->mdio_mux_list);
+   for_each_child_of_node(dev->of_node, child) {
+   if (!of_node_name_prefix(child, "mdio-mux"))
+   continue;
+
+   mux_data = devm_kzalloc(dev, sizeof(struct mdio_mux_data),
+   GFP_KERNEL);
+   if (!mux_data)
+   return -ENOMEM;
+   ret = mdio_mux_regmap_init(dev, child, &mux_data->data);
+   if (ret)
+   return ret;
+   list_add(&mux_data->link, &priv->mdio_mux_list);
+   }
+
+   return 0;
+}
+
+static int fsl_qixis_mdio_mux_uninit(struct qixis_priv *priv)
+{
+   struct list_head *pos;
+   struct mdio_mux_data *mux_data;
+
+   list_for_each(pos, &priv->mdio_mux_list) {
+   mux_data = list_entry(pos, struct mdio_mux_data, link);
+   mdio_mux_regmap_uninit(mux_data->data);
+   }
+
+   return 0;
+}
+
+static int fsl_qixis_probe(struct platform_device *pdev)
+{
+   static struct fsl_qixis_regs __iomem *qixis;
+   struct qixis_priv *priv;
+   int ret;
+   u32 qver;
+
+   qixis = of_iomap(pdev->dev.of_node, 0);
+   if (IS_ERR_OR_NULL(qixis)) {
+   pr_err("%s: Could not map qixis registers\n", __func__);
+   return -ENODEV;
+   }
+
+   priv = devm_kzalloc(&pdev->dev, sizeof(struct qixis_priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   priv->regmap = devm_regmap_i

[PATCH v4 1/2] dt-bindings: soc: fsl: Document Qixis FPGA usage

2019-02-05 Thread Pankaj Bansal
an FPGA-based system controller, called “Qixis”, which
manages several critical system features, including:
• Reset sequencing
• Power supply configuration
• Board configuration
• hardware configuration

The qixis registers are accessible over one or more system-specific
interfaces, typically I2C, JTAG or an embedded processor.

Signed-off-by: Pankaj Bansal 
---

Notes:
V4:
- No Change
V3:
- Added boardname based compatible field in bindings
- Added bindings for MMIO based FPGA
V2:
- No change

 .../bindings/soc/fsl/qixis_ctrl.txt  | 53 ++
 1 file changed, 53 insertions(+)

diff --git a/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt 
b/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt
new file mode 100644
index ..5d510df14be8
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt
@@ -0,0 +1,53 @@
+* QIXIS FPGA block
+
+an FPGA-based system controller, called “Qixis”, which
+manages several critical system features, including:
+• Configuration switch monitoring
+• Power on/off sequencing
+• Reset sequencing
+• Power supply configuration
+• Board configuration
+• hardware configuration
+• Background power data collection (DCM)
+• Fault monitoring
+• RCW bypass SRAM (replace flash RCW with internal RCW) (NOR only)
+• Dedicated functional validation blocks (POSt/IRS, triggered event, and so on)
+• I2C master for remote board control even with no DUT available
+
+The qixis registers are accessible over one or more system-specific interfaces,
+typically I2C, JTAG or an embedded processor.
+
+FPGA connected to I2C:
+Required properties:
+
+ - compatible: should be a board-specific string followed by a string
+   indicating the type of FPGA.  Example:
+   "fsl,-fpga", "fsl,fpga-qixis-i2c"
+ - reg : i2c address of the qixis device.
+
+Example (LX2160A-QDS):
+   /* The FPGA node */
+fpga@66 {
+   compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
+   reg = <0x66>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   }
+
+* Freescale on-board FPGA
+
+This is the memory-mapped registers for on board FPGA.
+
+Required properties:
+- compatible: should be a board-specific string followed by a string
+  indicating the type of FPGA.  Example:
+   "fsl,-fpga", "fsl,fpga-qixis"
+- reg: should contain the address and the length of the FPGA register set.
+
+Example (LS2080A-RDB):
+
+cpld@3,0 {
+compatible = "fsl,ls2080ardb-fpga", "fsl,fpga-qixis";
+reg = <0x3 0 0x1>;
+};
+
-- 
2.17.1



[PATCH v4 0/2] add qixis driver

2019-02-05 Thread Pankaj Bansal
FPGA on LX2160AQDS/LX2160ARDB connected on I2C bus, so add qixis driver which 
is basically an i2c client driver to control FPGA.

Also added platform driver for MMIO based FPGA, like the one available on 
LS2088ARDB/LS2088AQDS.

This driver is essential to control MDIO mux multiplexing.

This driver is dependent on below patches:
https://www.mail-archive.com/netdev@vger.kernel.org/msg281274.html

Cc: Varun Sethi 

---
Notes:
V3:
- https://patchwork.kernel.org/cover/10795195/
V2:
- https://patchwork.kernel.org/cover/10788341/
V1:
- https://patchwork.kernel.org/cover/10627297/

Pankaj Bansal (2):
  dt-bindings: soc: fsl: Document Qixis FPGA usage
  drivers: soc: fsl: add qixis driver

 .../bindings/soc/fsl/qixis_ctrl.txt   |  53 +
 drivers/soc/fsl/Kconfig   |  11 +
 drivers/soc/fsl/Makefile  |   1 +
 drivers/soc/fsl/qixis_ctrl.c  | 222 ++
 4 files changed, 287 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt
 create mode 100644 drivers/soc/fsl/qixis_ctrl.c

-- 
2.17.1



Re: [RFC/WIP] powerpc: Fix 32-bit handling of MSR_EE on exceptions

2019-02-05 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 20/12/2018 à 23:35, Benjamin Herrenschmidt a écrit :
>> 
/*
 * MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE.
 @@ -205,20 +208,46 @@ transfer_to_handler_cont:
mflrr9
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
 +#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
 +  mtspr   SPRN_NRI, r0
 +#endif
>>>
>>> That's not part of your patch, it's already in the tree.
>> 
>> Yup rebase glitch.
>> 
>>   .../...
>> 
>>> I tested it on the 8xx with the below changes in addition. No issue seen
>>> so far.
>> 
>> Thanks !
>> 
>> I'll merge that in.
>
> I'm currently working on a refactorisation and simplification of
> exception and syscall entry on ppc32.
>
> I plan to take your patch in my serie as it helps quite a bit. I hope 
> you don't mind. I expect to come out with a series this week.

Ben's AFK so go ahead and pull it in to your series if that helps you.
 
>> The main obscure area is that business with the irqsoff tracer and thus
>> the need to create stack frames around calls to trace_hardirqs_* ... we
>> do it in some places and not others, but I've not managed to make it
>> crash either. I need to get to the bottom of that, and possibly provide
>> proper macro helpers like ppc64 has to do it.
>
> I can't see anything special around this in ppc32 code. As far as I 
> understand, a stack frame is put in place when there is a need to
> save and restore some volatile registers. At the places where nothing 
> needs to be saved, nothing is done. I think that's the normal way for 
> any function call, isn't it ?

The concern was that the irqsoff tracer was doing
__builtin_return_address(1) (or some number > 0) and that crashes if
there aren't sufficiently many stack frames available.

See ftrace_return_address.

Possibly the answer is that we don't have CONFIG_FRAME_POINTER and so we
get the empty version of that.

cheers


Re: [PATCH v02] powerpc/pseries: Check for ceded CPU's during LPAR migration

2019-02-05 Thread Michael Ellerman
Tyrel Datwyler  writes:
> On 01/31/2019 02:21 PM, Tyrel Datwyler wrote:
>> On 01/31/2019 01:53 PM, Michael Bringmann wrote:
>>> On 1/30/19 11:38 PM, Michael Ellerman wrote:
 Michael Bringmann  writes:
> This patch is to check for cede'ed CPUs during LPM.  Some extreme
> tests encountered a problem ehere Linux has put some threads to
> sleep (possibly to save energy or something), LPM was attempted,
> and the Linux kernel didn't awaken the sleeping threads, but issued
> the H_JOIN for the active threads.  Since the sleeping threads
> are not awake, they can not issue the expected H_JOIN, and the
> partition would never suspend.  This patch wakes the sleeping
> threads back up.

 I'm don't think this is the right solution.

 Just after your for loop we do an on_each_cpu() call, which sends an IPI
 to every CPU, and that should wake all CPUs up from CEDE.

 If that's not happening then there is a bug somewhere, and we need to
 work out where.
>>>
>>> Let me explain the scenario of the LPM case that Pete Heyrman found, and
>>> that Nathan F. was working upon, previously.
>>>
>>> In the scenario, the partition has 5 dedicated processors each with 8 
>>> threads
>>> running.
>> 
>> Do we CEDE processors when running dedicated? I thought H_CEDE was part of 
>> the
>> Shared Processor LPAR option.
>
> Looks like the cpuidle-pseries driver uses CEDE with dedicated processors as
> long as firmware supports SPLPAR option.
>
>> 
>>>
>>> From the PHYP data we can see that on VP 0, threads 3, 4, 5, 6 and 7 issued
>>> a H_CEDE requesting to save energy by putting the requesting thread into
>>> sleep mode.  In this state, the thread will only be awakened by H_PROD from
>>> another running thread or from an external user action (power off, reboot
>>> and such).  Timers and external interrupts are disabled in this mode.
>> 
>> Not according to PAPR. A CEDE'd processor should awaken if signaled by 
>> external
>> interrupt such as decrementer or IPI as well.
>
> This statement should still apply though. From PAPR:
>
> 14.11.3.3 H_CEDE
> The architectural intent of this hcall() is to have the virtual processor, 
> which
> has no useful work to do, enter a wait state ceding its processor capacity to
> other virtual processors until some useful work appears, signaled either 
> through
> an interrupt or a prod hcall(). To help the caller reduce race conditions, 
> this
> call may be made with interrupts disabled but the semantics of the hcall()
> enable the virtual processor’s interrupts so that it may always receive wake 
> up
> interrupt signals.

Thanks for digging that out of PAPR.

H_CEDE must respond to IPIs, we have no logic to H_PROD CPUs that are
idle in order to wake them up.

There must be something else going on here.

cheers


Re: [PATCH v2] powerpc: drop page_is_ram() and walk_system_ram_range()

2019-02-05 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 04/02/2019 à 11:24, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>> 
>>> Since commit c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
>>> it is possible to use the generic walk_system_ram_range() and
>>> the generic page_is_ram().
>>>
>>> To enable the use of walk_system_ram_range() by the IBM EHEA
>>> ethernet driver, the generic function has to be exported.
>> 
>> I'm not sure if we have a policy on that, but I suspect we'd rather not
>> add a new export on all arches unless we need to. Especially seeing as
>> the only user is the EHEA code which is heavily in maintenance mode.
>
> If you take the exemple of function walk_iomem_res_desc(), that's 
> similar. It is only used by x86 it seems and exported for nvdimm/e820 
> driver only.
>
> See commit d76401ade0bb6ab0a7 ("libnvdimm, e820: Register all pmem 
> resources")

OK. Which begs the question whether we need both exported. It looks like
you could probably use walk_iomem_res_desc() with the right flags to do
the same thing as walk_system_ram_range().

>> I'll put the export in powerpc code and make sure that builds.
>
> I thought there was a rule that EXPORT_SYMBOL has to immediately follow 
> the function it exports. At least checkpatch checks for that.

Yeah that is a rule. But rules are made to be broken :)

I'll merge it for now with the export in powerpc code, if we want to we
can do a separate patch to move that export into generic code and get
acks for that.

cheers


Re: [RFC/WIP] powerpc: Fix 32-bit handling of MSR_EE on exceptions

2019-02-05 Thread Christophe Leroy




Le 20/12/2018 à 23:35, Benjamin Herrenschmidt a écrit :



   /*
* MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE.
@@ -205,20 +208,46 @@ transfer_to_handler_cont:
mflrr9
lwz r11,0(r9)   /* virtual address of handler */
lwz r9,4(r9)/* where to go when done */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
+   mtspr   SPRN_NRI, r0
+#endif


That's not part of your patch, it's already in the tree.


Yup rebase glitch.

  .../...


I tested it on the 8xx with the below changes in addition. No issue seen
so far.


Thanks !

I'll merge that in.


I'm currently working on a refactorisation and simplification of
exception and syscall entry on ppc32.

I plan to take your patch in my serie as it helps quite a bit. I hope 
you don't mind. I expect to come out with a series this week.




The main obscure area is that business with the irqsoff tracer and thus
the need to create stack frames around calls to trace_hardirqs_* ... we
do it in some places and not others, but I've not managed to make it
crash either. I need to get to the bottom of that, and possibly provide
proper macro helpers like ppc64 has to do it.


I can't see anything special around this in ppc32 code. As far as I 
understand, a stack frame is put in place when there is a need to
save and restore some volatile registers. At the places where nothing 
needs to be saved, nothing is done. I think that's the normal way for 
any function call, isn't it ?


Christophe