Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death

2020-08-10 Thread Michael Roth
Quoting Nathan Lynch (2020-08-07 02:05:09)
> Hi everyone,
> 
> Michael Ellerman  writes:
> > Greg Kurz  writes:
> >> On Tue, 04 Aug 2020 23:35:10 +1000
> >> Michael Ellerman  wrote:
> >>> Spinning forever seems like a bad idea, but as has been demonstrated at
> >>> least twice now, continuing when we don't know the state of the other
> >>> CPU can lead to straight up crashes.
> >>> 
> >>> So I think I'm persuaded that it's preferable to have the kernel stuck
> >>> spinning rather than oopsing.
> >>> 
> >>
> >> +1
> >>
> >>> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> >>> first instinct is no, if we're stuck here for 20s a stack trace would be
> >>> good. But then we will probably hit that on some big and/or heavily
> >>> loaded machine.
> >>> 
> >>> So possibly we should call cond_resched() but have some custom logic in
> >>> the loop to print a warning if we are stuck for more than some
> >>> sufficiently long amount of time.
> >>
> >> How long should that be ?
> >
> > Yeah good question.
> >
> > I guess step one would be seeing how long it can take on the 384 vcpu
> > machine. And we can probably test on some other big machines.
> >
> > Hopefully Nathan can give us some idea of how long he's seen it take on
> > large systems? I know he was concerned about the 20s timeout of the
> > softlockup detector.
> 
> Maybe I'm not quite clear what this is referring to, but I don't think
> stop-self/query-stopped-state latency increases with processor count, at
> least not on PowerVM. And IIRC I was concerned with the earlier patch's
> potential for causing the softlockup watchdog to rightly complain by
> polling the stopped state without ever scheduling away.
> 
> The fact that smp_query_cpu_stopped() kind of collapses the two distinct
> results from the query-cpu-stopped-state RTAS call into one return value
> may make it harder than necessary to reason about the questions around
> cond_resched() and whether to warn.
> 
> Sorry to pull this stunt but I have had some code sitting in a neglected
> branch that I think gets the logic around this right.
> 
> What we should have is a simple C wrapper for the RTAS call that reflects the
> architected inputs and outputs:
> 
> 
> (-- rtas.c --)
> 
> /**
>  * rtas_query_cpu_stopped_state() - Call RTAS query-cpu-stopped-state.
>  * @hwcpu: Identifies the processor thread to be queried.
>  * @status: Pointer to status, valid only on success.
>  *
>  * Determine whether the given processor thread is in the stopped
>  * state.  If successful and @status is non-NULL, the thread's status
>  * is stored to @status.
>  *
>  * Return:
>  * * 0   - Success
>  * * -1  - Hardware error
>  * * -2  - Busy, try again later
>  */
> int rtas_query_cpu_stopped_state(unsigned int hwcpu, unsigned int *status)
> {
>unsigned int cpu_status;
>int token;
>int fwrc;
> 
>token = rtas_token("query-cpu-stopped-state");
> 
>fwrc = rtas_call(token, 1, 2, _status, hwcpu);
>if (fwrc != 0)
>goto out;
> 
>if (status != NULL)
>*status = cpu_status;
> out:
>return fwrc;
> }
> 
> 
> 
> And then a utility function that waits for the remote thread to enter
> stopped state, with higher-level logic for rescheduling and warning. The
> fact that smp_query_cpu_stopped() currently does not handle a -2/busy
> status is a bug, fixed below by using rtas_busy_delay(). Note the
> justification for the explicit cond_resched() in the outer loop:
> 
> 
> (-- rtas.h --)
> 
> /* query-cpu-stopped-state CPU_status */
> #define RTAS_QCSS_STATUS_STOPPED 0
> #define RTAS_QCSS_STATUS_IN_PROGRESS 1
> #define RTAS_QCSS_STATUS_NOT_STOPPED 2
> 
> (-- pseries/hotplug-cpu.c --)
> 
> /**
>  * wait_for_cpu_stopped() - Wait for a cpu to enter RTAS stopped state.
>  */
> static void wait_for_cpu_stopped(unsigned int cpu)
> {
>unsigned int status;
>unsigned int hwcpu;
> 
>hwcpu = get_hard_smp_processor_id(cpu);
> 
>do {
>int fwrc;
> 
>/*
> * rtas_busy_delay() will yield only if RTAS returns a
> * busy status. Since query-cpu-stopped-state can
> * yield RTAS_QCSS_STATUS_IN_PROGRESS or
> * RTAS_QCSS_STATUS_NOT_STOPPED for an unbounded
> * period before the target thread stops, we must take
> * care to explicitly reschedule while polling.
> */
>cond_resched();
> 
>do {
>fwrc = rtas_query_cpu_stopped_state(hwcpu, );
>} while (rtas_busy_delay(fwrc));
> 
>if (fwrc == 0)
>continue;
> 
>pr_err_ratelimited("query-cpu-stopped-state for "
>  

[PATCH] powerpc: kvm: Increase HDEC threshold to enter guest

2020-08-10 Thread David Gibson
Before entering a guest, we need to set the HDEC to pull us out again
when the guest's time is up.  This needs some care, though, because the
HDEC is edge triggered, which means that if it expires before entering the
guest, the interrupt will be lost, meaning we stay in the guest
indefinitely (in practice, until the the hard lockup detector pulls us out
with an NMI).

For the POWER9, independent threads mode specific path, we attempt to
prevent that, by testing time has already expired before setting the HDEC
in kvmhv_load_regs_and_go().  However, that doesn't account for the case
where the timer expires between that test and the actual guest entry.
Preliminary instrumentation suggests that can take as long as 1.5µs under
certain load conditions, and simply checking the HDEC value we're going to
load is positive isn't enough to guarantee that leeway.

That test here is sometimes masked by a test in kvmhv_p9_guest_entry(), its
caller.  That checks that the remaining time is at 1µs.  However as noted
above that doesn't appear to be sufficient in all circumstances even
from the point HDEC is set, let alone this earlier point.

Therefore, increase the threshold we check for in both locations to 4µs
(2048 timebase ticks).  This is a pretty crude approach, but it addresses
a real problem where guest load can trigger a host hard lockup.

We're hoping to refine this in future by gathering more data on exactly
how long these paths can take, and possibly by moving the check closer to
the actual guest entry point to reduce the variance.  Getting the details
for that might take some time however.

NOTE: For reasons I haven't yet tracked down yet, I haven't actually
managed to reproduce this on current upstream.  I have reproduced it on
RHEL kernels without obvious differences in this area.  I'm still trying
to determine what the cause of that difference is, but I think it's worth
applying this change as a precaution in the interim.

Signed-off-by: David Gibson 
---
 arch/powerpc/kvm/book3s_hv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 0f83f39a2bd2..65a92dd890cb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3435,7 +3435,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
unsigned long host_pidr = mfspr(SPRN_PID);
 
hdec = time_limit - mftb();
-   if (hdec < 0)
+   if (hdec < 2048)
return BOOK3S_INTERRUPT_HV_DECREMENTER;
mtspr(SPRN_HDEC, hdec);
 
@@ -3564,7 +3564,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 
time_limit,
 
dec = mfspr(SPRN_DEC);
tb = mftb();
-   if (dec < 512)
+   if (dec < 2048)
return BOOK3S_INTERRUPT_HV_DECREMENTER;
local_paca->kvm_hstate.dec_expires = dec + tb;
if (local_paca->kvm_hstate.dec_expires < time_limit)
-- 
2.26.2



[Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain

2020-08-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205183

--- Comment #6 from Michael Ellerman (mich...@ellerman.id.au) ---
Fixed in 63dee5df43a3 ("powerpc: Allow 4224 bytes of stack expansion for the
signal frame")

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct

2020-08-10 Thread Scott Cheloha
At memory hot-remove time we can retrieve an LMB's nid from its
corresponding memory_block.  There is no need to store the nid
in multiple locations.

Note that lmb_to_memblock() uses find_memory_block() to get the
corresponding memory_block.  As find_memory_block() runs in sub-linear
time this approach is negligibly slower than what we do at present.

In exchange for this lookup at hot-remove time we no longer need to
call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
spares us an O(n^2) initialization during boot.

On systems with many LMBs that initialization overhead is palpable and
disruptive.  For example, on a box with 249854 LMBs we're seeing
drmem_init() take upwards of 30 seconds to complete:

[   53.721639] drmem: initializing drmem v2
[   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
[   80.604377] Modules linked in:
[   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
[   80.604397] NIP:  c00a4980 LR: c00a4940 CTR: 
[   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
[   80.604412] MSR:  82009033   CR: 44000248  
XER: 000d
[   80.604431] CFAR: c00a4a38 IRQMASK: 0
[   80.604431] GPR00: c00a4940 c0002dbff8493ac0 c1904400 
c0003cfede30
[   80.604431] GPR04:  c0f4095a 002f 
1000
[   80.604431] GPR08: cbf7ecdb7fb8 cbf7ecc2d3c8 0008 
c00c0002fdfb2001
[   80.604431] GPR12:  c0001e8ec200
[   80.604477] NIP [c00a4980] hot_add_scn_to_nid+0xa0/0x3e0
[   80.604486] LR [c00a4940] hot_add_scn_to_nid+0x60/0x3e0
[   80.604492] Call Trace:
[   80.604498] [c0002dbff8493ac0] [c00a4940] 
hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
[   80.604509] [c0002dbff8493b20] [c0087c10] 
memory_add_physaddr_to_nid+0x20/0x60
[   80.604521] [c0002dbff8493b40] [c10d4880] drmem_init+0x25c/0x2f0
[   80.604530] [c0002dbff8493c10] [c0010154] do_one_initcall+0x64/0x2c0
[   80.604540] [c0002dbff8493ce0] [c10c4aa0] 
kernel_init_freeable+0x2d8/0x3a0
[   80.604550] [c0002dbff8493db0] [c0010824] kernel_init+0x2c/0x148
[   80.604560] [c0002dbff8493e20] [c000b648] 
ret_from_kernel_thread+0x5c/0x74
[   80.604567] Instruction dump:
[   80.604574] 392918e8 e949 e90a000a e92a 80ea000c 1d080018 3908ffe8 
7d094214
[   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e949 
7fbe5040
[   89.047390] drmem: 249854 LMB(s)

With a patched kernel on the same machine we're no longer seeing the
soft lockup.  drmem_init() now completes in negligible time, even when
the LMB count is large.

Signed-off-by: Scott Cheloha 
---
v1:
 - RFC

v2:
 - Adjusted commit message.
 - Miscellaneous cleanup.

v3:
 - Correct issue found by Laurent Dufour :
   - Add missing put_device() call in dlpar_remove_lmb() for the
 lmb's associated mem_block.

 arch/powerpc/include/asm/drmem.h  | 21 
 arch/powerpc/mm/drmem.c   |  6 +
 .../platforms/pseries/hotplug-memory.c| 24 ---
 3 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 414d209f45bb..34e4e9b257f5 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -13,9 +13,6 @@ struct drmem_lmb {
u32 drc_index;
u32 aa_index;
u32 flags;
-#ifdef CONFIG_MEMORY_HOTPLUG
-   int nid;
-#endif
 };
 
 struct drmem_lmb_info {
@@ -104,22 +101,4 @@ static inline void 
invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
lmb->aa_index = 0x;
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-   lmb->nid = memory_add_physaddr_to_nid(lmb->base_addr);
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-   lmb->nid = -1;
-}
-#else
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-}
-#endif
-
 #endif /* _ASM_POWERPC_LMB_H */
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 59327cefbc6a..873fcfc7b875 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -362,10 +362,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
if (!drmem_info->lmbs)
return;
 
-   for_each_drmem_lmb(lmb) {
+   for_each_drmem_lmb(lmb)
read_drconf_v1_cell(lmb, );
-   lmb_set_nid(lmb);
-   }
 }
 
 static void __init init_drmem_v2_lmbs(const __be32 *prop)
@@ -410,8 +408,6 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 
lmb->aa_index = dr_cell.aa_index;
lmb->flags = dr_cell.flags;
-
-   

RE: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero

2020-08-10 Thread Van Leeuwen, Pascal
> -Original Message-
> From: Horia Geantă 
> Sent: Monday, August 10, 2020 4:34 PM
> To: Herbert Xu ; Van Leeuwen, Pascal 
> 
> Cc: Andrei Botila (OSS) ; David S. Miller 
> ; linux-cry...@vger.kernel.org; linux-
> arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org; 
> linuxppc-dev@lists.ozlabs.org; linux-s...@vger.kernel.org;
> x...@kernel.org; linux-arm-ker...@axis.com; Andrei Botila 
> ; Antoine Tenart 
> Subject: Re: [PATCH 19/22] crypto: inside-secure - add check for xts input 
> length equal to zero
>
> <<< External Email >>>
> On 8/10/2020 4:45 PM, Herbert Xu wrote:
> > On Mon, Aug 10, 2020 at 10:20:20AM +, Van Leeuwen, Pascal wrote:
> >>
> >> With all due respect, but this makes no sense.
> >
> > I agree.  This is a lot of churn for no gain.
> >
> I would say the gain is that all skcipher algorithms would behave the same
> when input length equals zero - i.e. treat the request as a no-op.
>
XTS already behaves differently because it can accept any byte amount as long
as it is not in the range 0 -16. So far, you got an EINVAL error for lengths < 
16.
The special exception on top of that for length 0 does not improve anything.

Treating a request of length 0 as a no-op is not a useful feature here, as there
is no use case where that would make sense. XTS encrypts blocks (usually disk
sectors), and cannot be chained. So an attempt to encrypt a zero length block
is most certainly some kind of error (e.g. trying to use XTS for something it
was not designed to do - big security mistake!).

> We can't say "no input" has any meaning to the other skcipher algorithms,
> but the convention is to accept this case and just return 0.
> I don't see why XTS has to be handled differently.
>
I don't see why you would blindly follow some historical convention ...
unless maybe there was some existing real use case that would benefit?

BTW: for generic ciphers I could think of some use cases where the zero
length request being a no-op makes sense if the application does not
bother to check how much data it has gathered to process (which may be
nothing), but I can't see how this could apply to XTS, being block-based.

> Thanks,
> Horia

Regards,
Pascal van Leeuwen
Silicon IP Architect Multi-Protocol Engines, Rambus Security
Rambus ROTW Holding BV
+31-73 6581953

Note: The Inside Secure/Verimatrix Silicon IP team was recently acquired by 
Rambus.
Please be so kind to update your e-mail address book with my new e-mail address.


** This message and any attachments are for the sole use of the intended 
recipient(s). It may contain information that is confidential and privileged. 
If you are not the intended recipient of this message, you are prohibited from 
printing, copying, forwarding or saving it. Please delete the message and 
attachments and notify the sender immediately. **

Rambus Inc.


RE: [PATCH 2/2 v2] powerpc/powernv: Enable and setup PCI P2P

2020-08-10 Thread Aneela Devarasetty
+ David from IBM.

-Original Message-
From: Oliver O'Halloran  
Sent: Monday, August 3, 2020 2:35 AM
To: Max Gurtovoy 
Cc: Christoph Hellwig ; linux-pci ; 
linuxppc-dev ; Israel Rukshin 
; Idan Werpoler ; Vladimir Koushnir 
; Shlomi Nimrodi ; Frederic 
Barrat ; Carol Soto ; Aneela 
Devarasetty 
Subject: Re: [PATCH 2/2 v2] powerpc/powernv: Enable and setup PCI P2P

On Thu, Apr 30, 2020 at 11:15 PM Max Gurtovoy  wrote:
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 57d3a6a..9ecc576 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3706,18 +3706,208 @@ static void pnv_pci_ioda_dma_bus_setup(struct 
> pci_bus *bus)
> }
>  }
>
> +#ifdef CONFIG_PCI_P2PDMA
> +static DEFINE_MUTEX(p2p_mutex);
> +
> +static bool pnv_pci_controller_owns_addr(struct pci_controller *hose,
> +phys_addr_t addr, size_t 
> +size) {
> +   int i;
> +
> +   /*
> +* It seems safe to assume the full range is under the same PHB, so we
> +* can ignore the size.
> +*/
> +   for (i = 0; i < ARRAY_SIZE(hose->mem_resources); i++) {
> +   struct resource *res = >mem_resources[i];
> +
> +   if (res->flags && addr >= res->start && addr < res->end)
> +   return true;
> +   }
> +   return false;
> +}
> +
> +/*
> + * find the phb owning a mmio address if not owned locally  */ static 
> +struct pnv_phb *pnv_pci_find_owning_phb(struct pci_dev *pdev,
> +  phys_addr_t addr, 
> +size_t size) {
> +   struct pci_controller *hose;
> +
> +   /* fast path */
> +   if (pnv_pci_controller_owns_addr(pdev->bus->sysdata, addr, size))
> +   return NULL;

Do we actually need this fast path? It's going to be slow either way.
Also if a device is doing p2p to another device under the same PHB then it 
should not be happening via the root complex. Is this a case you've tested?

> +   list_for_each_entry(hose, _list, list_node) {
> +   struct pnv_phb *phb = hose->private_data;
> +
> +   if (phb->type != PNV_PHB_NPU_NVLINK &&
> +   phb->type != PNV_PHB_NPU_OCAPI) {
> +   if (pnv_pci_controller_owns_addr(hose, addr, size))
> +   return phb;
> +   }
> +   }
> +   return NULL;
> +}
> +
> +static u64 pnv_pci_dma_dir_to_opal_p2p(enum dma_data_direction dir) {
> +   if (dir == DMA_TO_DEVICE)
> +   return OPAL_PCI_P2P_STORE;
> +   else if (dir == DMA_FROM_DEVICE)
> +   return OPAL_PCI_P2P_LOAD;
> +   else if (dir == DMA_BIDIRECTIONAL)
> +   return OPAL_PCI_P2P_LOAD | OPAL_PCI_P2P_STORE;
> +   else
> +   return 0;
> +}
> +
> +static int pnv_pci_ioda_enable_p2p(struct pci_dev *initiator,
> +  struct pnv_phb *phb_target,
> +  enum dma_data_direction dir) {
> +   struct pci_controller *hose;
> +   struct pnv_phb *phb_init;
> +   struct pnv_ioda_pe *pe_init;
> +   u64 desc;
> +   int rc;
> +
> +   if (!opal_check_token(OPAL_PCI_SET_P2P))
> +   return -ENXIO;
> +

> +   hose = pci_bus_to_host(initiator->bus);
> +   phb_init = hose->private_data;

You can use the pci_bus_to_pnvhb() helper

> +
> +   pe_init = pnv_ioda_get_pe(initiator);
> +   if (!pe_init)
> +   return -ENODEV;
> +
> +   if (!pe_init->tce_bypass_enabled)
> +   return -EINVAL;
> +
> +   /*
> +* Configuring the initiator's PHB requires to adjust its TVE#1
> +* setting. Since the same device can be an initiator several times 
> for
> +* different target devices, we need to keep a reference count to know
> +* when we can restore the default bypass setting on its TVE#1 when
> +* disabling. Opal is not tracking PE states, so we add a reference
> +* count on the PE in linux.
> +*
> +* For the target, the configuration is per PHB, so we keep a
> +* target reference count on the PHB.
> +*/

This irks me a bit because configuring the DMA address limits for the TVE is 
the kernel's job. What we really should be doing is using
opal_pci_map_pe_dma_window_real() to set the bypass-mode address limit for the 
TVE to something large enough to hit the MMIO ranges rather than having set_p2p 
do it as a side effect. Unfortunately, for some reason skiboot doesn't 
implement support for enabling 56bit addressing using 
opal_pci_map_pe_dma_window_real() and we do need to support older kernel's 
which used this stuff so I guess we're stuck with it for now. It'd be nice if 
we could fix this in the longer term though...

> +   mutex_lock(_mutex);
> +
> +   desc = OPAL_PCI_P2P_ENABLE | 

Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero

2020-08-10 Thread Horia Geantă
On 8/10/2020 4:45 PM, Herbert Xu wrote:
> On Mon, Aug 10, 2020 at 10:20:20AM +, Van Leeuwen, Pascal wrote:
>>
>> With all due respect, but this makes no sense.
> 
> I agree.  This is a lot of churn for no gain.
> 
I would say the gain is that all skcipher algorithms would behave the same
when input length equals zero - i.e. treat the request as a no-op.

We can't say "no input" has any meaning to the other skcipher algorithms,
but the convention is to accept this case and just return 0.
I don't see why XTS has to be handled differently.

Thanks,
Horia


RE: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero

2020-08-10 Thread Van Leeuwen, Pascal
> -Original Message-
> From: linux-crypto-ow...@vger.kernel.org  
> On Behalf Of Andrei Botila
> Sent: Friday, August 7, 2020 6:20 PM
> To: Herbert Xu ; David S. Miller 
> 
> Cc: linux-cry...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; 
> linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> linux-s...@vger.kernel.org; x...@kernel.org; linux-arm-ker...@axis.com; 
> Andrei Botila ; Antoine Tenart
> 
> Subject: [PATCH 19/22] crypto: inside-secure - add check for xts input length 
> equal to zero
>
> <<< External Email >>>
> From: Andrei Botila 
>
> Standardize the way input lengths equal to 0 are handled in all skcipher
> algorithms. All the algorithms return 0 for input lengths equal to zero.
>
> Cc: Antoine Tenart 
> Signed-off-by: Andrei Botila 
> ---
>  drivers/crypto/inside-secure/safexcel_cipher.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/crypto/inside-secure/safexcel_cipher.c 
> b/drivers/crypto/inside-secure/safexcel_cipher.c
> index 1ac3253b7903..03d06556ea98 100644
> --- a/drivers/crypto/inside-secure/safexcel_cipher.c
> +++ b/drivers/crypto/inside-secure/safexcel_cipher.c
> @@ -2533,6 +2533,9 @@ static int safexcel_skcipher_aes_xts_cra_init(struct 
> crypto_tfm *tfm)
>
>  static int safexcel_encrypt_xts(struct skcipher_request *req)
>  {
> +if (!req->cryptlen)
> +return 0;
> +
>  if (req->cryptlen < XTS_BLOCK_SIZE)
>  return -EINVAL;
>  return safexcel_queue_req(>base, skcipher_request_ctx(req),
> @@ -2541,6 +2544,9 @@ static int safexcel_encrypt_xts(struct skcipher_request 
> *req)
>
>  static int safexcel_decrypt_xts(struct skcipher_request *req)
>  {
> +if (!req->cryptlen)
> +return 0;
> +
>  if (req->cryptlen < XTS_BLOCK_SIZE)
>  return -EINVAL;
>  return safexcel_queue_req(>base, skcipher_request_ctx(req),
> --
> 2.17.1

With all due respect, but this makes no sense.

For XTS, any length below 16 is illegal, as applying CTS in order to handle 
non-cipher
block multiples (16 bytes in case of AES) requires _more_ data than 1 cipher 
block.

There is no benefit to explicitly check for zero length if there is already a 
check for
less-than-16. That's just wasting CPU cycles and  a branch predictor entry, for 
no
benefit whatsoever. (except for academic "alignment with other ciphers").

XTS has very specific use cases. No one in their right mind would call it for a
situation where it can't be applied in the first place, e.g. anything < 16 
bytes.

Regards,
Pascal van Leeuwen
Silicon IP Architect Multi-Protocol Engines, Rambus Security
Rambus ROTW Holding BV
+31-73 6581953

Note: The Inside Secure/Verimatrix Silicon IP team was recently acquired by 
Rambus.
Please be so kind to update your e-mail address book with my new e-mail address.


** This message and any attachments are for the sole use of the intended 
recipient(s). It may contain information that is confidential and privileged. 
If you are not the intended recipient of this message, you are prohibited from 
printing, copying, forwarding or saving it. Please delete the message and 
attachments and notify the sender immediately. **

Rambus Inc.



Re: [PATCH] recordmcount: Fix build failure on non arm64

2020-08-10 Thread Gregory Herrero
Hi Christophe,

On Mon, Aug 10, 2020 at 08:48:22AM +, Christophe Leroy wrote:
> Commit ea0eada45632 leads to the following build failure on powerpc:
> 
>   HOSTCC  scripts/recordmcount
> scripts/recordmcount.c: In function 'arm64_is_fake_mcount':
> scripts/recordmcount.c:440: error: 'R_AARCH64_CALL26' undeclared (first use 
> in this function)
> scripts/recordmcount.c:440: error: (Each undeclared identifier is reported 
> only once
> scripts/recordmcount.c:440: error: for each function it appears in.)
> make[2]: *** [scripts/recordmcount] Error 1
> 
> Make sure R_AARCH64_CALL26 is always defined.
> 
Oops, thanks for fixing this.

Acked-by: Gregory Herrero 

Greg

> Fixes: ea0eada45632 ("recordmcount: only record relocation of type 
> R_AARCH64_CALL26 on arm64.")
> Cc: Gregory Herrero 
> Signed-off-by: Christophe Leroy 
> ---
>  scripts/recordmcount.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
> index e59022b3f125..b9c2ee7ab43f 100644
> --- a/scripts/recordmcount.c
> +++ b/scripts/recordmcount.c
> @@ -42,6 +42,8 @@
>  #define R_ARM_THM_CALL   10
>  #define R_ARM_CALL   28
>  
> +#define R_AARCH64_CALL26 283
> +
>  static int fd_map;   /* File descriptor for file being modified. */
>  static int mmap_failed; /* Boolean flag. */
>  static char gpfx;/* prefix for global symbol name (sometimes '_') */
> -- 
> 2.25.0
> 


Re: [PATCH v2] ASoC: fsl-asoc-card: Get "extal" clock rate by clk_get_rate

2020-08-10 Thread Nicolin Chen
On Mon, Aug 10, 2020 at 04:11:43PM +0800, Shengjiu Wang wrote:
> On some platform(.e.g. i.MX8QM MEK), the "extal" clock is different
> with the mclk of codec, then the clock rate is also different.
> So it is better to get clock rate of "extal" rate by clk_get_rate,
> don't reuse the clock rate of mclk.
> 
> Signed-off-by: Shengjiu Wang 

Acked-by: Nicolin Chen 


Re: [PATCH] powerpc/pseries: explicitly reschedule during drmem_lmb list traversal

2020-08-10 Thread Nathan Lynch
Michael Ellerman  writes:
> One thought, which I possibly should not put in writing, is that we
> could use the alignment of the pointer as a poor man's substitute for a
> counter, eg:
>
> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
> +{
> + if (lmb % PAGE_SIZE == 0)
> + cond_resched();
> +
> + return ++lmb;
> +}
>
> I think the lmbs are allocated in a block, so I think that will work.
> Maybe PAGE_SIZE is not the right size to use, but you get the idea.
>
> Gross I know, but might be OK as short term solution?

OK, looking into this.


Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero

2020-08-10 Thread Eric Biggers
On Mon, Aug 10, 2020 at 05:33:39PM +0300, Horia Geantă wrote:
> On 8/10/2020 4:45 PM, Herbert Xu wrote:
> > On Mon, Aug 10, 2020 at 10:20:20AM +, Van Leeuwen, Pascal wrote:
> >>
> >> With all due respect, but this makes no sense.
> > 
> > I agree.  This is a lot of churn for no gain.
> > 
> I would say the gain is that all skcipher algorithms would behave the same
> when input length equals zero - i.e. treat the request as a no-op.
> 
> We can't say "no input" has any meaning to the other skcipher algorithms,
> but the convention is to accept this case and just return 0.
> I don't see why XTS has to be handled differently.
> 

CTS also rejects empty inputs.

The rule it follows is just that all input lengths >= blocksize are allowed.
Input lengths < blocksize aren't allowed.

- Eric


Re: [PATCH] Documentation/features: refresh powerpc arch support files

2020-08-10 Thread Tobias Klauser
On 2020-08-10 at 17:09:51 +0200, Christophe Leroy  
wrote:
> 
> 
> Le 10/08/2020 à 12:09, Tobias Klauser a écrit :
> > Support for these was added by commit aa65ff6b18e0 ("powerpc/64s:
> > Implement queued spinlocks and rwlocks").
> > 
> > Signed-off-by: Tobias Klauser 
> > ---
> >   Documentation/features/locking/queued-rwlocks/arch-support.txt  | 2 +-
> >   .../features/locking/queued-spinlocks/arch-support.txt  | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/features/locking/queued-rwlocks/arch-support.txt 
> > b/Documentation/features/locking/queued-rwlocks/arch-support.txt
> > index 5c6bcfcf8e1f..4dd5e554873f 100644
> > --- a/Documentation/features/locking/queued-rwlocks/arch-support.txt
> > +++ b/Documentation/features/locking/queued-rwlocks/arch-support.txt
> > @@ -22,7 +22,7 @@
> >   |   nios2: | TODO |
> >   |openrisc: |  ok  |
> >   |  parisc: | TODO |
> > -| powerpc: | TODO |
> > +| powerpc: |  ok  |
> 
> In your commit log you are refering to a commit titled "powerpc/64s:"
> 
> Are you sure it is now OK for all powerpc, not only for book3s/64 as
> suggested by yout text ?

The change was generated by running
Documentation/features/scripts/features-refresh.sh
Sorry, I should have mentioned this in the commit message. I noticed the
updated features for powerpc after updating the RISC-V supported
features [1].

[1] 
https://lore.kernel.org/linux-riscv/20200810095000.32092-1-tklau...@distanz.ch/T/#u

AFAIK, the features-refresh.sh script has no way of distinguishing
between different types of an architecture. It just checks for the
respective Kconfig symbols listed in the
Documentation/features/**/arch-support.txt files in all arch/**/Kconfig
files and updates the feature to "ok" if it finds the Kconfig symbol.


Re: [PATCH] Documentation/features: refresh powerpc arch support files

2020-08-10 Thread Christophe Leroy




Le 10/08/2020 à 12:09, Tobias Klauser a écrit :

Support for these was added by commit aa65ff6b18e0 ("powerpc/64s:
Implement queued spinlocks and rwlocks").

Signed-off-by: Tobias Klauser 
---
  Documentation/features/locking/queued-rwlocks/arch-support.txt  | 2 +-
  .../features/locking/queued-spinlocks/arch-support.txt  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/features/locking/queued-rwlocks/arch-support.txt 
b/Documentation/features/locking/queued-rwlocks/arch-support.txt
index 5c6bcfcf8e1f..4dd5e554873f 100644
--- a/Documentation/features/locking/queued-rwlocks/arch-support.txt
+++ b/Documentation/features/locking/queued-rwlocks/arch-support.txt
@@ -22,7 +22,7 @@
  |   nios2: | TODO |
  |openrisc: |  ok  |
  |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |


In your commit log you are refering to a commit titled "powerpc/64s:"

Are you sure it is now OK for all powerpc, not only for book3s/64 as 
suggested by yout text ?


Christophe


  |   riscv: | TODO |
  |s390: | TODO |
  |  sh: | TODO |
diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt 
b/Documentation/features/locking/queued-spinlocks/arch-support.txt
index b55e420a34ea..b16d4f71e5ce 100644
--- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
+++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
@@ -22,7 +22,7 @@
  |   nios2: | TODO |
  |openrisc: |  ok  |
  |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
  |   riscv: | TODO |
  |s390: | TODO |
  |  sh: | TODO |



[PATCH] Documentation/features: refresh powerpc arch support files

2020-08-10 Thread Tobias Klauser
Support for these was added by commit aa65ff6b18e0 ("powerpc/64s:
Implement queued spinlocks and rwlocks").

Signed-off-by: Tobias Klauser 
---
 Documentation/features/locking/queued-rwlocks/arch-support.txt  | 2 +-
 .../features/locking/queued-spinlocks/arch-support.txt  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/features/locking/queued-rwlocks/arch-support.txt 
b/Documentation/features/locking/queued-rwlocks/arch-support.txt
index 5c6bcfcf8e1f..4dd5e554873f 100644
--- a/Documentation/features/locking/queued-rwlocks/arch-support.txt
+++ b/Documentation/features/locking/queued-rwlocks/arch-support.txt
@@ -22,7 +22,7 @@
 |   nios2: | TODO |
 |openrisc: |  ok  |
 |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
 |   riscv: | TODO |
 |s390: | TODO |
 |  sh: | TODO |
diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt 
b/Documentation/features/locking/queued-spinlocks/arch-support.txt
index b55e420a34ea..b16d4f71e5ce 100644
--- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
+++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
@@ -22,7 +22,7 @@
 |   nios2: | TODO |
 |openrisc: |  ok  |
 |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
 |   riscv: | TODO |
 |s390: | TODO |
 |  sh: | TODO |
-- 
2.27.0



Re: [PATCH] recordmcount: Fix build failure on non arm64

2020-08-10 Thread Catalin Marinas
On Mon, 10 Aug 2020 08:48:22 + (UTC), Christophe Leroy wrote:
> Commit ea0eada45632 leads to the following build failure on powerpc:
> 
>   HOSTCC  scripts/recordmcount
> scripts/recordmcount.c: In function 'arm64_is_fake_mcount':
> scripts/recordmcount.c:440: error: 'R_AARCH64_CALL26' undeclared (first use 
> in this function)
> scripts/recordmcount.c:440: error: (Each undeclared identifier is reported 
> only once
> scripts/recordmcount.c:440: error: for each function it appears in.)
> make[2]: *** [scripts/recordmcount] Error 1
> 
> [...]

Applied to arm64 (for-next/core), thanks!

[1/1] recordmcount: Fix build failure on non arm64
  https://git.kernel.org/arm64/c/3df14264ad99

-- 
Catalin



Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero

2020-08-10 Thread Herbert Xu
On Mon, Aug 10, 2020 at 10:20:20AM +, Van Leeuwen, Pascal wrote:
>
> With all due respect, but this makes no sense.

I agree.  This is a lot of churn for no gain.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] recordmcount: Fix build failure on non arm64

2020-08-10 Thread Steven Rostedt
On Mon, 10 Aug 2020 13:18:55 +0100
Catalin Marinas  wrote:

> > Oops, thanks for fixing this.
> > 
> > Acked-by: Gregory Herrero   
> 
> Thanks. I'll queue it via the arm64 tree (as I did with the previous
> fix) but I'll wait a bit for Steve to ack it.

Acked-by: Steven Rostedt (VMware) 

-- Steve


Re: [PATCH] powerpc/papr_scm: Make access mode of 'perf_stats' attribute file to '0400'

2020-08-10 Thread Michael Ellerman
Vaibhav Jain  writes:
> The newly introduced 'perf_stats' attribute uses the default access
> mode of 0444 letting non-root users access performance stats of an
> nvdimm and potentially force the kernel into issuing large number of
> expensive HCALLs. Since the information exposed by this attribute
> cannot be cached hence its better to ward of access to this attribute
> from non-root users.
>
> Hence this patch updates the access-mode of 'perf_stats' sysfs
> attribute file to 0400 to make it only readable to root-users.

Or should we ratelimit it?

Fixes: ??

> Reported-by: Aneesh Kumar K.V 
> Signed-off-by: Vaibhav Jain 

cheers



[Virtual ppce500] virtio_gpu virtio0: swiotlb buffer is full

2020-08-10 Thread Christian Zigotzky

Hello,

Just for info. The latest git kernel doesn't work with a virtio_gpu anymore.

QEMU command: qemu-system-ppc64 -M ppce500 -cpu e5500 -enable-kvm -m 
1024 -kernel uImage -drive 
format=raw,file=fienix-soar_3.0-2020608-net.img,index=0,if=virtio -nic 
user,model=e1000 -append "rw root=/dev/vda2" -device virtio-vga -device 
virtio-mouse-pci -device virtio-keyboard-pci -device pci-ohci,id=newusb 
-device usb-audio,bus=newusb.0 -smp 4


Error messages:

virtio_gpu virtio0: swiotlb buffer is full (sz: 4096 bytes), total 0 
(slots), used 0 (slots)

BUG: Kernel NULL pointer dereference on read at 0x0010
Faulting instruction address: 0xc00c7324
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K PREEMPT SMP NR_CPUS=4 QEMU e500
Modules linked in:
CPU: 2 PID: 1678 Comm: kworker/2:2 Not tainted 
5.9-a3_A-EON_X5000-11735-g06a81c1c7db9-dirty #1

Workqueue: events .virtio_gpu_dequeue_ctrl_func
NIP:  c00c7324 LR: c00c72e4 CTR: c0462930
REGS: c0003dba75e0 TRAP: 0300   Not tainted 
(5.9-a3_A-EON_X5000-11735-g06a81c1c7db9-dirty)

MSR:  90029000   CR: 24002288  XER: 
DEAR: 0010 ESR:  IRQMASK: 0
GPR00: c00c6188 c0003dba7870 c17f2300 c0003d893010
GPR04:  0001  
GPR08:    7f7f7f7f7f7f7f7f
GPR12: 24002284 c0003fff9200 c008c3a0 c61566c0
GPR16:    
GPR20:    
GPR24: 0001 0011  
GPR28: c0003d893010   c0003d893010
NIP [c00c7324] .dma_direct_unmap_sg+0x4c/0xd8
LR [c00c72e4] .dma_direct_unmap_sg+0xc/0xd8
Call Trace:
[c0003dba7870] [c0003dba7950] 0xc0003dba7950 (unreliable)
[c0003dba7920] [c00c6188] .dma_unmap_sg_attrs+0x5c/0x98
[c0003dba79d0] [c05cd438] .drm_gem_shmem_free_object+0x98/0xcc
[c0003dba7a50] [c06af5b4] .virtio_gpu_cleanup_object+0xc8/0xd4
[c0003dba7ad0] [c06ad3bc] .virtio_gpu_cmd_unref_cb+0x1c/0x30
[c0003dba7b40] [c06adab8] 
.virtio_gpu_dequeue_ctrl_func+0x208/0x28c

[c0003dba7c10] [c0086b70] .process_one_work+0x1a4/0x258
[c0003dba7cb0] [c00870f4] .worker_thread+0x214/0x284
[c0003dba7d70] [c008c4f0] .kthread+0x150/0x158
[c0003dba7e20] [c82c] .ret_from_kernel_thread+0x58/0x60
Instruction dump:
f821ff51 7cb82b78 7cdb3378 4e00 7cfa3b78 3bc0 7f9ec000 41fc0014
382100b0 81810008 7d808120 48bc1ba8  ebfc0248 833d0018 7fff4850
---[ end trace f28d194d9f0955a8 ]---

virtio_gpu virtio0: swiotlb buffer is full (sz: 4096 bytes), total 0 
(slots), used 0 (slots)
virtio_gpu virtio0: swiotlb buffer is full (sz: 16384 bytes), total 0 
(slots), used 0 (slots)




The kernel 5.8 works without any problems in this virtual machine.

Could you please check the latest updates?

Thanks,
Christian


Re: [PATCH] recordmcount: Fix build failure on non arm64

2020-08-10 Thread Catalin Marinas
On Mon, Aug 10, 2020 at 11:17:30AM +0200, Gregory Herrero wrote:
> On Mon, Aug 10, 2020 at 08:48:22AM +, Christophe Leroy wrote:
> > Commit ea0eada45632 leads to the following build failure on powerpc:
> > 
> >   HOSTCC  scripts/recordmcount
> > scripts/recordmcount.c: In function 'arm64_is_fake_mcount':
> > scripts/recordmcount.c:440: error: 'R_AARCH64_CALL26' undeclared (first use 
> > in this function)
> > scripts/recordmcount.c:440: error: (Each undeclared identifier is reported 
> > only once
> > scripts/recordmcount.c:440: error: for each function it appears in.)
> > make[2]: *** [scripts/recordmcount] Error 1
> > 
> > Make sure R_AARCH64_CALL26 is always defined.
> > 
> Oops, thanks for fixing this.
> 
> Acked-by: Gregory Herrero 

Thanks. I'll queue it via the arm64 tree (as I did with the previous
fix) but I'll wait a bit for Steve to ack it.

-- 
Catalin


[PATCH] powerpc/pkeys: Fix boot failures with Nemo board (A-EON AmigaOne X1000)

2020-08-10 Thread Aneesh Kumar K.V
On p6 and before we should avoid updating UAMOR SPRN. This resulted
in boot failure on Nemo board.

Fixes: 269e829f48a0 ("powerpc/book3s64/pkey: Disable pkey on POWER6 and before")
Reported-by: Christian Zigotzky 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/hash_utils.c |  5 ++---
 arch/powerpc/mm/book3s64/pkeys.c  | 12 ++--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 1478fceeb683..1da9dbba9217 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1115,9 +1115,8 @@ void hash__early_init_mmu_secondary(void)
&& cpu_has_feature(CPU_FTR_HVMODE))
tlbiel_all();
 
-#ifdef CONFIG_PPC_MEM_KEYS
-   mtspr(SPRN_UAMOR, default_uamor);
-#endif
+   if (IS_ENABLED(CONFIG_PPC_MEM_KEYS) && mmu_has_feature(MMU_FTR_PKEY))
+   mtspr(SPRN_UAMOR, default_uamor);
 }
 #endif /* CONFIG_SMP */
 
diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 69a6b87f2bb4..b1d091a97611 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -73,12 +73,6 @@ static int scan_pkey_feature(void)
if (early_radix_enabled())
return 0;
 
-   /*
-* Only P7 and above supports SPRN_AMR update with MSR[PR] = 1
-*/
-   if (!early_cpu_has_feature(CPU_FTR_ARCH_206))
-   return 0;
-
ret = of_scan_flat_dt(dt_scan_storage_keys, _total);
if (ret == 0) {
/*
@@ -124,6 +118,12 @@ void __init pkey_early_init_devtree(void)
 __builtin_popcountl(ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT)
!= (sizeof(u64) * BITS_PER_BYTE));
 
+   /*
+* Only P7 and above supports SPRN_AMR update with MSR[PR] = 1
+*/
+   if (!early_cpu_has_feature(CPU_FTR_ARCH_206))
+   return;
+
/* scan the device tree for pkey feature */
pkeys_total = scan_pkey_feature();
if (!pkeys_total)
-- 
2.26.2



Re: [PASEMI] Nemo board doesn't boot anymore after the commit "powerpc/book3s64/pkeys: Simplify pkey disable branch"

2020-08-10 Thread Christian Zigotzky

Am 10.08.20 um 10:58 schrieb Aneesh Kumar K.V:

On 8/10/20 2:15 PM, Christian Zigotzky wrote:

Hello Aneesh,

I tested the new kernel today and unfortunately it doesn't run very 
well.


I have only one core (1 physical processor; 1 core; 2 threads) 
instead of two cores (1 physical processor; 2 cores; 2 threads) so 
the system is slower.


Boot log: http://www.xenosoft.de/dmesg_nemo_board_kernel_5.9.txt

Could you please check the updates?



modified   arch/powerpc/mm/book3s64/hash_utils.c
@@ -1116,7 +1116,8 @@ void hash__early_init_mmu_secondary(void)
 tlbiel_all();

 #ifdef CONFIG_PPC_MEM_KEYS
-    mtspr(SPRN_UAMOR, default_uamor);
+    if (mmu_has_feature(MMU_FTR_PKEY))
+    mtspr(SPRN_UAMOR, default_uamor);
 #endif
 }
 #endif /* CONFIG_SMP */



-aneesh

Hello Aneesh,

Your modifications work! I have 2 cores again and I can see the boot 
messages.


Thanks a lot!

Cheers,
Christian


Re: [PATCH] arch/powerpc: use simple i2c probe function

2020-08-10 Thread Wolfram Sang
On Fri, Aug 07, 2020 at 05:27:13PM +0200, Stephen Kitt wrote:
> The i2c probe functions here don't use the id information provided in
> their second argument, so the single-parameter i2c probe function
> ("probe_new") can be used instead.
> 
> This avoids scanning the identifier tables during probes.
> 
> Signed-off-by: Stephen Kitt 

This is useful, helps deprecating the old probe method:

Acked-by: Wolfram Sang 



signature.asc
Description: PGP signature


Re: [PASEMI] Nemo board doesn't boot anymore after the commit "powerpc/book3s64/pkeys: Simplify pkey disable branch"

2020-08-10 Thread Aneesh Kumar K.V

On 8/10/20 2:15 PM, Christian Zigotzky wrote:

Hello Aneesh,

I tested the new kernel today and unfortunately it doesn't run very well.

I have only one core (1 physical processor; 1 core; 2 threads) instead 
of two cores (1 physical processor; 2 cores; 2 threads) so the system is 
slower.


Boot log: http://www.xenosoft.de/dmesg_nemo_board_kernel_5.9.txt

Could you please check the updates?



modified   arch/powerpc/mm/book3s64/hash_utils.c
@@ -1116,7 +1116,8 @@ void hash__early_init_mmu_secondary(void)
tlbiel_all();

 #ifdef CONFIG_PPC_MEM_KEYS
-   mtspr(SPRN_UAMOR, default_uamor);
+   if (mmu_has_feature(MMU_FTR_PKEY))
+   mtspr(SPRN_UAMOR, default_uamor);
 #endif
 }
 #endif /* CONFIG_SMP */



-aneesh


[PATCHv5 2/2] powerpc/pseries: update device tree before ejecting hotplug uevents

2020-08-10 Thread Pingfan Liu
A bug is observed on pseries by taking the following steps on rhel:
-1. drmgr -c mem -r -q 5
-2. echo c > /proc/sysrq-trigger

And then, the failure looks like:
kdump: saving to /sysroot//var/crash/127.0.0.1-2020-01-16-02:06:14/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
 Checking for memory holes : [  0.0 %] /
   Checking for memory holes : [100.0 %] |  
 Excluding unnecessary pages   : [100.0 %] \
   Copying data  : [  0.3 %] -  
eta: 38s[   44.337636] hash-mmu: mm: Hashing failure ! EA=0x7fffba40 
access=0x8004 current=makedumpfile
[   44.337663] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
psize 2 pte=0xc0005504
[   44.337677] hash-mmu: mm: Hashing failure ! EA=0x7fffba40 
access=0x8004 current=makedumpfile
[   44.337692] hash-mmu: trap=0x300 vsid=0x13a109c ssize=1 base psize=2 
psize 2 pte=0xc0005504
[   44.337708] makedumpfile[469]: unhandled signal 7 at 7fffba40 nip 
7fffbbc4d7fc lr 00011356ca3c code 2
[   44.338548] Core dump to |/bin/false pipe failed
/lib/kdump-lib-initramfs.sh: line 98:   469 Bus error   
$CORE_COLLECTOR /proc/vmcore 
$_mp/$KDUMP_PATH/$HOST_IP-$DATEDIR/vmcore-incomplete
kdump: saving vmcore failed

* Root cause *
  After analyzing, it turns out that in the current implementation,
when hot-removing lmb, the KOBJ_REMOVE event ejects before the dt updating as
the code __remove_memory() comes before drmem_update_dt().
So in kdump kernel, when read_from_oldmem() resorts to
pSeries_lpar_hpte_insert() to install hpte, but fails with -2 due to
non-exist pfn. And finally, low_hash_fault() raise SIGBUS to process, as it
can be observed "Bus error"

>From a viewpoint of listener and publisher, the publisher notifies the
listener before data is ready.  This introduces a problem where udev
launches kexec-tools (due to KOBJ_REMOVE) and loads a stale dt before
updating. And in capture kernel, makedumpfile will access the memory based
on the stale dt info, and hit a SIGBUS error due to an un-existed lmb.

* Fix *
This bug is introduced by commit 063b8b1251fd
("powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR
request"), which tried to combine all the dt updating into one.

To fix this issue, meanwhile not to introduce a quadratic runtime
complexity by the model:
  dlpar_memory_add_by_count
for_each_drmem_lmb <--
  dlpar_add_lmb
drmem_update_dt(_v1|_v2)
  for_each_drmem_lmb   <--
The dt should still be only updated once, and just before the last memory
online/offline event is ejected to user space. Achieve this by tracing the
num of lmb added or removed.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Hari Bathini 
Cc: Nathan Lynch 
Cc: Nathan Fontenot 
Cc: Laurent Dufour 
To: linuxppc-dev@lists.ozlabs.org
Cc: ke...@lists.infradead.org
---
v4 -> v5: change dlpar_add_lmb()/dlpar_remove_lmb() prototype to report
  whether dt is updated successfully.
  Fix a condition boundary check bug
v3 -> v4: resolve a quadratic runtime complexity issue.
  This series is applied on next-test branch
 arch/powerpc/platforms/pseries/hotplug-memory.c | 102 +++-
 1 file changed, 80 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 46cbcd1..1567d9f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -350,13 +350,22 @@ static bool lmb_is_removable(struct drmem_lmb *lmb)
return true;
 }
 
-static int dlpar_add_lmb(struct drmem_lmb *);
+enum dt_update_status {
+   DT_NOUPDATE,
+   DT_TOUPDATE,
+   DT_UPDATED,
+};
+
+/* "*dt_update" returns DT_UPDATED if updated */
+static int dlpar_add_lmb(struct drmem_lmb *lmb,
+   enum dt_update_status *dt_update);
 
-static int dlpar_remove_lmb(struct drmem_lmb *lmb)
+static int dlpar_remove_lmb(struct drmem_lmb *lmb,
+   enum dt_update_status *dt_update)
 {
unsigned long block_sz;
phys_addr_t base_addr;
-   int rc, nid;
+   int rc, ret, nid;
 
if (!lmb_is_removable(lmb))
return -EINVAL;
@@ -372,6 +381,13 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+   if (*dt_update) {
+   ret = drmem_update_dt();
+   if (ret)
+   pr_warn("%s fail to update dt, but continue\n", 
__func__);
+   else
+   *dt_update = DT_UPDATED;
+   }
 
__remove_memory(nid, base_addr, block_sz);
 
@@ -387,6 +403,7 @@ static int 

[PATCHv5 1/2] powerpc/pseries: group lmb operation and memblock's

2020-08-10 Thread Pingfan Liu
This patch prepares for the incoming patch which swaps the order of
KOBJ_ADD/REMOVE uevent and dt's updating.

The dt updating should come after lmb operations, and before
__remove_memory()/__add_memory().  Accordingly, grouping all lmb operations
before the memblock's.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Hari Bathini 
Cc: Nathan Lynch 
Cc: Nathan Fontenot 
Cc: Laurent Dufour 
To: linuxppc-dev@lists.ozlabs.org
Cc: ke...@lists.infradead.org
---
v4 -> v5: fix the miss of clearing DRCONF_MEM_ASSIGNED in a failure path
 arch/powerpc/platforms/pseries/hotplug-memory.c | 28 +
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 5d545b7..46cbcd1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -355,7 +355,8 @@ static int dlpar_add_lmb(struct drmem_lmb *);
 static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 {
unsigned long block_sz;
-   int rc;
+   phys_addr_t base_addr;
+   int rc, nid;
 
if (!lmb_is_removable(lmb))
return -EINVAL;
@@ -364,17 +365,19 @@ static int dlpar_remove_lmb(struct drmem_lmb *lmb)
if (rc)
return rc;
 
+   base_addr = lmb->base_addr;
+   nid = lmb->nid;
block_sz = pseries_memory_block_size();
 
-   __remove_memory(lmb->nid, lmb->base_addr, block_sz);
-
-   /* Update memory regions for memory remove */
-   memblock_remove(lmb->base_addr, block_sz);
-
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 
+   __remove_memory(nid, base_addr, block_sz);
+
+   /* Update memory regions for memory remove */
+   memblock_remove(base_addr, block_sz);
+
return 0;
 }
 
@@ -603,22 +606,29 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
}
 
lmb_set_nid(lmb);
+   lmb->flags |= DRCONF_MEM_ASSIGNED;
+
block_sz = memory_block_size_bytes();
 
/* Add the memory */
rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
if (rc) {
invalidate_lmb_associativity_index(lmb);
+   lmb_clear_nid(lmb);
+   lmb->flags &= ~DRCONF_MEM_ASSIGNED;
return rc;
}
 
rc = dlpar_online_lmb(lmb);
if (rc) {
-   __remove_memory(lmb->nid, lmb->base_addr, block_sz);
+   int nid = lmb->nid;
+   phys_addr_t base_addr = lmb->base_addr;
+
invalidate_lmb_associativity_index(lmb);
lmb_clear_nid(lmb);
-   } else {
-   lmb->flags |= DRCONF_MEM_ASSIGNED;
+   lmb->flags &= ~DRCONF_MEM_ASSIGNED;
+
+   __remove_memory(nid, base_addr, block_sz);
}
 
return rc;
-- 
2.7.5



[PATCH] recordmcount: Fix build failure on non arm64

2020-08-10 Thread Christophe Leroy
Commit ea0eada45632 leads to the following build failure on powerpc:

  HOSTCC  scripts/recordmcount
scripts/recordmcount.c: In function 'arm64_is_fake_mcount':
scripts/recordmcount.c:440: error: 'R_AARCH64_CALL26' undeclared (first use in 
this function)
scripts/recordmcount.c:440: error: (Each undeclared identifier is reported only 
once
scripts/recordmcount.c:440: error: for each function it appears in.)
make[2]: *** [scripts/recordmcount] Error 1

Make sure R_AARCH64_CALL26 is always defined.

Fixes: ea0eada45632 ("recordmcount: only record relocation of type 
R_AARCH64_CALL26 on arm64.")
Cc: Gregory Herrero 
Signed-off-by: Christophe Leroy 
---
 scripts/recordmcount.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index e59022b3f125..b9c2ee7ab43f 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -42,6 +42,8 @@
 #define R_ARM_THM_CALL 10
 #define R_ARM_CALL 28
 
+#define R_AARCH64_CALL26   283
+
 static int fd_map; /* File descriptor for file being modified. */
 static int mmap_failed; /* Boolean flag. */
 static char gpfx;  /* prefix for global symbol name (sometimes '_') */
-- 
2.25.0



Re: [PASEMI] Nemo board doesn't boot anymore after the commit "powerpc/book3s64/pkeys: Simplify pkey disable branch"

2020-08-10 Thread Christian Zigotzky

Hello Aneesh,

I tested the new kernel today and unfortunately it doesn't run very well.

I have only one core (1 physical processor; 1 core; 2 threads) instead 
of two cores (1 physical processor; 2 cores; 2 threads) so the system is 
slower.


Boot log: http://www.xenosoft.de/dmesg_nemo_board_kernel_5.9.txt

Could you please check the updates?

Thanks,
Christian


On 10 August 2020 at 09:56 am, Christian Zigotzky wrote:

Hello Aneesh,

The Nemo board boots with your patch but unfortunately I don't see any 
boot messages anymore.


Please find attached the kernel config.

Thanks,
Christian


On 09 August 2020 at 5:49 pm, Christian Zigotzky wrote:

Hello Aneesh,

Many thanks for your fast response and thanks a lot for your patch!
I will patch and compile a new git kernel tomorrow. I am looking 
forward to the result.


Have a nice day!

Cheers,
Christian

On 9. Aug 2020, at 17:11, Aneesh Kumar K.V 
 wrote:


"Aneesh Kumar K.V"  writes:


On 8/9/20 8:04 PM, Aneesh Kumar K.V wrote:
On 8/9/20 7:42 PM, Christian Zigotzky wrote:

Hello,

The Nemo board (A-EON AmigaOne X1000) [1] doesn't start with the
latest Git kernel anymore after the commit "powerpc/book3s64/pkeys:
Simplify pkey disable branch" [2].

I bisected today [3].

Result: powerpc/book3s64/pkeys: Simplify pkey disable branch
(a4678d4b477c3d2901f101986ca01406f3b7eaea) [2] is the first bad 
commit.


Unfortunately I wasn't able to revert the first bad commit. The 
first
bad commit depends on many other commits, which unfortunately I 
don't
know. I tried to remove the modifications of the files from the 
first
bad commit but without any success. There are just too many 
dependencies.


Additionally I reverted the commit "selftests/powerpc: Fix pkey
syscall redefinitions" [4] and compiled a new kernel but without any
success.

Could you please check the first bad commit?

Thanks,
Christian



Can you share a successful boot log of the system so that i can 
double

check the cpu_feature and mmu_feature reported ? I am looking for
details similar to below.

[    0.00] cpu_features  = 0x0001c07f8f5f91a7
[    0.00]   possible    = 0x0001fbffcf5fb1a7
[    0.00]   always  = 0x0003800081a1
[    0.00] cpu_user_features = 0xdc0065c2 0xefe0
[    0.00] mmu_features  = 0x7c006001
[    0.00] firmware_features = 0x001fc45bfc57
[    0.00] vmalloc start = 0xc008
[    0.00] IO start  = 0xc00a
[    0.00] vmemmap start = 0xc00c


IIUC this is P5+? (ISA 2.04). On that pkey should be marked 
disabled via


static int scan_pkey_feature(void)
{
 int ret;
 int pkeys_total = 0;

 

 /*
  * Only P7 and above supports SPRN_AMR update with MSR[PR] = 1
  */
 if (!early_cpu_has_feature(CPU_FTR_ARCH_206))
 return 0;


}

Can you boot with CONFIG_PPC_MEM_KEYS=n ?


Can you try this change on top of master?


modified   arch/powerpc/mm/book3s64/pkeys.c
@@ -215,10 +215,6 @@ void __init pkey_early_init_devtree(void)

  pr_info("Enabling pkeys with max key count %d\n", num_pkey);
  out:
-    /*
- * Setup uamor on boot cpu
- */
-    mtspr(SPRN_UAMOR, default_uamor);

  return;
  }


Full patch with better description.

commit 919a177bcdaf1eaeaeecc0d0f50a688629d7b5df
Author: Aneesh Kumar K.V 
Date:   Sun Aug 9 20:37:38 2020 +0530

    powerpc/pkeys: Fix boot failures with Nemo board (A-EON AmigaOne 
X1000)


    On p6 and before we should avoid updating UAMOR SPRN. This resulted
    in boot failure on Nemo board.

    Fixes: 269e829f48a0 ("powerpc/book3s64/pkey: Disable pkey on 
POWER6 and before")

    Reported-by: Christian Zigotzky 
    Signed-off-by: Aneesh Kumar K.V 

diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
b/arch/powerpc/mm/book3s64/pkeys.c

index 69a6b87f2bb4..b1d091a97611 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -73,12 +73,6 @@ static int scan_pkey_feature(void)
    if (early_radix_enabled())
    return 0;

-    /*
- * Only P7 and above supports SPRN_AMR update with MSR[PR] = 1
- */
-    if (!early_cpu_has_feature(CPU_FTR_ARCH_206))
-    return 0;
-
    ret = of_scan_flat_dt(dt_scan_storage_keys, _total);
    if (ret == 0) {
    /*
@@ -124,6 +118,12 @@ void __init pkey_early_init_devtree(void)
 __builtin_popcountl(ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT)
    != (sizeof(u64) * BITS_PER_BYTE));

+    /*
+ * Only P7 and above supports SPRN_AMR update with MSR[PR] = 1
+ */
+    if (!early_cpu_has_feature(CPU_FTR_ARCH_206))
+    return;
+
    /* scan the device tree for pkey feature */
    pkeys_total = scan_pkey_feature();
    if (!pkeys_total)






Re: [RFC PATCH 1/2] powerpc/numa: Introduce logical numa id

2020-08-10 Thread Srikar Dronamraju
* Aneesh Kumar K.V  [2020-08-06 16:14:21]:

> >
> > associativity_to_nid gets called the first time a cpu is being made present
> > from offline. So it need not be in boot path. We may to verify if cpu
> > hotplug, dlpar, operations are synchronized. For example a memory hotadd and
> > cpu hotplug are they synchronized? I am not sure if they are synchronized at
> > this time.
> 
> But you don't online cpu or memory to a non existent node post boot
> right?. If the node is existent we have already initialized the nid_map.
> 

Not sure what you mean by existent and non-existent. Are you referring to
online / offline?

> However i am not sure whether we do a parallel initialization of devices. ie,
> of_device_add getting called in parallel. if it can then we need the
> below?
> 
> @@ -226,6 +226,7 @@ static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 
> 1] =  NUMA_NO_NODE};
>  int firmware_group_id_to_nid(int firmware_gid)
>  {
> static int last_nid = 0;
> +   static DEFINE_SPINLOCK(node_id_lock);
> 
> /*
>  * For PowerNV we don't change the node id. This helps to avoid
> @@ -238,8 +239,13 @@ int firmware_group_id_to_nid(int firmware_gid)
> if (firmware_gid ==  -1)
> return NUMA_NO_NODE;
> 
> -   if (nid_map[firmware_gid] == NUMA_NO_NODE)
> -   nid_map[firmware_gid] = last_nid++;
> +   if (nid_map[firmware_gid] == NUMA_NO_NODE) {
> +   spin_lock(_id_lock);
> +   /*  recheck with lock held */
> +   if (nid_map[firmware_gid] == NUMA_NO_NODE)
> +   nid_map[firmware_gid] = last_nid++;
> +   spin_unlock(_id_lock);
> +   }
> 
> return nid_map[firmware_gid];
>  }
> 

This should help.


> 
> I will also add a las_nid > MAX_NUMNODES check in
> firmware_group_id_to_nid() to handle the case where we find more numa
> nodes than MAX_NUMANODES in device tree.
> 

Okay, 

Whats your plan to handle the node distances?
Currently the node distances we compute from the device tree properties are
based on distance from node 0.  If you rename a different node as node 0,
how do you plan to remap the node distances?

> -aneesh

-- 
Thanks and Regards
Srikar Dronamraju


[PATCH v5 02/10] powerpc/smp: Merge Power9 topology with Power topology

2020-08-10 Thread Srikar Dronamraju
A new sched_domain_topology_level was added just for Power9. However the
same can be achieved by merging powerpc_topology with power9_topology
and makes the code more simpler especially when adding a new sched
domain.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1 -> v2:
Replaced a reference to cpu_smt_mask with per_cpu(cpu_sibling_map, cpu)
since cpu_smt_mask is only defined under CONFIG_SCHED_SMT

 arch/powerpc/kernel/smp.c | 25 +++--
 1 file changed, 3 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index edf94ca64eea..08da765b91f1 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1313,7 +1313,7 @@ int setup_profiling_timer(unsigned int multiplier)
 }
 
 #ifdef CONFIG_SCHED_SMT
-/* cpumask of CPUs with asymetric SMT dependancy */
+/* cpumask of CPUs with asymmetric SMT dependency */
 static int powerpc_smt_flags(void)
 {
int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
@@ -1326,14 +1326,6 @@ static int powerpc_smt_flags(void)
 }
 #endif
 
-static struct sched_domain_topology_level powerpc_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-   { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
-#endif
-   { cpu_cpu_mask, SD_INIT_NAME(DIE) },
-   { NULL, },
-};
-
 /*
  * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
  * This topology makes it *much* cheaper to migrate tasks between adjacent 
cores
@@ -1361,7 +1353,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
 }
 #endif
 
-static struct sched_domain_topology_level power9_topology[] = {
+static struct sched_domain_topology_level powerpc_topology[] = {
 #ifdef CONFIG_SCHED_SMT
{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
 #endif
@@ -1386,21 +1378,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
 #ifdef CONFIG_SCHED_SMT
if (has_big_cores) {
pr_info("Big cores detected but using small core scheduling\n");
-   power9_topology[0].mask = smallcore_smt_mask;
powerpc_topology[0].mask = smallcore_smt_mask;
}
 #endif
-   /*
-* If any CPU detects that it's sharing a cache with another CPU then
-* use the deeper topology that is aware of this sharing.
-*/
-   if (shared_caches) {
-   pr_info("Using shared cache scheduler topology\n");
-   set_sched_topology(power9_topology);
-   } else {
-   pr_info("Using standard scheduler topology\n");
-   set_sched_topology(powerpc_topology);
-   }
+   set_sched_topology(powerpc_topology);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-- 
2.18.2



[PATCH v5 01/10] powerpc/smp: Fix a warning under !NEED_MULTIPLE_NODES

2020-08-10 Thread Srikar Dronamraju
Fix a build warning in a non CONFIG_NEED_MULTIPLE_NODES
"error: _numa_cpu_lookup_table_ undeclared"

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v2 -> v3:
Removed node caching part. Rewrote the Commit msg (Michael Ellerman)
Renamed to powerpc/smp: Fix a warning under !NEED_MULTIPLE_NODES

 arch/powerpc/kernel/smp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 73199470c265..edf94ca64eea 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -860,6 +860,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
GFP_KERNEL, cpu_to_node(cpu));
zalloc_cpumask_var_node(_cpu(cpu_core_map, cpu),
GFP_KERNEL, cpu_to_node(cpu));
+#ifdef CONFIG_NEED_MULTIPLE_NODES
/*
 * numa_node_id() works after this.
 */
@@ -868,6 +869,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
set_cpu_numa_mem(cpu,
local_memory_node(numa_cpu_lookup_table[cpu]));
}
+#endif
}
 
/* Init the cpumasks so the boot CPU is related to itself */
-- 
2.18.2



[PATCH v5 08/10] powerpc/smp: Allocate cpumask only after searching thread group

2020-08-10 Thread Srikar Dronamraju
If allocated earlier and the search fails, then cpu_l1_cache_map cpumask
is unnecessarily cleared. However cpu_l1_cache_map can be allocated /
cleared after we search thread group.

Please note CONFIG_CPUMASK_OFFSTACK is not set on Powerpc. Hence cpumask
allocated by zalloc_cpumask_var_node is never freed.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v4 ->v5:
Updated commit msg on why cpumask need not be freed.
(Michael Ellerman)

 arch/powerpc/kernel/smp.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 7403fdcf3821..0536ac06876b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -789,10 +789,6 @@ static int init_cpu_l1_cache_map(int cpu)
if (err)
goto out;
 
-   zalloc_cpumask_var_node(_cpu(cpu_l1_cache_map, cpu),
-   GFP_KERNEL,
-   cpu_to_node(cpu));
-
cpu_group_start = get_cpu_thread_group_start(cpu, );
 
if (unlikely(cpu_group_start == -1)) {
@@ -801,6 +797,9 @@ static int init_cpu_l1_cache_map(int cpu)
goto out;
}
 
+   zalloc_cpumask_var_node(_cpu(cpu_l1_cache_map, cpu),
+   GFP_KERNEL, cpu_to_node(cpu));
+
for (i = first_thread; i < first_thread + threads_per_core; i++) {
int i_group_start = get_cpu_thread_group_start(i, );
 
-- 
2.18.2



[PATCH v5 10/10] powerpc/smp: Implement cpu_to_coregroup_id

2020-08-10 Thread Srikar Dronamraju
Lookup the coregroup id from the associativity array.

If unable to detect the coregroup id, fallback on the core id.
This way, ensure sched_domain degenerates and an extra sched domain is
not created.

Ideally this function should have been implemented in
arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
don't need to find the primary domain again.

If the device-tree mentions more than one coregroup, then kernel
implements only the last or the smallest coregroup, which currently
corresponds to the penultimate domain in the device-tree.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1 -> v2:
Move coregroup_enabled before getting associativity (Gautham)

 arch/powerpc/mm/numa.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0d57779e7942..8b3b3ec7fcc4 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1218,6 +1218,26 @@ int find_and_online_cpu_nid(int cpu)
 
 int cpu_to_coregroup_id(int cpu)
 {
+   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+   int index;
+
+   if (cpu < 0 || cpu > nr_cpu_ids)
+   return -1;
+
+   if (!coregroup_enabled)
+   goto out;
+
+   if (!firmware_has_feature(FW_FEATURE_VPHN))
+   goto out;
+
+   if (vphn_get_associativity(cpu, associativity))
+   goto out;
+
+   index = of_read_number(associativity, 1);
+   if (index > min_common_depth + 1)
+   return of_read_number([index - 1], 1);
+
+out:
return cpu_to_core_id(cpu);
 }
 
-- 
2.18.2



[PATCH v5 09/10] powerpc/smp: Create coregroup domain

2020-08-10 Thread Srikar Dronamraju
Add percpu coregroup maps and masks to create coregroup domain.
If a coregroup doesn't exist, the coregroup domain will be degenerated
in favour of SMT/CACHE domain. Do note this patch is only creating stubs
for cpu_to_coregroup_id. The actual cpu_to_coregroup_id implementation
would be in a subsequent patch.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R. Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v4 ->v5:
Updated commit msg to specify actual implementation of
cpu_to_coregroup_id is in a subsequent patch (Michael Ellerman)

Changelog v3 ->v4:
if coregroup_support doesn't exist, update MC mask to the next
smaller domain mask.

Changelog v2 -> v3:
Add optimization for mask updation under coregroup_support

Changelog v1 -> v2:
Moved coregroup topology fixup to fixup_topology (Gautham)

 arch/powerpc/include/asm/topology.h | 10 ++
 arch/powerpc/kernel/smp.c   | 54 -
 arch/powerpc/mm/numa.c  |  5 +++
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index f0b6300e7dd3..6609174918ab 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -88,12 +88,22 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 
*cpu2_assoc)
 
 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
 extern int find_and_online_cpu_nid(int cpu);
+extern int cpu_to_coregroup_id(int cpu);
 #else
 static inline int find_and_online_cpu_nid(int cpu)
 {
return 0;
 }
 
+static inline int cpu_to_coregroup_id(int cpu)
+{
+#ifdef CONFIG_SMP
+   return cpu_to_core_id(cpu);
+#else
+   return 0;
+#endif
+}
+
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #include 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 0536ac06876b..566e3accac3e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -80,12 +80,22 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_core_map);
+DEFINE_PER_CPU(cpumask_var_t, cpu_coregroup_map);
 
 EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
 EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
 EXPORT_PER_CPU_SYMBOL(cpu_core_map);
 EXPORT_SYMBOL_GPL(has_big_cores);
 
+enum {
+#ifdef CONFIG_SCHED_SMT
+   smt_idx,
+#endif
+   cache_idx,
+   mc_idx,
+   die_idx,
+};
+
 #define MAX_THREAD_LIST_SIZE   8
 #define THREAD_GROUP_SHARE_L1   1
 struct thread_groups {
@@ -861,11 +871,27 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
 }
 #endif
 
+static struct cpumask *cpu_coregroup_mask(int cpu)
+{
+   return per_cpu(cpu_coregroup_map, cpu);
+}
+
+static bool has_coregroup_support(void)
+{
+   return coregroup_enabled;
+}
+
+static const struct cpumask *cpu_mc_mask(int cpu)
+{
+   return cpu_coregroup_mask(cpu);
+}
+
 static struct sched_domain_topology_level powerpc_topology[] = {
 #ifdef CONFIG_SCHED_SMT
{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
 #endif
{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
+   { cpu_mc_mask, SD_INIT_NAME(MC) },
{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
{ NULL, },
 };
@@ -912,6 +938,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
GFP_KERNEL, cpu_to_node(cpu));
zalloc_cpumask_var_node(_cpu(cpu_core_map, cpu),
GFP_KERNEL, cpu_to_node(cpu));
+   if (has_coregroup_support())
+   zalloc_cpumask_var_node(_cpu(cpu_coregroup_map, 
cpu),
+   GFP_KERNEL, cpu_to_node(cpu));
+
 #ifdef CONFIG_NEED_MULTIPLE_NODES
/*
 * numa_node_id() works after this.
@@ -929,6 +959,9 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
 
+   if (has_coregroup_support())
+   cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
+
init_big_cores();
if (has_big_cores) {
cpumask_set_cpu(boot_cpuid,
@@ -1220,6 +1253,8 @@ static void remove_cpu_from_masks(int cpu)
set_cpus_unrelated(cpu, i, cpu_sibling_mask);
if (has_big_cores)
set_cpus_unrelated(cpu, i, cpu_smallcore_mask);
+   if (has_coregroup_support())
+   set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
}
 }
 

[PATCH v5 07/10] powerpc/numa: Detect support for coregroup

2020-08-10 Thread Srikar Dronamraju
Add support for grouping cores based on the device-tree classification.
- The last domain in the associativity domains always refers to the
core.
- If primary reference domain happens to be the penultimate domain in
the associativity domains device-tree property, then there are no
coregroups. However if its not a penultimate domain, then there are
coregroups. There can be more than one coregroup. For now we would be
interested in the last or the smallest coregroups, i.e one sub-group
per DIE.

Currently there are no firmwares that are exposing this grouping. Hence
allow the basis for grouping to be abstract.  Once the firmware starts
using this grouping, code would be added to detect the type of grouping
and adjust the sd domain flags accordingly.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v4->v5:
Updated commit msg with current abstract nature of the coregroups
(Michael Ellerman)

Changelog v1 -> v2:
Explained Coregroup in commit msg (Michael Ellerman)

 arch/powerpc/include/asm/smp.h |  1 +
 arch/powerpc/kernel/smp.c  |  1 +
 arch/powerpc/mm/numa.c | 34 +-
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 49a25e2400f2..5bdc17a7049f 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -28,6 +28,7 @@
 extern int boot_cpuid;
 extern int spinning_secondaries;
 extern u32 *cpu_to_phys_id;
+extern bool coregroup_enabled;
 
 extern void cpu_die(void);
 extern int cpu_to_chip_id(int cpu);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 91cf5d05e7ec..7403fdcf3821 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -74,6 +74,7 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 
 struct task_struct *secondary_current;
 bool has_big_cores;
+bool coregroup_enabled;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 2298899a0f0a..51cb672f113b 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -886,7 +886,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, 
u64 end_pfn)
 static void __init find_possible_nodes(void)
 {
struct device_node *rtas;
-   u32 numnodes, i;
+   const __be32 *domains;
+   int prop_length, max_nodes;
+   u32 i;
 
if (!numa_enabled)
return;
@@ -895,25 +897,31 @@ static void __init find_possible_nodes(void)
if (!rtas)
return;
 
-   if (of_property_read_u32_index(rtas, 
"ibm,current-associativity-domains",
-   min_common_depth, )) {
-   /*
-* ibm,current-associativity-domains is a fairly recent
-* property. If it doesn't exist, then fallback on
-* ibm,max-associativity-domains. Current denotes what the
-* platform can support compared to max which denotes what the
-* Hypervisor can support.
-*/
-   if (of_property_read_u32_index(rtas, 
"ibm,max-associativity-domains",
-   min_common_depth, ))
+   /*
+* ibm,current-associativity-domains is a fairly recent property. If
+* it doesn't exist, then fallback on ibm,max-associativity-domains.
+* Current denotes what the platform can support compared to max
+* which denotes what the Hypervisor can support.
+*/
+   domains = of_get_property(rtas, "ibm,current-associativity-domains",
+   _length);
+   if (!domains) {
+   domains = of_get_property(rtas, "ibm,max-associativity-domains",
+   _length);
+   if (!domains)
goto out;
}
 
-   for (i = 0; i < numnodes; i++) {
+   max_nodes = of_read_number([min_common_depth], 1);
+   for (i = 0; i < max_nodes; i++) {
if (!node_possible(i))
node_set(i, node_possible_map);
}
 
+   prop_length /= sizeof(int);
+   if (prop_length > min_common_depth + 2)
+   coregroup_enabled = 1;
+
 out:
of_node_put(rtas);
 }
-- 
2.18.2



[PATCH v5 06/10] powerpc/smp: Optimize start_secondary

2020-08-10 Thread Srikar Dronamraju
In start_secondary, even if shared_cache was already set, system does a
redundant match for cpumask. This redundant check can be removed by
checking if shared_cache is already set.

While here, localize the sibling_mask variable to within the if
condition.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Signed-off-by: Srikar Dronamraju 
---
Changelog v4 ->v5:
Retain cache domain, no need for generalization
 (Michael Ellerman, Peter Zijlstra,
 Valentin Schneider, Gautham R. Shenoy)

Changelog v1 -> v2:
Moved shared_cache topology fixup to fixup_topology (Gautham)

 arch/powerpc/kernel/smp.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 0c960ce3be42..91cf5d05e7ec 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -851,7 +851,7 @@ static int powerpc_shared_cache_flags(void)
  */
 static const struct cpumask *shared_cache_mask(int cpu)
 {
-   return cpu_l2_cache_mask(cpu);
+   return per_cpu(cpu_l2_cache_map, cpu);
 }
 
 #ifdef CONFIG_SCHED_SMT
@@ -1305,7 +1305,6 @@ static void add_cpu_to_masks(int cpu)
 void start_secondary(void *unused)
 {
unsigned int cpu = smp_processor_id();
-   struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
 
mmgrab(_mm);
current->active_mm = _mm;
@@ -1331,14 +1330,20 @@ void start_secondary(void *unused)
/* Update topology CPU masks */
add_cpu_to_masks(cpu);
 
-   if (has_big_cores)
-   sibling_mask = cpu_smallcore_mask;
/*
 * Check for any shared caches. Note that this must be done on a
 * per-core basis because one core in the pair might be disabled.
 */
-   if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
-   shared_caches = true;
+   if (!shared_caches) {
+   struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
+   struct cpumask *mask = cpu_l2_cache_mask(cpu);
+
+   if (has_big_cores)
+   sibling_mask = cpu_smallcore_mask;
+
+   if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
+   shared_caches = true;
+   }
 
set_numa_node(numa_cpu_lookup_table[cpu]);
set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
-- 
2.18.2



[PATCH v5 04/10] powerpc/smp: Move topology fixups into a new function

2020-08-10 Thread Srikar Dronamraju
Move topology fixup based on the platform attributes into its own
function which is called just before set_sched_topology.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v2 -> v3:
Rewrote changelog (Gautham)
Renamed to powerpc/smp: Move topology fixups into  a new function

 arch/powerpc/kernel/smp.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 39224a042468..b13161a5ffc3 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1362,6 +1362,16 @@ int setup_profiling_timer(unsigned int multiplier)
return 0;
 }
 
+static void fixup_topology(void)
+{
+#ifdef CONFIG_SCHED_SMT
+   if (has_big_cores) {
+   pr_info("Big cores detected but using small core scheduling\n");
+   powerpc_topology[0].mask = smallcore_smt_mask;
+   }
+#endif
+}
+
 void __init smp_cpus_done(unsigned int max_cpus)
 {
/*
@@ -1375,12 +1385,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
dump_numa_cpu_topology();
 
-#ifdef CONFIG_SCHED_SMT
-   if (has_big_cores) {
-   pr_info("Big cores detected but using small core scheduling\n");
-   powerpc_topology[0].mask = smallcore_smt_mask;
-   }
-#endif
+   fixup_topology();
set_sched_topology(powerpc_topology);
 }
 
-- 
2.18.2



[PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-08-10 Thread Srikar Dronamraju
Current code assumes that cpumask of cpus sharing a l2-cache mask will
always be a superset of cpu_sibling_mask.

Lets stop that assumption. cpu_l2_cache_mask is a superset of
cpu_sibling_mask if and only if shared_caches is set.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1 -> v2:
Set cpumask after verifying l2-cache. (Gautham)

 arch/powerpc/kernel/smp.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index b13161a5ffc3..0c960ce3be42 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1188,6 +1188,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask 
*(*mask_fn)(int))
if (!l2_cache)
return false;
 
+   cpumask_set_cpu(cpu, mask_fn(cpu));
for_each_cpu(i, cpu_online_mask) {
/*
 * when updating the marks the current CPU has not been marked
@@ -1270,29 +1271,30 @@ static void add_cpu_to_masks(int cpu)
 * add it to it's own thread sibling mask.
 */
cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
+   cpumask_set_cpu(cpu, cpu_core_mask(cpu));
 
for (i = first_thread; i < first_thread + threads_per_core; i++)
if (cpu_online(i))
set_cpus_related(i, cpu, cpu_sibling_mask);
 
add_cpu_to_smallcore_masks(cpu);
-   /*
-* Copy the thread sibling mask into the cache sibling mask
-* and mark any CPUs that share an L2 with this CPU.
-*/
-   for_each_cpu(i, cpu_sibling_mask(cpu))
-   set_cpus_related(cpu, i, cpu_l2_cache_mask);
update_mask_by_l2(cpu, cpu_l2_cache_mask);
 
-   /*
-* Copy the cache sibling mask into core sibling mask and mark
-* any CPUs on the same chip as this CPU.
-*/
-   for_each_cpu(i, cpu_l2_cache_mask(cpu))
-   set_cpus_related(cpu, i, cpu_core_mask);
+   if (pkg_id == -1) {
+   struct cpumask *(*mask)(int) = cpu_sibling_mask;
+
+   /*
+* Copy the sibling mask into core sibling mask and
+* mark any CPUs on the same chip as this CPU.
+*/
+   if (shared_caches)
+   mask = cpu_l2_cache_mask;
+
+   for_each_cpu(i, mask(cpu))
+   set_cpus_related(cpu, i, cpu_core_mask);
 
-   if (pkg_id == -1)
return;
+   }
 
for_each_cpu(i, cpu_online_mask)
if (get_physical_package_id(i) == pkg_id)
-- 
2.18.2



[PATCH v5 03/10] powerpc/smp: Move powerpc_topology above

2020-08-10 Thread Srikar Dronamraju
Just moving the powerpc_topology description above.
This will help in using functions in this file and avoid declarations.

No other functional changes

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Anton Blanchard 
Cc: Oliver O'Halloran 
Cc: Nathan Lynch 
Cc: Michael Neuling 
Cc: Gautham R Shenoy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Jordan Niethe 
Cc: Vaidyanathan Srinivasan 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Srikar Dronamraju 
---
 arch/powerpc/kernel/smp.c | 104 +++---
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 08da765b91f1..39224a042468 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -818,6 +818,58 @@ static int init_cpu_l1_cache_map(int cpu)
return err;
 }
 
+static bool shared_caches;
+
+#ifdef CONFIG_SCHED_SMT
+/* cpumask of CPUs with asymmetric SMT dependency */
+static int powerpc_smt_flags(void)
+{
+   int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
+
+   if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
+   printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
+   flags |= SD_ASYM_PACKING;
+   }
+   return flags;
+}
+#endif
+
+/*
+ * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
+ * This topology makes it *much* cheaper to migrate tasks between adjacent 
cores
+ * since the migrated task remains cache hot. We want to take advantage of this
+ * at the scheduler level so an extra topology level is required.
+ */
+static int powerpc_shared_cache_flags(void)
+{
+   return SD_SHARE_PKG_RESOURCES;
+}
+
+/*
+ * We can't just pass cpu_l2_cache_mask() directly because
+ * returns a non-const pointer and the compiler barfs on that.
+ */
+static const struct cpumask *shared_cache_mask(int cpu)
+{
+   return cpu_l2_cache_mask(cpu);
+}
+
+#ifdef CONFIG_SCHED_SMT
+static const struct cpumask *smallcore_smt_mask(int cpu)
+{
+   return cpu_smallcore_mask(cpu);
+}
+#endif
+
+static struct sched_domain_topology_level powerpc_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+   { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+   { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
+   { cpu_cpu_mask, SD_INIT_NAME(DIE) },
+   { NULL, },
+};
+
 static int init_big_cores(void)
 {
int cpu;
@@ -1247,8 +1299,6 @@ static void add_cpu_to_masks(int cpu)
set_cpus_related(cpu, i, cpu_core_mask);
 }
 
-static bool shared_caches;
-
 /* Activate a secondary processor. */
 void start_secondary(void *unused)
 {
@@ -1312,56 +1362,6 @@ int setup_profiling_timer(unsigned int multiplier)
return 0;
 }
 
-#ifdef CONFIG_SCHED_SMT
-/* cpumask of CPUs with asymmetric SMT dependency */
-static int powerpc_smt_flags(void)
-{
-   int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
-
-   if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
-   printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
-   flags |= SD_ASYM_PACKING;
-   }
-   return flags;
-}
-#endif
-
-/*
- * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
- * This topology makes it *much* cheaper to migrate tasks between adjacent 
cores
- * since the migrated task remains cache hot. We want to take advantage of this
- * at the scheduler level so an extra topology level is required.
- */
-static int powerpc_shared_cache_flags(void)
-{
-   return SD_SHARE_PKG_RESOURCES;
-}
-
-/*
- * We can't just pass cpu_l2_cache_mask() directly because
- * returns a non-const pointer and the compiler barfs on that.
- */
-static const struct cpumask *shared_cache_mask(int cpu)
-{
-   return cpu_l2_cache_mask(cpu);
-}
-
-#ifdef CONFIG_SCHED_SMT
-static const struct cpumask *smallcore_smt_mask(int cpu)
-{
-   return cpu_smallcore_mask(cpu);
-}
-#endif
-
-static struct sched_domain_topology_level powerpc_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-   { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
-#endif
-   { shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
-   { cpu_cpu_mask, SD_INIT_NAME(DIE) },
-   { NULL, },
-};
-
 void __init smp_cpus_done(unsigned int max_cpus)
 {
/*
-- 
2.18.2



[PATCH v5 00/10] Coregroup support on Powerpc

2020-08-10 Thread Srikar Dronamraju
Changelog v4->v5:
v4: 
http://lore.kernel.org/lkml/20200727053230.19753-1-sri...@linux.vnet.ibm.com/t/#u

Changelog v4 ->v5:
powerpc/smp: Optimize start_secondary
Retain cache domain, no need for generalization
 (Michael Ellerman, Peter Zijlstra,
 Valentin Schneider, Gautham R. Shenoy)

powerpc/numa: Detect support for coregroup
Updated commit msg with current abstract nature of the coregroups
(Michael Ellerman)
powerpc/smp: Allocate cpumask only after searching thread group
Updated commit msg on why cpumask need not be freed.
(Michael Ellerman)

powerpc/smp: Create coregroup domain
Updated commit msg to specify actual implementation of
cpu_to_coregroup_id is in a subsequent patch (Michael Ellerman)

Changelog v3 ->v4:
v3: 
https://lore.kernel.org/lkml/20200723085116.4731-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Create coregroup domain
if coregroup_support doesn't exist, update MC mask to the next
smaller domain mask.

Changelog v2 -> v3:
v2: 
https://lore.kernel.org/linuxppc-dev/20200721113814.32284-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Cache node for reuse
Removed node caching part. Rewrote the Commit msg (Michael Ellerman)
Renamed to powerpc/smp: Fix a warning under !NEED_MULTIPLE_NODES

powerpc/smp: Enable small core scheduling sooner
Rewrote changelog (Gautham)
Renamed to powerpc/smp: Move topology fixups into  a new function

powerpc/smp: Create coregroup domain
Add optimization for mask updation under coregroup_support

Changelog v1 -> v2:
v1: 
https://lore.kernel.org/linuxppc-dev/20200714043624.5648-1-sri...@linux.vnet.ibm.com/t/#u

powerpc/smp: Merge Power9 topology with Power topology
Replaced a reference to cpu_smt_mask with per_cpu(cpu_sibling_map, cpu)
since cpu_smt_mask is only defined under CONFIG_SCHED_SMT

powerpc/smp: Enable small core scheduling sooner
Restored the previous info msg (Jordan)
Moved big core topology fixup to fixup_topology (Gautham)

powerpc/smp: Dont assume l2-cache to be superset of sibling
Set cpumask after verifying l2-cache. (Gautham)

powerpc/smp: Generalize 2nd sched domain
Moved shared_cache topology fixup to fixup_topology (Gautham)

Powerpc/numa: Detect support for coregroup
Explained Coregroup in commit msg (Michael Ellerman)

Powerpc/smp: Create coregroup domain
Moved coregroup topology fixup to fixup_topology (Gautham)

powerpc/smp: Implement cpu_to_coregroup_id
Move coregroup_enabled before getting associativity (Gautham)

powerpc/smp: Provide an ability to disable coregroup
Patch dropped (Michael Ellerman)

Cleanup of existing powerpc topologies and add coregroup support on
Powerpc. Coregroup is a group of (subset of) cores of a DIE that share
a resource.

Patch 7 of this patch series: "Powerpc/numa: Detect support for coregroup"
depends on
https://lore.kernel.org/linuxppc-dev/20200707140644.7241-1-sri...@linux.vnet.ibm.com/t/#u
However it should be easy to rebase the patch without the above patch.

This patch series is based on top of current powerpc/next tree + the
above patch.

Summary of some of the testing done with coregroup patchset.
It includes ebizzy, schbench, perf bench sched pipe and topology verification.
One the left side are results from powerpc/next tree and on the right are the
results with the patchset applied.  Topological verification clearly shows that
there is no change in topology with and without the patches on all the 3 class
of systems that were tested.

On PowerPc/NextOn 
Powerpc/next + Coregroup Support v5 patchset

Power 9 PowerNV (2 Node/ 160 Cpu System)
-
ebizzy (Throughput of 100 iterations of 30 seconds higher throughput is better)
  N  Min   MaxMedian   AvgStddev  N 
 Min   MaxMedian   Avg  Stddev
100   993884   1276090   1173476   1165914 54867.201100   
910470   1279820   1171095   116209167363.28

schbench (latency hence lower is better)
Latency percentiles (usec)  Latency 
percentiles (usec)
50.0th: 455 
50.0th: 454
75.0th: 533 
75.0th: 543
90.0th: 683 
90.0th: 701
95.0th: 743 
95.0th: 737
*99.0th: 815
*99.0th: 805
99.5th: 839 
99.5th: 835
99.9th: 913 

Re: [PATCH] powerpc/rtas: Restrict RTAS requests from userspace

2020-08-10 Thread Michael Ellerman
Hi ajd,

Thanks for taking care of this.

I was going to merge this as-is, but given it's fixing a long standing
issue there's not really a big rush. So a few comments below.

Andrew Donnellan  writes:
> A number of userspace utilities depend on making calls to RTAS to retrieve
> information and update various things.
>
> The existing API through which we expose RTAS to userspace exposes more
> RTAS functionality than we actually need, through the sys_rtas syscall,
> which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they
> want with arbitrary arguments.
>
> Many RTAS calls take the address of a buffer as an argument, and it's up to
> the caller to specify the physical address of the buffer as an argument. We
> allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can
> access, and then expose the physical address and size of this buffer in
> /proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address,
> poke at the buffer using /dev/mem, and pass an address in the RMO buffer to
> the RTAS call.
>
> However, there's nothing stopping the caller from specifying whatever
> address they want in the RTAS call, and it's easy to construct a series of
> RTAS calls that can overwrite arbitrary bytes (even without /dev/mem
> access).
>
> Additionally, there are some RTAS calls that do potentially dangerous
> things and for which there are no legitimate userspace use cases.
>
> In the past, this would not have been a particularly big deal as it was
> assumed that root could modify all system state freely, but with Secure
> Boot and lockdown we need to care about this.
>
> We can't fundamentally change the ABI at this point, however we can address
> this by implementing a filter that checks RTAS calls against a list
> of permitted calls and forces the caller to use addresses within the RMO
> buffer.
>
> The list is based off the list of calls that are used by the librtas
> userspace library, and has been tested with a number of existing userspace
> RTAS utilities. For compatibility with any applications we are not aware of
> that require other calls, the filter can be turned off at build time.
>
> Reported-by: Daniel Axtens 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Andrew Donnellan 
> ---
>  arch/powerpc/Kconfig   |  13 +++
>  arch/powerpc/kernel/rtas.c | 198 +
>  2 files changed, 211 insertions(+)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 9fa23eb320ff..0e2dfe497357 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -973,6 +973,19 @@ config PPC_SECVAR_SYSFS
> read/write operations on these variables. Say Y if you have
> secure boot enabled and want to expose variables to userspace.
>  
> +config PPC_RTAS_FILTER
> + bool "Enable filtering of RTAS syscalls"
> + default y
> + depends on PPC_RTAS
> + help
> +   The RTAS syscall API has security issues that could be used to
> +   compromise system integrity. This option enforces restrictions on the
> +   RTAS calls and arguments passed by userspace programs to mitigate
> +   these issues.
> +
> +   Say Y unless you know what you are doing and the filter is causing
> +   problems for you.
> +
>  endmenu
>  
>  config ISA_DMA_API
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index a09eba03f180..ec1cae52d8bd 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -324,6 +324,23 @@ int rtas_token(const char *service)
>  }
>  EXPORT_SYMBOL(rtas_token);
>  
> +#ifdef CONFIG_PPC_RTAS_FILTER
> +

I think this could be combined with the #ifdef block below?

> +static char *rtas_token_name(int token)
> +{
> + struct property *prop;
> +
> + for_each_property_of_node(rtas.dev, prop) {
> + const __be32 *tokp = prop->value;
> +
> + if (tokp && be32_to_cpu(*tokp) == token)
> + return prop->name;
> + }
> + return NULL;
> +}
> +
> +#endif /* CONFIG_PPC_RTAS_FILTER */
> +
>  int rtas_service_present(const char *service)
>  {
>   return rtas_token(service) != RTAS_UNKNOWN_SERVICE;
> @@ -1110,6 +1127,184 @@ struct pseries_errorlog *get_pseries_errorlog(struct 
> rtas_error_log *log,
>   return NULL;
>  }
>  
> +#ifdef CONFIG_PPC_RTAS_FILTER
> +
> +/*
> + * The sys_rtas syscall, as originally designed, allows root to pass
> + * arbitrary physical addresses to RTAS calls. A number of RTAS calls
> + * can be abused to write to arbitrary memory and do other things that
> + * are potentially harmful to system integrity, and thus should only
> + * be used inside the kernel and not exposed to userspace.
> + *
> + * All known legitimate users of the sys_rtas syscall will only ever
> + * pass addresses that fall within the RMO buffer, and use a known
> + * subset of RTAS calls.
> + *
> + * Accordingly, we filter RTAS requests to check that the call is
> + * permitted, and that provided 

Re: linux-next: manual merge of the set_fs tree with the powerpc tree

2020-08-10 Thread Stephen Rothwell
Hi Christoph,

On Mon, 10 Aug 2020 08:11:06 +0200 Christoph Hellwig  wrote:
>
> please drop my set_fs tree from linux-next.  It is not going to be
> merged for 5.9 in this form.

OK, done from tomorrow.

-- 
Cheers,
Stephen Rothwell


pgpWqA2LVLZSU.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the set_fs tree with the powerpc tree

2020-08-10 Thread Christoph Hellwig
Hi Stephen,

please drop my set_fs tree from linux-next.  It is not going to be
merged for 5.9 in this form.

Thanks!