[PATCH v4] powerpc: setup_64: set up PACA before parsing device tree
Currently, we set up the PACA after parsing the device tree for CPU features. Before that, r13 contains random data, which means there is random data in r13 while we're running the generic dt parsing code. This random data varies depending on whether we boot through a vmlinux or a zImage: for the vmlinux case it's usually around zero, but for zImages we see random values like 912a72603d420015. This is poor practice, and can also lead to difficult-to-debug crashes. For example, when kcov is enabled, the kcov instrumentation attempts to read preempt_count out of the current task, which goes via the PACA. This then crashes in the zImage case. To resolve this: - move the PACA setup to before the cpu feature parsing. - because we no longer have access to cpu feature flags in PACA setup, change the HV feature test in the PACA setup path to consider the actual value of the MSR rather than the CPU feature. Translations get switched on once we leave early_setup, so I think we'd already catch any other cases where the PACA or task aren't set up. Boot tested on a P9 guest and host. Fixes: fb0b0a73b223 ("powerpc: Enable kcov") Cc: Andrew Donnellan Cc: sta...@vger.kernel.org Reviewed-by: Andrew Donnellan Suggested-by: Michael Ellerman Signed-off-by: Daniel Axtens --- Regarding moving the comment about printk()-safety: I am about 75% sure that the thing that makes printk() safe is the PACA, not the CPU features. That's what commit 24d9649574fb ("[POWERPC] Document when printk is useable") seems to indicate, but as someone wise recently told me, "bootstrapping is hard", so I may be totally wrong. v4: Update commit message and clarify that the mfmsr() approach is not for general use. Thanks Nick Piggin. v3: Update comment, thanks Christophe Leroy. Remove a comment in dt_cpu_ftrs.c that is no longer accurate - thanks Andrew. I think we want to retain all the code still, but I'm open to being told otherwise. --- arch/powerpc/kernel/dt_cpu_ftrs.c | 1 - arch/powerpc/kernel/paca.c| 10 +++--- arch/powerpc/kernel/setup_64.c| 20 +++- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 182b4047c1ef..36bc0d5c4f3a 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -139,7 +139,6 @@ static void __init cpufeatures_setup_cpu(void) /* Initialize the base environment -- clear FSCR/HFSCR. */ hv_mode = !!(mfmsr() & MSR_HV); if (hv_mode) { - /* CPU_FTR_HVMODE is used early in PACA setup */ cur_cpu_spec->cpu_features |= CPU_FTR_HVMODE; mtspr(SPRN_HFSCR, 0); } diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index 949eceb254d8..0ee6308541b1 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -214,11 +214,15 @@ void setup_paca(struct paca_struct *new_paca) /* On Book3E, initialize the TLB miss exception frames */ mtspr(SPRN_SPRG_TLB_EXFRAME, local_paca->extlb); #else - /* In HV mode, we setup both HPACA and PACA to avoid problems + /* +* In HV mode, we setup both HPACA and PACA to avoid problems * if we do a GET_PACA() before the feature fixups have been -* applied +* applied. +* +* Normally you should test against CPU_FTR_HVMODE, but CPU features +* are not yet set up when we first reach here. */ - if (early_cpu_has_feature(CPU_FTR_HVMODE)) + if (mfmsr() & MSR_HV) mtspr(SPRN_SPRG_HPACA, local_paca); #endif mtspr(SPRN_SPRG_PACA, local_paca); diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index e05e6dd67ae6..2259da8e8685 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -285,18 +285,28 @@ void __init early_setup(unsigned long dt_ptr) /* printk is _NOT_ safe to use here ! --- */ - /* Try new device tree based feature discovery ... */ - if (!dt_cpu_ftrs_init(__va(dt_ptr))) - /* Otherwise use the old style CPU table */ - identify_cpu(0, mfspr(SPRN_PVR)); + /* +* Assume we're on cpu 0 for now. +* +* We need to load a PACA very early if we are using kcov. kcov will +* call in_task() in its instrumentation, which relies on the current +* task from the PACA. dt_cpu_ftrs_init is coveraged-enabled and also +* calls into the coverage-enabled generic dt library. +* +* Set up a temporary paca. It is going to be replaced below. +*/ - /* Assume we're on cpu 0 for now. Don't write to the paca yet! */ initialise_paca(_paca, 0); setup_paca(_paca); fixup_boot_paca(); /* printk is now safe to use --- */ + /* Try new device tree based feature
Re: [PATCHv3 2/2] pseries/scm: buffer pmem's bound addr in dt for kexec kernel
On 3/4/20 2:17 PM, Pingfan Liu wrote: At present, plpar_hcall(H_SCM_BIND_MEM, ...) takes a very long time, so if dumping to fsdax, it will take a very long time. that should be fixed by faa6d21153fd11e139dd880044521389b34a24f2 Author: Aneesh Kumar K.V AuthorDate: Tue Sep 3 18:04:52 2019 +0530 Commit: Michael Ellerman CommitDate: Wed Sep 25 08:32:59 2019 +1000 powerpc/nvdimm: use H_SCM_QUERY hcall on H_OVERLAP error Right now we force an unbind of SCM memory at drcindex on H_OVERLAP error. This really slows down operations like kexec where we get the H_OVERLAP error because we don't go through a full hypervisor re init. H_OVERLAP error for a H_SCM_BIND_MEM hcall indicates that SCM memory at drc index is already bound. Since we don't specify a logical memory address for bind hcall, we can use the H_SCM_QUERY hcall to query the already bound logical address. Take a closer look, during the papr_scm initialization, the only configuration is through drc_pmem_bind()-> plpar_hcall(H_SCM_BIND_MEM, ...), which helps to set up the bound address. On pseries, for kexec -l/-p kernel, there is no reset of hardware, and this step can be stepped around to save times. So the pmem bound address can be passed to the 2nd kernel through a dynamic added property "bound-addr" in dt node 'ibm,pmemory'. -aneesh
Re: [PATCHv3 2/2] pseries/scm: buffer pmem's bound addr in dt for kexec kernel
Appreciate for your kind review. And I have some comment as below. On Fri, Mar 13, 2020 at 11:18 AM Oliver O'Halloran wrote: > > On Wed, Mar 4, 2020 at 7:50 PM Pingfan Liu wrote: > > > > At present, plpar_hcall(H_SCM_BIND_MEM, ...) takes a very long time, so > > if dumping to fsdax, it will take a very long time. > > > > Take a closer look, during the papr_scm initialization, the only > > configuration is through drc_pmem_bind()-> plpar_hcall(H_SCM_BIND_MEM, > > ...), which helps to set up the bound address. > > > > On pseries, for kexec -l/-p kernel, there is no reset of hardware, and this > > step can be stepped around to save times. So the pmem bound address can be > > passed to the 2nd kernel through a dynamic added property "bound-addr" in > > dt node 'ibm,pmemory'. > > > > Signed-off-by: Pingfan Liu > > To: linuxppc-dev@lists.ozlabs.org > > Cc: Benjamin Herrenschmidt > > Cc: Paul Mackerras > > Cc: Michael Ellerman > > Cc: Hari Bathini > > Cc: Aneesh Kumar K.V > > Cc: Oliver O'Halloran > > Cc: Dan Williams > > Cc: Andrew Donnellan > > Cc: Christophe Leroy > > Cc: Rob Herring > > Cc: Frank Rowand > > Cc: ke...@lists.infradead.org > > --- > > note: This patch has not been tested since I can not get such a pseries > > with pmem. > > Please kindly to give some suggestion, thanks. > > There was some qemu patches to implement the Hcall interface floating > around a while ago. I'm not sure they ever made it into upstream qemu > though. Unfortunately, it does not appear in latest qemu code. I think probably virt-pmem has achieved the same feature. > > > --- > > arch/powerpc/platforms/pseries/of_helpers.c | 1 + > > arch/powerpc/platforms/pseries/papr_scm.c | 33 > > - > > drivers/of/base.c | 1 + > > 3 files changed, 25 insertions(+), 10 deletions(-) > > > > diff --git a/arch/powerpc/platforms/pseries/of_helpers.c > > b/arch/powerpc/platforms/pseries/of_helpers.c > > index 1022e0f..2c7bab4 100644 > > --- a/arch/powerpc/platforms/pseries/of_helpers.c > > +++ b/arch/powerpc/platforms/pseries/of_helpers.c > > @@ -34,6 +34,7 @@ struct property *new_property(const char *name, const int > > length, > > kfree(new); > > return NULL; > > } > > +EXPORT_SYMBOL(new_property); > > > > /** > > * pseries_of_derive_parent - basically like dirname(1) > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > > b/arch/powerpc/platforms/pseries/papr_scm.c > > index 0b4467e..54ae903 100644 > > --- a/arch/powerpc/platforms/pseries/papr_scm.c > > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > > @@ -14,6 +14,7 @@ > > #include > > > > #include > > +#include "of_helpers.h" > > > > #define BIND_ANY_ADDR (~0ul) > > > > @@ -383,7 +384,7 @@ static int papr_scm_probe(struct platform_device *pdev) > > { > > struct device_node *dn = pdev->dev.of_node; > > u32 drc_index, metadata_size; > > - u64 blocks, block_size; > > + u64 blocks, block_size, bound_addr = 0; > > struct papr_scm_priv *p; > > const char *uuid_str; > > u64 uuid[2]; > > @@ -440,17 +441,29 @@ static int papr_scm_probe(struct platform_device > > *pdev) > > p->metadata_size = metadata_size; > > p->pdev = pdev; > > > > - /* request the hypervisor to bind this region to somewhere in > > memory */ > > - rc = drc_pmem_bind(p); > > + of_property_read_u64(dn, "bound-addr", _addr); > > + if (bound_addr) { > > + p->bound_addr = bound_addr; > > + } else { > > + struct property *property; > > + u64 big; > > > > - /* If phyp says drc memory still bound then force unbound and retry > > */ > > - if (rc == H_OVERLAP) > > - rc = drc_pmem_query_n_bind(p); > > + /* request the hypervisor to bind this region to somewhere > > in memory */ > > + rc = drc_pmem_bind(p); > > > > - if (rc != H_SUCCESS) { > > - dev_err(>pdev->dev, "bind err: %d\n", rc); > > - rc = -ENXIO; > > - goto err; > > + /* If phyp says drc memory still bound then force unbound > > and retry */ > > + if (rc == H_OVERLAP) > > + rc = drc_pmem_query_n_bind(p); > > + > > + if (rc != H_SUCCESS) { > > + dev_err(>pdev->dev, "bind err: %d\n", rc); > > + rc = -ENXIO; > > + goto err; > > + } > > + big = cpu_to_be64(p->bound_addr); > > + property = new_property("bound-addr", sizeof(u64), (const > > unsigned char *), > > + NULL); > > That should probably be "linux,bound-addr" OK, thanks for suggestion. > > The other thing that stands out to me is that you aren't removing the Yes, you are right. I will fix it in V2. > property when the region is unbound. As a general rule I'd prefer we > didn't
[PATCH AUTOSEL 4.4 2/7] powerpc: Include .BTF section
From: "Naveen N. Rao" [ Upstream commit cb0cc635c7a9fa8a3a0f75d4d896721819c63add ] Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld: ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF' Include .BTF section in vmlinux explicitly to fix the same. Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n@linux.vnet.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/vmlinux.lds.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 876ac9d52afcb..9b1e297be6730 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -255,6 +255,12 @@ SECTIONS *(.branch_lt) } +#ifdef CONFIG_DEBUG_INFO_BTF + .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { + *(.BTF) + } +#endif + .opd : AT(ADDR(.opd) - LOAD_OFFSET) { *(.opd) } -- 2.20.1
[PATCH AUTOSEL 4.9 2/7] powerpc: Include .BTF section
From: "Naveen N. Rao" [ Upstream commit cb0cc635c7a9fa8a3a0f75d4d896721819c63add ] Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld: ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF' Include .BTF section in vmlinux explicitly to fix the same. Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n@linux.vnet.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/vmlinux.lds.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 50d3650608558..c20510497c49d 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -315,6 +315,12 @@ SECTIONS *(.branch_lt) } +#ifdef CONFIG_DEBUG_INFO_BTF + .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { + *(.BTF) + } +#endif + .opd : AT(ADDR(.opd) - LOAD_OFFSET) { *(.opd) } -- 2.20.1
[PATCH AUTOSEL 4.14 03/15] powerpc: Include .BTF section
From: "Naveen N. Rao" [ Upstream commit cb0cc635c7a9fa8a3a0f75d4d896721819c63add ] Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld: ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF' Include .BTF section in vmlinux explicitly to fix the same. Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n@linux.vnet.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/vmlinux.lds.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index b0cf4af7ba840..e4da937d6cf91 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -317,6 +317,12 @@ SECTIONS *(.branch_lt) } +#ifdef CONFIG_DEBUG_INFO_BTF + .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { + *(.BTF) + } +#endif + .opd : AT(ADDR(.opd) - LOAD_OFFSET) { *(.opd) } -- 2.20.1
[PATCH AUTOSEL 4.19 04/20] powerpc: Include .BTF section
From: "Naveen N. Rao" [ Upstream commit cb0cc635c7a9fa8a3a0f75d4d896721819c63add ] Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld: ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF' Include .BTF section in vmlinux explicitly to fix the same. Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n@linux.vnet.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/vmlinux.lds.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index fd35eddf32669..d081d726ca8ea 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -322,6 +322,12 @@ SECTIONS *(.branch_lt) } +#ifdef CONFIG_DEBUG_INFO_BTF + .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { + *(.BTF) + } +#endif + .opd : AT(ADDR(.opd) - LOAD_OFFSET) { __start_opd = .; KEEP(*(.opd)) -- 2.20.1
[PATCH AUTOSEL 5.4 10/35] powerpc: Include .BTF section
From: "Naveen N. Rao" [ Upstream commit cb0cc635c7a9fa8a3a0f75d4d896721819c63add ] Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld: ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF' Include .BTF section in vmlinux explicitly to fix the same. Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n@linux.vnet.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/vmlinux.lds.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 060a1acd7c6d7..4638d28633888 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -326,6 +326,12 @@ SECTIONS *(.branch_lt) } +#ifdef CONFIG_DEBUG_INFO_BTF + .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { + *(.BTF) + } +#endif + .opd : AT(ADDR(.opd) - LOAD_OFFSET) { __start_opd = .; KEEP(*(.opd)) -- 2.20.1
[PATCH AUTOSEL 5.5 11/41] powerpc: Include .BTF section
From: "Naveen N. Rao" [ Upstream commit cb0cc635c7a9fa8a3a0f75d4d896721819c63add ] Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld: ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF' Include .BTF section in vmlinux explicitly to fix the same. Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n@linux.vnet.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/vmlinux.lds.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 8834220036a51..857ab49750f18 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -303,6 +303,12 @@ SECTIONS *(.branch_lt) } +#ifdef CONFIG_DEBUG_INFO_BTF + .BTF : AT(ADDR(.BTF) - LOAD_OFFSET) { + *(.BTF) + } +#endif + .opd : AT(ADDR(.opd) - LOAD_OFFSET) { __start_opd = .; KEEP(*(.opd)) -- 2.20.1
[RFC PATCH v1] powerpc/XIVE: SVM: share the event-queue page with the Hypervisor.
XIVE interrupt controller maintains a Event-Queue(EQ) page. This page is used to communicate events with the Hypervisor/Qemu. In Secure-VM, unless a page is shared with the Hypervisor, the Hypervisor will not be able to read/write to that page. Explicitly share the EQ page with the Hypervisor, and unshare it during cleanup. This enables SVM to use XIVE. (NOTE: If the Hypervisor/Ultravisor is unable to target interrupts directly to Secure VM, use "kernel_irqchip=off" on the qemu command line). Cc: kvm-...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman Cc: Thiago Jung Bauermann Cc: Michael Anderson Cc: Sukadev Bhattiprolu Cc: Alexey Kardashevskiy Cc: Paul Mackerras Cc: Greg Kurz Cc: Cedric Le Goater Cc: David Gibson Signed-off-by: Ram Pai --- arch/powerpc/sysdev/xive/spapr.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c index 55dc61c..608b52f 100644 --- a/arch/powerpc/sysdev/xive/spapr.c +++ b/arch/powerpc/sysdev/xive/spapr.c @@ -26,6 +26,8 @@ #include #include #include +#include +#include #include "xive-internal.h" @@ -501,6 +503,9 @@ static int xive_spapr_configure_queue(u32 target, struct xive_q *q, u8 prio, rc = -EIO; } else { q->qpage = qpage; + if (is_secure_guest()) + uv_share_page(PHYS_PFN(qpage_phys), + 1 << xive_alloc_order(order)); } fail: return rc; @@ -534,6 +539,8 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, struct xive_cpu *xc, hw_cpu, prio); alloc_order = xive_alloc_order(xive_queue_shift); + if (is_secure_guest()) + uv_unshare_page(PHYS_PFN(__pa(q->qpage)), 1 << alloc_order); free_pages((unsigned long)q->qpage, alloc_order); q->qpage = NULL; } -- 1.8.3.1
Re: [PATCH 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
On Wed, 11 Mar 2020, Srikar Dronamraju wrote: > Currently Linux kernel with CONFIG_NUMA on a system with multiple > possible nodes, marks node 0 as online at boot. However in practice, > there are systems which have node 0 as memoryless and cpuless. Would it not be better and simpler to require that node 0 always has memory (and processors)? A mininum operational set? We can dynamically number the nodes right? So just make sure that the firmware properly creates memory on node 0?