Re: [PATCH v2 3/3] ASoC: fsl_easrc: Add EASRC ASoC CPU DAI and platform drivers
On Mon, Feb 24, 2020 at 08:53:25AM +, S.j. Wang wrote: > Hi > > > > > > > Signed-off-by: Shengjiu Wang > > > --- > > > sound/soc/fsl/Kconfig | 10 + > > > sound/soc/fsl/Makefile |2 + > > > sound/soc/fsl/fsl_asrc_common.h |1 + > > > sound/soc/fsl/fsl_easrc.c | 2265 +++ > > > sound/soc/fsl/fsl_easrc.h | 668 + > > > sound/soc/fsl/fsl_easrc_dma.c | 440 ++ > > > > I see a 90% similarity between fsl_asrc_dma and fsl_easrc_dma files. > > Would it be possible reuse the existing code? Could share structures from > > my point of view, just like it reuses "enum asrc_pair_index", I know > > differentiating "pair" and "context" is a big point here though. > > > > A possible quick solution for that, off the top of my head, could be: > > > > 1) in fsl_asrc_common.h > > > > struct fsl_asrc { > > > > }; > > > > struct fsl_asrc_pair { > > > > }; > > > > 2) in fsl_easrc.h > > > > /* Renaming shared structures */ > > #define fsl_easrc fsl_asrc > > #define fsl_easrc_context fsl_asrc_pair > > > > May be a good idea to see if others have some opinion too. > > > > We need to modify the fsl_asrc and fsl_asrc_pair, let them > To be used by both driver, also we need to put the specific > Definition for each module to same struct, right? Yea. A merged structure if that doesn't look that bad. I see most of the fields in struct fsl_asrc are being reused by in fsl_easrc. > > > > > +static const struct regmap_config fsl_easrc_regmap_config = { > > > + .readable_reg = fsl_easrc_readable_reg, > > > + .volatile_reg = fsl_easrc_volatile_reg, > > > + .writeable_reg = fsl_easrc_writeable_reg, > > > > Can we use regmap_range and regmap_access_table? > > > > Can the regmap_range support discontinuous registers? The > reg_stride = 4. I think it does. Giving an example here: https://github.com/torvalds/linux/blob/master/drivers/mfd/da9063-i2c.c
[PATCH 9/8] powerpc: Switch 8xx MAINTAINERS entry to Christophe
It's over 10 years since the last commit from Vitaly, so I suspect he's moved on to other things. Christophe has been the primary contributor to 8xx in the last several years, so anoint him as the maintainer. Remove the dead penguingppc.org link. Cc: Vitaly Bordug Signed-off-by: Michael Ellerman Acked-by: Christophe Leroy --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2e917116ef6a..0c1266afb52a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9658,8 +9658,7 @@ F:arch/powerpc/platforms/85xx/ F: Documentation/devicetree/bindings/powerpc/fsl/ LINUX FOR POWERPC EMBEDDED PPC8XX -M: Vitaly Bordug -W: http://www.penguinppc.org/ +M: Christophe Leroy L: linuxppc-dev@lists.ozlabs.org S: Maintained F: arch/powerpc/platforms/8xx/ -- 2.21.1
Re: [PATCH] evh_bytechan: fix out of bounds accesses
On 21.02.2020 01:57, Stephen Rothwell wrote: Hi all, On Thu, 16 Jan 2020 11:37:14 +1100 Stephen Rothwell wrote: On Wed, 15 Jan 2020 14:01:35 -0600 Scott Wood wrote: On Thu, 2020-01-16 at 06:42 +1100, Stephen Rothwell wrote: Hi Timur, On Wed, 15 Jan 2020 07:25:45 -0600 Timur Tabi wrote: On 1/14/20 12:31 AM, Stephen Rothwell wrote: +/** + * ev_byte_channel_send - send characters to a byte stream + * @handle: byte stream handle + * @count: (input) num of chars to send, (output) num chars sent + * @bp: pointer to chars to send + * + * Returns 0 for success, or an error code. + */ +static unsigned int ev_byte_channel_send(unsigned int handle, + unsigned int *count, const char *bp) Well, now you've moved this into the .c file and it is no longer available to other callers. Anything wrong with keeping it in the .h file? There are currently no other callers - are there likely to be in the future? Even if there are, is it time critical enough that it needs to be inlined everywhere? It's not performance critical and there aren't likely to be other users -- just a matter of what's cleaner. FWIW I'd rather see the original patch, that keeps the raw asm hcall stuff as simple wrappers in one place. And I don't mind either way :-) I just want to get rid of the warnings. Any progress with this? I think that the consensus was to pick up the original patch that is, this one: https://patchwork.ozlabs.org/patch/1220186/ I've tested it too, so please feel free to add a: Tested-by: Laurentiu Tudor --- Best Regards, Laurentiu
Re: [PATCH v7 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
Hi, Is there anything else I could do in order to move the changes forward or is something still missing from this patch set? Could you please share you mind? Thanks, Alexey On 17.02.2020 11:02, Alexey Budankov wrote: > > Currently access to perf_events, i915_perf and other performance > monitoring and observability subsystems of the kernel is open only for > a privileged process [1] with CAP_SYS_ADMIN capability enabled in the > process effective set [2]. > > This patch set introduces CAP_PERFMON capability designed to secure > system performance monitoring and observability operations so that > CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role > for performance monitoring and observability subsystems of the kernel. > > CAP_PERFMON intends to harden system security and integrity during > performance monitoring and observability operations by decreasing attack > surface that is available to a CAP_SYS_ADMIN privileged process [2]. > Providing the access to performance monitoring and observability > operations under CAP_PERFMON capability singly, without the rest of > CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials > and makes the operation more secure. Thus, CAP_PERFMON implements the > principal of least privilege for performance monitoring and > observability operations (POSIX IEEE 1003.1e: 2.2.2.39 principle of > least privilege: A security design principle that states that a process > or program be granted only those privileges (e.g., capabilities) > necessary to accomplish its legitimate function, and only for the time > that such privileges are actually required) > > CAP_PERFMON intends to meet the demand to secure system performance > monitoring and observability operations for adoption in security > sensitive, restricted, multiuser production environments (e.g. HPC > clusters, cloud and virtual compute environments), where root or > CAP_SYS_ADMIN credentials are not available to mass users of a system, > and securely unblock accessibility of system performance monitoring and > observability operations beyond root and CAP_SYS_ADMIN use cases. > > CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to > system performance monitoring and observability operations and balance > amount of CAP_SYS_ADMIN credentials following the recommendations in > the capabilities man page [2] for CAP_SYS_ADMIN: "Note: this capability > is overloaded; see Notes to kernel developers, below." For backward > compatibility reasons access to system performance monitoring and > observability subsystems of the kernel remains open for CAP_SYS_ADMIN > privileged processes but CAP_SYS_ADMIN capability usage for secure > system performance monitoring and observability operations is > discouraged with respect to the designed CAP_PERFMON capability. > > Possible alternative solution to this system security hardening, > capabilities balancing task of making performance monitoring and > observability operations more secure and accessible could be to use > the existing CAP_SYS_PTRACE capability to govern system performance > monitoring and observability subsystems. However CAP_SYS_PTRACE > capability still provides users with more credentials than are > required for secure performance monitoring and observability > operations and this excess is avoided by the designed CAP_PERFMON. > > Although software running under CAP_PERFMON can not ensure avoidance of > related hardware issues, the software can still mitigate those issues > following the official hardware issues mitigation procedure [3]. The > bugs in the software itself can be fixed following the standard kernel > development process [4] to maintain and harden security of system > performance monitoring and observability operations. Finally, the patch > set is shaped in the way that simplifies backtracking procedure of > possible induced issues [5] as much as possible. > > The patch set is for tip perf/core repository: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core > sha1: fdb64822443ec9fb8c3a74b598a74790ae8d2e22 > > --- > Changes in v7: > - updated and extended kernel.rst and perf-security.rst documentation > files with the information about CAP_PERFMON capability and its use cases > - documented the case of double audit logging of CAP_PERFMON and CAP_SYS_ADMIN > capabilities on a SELinux enabled system > Changes in v6: > - avoided noaudit checks in perfmon_capable() to explicitly advertise > CAP_PERFMON usage thru audit logs to secure system performance > monitoring and observability > Changes in v5: > - renamed CAP_SYS_PERFMON to CAP_PERFMON > - extended perfmon_capable() with noaudit checks > Changes in v4: > - converted perfmon_capable() into an inline function > - made perf_events kprobes, uprobes, hw breakpoints and namespaces data > available to CAP_SYS_PERFMON privileged processes > - applied perfmon_capable() to drivers/perf and drivers/oprofile > - extended __cmd_ftrace
Re: [PATCH v3 03/27] powerpc: Map & release OpenCAPI LPC memory
Le 21/02/2020 à 04:26, Alastair D'Silva a écrit : From: Alastair D'Silva This patch adds platform support to map & release LPC memory. Signed-off-by: Alastair D'Silva --- arch/powerpc/include/asm/pnv-ocxl.h | 4 +++ arch/powerpc/platforms/powernv/ocxl.c | 43 +++ 2 files changed, 47 insertions(+) diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h index 7de82647e761..0b2a6707e555 100644 --- a/arch/powerpc/include/asm/pnv-ocxl.h +++ b/arch/powerpc/include/asm/pnv-ocxl.h @@ -32,5 +32,9 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle) extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr); extern void pnv_ocxl_free_xive_irq(u32 irq); +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size); +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev); +#endif This breaks the compilation of the ocxl driver if CONFIG_MEMORY_HOTPLUG=n Those functions still make sense even without memory hotplug, for example in the context of the implementation you had to access opencapi LPC memory through mmap(). The #ifdef is really needed only around the check_hotplug_memory_addressable() call. Fred #endif /* _ASM_PNV_OCXL_H */ diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c index 8c65aacda9c8..f2edbcc67361 100644 --- a/arch/powerpc/platforms/powernv/ocxl.c +++ b/arch/powerpc/platforms/powernv/ocxl.c @@ -475,6 +475,49 @@ void pnv_ocxl_spa_release(void *platform_data) } EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release); +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size) +{ + struct pci_controller *hose = pci_bus_to_host(pdev->bus); + struct pnv_phb *phb = hose->private_data; + u32 bdfn = pci_dev_id(pdev); + __be64 base_addr_be64; + u64 base_addr; + int rc; + + rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size, &base_addr_be64); + if (rc) { + dev_warn(&pdev->dev, +"OPAL could not allocate LPC memory, rc=%d\n", rc); + return 0; + } + + base_addr = be64_to_cpu(base_addr_be64); + + rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT, + size >> PAGE_SHIFT); + if (rc) + return 0; + + return base_addr; +} +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup); + +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev) +{ + struct pci_controller *hose = pci_bus_to_host(pdev->bus); + struct pnv_phb *phb = hose->private_data; + u32 bdfn = pci_dev_id(pdev); + int rc; + + rc = opal_npu_mem_release(phb->opal_id, bdfn); + if (rc) + dev_warn(&pdev->dev, +"OPAL reported rc=%d when releasing LPC memory\n", rc); +} +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release); +#endif + int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle) { struct spa_data *data = (struct spa_data *) platform_data;
Re: [PATCH v2 4/5] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU
Gautham R Shenoy wrote: On Fri, Feb 21, 2020 at 10:50:12AM -0600, Nathan Lynch wrote: "Gautham R. Shenoy" writes: > diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c > index 80a676d..5b4b450 100644 > --- a/arch/powerpc/kernel/sysfs.c > +++ b/arch/powerpc/kernel/sysfs.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > > #include "cacheinfo.h" > @@ -733,6 +734,42 @@ static void create_svm_file(void) > } > #endif /* CONFIG_PPC_SVM */ > > +static void read_idle_purr(void *val) > +{ > + u64 *ret = (u64 *)val; No cast from void* needed. Will fix this. Thanks. > + > + *ret = read_this_idle_purr(); > +} > + > +static ssize_t idle_purr_show(struct device *dev, > +struct device_attribute *attr, char *buf) > +{ > + struct cpu *cpu = container_of(dev, struct cpu, dev); > + u64 val; > + > + smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1); > + return sprintf(buf, "%llx\n", val); > +} > +static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL); > + > +static void read_idle_spurr(void *val) > +{ > + u64 *ret = (u64 *)val; > + > + *ret = read_this_idle_spurr(); > +} > + > +static ssize_t idle_spurr_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + struct cpu *cpu = container_of(dev, struct cpu, dev); > + u64 val; > + > + smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1); > + return sprintf(buf, "%llx\n", val); > +} > +static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL); It's regrettable that we have to wake up potentially idle CPUs in order to derive correct idle statistics for them, but I suppose the main user (lparstat) of these interfaces already is causing this to happen by polling the existing per-cpu purr and spurr attributes. So now lparstat will incur at minimum four syscalls and four IPIs per CPU per polling interval -- one for each of purr, spurr, idle_purr and idle_spurr. Correct? Yes, it is unforunate that we will end up making four syscalls and generating IPI noise, and this is something that I discussed with Naveen and Kamalesh. We have the following two constraints: 1) These values of PURR and SPURR required are per-cpu. Hence putting them in lparcfg is not an option. 2) sysfs semantics encourages a single value per key, the key being the sysfs-file. Something like the following would have made far more sense. cat /sys/devices/system/cpu/cpuX/purr_spurr_accounting purr:A idle_purr:B spurr:C idle_spurr:D There are some sysfs files which allow something like this. Eg: /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state Thoughts on any other alternatives? Umm... procfs? /me ducks At some point it's going to make sense to batch sampling of remote CPUs' SPRs. How did you mean this? It looks like we first need to provide a separate user interface, since with the existing sysfs interface providing separate files, I am not sure if we can batch such reads. - Naveen
Re: [linux-next/mainline][bisected 3acac06][ppc] Oops when unloading mpt3sas driver
On Tue, Feb 25, 2020 at 11:51 AM Abdul Haleem wrote: > > On Fri, 2020-01-17 at 18:21 +0530, Abdul Haleem wrote: > > On Thu, 2020-01-16 at 09:44 -0800, Christoph Hellwig wrote: > > > Hi Abdul, > > > > > > I think the problem is that mpt3sas has some convoluted logic to do > > > some DMA allocations with a 32-bit coherent mask, and then switches > > > to a 63 or 64 bit mask, which is not supported by the DMA API. > > > > > > Can you try the patch below? > > > > Thank you Christoph, with the given patch applied the bug is not seen. > > > > rmmod of mpt3sas driver is successful, no kernel Oops > > > > Reported-and-tested-by: Abdul Haleem > > Hi Christoph, > > I see the patch is under discussion, will this be merged upstream any > time soon ? as boot is broken on our machines with out your patch. > Hi Abdul, We have posted a new set of patches to fix this issue. This patch set won't change the DMA Mask on the fly and also won't hardcode the DMA mask to 32 bit. [PATCH 0/5] mpt3sas: Fix changing coherent mask after allocation. This patchset will have below patches, Please review and try with this patch set. Suganath Prabu S (5): mpt3sas: Don't change the dma coherent mask after allocations mpt3sas: Rename function name is_MSB_are_same mpt3sas: Code Refactoring. mpt3sas: Handle RDPQ DMA allocation in same 4g region mpt3sas: Update version to 33.101.00.00 Regards, Sreekanth > -- > Regard's > > Abdul Haleem > IBM Linux Technology Centre > > >
Re: [PATCH v3 04/27] ocxl: Remove unnecessary externs
Le 21/02/2020 à 04:26, Alastair D'Silva a écrit : From: Alastair D'Silva Function declarations don't need externs, remove the existing ones so they are consistent with newer code Signed-off-by: Alastair D'Silva --- Thanks for the cleanup! Acked-by: Frederic Barrat arch/powerpc/include/asm/pnv-ocxl.h | 32 ++--- include/misc/ocxl.h | 6 +++--- 2 files changed, 18 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h index 0b2a6707e555..b23c99bc0c84 100644 --- a/arch/powerpc/include/asm/pnv-ocxl.h +++ b/arch/powerpc/include/asm/pnv-ocxl.h @@ -9,29 +9,27 @@ #define PNV_OCXL_TL_BITS_PER_RATE 4 #define PNV_OCXL_TL_RATE_BUF_SIZE ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8) -extern int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, - u16 *supported); -extern int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count); +int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 *supported); +int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count); -extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap, +int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap, char *rate_buf, int rate_buf_size); -extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap, +int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap, uint64_t rate_buf_phys, int rate_buf_size); -extern int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq); -extern void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar, - void __iomem *tfc, void __iomem *pe_handle); -extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr, - void __iomem **dar, void __iomem **tfc, - void __iomem **pe_handle); +int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq); +void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar, +void __iomem *tfc, void __iomem *pe_handle); +int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr, + void __iomem **dar, void __iomem **tfc, + void __iomem **pe_handle); -extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, - void **platform_data); -extern void pnv_ocxl_spa_release(void *platform_data); -extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle); +int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, void **platform_data); +void pnv_ocxl_spa_release(void *platform_data); +int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle); -extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr); -extern void pnv_ocxl_free_xive_irq(u32 irq); +int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr); +void pnv_ocxl_free_xive_irq(u32 irq); #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size); void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev); diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h index 06dd5839e438..0a762e387418 100644 --- a/include/misc/ocxl.h +++ b/include/misc/ocxl.h @@ -173,7 +173,7 @@ int ocxl_context_detach(struct ocxl_context *ctx); * * Returns 0 on success, negative on failure */ -extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id); +int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id); /** * Frees an IRQ associated with an AFU context @@ -182,7 +182,7 @@ extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id); * * Returns 0 on success, negative on failure */ -extern int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id); +int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id); /** * Gets the address of the trigger page for an IRQ @@ -193,7 +193,7 @@ extern int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id); * * returns the trigger page address, or 0 if the IRQ is not valid */ -extern u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id); +u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id); /** * Provide a callback to be called when an IRQ is triggered
Re: [PATCH] crypto: Replace zero-length array with flexible-array member
On 2/24/2020 6:18 PM, Gustavo A. R. Silva wrote: > The current codebase makes use of the zero-length array language > extension to the C90 standard, but the preferred mechanism to declare > variable-length types such as these ones is a flexible array member[1][2], > introduced in C99: > > struct foo { > int stuff; > struct boo array[]; > }; > > By making use of the mechanism above, we will get a compiler warning > in case the flexible array does not occur last in the structure, which > will help us prevent some kind of undefined behavior bugs from being > inadvertently introduced[3] to the codebase from now on. > > Also, notice that, dynamic memory allocations won't be affected by > this change: > > "Flexible array members have incomplete type, and so the sizeof operator > may not be applied. As a quirk of the original implementation of > zero-length arrays, sizeof evaluates to zero."[1] > > This issue was found with the help of Coccinelle. > > [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html > [2] https://github.com/KSPP/linux/issues/21 > [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") > > Signed-off-by: Gustavo A. R. Silva Reviewed-by: Horia Geantă for caam driver: > drivers/crypto/caam/caamalg.c | 2 +- > drivers/crypto/caam/caamalg_qi.c | 4 ++-- > drivers/crypto/caam/caamalg_qi2.h | 6 +++--- > drivers/crypto/caam/caamhash.c | 2 +- Thanks, Horia
Re: [PATCH] crypto: Replace zero-length array with flexible-array member
On 2/25/20 07:44, Horia Geanta wrote: > On 2/24/2020 6:18 PM, Gustavo A. R. Silva wrote: >> The current codebase makes use of the zero-length array language >> extension to the C90 standard, but the preferred mechanism to declare >> variable-length types such as these ones is a flexible array member[1][2], >> introduced in C99: >> >> struct foo { >> int stuff; >> struct boo array[]; >> }; >> >> By making use of the mechanism above, we will get a compiler warning >> in case the flexible array does not occur last in the structure, which >> will help us prevent some kind of undefined behavior bugs from being >> inadvertently introduced[3] to the codebase from now on. >> >> Also, notice that, dynamic memory allocations won't be affected by >> this change: >> >> "Flexible array members have incomplete type, and so the sizeof operator >> may not be applied. As a quirk of the original implementation of >> zero-length arrays, sizeof evaluates to zero."[1] >> >> This issue was found with the help of Coccinelle. >> >> [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html >> [2] https://github.com/KSPP/linux/issues/21 >> [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") >> >> Signed-off-by: Gustavo A. R. Silva > Reviewed-by: Horia Geantă > Thank you, Horia. -- Gustavo > for caam driver: > >> drivers/crypto/caam/caamalg.c | 2 +- >> drivers/crypto/caam/caamalg_qi.c | 4 ++-- >> drivers/crypto/caam/caamalg_qi2.h | 6 +++--- >> drivers/crypto/caam/caamhash.c | 2 +- > > Thanks, > Horia >
Re: [PATCH] macintosh: therm_windtunnel: fix regression when instantiating devices
Hello! On 2/25/20 3:12 PM, Wolfram Sang wrote: > Adding the Debian-PPC List to reach further people maybe willing to > test. This might be related [1]. Adrian > [1] https://lists.debian.org/debian-powerpc/2020/01/msg00062.html -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaub...@debian.org `. `' Freie Universitaet Berlin - glaub...@physik.fu-berlin.de `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
[Bug 201723] [Bisected][Regression] THERM_WINDTUNNEL not working any longer in kernel 4.19.x (PowerMac G4 MDD)
https://bugzilla.kernel.org/show_bug.cgi?id=201723 Wolfram Sang (w...@the-dreams.de) changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #6 from Wolfram Sang (w...@the-dreams.de) --- Patch which works for Erhard is sent out: http://patchwork.ozlabs.org/patch/1244322/ -- You are receiving this mail because: You are watching the assignee of the bug.
[Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
https://bugzilla.kernel.org/show_bug.cgi?id=206669 Bug ID: 206669 Summary: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load Product: Platform Specific/Hardware Version: 2.5 Kernel Version: 5.4.x Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: PPC-64 Assignee: platform_ppc...@kernel-bugs.osdl.org Reporter: glaub...@physik.fu-berlin.de CC: mator...@gmail.com Regression: No Created attachment 287605 --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit Backtrace of host system crashing with little-endian kernel We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian unstable hosting a big-endian ppc64 virtual machine running the same kernel in big-endian mode. When building OpenJDK-11 on the big-endian VM, the testsuite crashes the *host* system which is little-endian with the following kernel backtrace. The problem reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host running 5.4.x. Backtrace attached. -- You are receiving this mail because: You are watching the assignee of the bug.
[Bug 199471] windfarm_pm72 no longer gets automatically loaded when CONFIG_I2C_POWERMAC=y is set (regression)
https://bugzilla.kernel.org/show_bug.cgi?id=199471 Wolfram Sang (w...@the-dreams.de) changed: What|Removed |Added CC||w...@the-dreams.de --- Comment #8 from Wolfram Sang (w...@the-dreams.de) --- "This has been quite nice since 4.?.x up to 4.16.x as you only need CONFIG_I2C_POWERMAC=y which selects the proper windfarm_pmXX at boot time." I can't find that in the code. Are you sure i2c-powermac requested that module? -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v3 06/27] ocxl: Tally up the LPC memory on a link & allow it to be mapped
Le 21/02/2020 à 04:26, Alastair D'Silva a écrit : From: Alastair D'Silva Tally up the LPC memory on an OpenCAPI link & allow it to be mapped Signed-off-by: Alastair D'Silva --- drivers/misc/ocxl/core.c | 10 ++ drivers/misc/ocxl/link.c | 53 +++ drivers/misc/ocxl/ocxl_internal.h | 33 +++ 3 files changed, 96 insertions(+) diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c index b7a09b21ab36..2531c6cf19a0 100644 --- a/drivers/misc/ocxl/core.c +++ b/drivers/misc/ocxl/core.c @@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev *dev) if (rc) goto err_free_pasid; + if (afu->config.lpc_mem_size || afu->config.special_purpose_mem_size) { + rc = ocxl_link_add_lpc_mem(afu->fn->link, afu->config.lpc_mem_offset, + afu->config.lpc_mem_size + + afu->config.special_purpose_mem_size); + if (rc) + goto err_free_mmio; + } + return 0; +err_free_mmio: + unmap_mmio_areas(afu); err_free_pasid: reclaim_afu_pasid(afu); err_free_actag: diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 58d111afd9f6..1e039cc5ebe5 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -84,6 +84,11 @@ struct ocxl_link { int dev; atomic_t irq_available; struct spa *spa; + struct mutex lpc_mem_lock; /* protects lpc_mem & lpc_mem_sz */ + u64 lpc_mem_sz; /* Total amount of LPC memory presented on the link */ + u64 lpc_mem; + int lpc_consumers; + void *platform_data; }; static struct list_head links_list = LIST_HEAD_INIT(links_list); @@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l if (rc) goto err_spa; + mutex_init(&link->lpc_mem_lock); + /* platform specific hook */ rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask, &link->platform_data); @@ -711,3 +718,49 @@ void ocxl_link_free_irq(void *link_handle, int hw_irq) atomic_inc(&link->irq_available); } EXPORT_SYMBOL_GPL(ocxl_link_free_irq); + +int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size) +{ + struct ocxl_link *link = (struct ocxl_link *) link_handle; + + // Check for overflow + if (offset > (offset + size)) + return -EINVAL; + + mutex_lock(&link->lpc_mem_lock); + link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size); + + mutex_unlock(&link->lpc_mem_lock); + + return 0; +} + +u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev) +{ + struct ocxl_link *link = (struct ocxl_link *) link_handle; + + mutex_lock(&link->lpc_mem_lock); + + if(!link->lpc_mem) + link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link->lpc_mem_sz); + + if(link->lpc_mem) + link->lpc_consumers++; + mutex_unlock(&link->lpc_mem_lock); + + return link->lpc_mem; +} + +void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev) +{ + struct ocxl_link *link = (struct ocxl_link *) link_handle; + + mutex_lock(&link->lpc_mem_lock); + WARN_ON(--link->lpc_consumers < 0); Here, we always decrement the lpc_consumers count. However, it was only incremented if the mapping was setup correctly in opal. We could arguably claim that ocxl_link_lpc_release() should only be called if ocxl_link_lpc_map() succeeded, but it would make error path handling easier if we only decrement the lpc_consumers count if link->lpc_mem is set. So that we can just call ocxl_link_lpc_release() in error paths without having to worry about triggering the WARN_ON message. Fred + if (link->lpc_consumers == 0) { + pnv_ocxl_platform_lpc_release(pdev); + link->lpc_mem = 0; + } + + mutex_unlock(&link->lpc_mem_lock); +} diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h index 198e4e4bc51d..d0c8c4838f42 100644 --- a/drivers/misc/ocxl/ocxl_internal.h +++ b/drivers/misc/ocxl/ocxl_internal.h @@ -142,4 +142,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 offset); u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id); void ocxl_afu_irq_free_all(struct ocxl_context *ctx); +/** + * ocxl_link_add_lpc_mem() - Increment the amount of memory required by an OpenCAPI link + * + * @link_handle: The OpenCAPI link handle + * @offset: The offset of the memory to add + * @size: The amount of memory to increment by + * + * Returns 0 on success, negative on overflow + */ +int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size); + +/** + * ocxl_link_lpc_map() - Map the LPC memory for an OpenCAPI device + * Since LPC memory
Re: [PATCH v3 07/27] ocxl: Add functions to map/unmap LPC memory
Le 21/02/2020 à 04:27, Alastair D'Silva a écrit : From: Alastair D'Silva Add functions to map/unmap LPC memory Signed-off-by: Alastair D'Silva --- It looks ok to me. Acked-by: Frederic Barrat drivers/misc/ocxl/core.c | 51 +++ drivers/misc/ocxl/ocxl_internal.h | 3 ++ include/misc/ocxl.h | 21 + 3 files changed, 75 insertions(+) diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c index 2531c6cf19a0..75ff14e3882a 100644 --- a/drivers/misc/ocxl/core.c +++ b/drivers/misc/ocxl/core.c @@ -210,6 +210,56 @@ static void unmap_mmio_areas(struct ocxl_afu *afu) release_fn_bar(afu->fn, afu->config.global_mmio_bar); } +int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu) +{ + struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent); + + if ((afu->config.lpc_mem_size + afu->config.special_purpose_mem_size) == 0) + return 0; + + afu->lpc_base_addr = ocxl_link_lpc_map(afu->fn->link, dev); + if (afu->lpc_base_addr == 0) + return -EINVAL; + + if (afu->config.lpc_mem_size > 0) { + afu->lpc_res.start = afu->lpc_base_addr + afu->config.lpc_mem_offset; + afu->lpc_res.end = afu->lpc_res.start + afu->config.lpc_mem_size - 1; + } + + if (afu->config.special_purpose_mem_size > 0) { + afu->special_purpose_res.start = afu->lpc_base_addr + + afu->config.special_purpose_mem_offset; + afu->special_purpose_res.end = afu->special_purpose_res.start + + afu->config.special_purpose_mem_size - 1; + } + + return 0; +} +EXPORT_SYMBOL_GPL(ocxl_afu_map_lpc_mem); + +struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu) +{ + return &afu->lpc_res; +} +EXPORT_SYMBOL_GPL(ocxl_afu_lpc_mem); + +static void unmap_lpc_mem(struct ocxl_afu *afu) +{ + struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent); + + if (afu->lpc_res.start || afu->special_purpose_res.start) { + void *link = afu->fn->link; + + // only release the link when the the last consumer calls release + ocxl_link_lpc_release(link, dev); + + afu->lpc_res.start = 0; + afu->lpc_res.end = 0; + afu->special_purpose_res.start = 0; + afu->special_purpose_res.end = 0; + } +} + static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev *dev) { int rc; @@ -251,6 +301,7 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev *dev) static void deconfigure_afu(struct ocxl_afu *afu) { + unmap_lpc_mem(afu); unmap_mmio_areas(afu); reclaim_afu_pasid(afu); reclaim_afu_actag(afu); diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h index d0c8c4838f42..ce0cac1da416 100644 --- a/drivers/misc/ocxl/ocxl_internal.h +++ b/drivers/misc/ocxl/ocxl_internal.h @@ -52,6 +52,9 @@ struct ocxl_afu { void __iomem *global_mmio_ptr; u64 pp_mmio_start; void *private; + u64 lpc_base_addr; /* Covers both LPC & special purpose memory */ + struct resource lpc_res; + struct resource special_purpose_res; }; enum ocxl_context_status { diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h index 357ef1aadbc0..d8b0b4d46bfb 100644 --- a/include/misc/ocxl.h +++ b/include/misc/ocxl.h @@ -203,6 +203,27 @@ int ocxl_irq_set_handler(struct ocxl_context *ctx, int irq_id, // AFU Metadata +/** + * ocxl_afu_map_lpc_mem() - Map the LPC system & special purpose memory for an AFU + * Do not call this during device discovery, as there may me multiple + * devices on a link, and the memory is mapped for the whole link, not + * just one device. It should only be called after all devices have + * registered their memory on the link. + * + * @afu: The AFU that has the LPC memory to map + * + * Returns 0 on success, negative on failure + */ +int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu); + +/** + * ocxl_afu_lpc_mem() - Get the physical address range of LPC memory for an AFU + * @afu: The AFU associated with the LPC memory + * + * Returns a pointer to the resource struct for the physical address range + */ +struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu); + /** * ocxl_afu_config() - Get a pointer to the config for an AFU * @afu: a pointer to the AFU to get the config for
Re: [PATCH v3 08/27] ocxl: Emit a log message showing how much LPC memory was detected
Le 21/02/2020 à 04:27, Alastair D'Silva a écrit : From: Alastair D'Silva This patch emits a message showing how much LPC memory & special purpose memory was detected on an OCXL device. Signed-off-by: Alastair D'Silva --- Acked-by: Frederic Barrat drivers/misc/ocxl/config.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c index a62e3d7db2bf..701ae6216abf 100644 --- a/drivers/misc/ocxl/config.c +++ b/drivers/misc/ocxl/config.c @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev, afu->special_purpose_mem_size = total_mem_size - lpc_mem_size; } + + dev_info(&dev->dev, "Probed LPC memory of %#llx bytes and special purpose memory of %#llx bytes\n", + afu->lpc_mem_size, afu->special_purpose_mem_size); + return 0; }
[PATCH v3 00/32] powerpc/64: interrupts and syscalls series
This is a long overdue update of the series, with fixes from me Michal and Michael. Does not include Michal's syscall compat series. Patches 1-22 are changes to low level 64s interrupt entry assembly which has been posted before, no change except adding patch 21 and fixing patch 22 to reconcile irq state in the soft-nmi handler to avoid preempt warnings. Patches 23-26 are to turn system call entry/exit code into C. Bunch of irq and preempt and TM warnings and bugs caught by selftests etc fixed, plus a few peripheral patches added (sstep and zeroing regs). Patches 27-29 are to turn interrupt exit code into C. This had a bit more change, most significantly a change to how interrupt exit soft irq replay works. Patches 30-32 are for scv system call support. Lot of changes here to turn it into something a bit better than RFC quality. Discussion about ABI seems to be settling and not very controversial. Thanks, Nick Nicholas Piggin (32): powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE parameters powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros powerpc/64s/exception: Move all interrupt handlers to new style code gen macros powerpc/64s/exception: Remove old INT_ENTRY macro powerpc/64s/exception: Remove old INT_COMMON macro powerpc/64s/exception: Remove old INT_KVM_HANDLER powerpc/64s/exception: Add ISIDE option powerpc/64s/exception: move real->virt switch into the common handler powerpc/64s/exception: move soft-mask test to common code powerpc/64s/exception: move KVM test to common code powerpc/64s/exception: remove confusing IEARLY option powerpc/64s/exception: remove the SPR saving patch code macros powerpc/64s/exception: trim unused arguments from KVMTEST macro powerpc/64s/exception: hdecrementer avoid touching the stack powerpc/64s/exception: re-inline some handlers powerpc/64s/exception: Clean up SRR specifiers powerpc/64s/exception: add more comments for interrupt handlers powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is supported powerpc/64s/exception: sreset interrupts reconcile fix powerpc/64s/exception: soft nmi interrupt should not use ret_from_except powerpc/64: system call remove non-volatile GPR save optimisation powerpc/64: sstep ifdef the deprecated fast endian switch syscall powerpc/64: system call implement entry/exit logic in C powerpc/64: system call zero volatile registers when returning powerpc/64: implement soft interrupt replay in C powerpc/64s: interrupt implement exit logic in C powerpc/64s/exception: remove lite interrupt return powerpc/64: system call reconcile interrupts powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked powerpc/64s: system call support for scv/rfscv instructions Documentation/powerpc/syscall64-abi.rst | 42 +- arch/powerpc/include/asm/asm-prototypes.h | 17 +- .../powerpc/include/asm/book3s/64/kup-radix.h | 24 +- arch/powerpc/include/asm/cputime.h| 29 + arch/powerpc/include/asm/exception-64s.h | 10 +- arch/powerpc/include/asm/head-64.h|2 +- arch/powerpc/include/asm/hw_irq.h |6 +- arch/powerpc/include/asm/ppc_asm.h|2 + arch/powerpc/include/asm/processor.h |2 +- arch/powerpc/include/asm/ptrace.h |3 + arch/powerpc/include/asm/setup.h |4 +- arch/powerpc/include/asm/signal.h |3 + arch/powerpc/include/asm/switch_to.h | 11 + arch/powerpc/include/asm/time.h |4 +- arch/powerpc/kernel/Makefile |3 +- arch/powerpc/kernel/cpu_setup_power.S |2 +- arch/powerpc/kernel/cputable.c|3 +- arch/powerpc/kernel/dt_cpu_ftrs.c |1 + arch/powerpc/kernel/entry_64.S| 1017 +++- arch/powerpc/kernel/exceptions-64e.S | 287 ++- arch/powerpc/kernel/exceptions-64s.S | 2168 - arch/powerpc/kernel/irq.c | 183 +- arch/powerpc/kernel/process.c | 89 +- arch/powerpc/kernel/setup_64.c|5 +- arch/powerpc/kernel/signal.h |2 - arch/powerpc/kernel/syscall_64.c | 379 +++ arch/powerpc/kernel/syscalls/syscall.tbl | 22 +- arch/powerpc/kernel/systbl.S |9 +- arch/powerpc/kernel/time.c|9 - arch/powerpc/kernel/vector.S |2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 - arch/powerpc/kvm/book3s_segment.S |7 - arch/powerpc/lib/sstep.c |5 +- arch/powerpc/platforms/pseries/setup.c|8 +- 34 files changed, 2769 insertions(+), 1602 deletions(-) create mode 100
[PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation
The code generation macro arguments are difficult to read, and defaults can't easily be used. This introduces a block where parameters can be set for interrupt handler code generation by the subsequent macros, and adds the first generation macro for interrupt entry. One interrupt handler is converted to the new macros to demonstrate the change, the rest will be coverted all at once. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 77 ++-- 1 file changed, 73 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index ffc15f4f079d..1b942c98bc05 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -193,6 +193,61 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) mtctr reg;\ bctr +/* + * Interrupt code generation macros + */ +#define IVEC .L_IVEC_\name\() +#define IHSRR .L_IHSRR_\name\() +#define IAREA .L_IAREA_\name\() +#define IDAR .L_IDAR_\name\() +#define IDSISR .L_IDSISR_\name\() +#define ISET_RI.L_ISET_RI_\name\() +#define IEARLY .L_IEARLY_\name\() +#define IMASK .L_IMASK_\name\() +#define IKVM_REAL .L_IKVM_REAL_\name\() +#define IKVM_VIRT .L_IKVM_VIRT_\name\() + +#define INT_DEFINE_BEGIN(n)\ +.macro int_define_ ## n name + +#define INT_DEFINE_END(n) \ +.endm ; \ +int_define_ ## n n ; \ +do_define_int n + +.macro do_define_int name + .ifndef IVEC + .error "IVEC not defined" + .endif + .ifndef IHSRR + IHSRR=EXC_STD + .endif + .ifndef IAREA + IAREA=PACA_EXGEN + .endif + .ifndef IDAR + IDAR=0 + .endif + .ifndef IDSISR + IDSISR=0 + .endif + .ifndef ISET_RI + ISET_RI=1 + .endif + .ifndef IEARLY + IEARLY=0 + .endif + .ifndef IMASK + IMASK=0 + .endif + .ifndef IKVM_REAL + IKVM_REAL=0 + .endif + .ifndef IKVM_VIRT + IKVM_VIRT=0 + .endif +.endm + .macro INT_KVM_HANDLER name, vec, hsrr, area, skip TRAMP_KVM_BEGIN(\name\()_kvm) KVM_HANDLER \vec, \hsrr, \area, \skip @@ -474,7 +529,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) */ GET_SCRATCH0(r10) std r10,\area\()+EX_R13(r13) - .if \dar + .if \dar == 1 .if \hsrr mfspr r10,SPRN_HDAR .else @@ -482,7 +537,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endif std r10,\area\()+EX_DAR(r13) .endif - .if \dsisr + .if \dsisr == 1 .if \hsrr mfspr r10,SPRN_HDSISR .else @@ -506,6 +561,14 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endif .endm +.macro GEN_INT_ENTRY name, virt, ool=0 + .if ! \virt + INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, ISET_RI, IDAR, IDSISR, IMASK, IKVM_REAL + .else + INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, ISET_RI, IDAR, IDSISR, IMASK, IKVM_VIRT + .endif +.endm + /* * On entry r13 points to the paca, r9-r13 are saved in the paca, * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and @@ -1143,12 +1206,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) bl unrecoverable_exception b . +INT_DEFINE_BEGIN(data_access) + IVEC=0x300 + IDAR=1 + IDSISR=1 + IKVM_REAL=1 +INT_DEFINE_END(data_access) EXC_REAL_BEGIN(data_access, 0x300, 0x80) - INT_HANDLER data_access, 0x300, ool=1, dar=1, dsisr=1, kvm=1 + GEN_INT_ENTRY data_access, virt=0, ool=1 EXC_REAL_END(data_access, 0x300, 0x80) EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) - INT_HANDLER data_access, 0x300, virt=1, dar=1, dsisr=1 + GEN_INT_ENTRY data_access, virt=1 EXC_VIRT_END(data_access, 0x4300, 0x80) INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1 EXC_COMMON_BEGIN(data_access_common) -- 2.23.0
[PATCH v3 02/32] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 1b942c98bc05..f3f2ec88b3d8 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -206,6 +206,9 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define IMASK .L_IMASK_\name\() #define IKVM_REAL .L_IKVM_REAL_\name\() #define IKVM_VIRT .L_IKVM_VIRT_\name\() +#define ISTACK .L_ISTACK_\name\() +#define IRECONCILE .L_IRECONCILE_\name\() +#define IKUAP .L_IKUAP_\name\() #define INT_DEFINE_BEGIN(n)\ .macro int_define_ ## n name @@ -246,6 +249,15 @@ do_define_int n .ifndef IKVM_VIRT IKVM_VIRT=0 .endif + .ifndef ISTACK + ISTACK=1 + .endif + .ifndef IRECONCILE + IRECONCILE=1 + .endif + .ifndef IKUAP + IKUAP=1 + .endif .endm .macro INT_KVM_HANDLER name, vec, hsrr, area, skip @@ -670,6 +682,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66) .endif .endm +.macro GEN_COMMON name + INT_COMMON IVEC, IAREA, ISTACK, IKUAP, IRECONCILE, IDAR, IDSISR +.endm + /* * Restore all registers including H/SRR0/1 saved in a stack frame of a * standard exception. @@ -1221,13 +1237,7 @@ EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) EXC_VIRT_END(data_access, 0x4300, 0x80) INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1 EXC_COMMON_BEGIN(data_access_common) - /* -* Here r13 points to the paca, r9 contains the saved CR, -* SRR0 and SRR1 are saved in r11 and r12, -* r9 - r13 are saved in paca->exgen. -* EX_DAR and EX_DSISR have saved DAR/DSISR -*/ - INT_COMMON 0x300, PACA_EXGEN, 1, 1, 1, 1, 1 + GEN_COMMON data_access ld r4,_DAR(r1) ld r5,_DSISR(r1) BEGIN_MMU_FTR_SECTION -- 2.23.0
[PATCH v3 03/32] powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE parameters
No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index f3f2ec88b3d8..da3c22eea72d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -204,6 +204,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define ISET_RI.L_ISET_RI_\name\() #define IEARLY .L_IEARLY_\name\() #define IMASK .L_IMASK_\name\() +#define IKVM_SKIP .L_IKVM_SKIP_\name\() #define IKVM_REAL .L_IKVM_REAL_\name\() #define IKVM_VIRT .L_IKVM_VIRT_\name\() #define ISTACK .L_ISTACK_\name\() @@ -243,6 +244,9 @@ do_define_int n .ifndef IMASK IMASK=0 .endif + .ifndef IKVM_SKIP + IKVM_SKIP=0 + .endif .ifndef IKVM_REAL IKVM_REAL=0 .endif @@ -265,6 +269,10 @@ do_define_int n KVM_HANDLER \vec, \hsrr, \area, \skip .endm +.macro GEN_KVM name + KVM_HANDLER IVEC, IHSRR, IAREA, IKVM_SKIP +.endm + #ifdef CONFIG_KVM_BOOK3S_64_HANDLER #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* @@ -1226,6 +1234,7 @@ INT_DEFINE_BEGIN(data_access) IVEC=0x300 IDAR=1 IDSISR=1 + IKVM_SKIP=1 IKVM_REAL=1 INT_DEFINE_END(data_access) @@ -1235,7 +1244,8 @@ EXC_REAL_END(data_access, 0x300, 0x80) EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) GEN_INT_ENTRY data_access, virt=1 EXC_VIRT_END(data_access, 0x4300, 0x80) -INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1 +TRAMP_KVM_BEGIN(data_access_kvm) + GEN_KVM data_access EXC_COMMON_BEGIN(data_access_common) GEN_COMMON data_access ld r4,_DAR(r1) -- 2.23.0
[PATCH v3 04/32] powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
These don't provide a large amount of code sharing. Removing them makes code easier to shuffle around. For example, some of the common instructions will be moved into the common code gen macro. No generated code change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 160 --- 1 file changed, 117 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index da3c22eea72d..0f1da3099c28 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -757,28 +757,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) #define FINISH_NAP #endif -#define EXC_COMMON(name, realvec, hdlr) \ - EXC_COMMON_BEGIN(name); \ - INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \ - bl save_nvgprs;\ - addir3,r1,STACK_FRAME_OVERHEAD; \ - bl hdlr; \ - b ret_from_except - -/* - * Like EXC_COMMON, but for exceptions that can occur in the idle task and - * therefore need the special idle handling (finish nap and runlatch) - */ -#define EXC_COMMON_ASYNC(name, realvec, hdlr) \ - EXC_COMMON_BEGIN(name); \ - INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \ - FINISH_NAP; \ - RUNLATCH_ON;\ - addir3,r1,STACK_FRAME_OVERHEAD; \ - bl hdlr; \ - b ret_from_except_lite - - /* * There are a few constraints to be concerned with. * - Real mode exceptions code/data must be located at their physical location. @@ -1349,7 +1327,13 @@ EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100) INT_HANDLER hardware_interrupt, 0x500, virt=1, hsrr=EXC_HV_OR_STD, bitmask=IRQS_DISABLED, kvm=1 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100) INT_KVM_HANDLER hardware_interrupt, 0x500, EXC_HV_OR_STD, PACA_EXGEN, 0 -EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ) +EXC_COMMON_BEGIN(hardware_interrupt_common) + INT_COMMON 0x500, PACA_EXGEN, 1, 1, 1, 0, 0 + FINISH_NAP + RUNLATCH_ON + addir3,r1,STACK_FRAME_OVERHEAD + bl do_IRQ + b ret_from_except_lite EXC_REAL_BEGIN(alignment, 0x600, 0x100) @@ -1455,7 +1439,13 @@ EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80) INT_HANDLER decrementer, 0x900, virt=1, bitmask=IRQS_DISABLED EXC_VIRT_END(decrementer, 0x4900, 0x80) INT_KVM_HANDLER decrementer, 0x900, EXC_STD, PACA_EXGEN, 0 -EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt) +EXC_COMMON_BEGIN(decrementer_common) + INT_COMMON 0x900, PACA_EXGEN, 1, 1, 1, 0, 0 + FINISH_NAP + RUNLATCH_ON + addir3,r1,STACK_FRAME_OVERHEAD + bl timer_interrupt + b ret_from_except_lite EXC_REAL_BEGIN(hdecrementer, 0x980, 0x80) @@ -1465,7 +1455,12 @@ EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80) INT_HANDLER hdecrementer, 0x980, virt=1, hsrr=EXC_HV, kvm=1 EXC_VIRT_END(hdecrementer, 0x4980, 0x80) INT_KVM_HANDLER hdecrementer, 0x980, EXC_HV, PACA_EXGEN, 0 -EXC_COMMON(hdecrementer_common, 0x980, hdec_interrupt) +EXC_COMMON_BEGIN(hdecrementer_common) + INT_COMMON 0x980, PACA_EXGEN, 1, 1, 1, 0, 0 + bl save_nvgprs + addir3,r1,STACK_FRAME_OVERHEAD + bl hdec_interrupt + b ret_from_except EXC_REAL_BEGIN(doorbell_super, 0xa00, 0x100) @@ -1475,11 +1470,17 @@ EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100) INT_HANDLER doorbell_super, 0xa00, virt=1, bitmask=IRQS_DISABLED EXC_VIRT_END(doorbell_super, 0x4a00, 0x100) INT_KVM_HANDLER doorbell_super, 0xa00, EXC_STD, PACA_EXGEN, 0 +EXC_COMMON_BEGIN(doorbell_super_common) + INT_COMMON 0xa00, PACA_EXGEN, 1, 1, 1, 0, 0 + FINISH_NAP + RUNLATCH_ON + addir3,r1,STACK_FRAME_OVERHEAD #ifdef CONFIG_PPC_DOORBELL -EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, doorbell_exception) + bl doorbell_exception #else -EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, unknown_exception) + bl unknown_exception #endif + b ret_from_except_lite EXC_REAL_NONE(0xb00, 0x100) @@ -1610,7 +1611,12 @@ EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100) INT_HANDLER single_step, 0xd00, virt=1 EXC_VIRT_END(single_step, 0x4d00, 0x100) INT_KVM_HANDLER single_step, 0xd00, EXC_STD, PACA_EXGEN, 0 -EXC_COMMON(single_step_common, 0xd00, single_step_exception) +EXC_COMMON_BEGIN(single_step_common) + INT_COMMON 0xd00, PACA_EXGEN, 1, 1, 1, 0, 0 + bl save_nvgprs + addir3,r1,STACK_FRAME_
[PATCH v3 05/32] powerpc/64s/exception: Move all interrupt handlers to new style code gen macros
Aside from label names and BUG line numbers, the generated code change is an additional HMI KVM handler added for the "late" KVM handler, because early and late HMI generation is achieved by defining two different interrupt types. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 556 --- 1 file changed, 418 insertions(+), 138 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 0f1da3099c28..0157ba48efe9 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -206,8 +206,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define IMASK .L_IMASK_\name\() #define IKVM_SKIP .L_IKVM_SKIP_\name\() #define IKVM_REAL .L_IKVM_REAL_\name\() +#define __IKVM_REAL(name) .L_IKVM_REAL_ ## name #define IKVM_VIRT .L_IKVM_VIRT_\name\() #define ISTACK .L_ISTACK_\name\() +#define __ISTACK(name) .L_ISTACK_ ## name #define IRECONCILE .L_IRECONCILE_\name\() #define IKUAP .L_IKUAP_\name\() @@ -570,7 +572,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) /* nothing more */ .elseif \early mfctr r10 /* save ctr, even for !RELOCATABLE */ - BRANCH_TO_C000(r11, \name\()_early_common) + BRANCH_TO_C000(r11, \name\()_common) .elseif !\virt INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri .else @@ -843,6 +845,19 @@ __start_interrupts: EXC_VIRT_NONE(0x4000, 0x100) +INT_DEFINE_BEGIN(system_reset) + IVEC=0x100 + IAREA=PACA_EXNMI + /* +* MSR_RI is not enabled, because PACA_EXNMI and nmi stack is +* being used, so a nested NMI exception would corrupt it. +*/ + ISET_RI=0 + ISTACK=0 + IRECONCILE=0 + IKVM_REAL=1 +INT_DEFINE_END(system_reset) + EXC_REAL_BEGIN(system_reset, 0x100, 0x100) #ifdef CONFIG_PPC_P7_NAP /* @@ -880,11 +895,8 @@ BEGIN_FTR_SECTION END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) #endif - INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0, kvm=1 + GEN_INT_ENTRY system_reset, virt=0 /* -* MSR_RI is not enabled, because PACA_EXNMI and nmi stack is -* being used, so a nested NMI exception would corrupt it. -* * In theory, we should not enable relocation here if it was disabled * in SRR1, because the MMU may not be configured to support it (e.g., * SLB may have been cleared). In practice, there should only be a few @@ -893,7 +905,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) */ EXC_REAL_END(system_reset, 0x100, 0x100) EXC_VIRT_NONE(0x4100, 0x100) -INT_KVM_HANDLER system_reset 0x100, EXC_STD, PACA_EXNMI, 0 +TRAMP_KVM_BEGIN(system_reset_kvm) + GEN_KVM system_reset #ifdef CONFIG_PPC_P7_NAP TRAMP_REAL_BEGIN(system_reset_idle_wake) @@ -908,8 +921,8 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake) * Vectors for the FWNMI option. Share common code. */ TRAMP_REAL_BEGIN(system_reset_fwnmi) - /* See comment at system_reset exception, don't turn on RI */ - INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0 + __IKVM_REAL(system_reset)=0 + GEN_INT_ENTRY system_reset, virt=0 #endif /* CONFIG_PPC_PSERIES */ @@ -929,7 +942,7 @@ EXC_COMMON_BEGIN(system_reset_common) mr r10,r1 ld r1,PACA_NMI_EMERG_SP(r13) subir1,r1,INT_FRAME_SIZE - INT_COMMON 0x100, PACA_EXNMI, 0, 1, 0, 0, 0 + GEN_COMMON system_reset bl save_nvgprs /* * Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does @@ -971,23 +984,46 @@ EXC_COMMON_BEGIN(system_reset_common) RFI_TO_USER_OR_KERNEL -EXC_REAL_BEGIN(machine_check, 0x200, 0x100) - INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, dsisr=1 +INT_DEFINE_BEGIN(machine_check_early) + IVEC=0x200 + IAREA=PACA_EXMC /* * MSR_RI is not enabled, because PACA_EXMC is being used, so a * nested machine check corrupts it. machine_check_common enables * MSR_RI. */ + ISET_RI=0 + ISTACK=0 + IEARLY=1 + IDAR=1 + IDSISR=1 + IRECONCILE=0 + IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */ +INT_DEFINE_END(machine_check_early) + +INT_DEFINE_BEGIN(machine_check) + IVEC=0x200 + IAREA=PACA_EXMC + ISET_RI=0 + IDAR=1 + IDSISR=1 + IKVM_SKIP=1 + IKVM_REAL=1 +INT_DEFINE_END(machine_check) + +EXC_REAL_BEGIN(machine_check, 0x200, 0x100) + GEN_INT_ENTRY machine_check_early, virt=0 EXC_REAL_END(machine_check, 0x200, 0x100) EXC_VIRT_NONE(0x4200, 0x100) #ifdef CONFIG_PPC_PSERIES TRAMP_REAL_BEGIN(machine_check_fwnmi) /* See comment at machine_check exception, don't turn on RI */ - INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, da
[PATCH v3 06/32] powerpc/64s/exception: Remove old INT_ENTRY macro
Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 68 1 file changed, 30 insertions(+), 38 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 0157ba48efe9..74bf6e0bf61f 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -482,13 +482,13 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) * - Fall through and continue executing in real, unrelocated mode. * This is done if early=2. */ -.macro INT_HANDLER name, vec, ool=0, early=0, virt=0, hsrr=0, area=PACA_EXGEN, ri=1, dar=0, dsisr=0, bitmask=0, kvm=0 +.macro GEN_INT_ENTRY name, virt, ool=0 SET_SCRATCH0(r13) /* save r13 */ GET_PACA(r13) - std r9,\area\()+EX_R9(r13) /* save r9 */ + std r9,IAREA+EX_R9(r13) /* save r9 */ OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR) HMT_MEDIUM - std r10,\area\()+EX_R10(r13)/* save r10 - r12 */ + std r10,IAREA+EX_R10(r13) /* save r10 - r12 */ OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) .if \ool .if !\virt @@ -502,47 +502,47 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endif .endif - OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR) - OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR) + OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR) + OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR) INTERRUPT_TO_KERNEL - SAVE_CTR(r10, \area\()) + SAVE_CTR(r10, IAREA) mfcrr9 - .if \kvm - KVMTEST \name \hsrr \vec + .if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT) + KVMTEST \name IHSRR IVEC .endif - .if \bitmask + .if IMASK lbz r10,PACAIRQSOFTMASK(r13) - andi. r10,r10,\bitmask + andi. r10,r10,IMASK /* Associate vector numbers with bits in paca->irq_happened */ - .if \vec == 0x500 || \vec == 0xea0 + .if IVEC == 0x500 || IVEC == 0xea0 li r10,PACA_IRQ_EE - .elseif \vec == 0x900 + .elseif IVEC == 0x900 li r10,PACA_IRQ_DEC - .elseif \vec == 0xa00 || \vec == 0xe80 + .elseif IVEC == 0xa00 || IVEC == 0xe80 li r10,PACA_IRQ_DBELL - .elseif \vec == 0xe60 + .elseif IVEC == 0xe60 li r10,PACA_IRQ_HMI - .elseif \vec == 0xf00 + .elseif IVEC == 0xf00 li r10,PACA_IRQ_PMI .else .abort "Bad maskable vector" .endif - .if \hsrr == EXC_HV_OR_STD + .if IHSRR == EXC_HV_OR_STD BEGIN_FTR_SECTION bne masked_Hinterrupt FTR_SECTION_ELSE bne masked_interrupt ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr + .elseif IHSRR bne masked_Hinterrupt .else bne masked_interrupt .endif .endif - std r11,\area\()+EX_R11(r13) - std r12,\area\()+EX_R12(r13) + std r11,IAREA+EX_R11(r13) + std r12,IAREA+EX_R12(r13) /* * DAR/DSISR, SCRATCH0 must be read before setting MSR[RI], @@ -550,47 +550,39 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) * not recoverable if they are live. */ GET_SCRATCH0(r10) - std r10,\area\()+EX_R13(r13) - .if \dar == 1 - .if \hsrr + std r10,IAREA+EX_R13(r13) + .if IDAR == 1 + .if IHSRR mfspr r10,SPRN_HDAR .else mfspr r10,SPRN_DAR .endif - std r10,\area\()+EX_DAR(r13) + std r10,IAREA+EX_DAR(r13) .endif - .if \dsisr == 1 - .if \hsrr + .if IDSISR == 1 + .if IHSRR mfspr r10,SPRN_HDSISR .else mfspr r10,SPRN_DSISR .endif - stw r10,\area\()+EX_DSISR(r13) + stw r10,IAREA+EX_DSISR(r13) .endif - .if \early == 2 + .if IEARLY == 2 /* nothing more */ - .elseif \early + .elseif IEARLY mfctr r10 /* save ctr, even for !RELOCATABLE */ BRANCH_TO_C000(r11, \name\()_common) .elseif !\virt - INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri + INT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR, ISET_RI .else - INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr + INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR .endif .if \ool .popsection .endif .endm
[PATCH v3 08/32] powerpc/64s/exception: Remove old INT_KVM_HANDLER
Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 55 +--- 1 file changed, 26 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 90514766dc7d..cba99f9a815b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -266,15 +266,6 @@ do_define_int n .endif .endm -.macro INT_KVM_HANDLER name, vec, hsrr, area, skip - TRAMP_KVM_BEGIN(\name\()_kvm) - KVM_HANDLER \vec, \hsrr, \area, \skip -.endm - -.macro GEN_KVM name - KVM_HANDLER IVEC, IHSRR, IAREA, IKVM_SKIP -.endm - #ifdef CONFIG_KVM_BOOK3S_64_HANDLER #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* @@ -293,35 +284,35 @@ do_define_int n bne \name\()_kvm .endm -.macro KVM_HANDLER vec, hsrr, area, skip - .if \skip +.macro GEN_KVM name + .if IKVM_SKIP cmpwi r10,KVM_GUEST_MODE_SKIP beq 89f .else BEGIN_FTR_SECTION_NESTED(947) - ld r10,\area+EX_CFAR(r13) + ld r10,IAREA+EX_CFAR(r13) std r10,HSTATE_CFAR(r13) END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947) .endif BEGIN_FTR_SECTION_NESTED(948) - ld r10,\area+EX_PPR(r13) + ld r10,IAREA+EX_PPR(r13) std r10,HSTATE_PPR(r13) END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) - ld r10,\area+EX_R10(r13) + ld r10,IAREA+EX_R10(r13) std r12,HSTATE_SCRATCH0(r13) sldir12,r9,32 /* HSRR variants have the 0x2 bit added to their trap number */ - .if \hsrr == EXC_HV_OR_STD + .if IHSRR == EXC_HV_OR_STD BEGIN_FTR_SECTION - ori r12,r12,(\vec + 0x2) + ori r12,r12,(IVEC + 0x2) FTR_SECTION_ELSE - ori r12,r12,(\vec) + ori r12,r12,(IVEC) ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - ori r12,r12,(\vec + 0x2) + .elseif IHSRR + ori r12,r12,(IVEC+ 0x2) .else - ori r12,r12,(\vec) + ori r12,r12,(IVEC) .endif #ifdef CONFIG_RELOCATABLE @@ -334,25 +325,25 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) std r9,HSTATE_SCRATCH1(r13) __LOAD_FAR_HANDLER(r9, kvmppc_interrupt) mtctr r9 - ld r9,\area+EX_R9(r13) + ld r9,IAREA+EX_R9(r13) bctr #else - ld r9,\area+EX_R9(r13) + ld r9,IAREA+EX_R9(r13) b kvmppc_interrupt #endif - .if \skip + .if IKVM_SKIP 89:mtocrf 0x80,r9 - ld r9,\area+EX_R9(r13) - ld r10,\area+EX_R10(r13) - .if \hsrr == EXC_HV_OR_STD + ld r9,IAREA+EX_R9(r13) + ld r10,IAREA+EX_R10(r13) + .if IHSRR == EXC_HV_OR_STD BEGIN_FTR_SECTION b kvmppc_skip_Hinterrupt FTR_SECTION_ELSE b kvmppc_skip_interrupt ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr + .elseif IHSRR b kvmppc_skip_Hinterrupt .else b kvmppc_skip_interrupt @@ -363,7 +354,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) #else .macro KVMTEST name, hsrr, n .endm -.macro KVM_HANDLER name, vec, hsrr, area, skip +.macro GEN_KVM name .endm #endif @@ -1627,6 +1618,12 @@ EXC_VIRT_NONE(0x4b00, 0x100) * without saving, though xer is not a good idea to use, as hardware may * interpret some bits so it may be costly to change them. */ +INT_DEFINE_BEGIN(system_call) + IVEC=0xc00 + IKVM_REAL=1 + IKVM_VIRT=1 +INT_DEFINE_END(system_call) + .macro SYSTEM_CALL virt #ifdef CONFIG_KVM_BOOK3S_64_HANDLER /* @@ -1720,7 +1717,7 @@ TRAMP_KVM_BEGIN(system_call_kvm) SET_SCRATCH0(r10) std r9,PACA_EXGEN+EX_R9(r13) mfcrr9 - KVM_HANDLER 0xc00, EXC_STD, PACA_EXGEN, 0 + GEN_KVM system_call #endif -- 2.23.0
[PATCH v3 07/32] powerpc/64s/exception: Remove old INT_COMMON macro
Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 51 +--- 1 file changed, 24 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 74bf6e0bf61f..90514766dc7d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -591,8 +591,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) * If stack=0, then the stack is already set in r1, and r1 is saved in r10. * PPR save and CPU accounting is not done for the !stack case (XXX why not?) */ -.macro INT_COMMON vec, area, stack, kuap, reconcile, dar, dsisr - .if \stack +.macro GEN_COMMON name + .if ISTACK andi. r10,r12,MSR_PR /* See if coming from user */ mr r10,r1 /* Save r1 */ subir1,r1,INT_FRAME_SIZE/* alloc frame on kernel stack */ @@ -609,54 +609,54 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) std r0,GPR0(r1) /* save r0 in stackframe*/ std r10,GPR1(r1)/* save r1 in stackframe*/ - .if \stack - .if \kuap + .if ISTACK + .if IKUAP kuap_save_amr_and_lock r9, r10, cr1, cr0 .endif beq 101f/* if from kernel mode */ ACCOUNT_CPU_USER_ENTRY(r13, r9, r10) - SAVE_PPR(\area, r9) + SAVE_PPR(IAREA, r9) 101: .else - .if \kuap + .if IKUAP kuap_save_amr_and_lock r9, r10, cr1 .endif .endif /* Save original regs values from save area to stack frame. */ - ld r9,\area+EX_R9(r13) /* move r9, r10 to stackframe */ - ld r10,\area+EX_R10(r13) + ld r9,IAREA+EX_R9(r13) /* move r9, r10 to stackframe */ + ld r10,IAREA+EX_R10(r13) std r9,GPR9(r1) std r10,GPR10(r1) - ld r9,\area+EX_R11(r13)/* move r11 - r13 to stackframe */ - ld r10,\area+EX_R12(r13) - ld r11,\area+EX_R13(r13) + ld r9,IAREA+EX_R11(r13)/* move r11 - r13 to stackframe */ + ld r10,IAREA+EX_R12(r13) + ld r11,IAREA+EX_R13(r13) std r9,GPR11(r1) std r10,GPR12(r1) std r11,GPR13(r1) - .if \dar - .if \dar == 2 + .if IDAR + .if IDAR == 2 ld r10,_NIP(r1) .else - ld r10,\area+EX_DAR(r13) + ld r10,IAREA+EX_DAR(r13) .endif std r10,_DAR(r1) .endif - .if \dsisr - .if \dsisr == 2 + .if IDSISR + .if IDSISR == 2 ld r10,_MSR(r1) lis r11,DSISR_SRR1_MATCH_64S@h and r10,r10,r11 .else - lwz r10,\area+EX_DSISR(r13) + lwz r10,IAREA+EX_DSISR(r13) .endif std r10,_DSISR(r1) .endif BEGIN_FTR_SECTION_NESTED(66) - ld r10,\area+EX_CFAR(r13) + ld r10,IAREA+EX_CFAR(r13) std r10,ORIG_GPR3(r1) END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66) - GET_CTR(r10, \area) + GET_CTR(r10, IAREA) std r10,_CTR(r1) std r2,GPR2(r1) /* save r2 in stackframe*/ SAVE_4GPRS(3, r1) /* save r3 - r6 in stackframe */ @@ -668,26 +668,22 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66) mfspr r11,SPRN_XER/* save XER in stackframe */ std r10,SOFTE(r1) std r11,_XER(r1) - li r9,(\vec)+1 + li r9,(IVEC)+1 std r9,_TRAP(r1)/* set trap number */ li r10,0 ld r11,exception_marker@toc(r2) std r10,RESULT(r1) /* clear regs->result */ std r11,STACK_FRAME_OVERHEAD-16(r1) /* mark the frame */ - .if \stack + .if ISTACK ACCOUNT_STOLEN_TIME .endif - .if \reconcile + .if IRECONCILE RECONCILE_IRQ_STATE(r10, r11) .endif .endm -.macro GEN_COMMON name - INT_COMMON IVEC, IAREA, ISTACK, IKUAP, IRECONCILE, IDAR, IDSISR -.endm - /* * Restore all registers including H/SRR0/1 saved in a stack frame of a * standard exception. @@ -2387,7 +2383,8 @@ EXC_COMMON_BEGIN(soft_nmi_common) mr r10,r1 ld r1,PACAEMERGSP(r13) subir1,r1,INT_FRAME_SIZE - INT_COMMON 0x900, PACA_EXGEN, 0, 1, 1, 0, 0 + __ISTACK(decrementer)=0 + GEN_COMMON decrementer bl save_nvgprs addir3,r1,STACK_FRAME_OVERHEAD bl soft_nmi_interrupt -- 2.23.0
[PATCH v3 09/32] powerpc/64s/exception: Add ISIDE option
Rather than using DAR=2 to select the i-side registers, add an explicit option. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index cba99f9a815b..4eb099046f9d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -199,6 +199,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define IVEC .L_IVEC_\name\() #define IHSRR .L_IHSRR_\name\() #define IAREA .L_IAREA_\name\() +#define IISIDE .L_IISIDE_\name\() #define IDAR .L_IDAR_\name\() #define IDSISR .L_IDSISR_\name\() #define ISET_RI.L_ISET_RI_\name\() @@ -231,6 +232,9 @@ do_define_int n .ifndef IAREA IAREA=PACA_EXGEN .endif + .ifndef IISIDE + IISIDE=0 + .endif .ifndef IDAR IDAR=0 .endif @@ -542,7 +546,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) */ GET_SCRATCH0(r10) std r10,IAREA+EX_R13(r13) - .if IDAR == 1 + .if IDAR && !IISIDE .if IHSRR mfspr r10,SPRN_HDAR .else @@ -550,7 +554,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endif std r10,IAREA+EX_DAR(r13) .endif - .if IDSISR == 1 + .if IDSISR && !IISIDE .if IHSRR mfspr r10,SPRN_HDSISR .else @@ -625,16 +629,18 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) std r9,GPR11(r1) std r10,GPR12(r1) std r11,GPR13(r1) + .if IDAR - .if IDAR == 2 + .if IISIDE ld r10,_NIP(r1) .else ld r10,IAREA+EX_DAR(r13) .endif std r10,_DAR(r1) .endif + .if IDSISR - .if IDSISR == 2 + .if IISIDE ld r10,_MSR(r1) lis r11,DSISR_SRR1_MATCH_64S@h and r10,r10,r11 @@ -643,6 +649,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endif std r10,_DSISR(r1) .endif + BEGIN_FTR_SECTION_NESTED(66) ld r10,IAREA+EX_CFAR(r13) std r10,ORIG_GPR3(r1) @@ -1311,8 +1318,9 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) INT_DEFINE_BEGIN(instruction_access) IVEC=0x400 - IDAR=2 - IDSISR=2 + IISIDE=1 + IDAR=1 + IDSISR=1 IKVM_REAL=1 INT_DEFINE_END(instruction_access) @@ -1341,7 +1349,8 @@ INT_DEFINE_BEGIN(instruction_access_slb) IVEC=0x480 IAREA=PACA_EXSLB IRECONCILE=0 - IDAR=2 + IISIDE=1 + IDAR=1 IKVM_REAL=1 INT_DEFINE_END(instruction_access_slb) -- 2.23.0
[PATCH v3 10/32] powerpc/64s/exception: move real->virt switch into the common handler
The real mode interrupt entry points currently use rfid to branch to the common handler in virtual mode. This is a significant amount of code, and forces other code (notably the KVM test) to live in the real mode handler. In the interest of minimising the amount of code that runs unrelocated move the switch to virt mode into the common code, and do it with mtmsrd, which avoids clobbering SRRs (although the post-KVMTEST performance of real-mode interrupt handlers is not a big concern these days). This requires CTR to always be saved (real-mode needs to reach 0xc...) but that's not a huge impact these days. It could be optimized away in future. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/exception-64s.h | 4 - arch/powerpc/kernel/exceptions-64s.S | 251 ++- 2 files changed, 109 insertions(+), 146 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 33f4f72eb035..47bd4ea0837d 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -33,11 +33,7 @@ #include /* PACA save area size in u64 units (exgen, exmc, etc) */ -#if defined(CONFIG_RELOCATABLE) #define EX_SIZE10 -#else -#define EX_SIZE9 -#endif /* * maximum recursive depth of MCE exceptions diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 4eb099046f9d..112cdb446e03 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -32,16 +32,10 @@ #define EX_CCR 52 #define EX_CFAR56 #define EX_PPR 64 -#if defined(CONFIG_RELOCATABLE) #define EX_CTR 72 .if EX_SIZE != 10 .error "EX_SIZE is wrong" .endif -#else -.if EX_SIZE != 9 - .error "EX_SIZE is wrong" -.endif -#endif /* * Following are fixed section helper macros. @@ -124,22 +118,6 @@ name: #define EXC_HV 1 #define EXC_STD0 -#if defined(CONFIG_RELOCATABLE) -/* - * If we support interrupts with relocation on AND we're a relocatable kernel, - * we need to use CTR to get to the 2nd level handler. So, save/restore it - * when required. - */ -#define SAVE_CTR(reg, area)mfctr reg ; std reg,area+EX_CTR(r13) -#define GET_CTR(reg, area) ld reg,area+EX_CTR(r13) -#define RESTORE_CTR(reg, area) ld reg,area+EX_CTR(r13) ; mtctr reg -#else -/* ...else CTR is unused and in register. */ -#define SAVE_CTR(reg, area) -#define GET_CTR(reg, area) mfctr reg -#define RESTORE_CTR(reg, area) -#endif - /* * PPR save/restore macros used in exceptions-64s.S * Used for P7 or later processors @@ -199,6 +177,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define IVEC .L_IVEC_\name\() #define IHSRR .L_IHSRR_\name\() #define IAREA .L_IAREA_\name\() +#define IVIRT .L_IVIRT_\name\() #define IISIDE .L_IISIDE_\name\() #define IDAR .L_IDAR_\name\() #define IDSISR .L_IDSISR_\name\() @@ -232,6 +211,9 @@ do_define_int n .ifndef IAREA IAREA=PACA_EXGEN .endif + .ifndef IVIRT + IVIRT=1 + .endif .ifndef IISIDE IISIDE=0 .endif @@ -325,7 +307,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) * outside the head section. CONFIG_RELOCATABLE KVM expects CTR * to be saved in HSTATE_SCRATCH1. */ - mfctr r9 + ld r9,IAREA+EX_CTR(r13) std r9,HSTATE_SCRATCH1(r13) __LOAD_FAR_HANDLER(r9, kvmppc_interrupt) mtctr r9 @@ -362,101 +344,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endm #endif -.macro INT_SAVE_SRR_AND_JUMP label, hsrr, set_ri - ld r10,PACAKMSR(r13) /* get MSR value for kernel */ - .if ! \set_ri - xorir10,r10,MSR_RI /* Clear MSR_RI */ - .endif - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - mtspr SPRN_HSRR1,r10 - FTR_SECTION_ELSE - mfspr r11,SPRN_SRR0 /* save SRR0 */ - mfspr r12,SPRN_SRR1 /* and SRR1 */ - mtspr SPRN_SRR1,r10 - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif \hsrr - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ - mfspr r12,SPRN_HSRR1 /* and HSRR1 */ - mtspr SPRN_HSRR1,r10 - .else - mfspr r11,SPRN_SRR0 /* save SRR0 */ - mfspr r12,SPRN_SRR1 /* and SRR1 */ - mtspr SPRN_SRR1,r10 - .endif - LOAD_HANDLER(r10, \label\()) - .if \hsrr == EXC_HV_OR_STD - BEGIN_FTR_SECTION - mtspr SPRN_HSRR0,r10 - HRFI_TO_KERNEL - FTR_SECTION_ELSE - mtspr SPRN_SRR0,r10 - RFI_TO_KERNE
[PATCH v3 11/32] powerpc/64s/exception: move soft-mask test to common code
As well as moving code out of the unrelocated vectors, this allows the masked handlers to be moved to common code, and allows the soft_nmi handler to be generated more like a regular handler. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 106 +-- 1 file changed, 49 insertions(+), 57 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 112cdb446e03..a23f2450f9ed 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -411,36 +411,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT) KVMTEST \name IHSRR IVEC .endif - .if IMASK - lbz r10,PACAIRQSOFTMASK(r13) - andi. r10,r10,IMASK - /* Associate vector numbers with bits in paca->irq_happened */ - .if IVEC == 0x500 || IVEC == 0xea0 - li r10,PACA_IRQ_EE - .elseif IVEC == 0x900 - li r10,PACA_IRQ_DEC - .elseif IVEC == 0xa00 || IVEC == 0xe80 - li r10,PACA_IRQ_DBELL - .elseif IVEC == 0xe60 - li r10,PACA_IRQ_HMI - .elseif IVEC == 0xf00 - li r10,PACA_IRQ_PMI - .else - .abort "Bad maskable vector" - .endif - - .if IHSRR == EXC_HV_OR_STD - BEGIN_FTR_SECTION - bne masked_Hinterrupt - FTR_SECTION_ELSE - bne masked_interrupt - ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) - .elseif IHSRR - bne masked_Hinterrupt - .else - bne masked_interrupt - .endif - .endif std r11,IAREA+EX_R11(r13) std r12,IAREA+EX_R12(r13) @@ -524,6 +494,37 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt) .endm .macro __GEN_COMMON_BODY name + .if IMASK + lbz r10,PACAIRQSOFTMASK(r13) + andi. r10,r10,IMASK + /* Associate vector numbers with bits in paca->irq_happened */ + .if IVEC == 0x500 || IVEC == 0xea0 + li r10,PACA_IRQ_EE + .elseif IVEC == 0x900 + li r10,PACA_IRQ_DEC + .elseif IVEC == 0xa00 || IVEC == 0xe80 + li r10,PACA_IRQ_DBELL + .elseif IVEC == 0xe60 + li r10,PACA_IRQ_HMI + .elseif IVEC == 0xf00 + li r10,PACA_IRQ_PMI + .else + .abort "Bad maskable vector" + .endif + + .if IHSRR == EXC_HV_OR_STD + BEGIN_FTR_SECTION + bne masked_Hinterrupt + FTR_SECTION_ELSE + bne masked_interrupt + ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206) + .elseif IHSRR + bne masked_Hinterrupt + .else + bne masked_interrupt + .endif + .endif + .if ISTACK andi. r10,r12,MSR_PR /* See if coming from user */ mr r10,r1 /* Save r1 */ @@ -2330,18 +2331,10 @@ EXC_VIRT_NONE(0x5800, 0x100) #ifdef CONFIG_PPC_WATCHDOG -#define MASKED_DEC_HANDLER_LABEL 3f - -#define MASKED_DEC_HANDLER(_H) \ -3: /* soft-nmi */ \ - std r12,PACA_EXGEN+EX_R12(r13); \ - GET_SCRATCH0(r10); \ - std r10,PACA_EXGEN+EX_R13(r13); \ - mfspr r11,SPRN_SRR0; /* save SRR0 */ \ - mfspr r12,SPRN_SRR1; /* and SRR1 */ \ - LOAD_HANDLER(r10, soft_nmi_common); \ - mtctr r10;\ - bctr +INT_DEFINE_BEGIN(soft_nmi) + IVEC=0x900 + ISTACK=0 +INT_DEFINE_END(soft_nmi) /* * Branch to soft_nmi_interrupt using the emergency stack. The emergency @@ -2353,19 +2346,16 @@ EXC_VIRT_NONE(0x5800, 0x100) * and run it entirely with interrupts hard disabled. */ EXC_COMMON_BEGIN(soft_nmi_common) + mfspr r11,SPRN_SRR0 mr r10,r1 ld r1,PACAEMERGSP(r13) subir1,r1,INT_FRAME_SIZE - __ISTACK(decrementer)=0 - __GEN_COMMON_BODY decrementer + __GEN_COMMON_BODY soft_nmi bl save_nvgprs addir3,r1,STACK_FRAME_OVERHEAD bl soft_nmi_interrupt b ret_from_except -#else /* CONFIG_PPC_WATCHDOG */ -#define MASKED_DEC_HANDLER_LABEL 2f /* normal return */ -#define MASKED_DEC_HANDLER(_H) #endif /* CONFIG_PPC_WATCHDOG */ /* @@ -2384,7 +2374,6 @@ masked_Hinterrupt: .else masked_interrupt: .
[PATCH v3 14/32] powerpc/64s/exception: remove the SPR saving patch code macros
These are used infrequently enough they don't provide much help, so inline them. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 82 ++-- 1 file changed, 28 insertions(+), 54 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index f4f35d01fe00..feb563416abd 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -110,46 +110,6 @@ name: #define EXC_HV 1 #define EXC_STD0 -/* - * PPR save/restore macros used in exceptions-64s.S - * Used for P7 or later processors - */ -#define SAVE_PPR(area, ra) \ -BEGIN_FTR_SECTION_NESTED(940) \ - ld ra,area+EX_PPR(r13);/* Read PPR from paca */\ - std ra,_PPR(r1);\ -END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940) - -#define RESTORE_PPR_PACA(area, ra) \ -BEGIN_FTR_SECTION_NESTED(941) \ - ld ra,area+EX_PPR(r13);\ - mtspr SPRN_PPR,ra;\ -END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941) - -/* - * Get an SPR into a register if the CPU has the given feature - */ -#define OPT_GET_SPR(ra, spr, ftr) \ -BEGIN_FTR_SECTION_NESTED(943) \ - mfspr ra,spr; \ -END_FTR_SECTION_NESTED(ftr,ftr,943) - -/* - * Set an SPR from a register if the CPU has the given feature - */ -#define OPT_SET_SPR(ra, spr, ftr) \ -BEGIN_FTR_SECTION_NESTED(943) \ - mtspr spr,ra; \ -END_FTR_SECTION_NESTED(ftr,ftr,943) - -/* - * Save a register to the PACA if the CPU has the given feature - */ -#define OPT_SAVE_REG_TO_PACA(offset, ra, ftr) \ -BEGIN_FTR_SECTION_NESTED(943) \ - std ra,offset(r13); \ -END_FTR_SECTION_NESTED(ftr,ftr,943) - /* * Branch to label using its 0xC000 address. This results in instruction * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned @@ -278,18 +238,18 @@ do_define_int n cmpwi r10,KVM_GUEST_MODE_SKIP beq 89f .else -BEGIN_FTR_SECTION_NESTED(947) +BEGIN_FTR_SECTION ld r10,IAREA+EX_CFAR(r13) std r10,HSTATE_CFAR(r13) -END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947) +END_FTR_SECTION_IFSET(CPU_FTR_CFAR) .endif ld r10,PACA_EXGEN+EX_CTR(r13) mtctr r10 -BEGIN_FTR_SECTION_NESTED(948) +BEGIN_FTR_SECTION ld r10,IAREA+EX_PPR(r13) std r10,HSTATE_PPR(r13) -END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r11,IAREA+EX_R11(r13) ld r12,IAREA+EX_R12(r13) std r12,HSTATE_SCRATCH0(r13) @@ -386,10 +346,14 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) SET_SCRATCH0(r13) /* save r13 */ GET_PACA(r13) std r9,IAREA+EX_R9(r13) /* save r9 */ - OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR) +BEGIN_FTR_SECTION + mfspr r9,SPRN_PPR +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) HMT_MEDIUM std r10,IAREA+EX_R10(r13) /* save r10 - r12 */ - OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR) +BEGIN_FTR_SECTION + mfspr r10,SPRN_CFAR +END_FTR_SECTION_IFSET(CPU_FTR_CFAR) .if \ool .if !\virt b tramp_real_\name @@ -402,8 +366,12 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .endif .endif - OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR) - OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR) +BEGIN_FTR_SECTION + std r9,IAREA+EX_PPR(r13) +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) +BEGIN_FTR_SECTION + std r10,IAREA+EX_CFAR(r13) +END_FTR_SECTION_IFSET(CPU_FTR_CFAR) INTERRUPT_TO_KERNEL mfctr r10 std r10,IAREA+EX_CTR(r13) @@ -558,7 +526,10 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt) .endif beq 101f/* if from kernel mode */ ACCOUNT_CPU_USER_ENTRY(r13, r9, r10) - SAVE_PPR(IAREA, r9) +BEGIN_FTR_SECTION + ld r9,IAREA+EX_PPR(r13)/* Read PPR from paca */ + std r9,_PPR(r1) +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) 101: .else .if IKUAP @@ -598,10 +569,10 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt) std r10,_DSISR(r1) .endif -BEGIN_FT
[PATCH v3 13/32] powerpc/64s/exception: remove confusing IEARLY option
Replace IEARLY=1 and IEARLY=2 with IBRANCH_COMMON, which controls if the entry code branches to a common handler; and IREALMODE_COMMON, which controls whether the common handler should remain in real mode. These special cases no longer avoid loading the SRR registers, there is no point as most of them load the registers immediately anyway. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 48 ++-- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index eb2f6ee4d652..f4f35d01fe00 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -174,7 +174,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943) #define IDAR .L_IDAR_\name\() #define IDSISR .L_IDSISR_\name\() #define ISET_RI.L_ISET_RI_\name\() -#define IEARLY .L_IEARLY_\name\() +#define IBRANCH_TO_COMMON .L_IBRANCH_TO_COMMON_\name\() +#define IREALMODE_COMMON .L_IREALMODE_COMMON_\name\() #define IMASK .L_IMASK_\name\() #define IKVM_SKIP .L_IKVM_SKIP_\name\() #define IKVM_REAL .L_IKVM_REAL_\name\() @@ -218,8 +219,15 @@ do_define_int n .ifndef ISET_RI ISET_RI=1 .endif - .ifndef IEARLY - IEARLY=0 + .ifndef IBRANCH_TO_COMMON + IBRANCH_TO_COMMON=1 + .endif + .ifndef IREALMODE_COMMON + IREALMODE_COMMON=0 + .else + .if ! IBRANCH_TO_COMMON + .error "IREALMODE_COMMON=1 but IBRANCH_TO_COMMON=0" + .endif .endif .ifndef IMASK IMASK=0 @@ -353,6 +361,11 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) */ .macro GEN_BRANCH_TO_COMMON name, virt + .if IREALMODE_COMMON + LOAD_HANDLER(r10, \name\()_common) + mtctr r10 + bctr + .else .if \virt #ifndef CONFIG_RELOCATABLE b \name\()_common_virt @@ -366,6 +379,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) mtctr r10 bctr .endif + .endif .endm .macro GEN_INT_ENTRY name, virt, ool=0 @@ -421,11 +435,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) stw r10,IAREA+EX_DSISR(r13) .endif - .if IEARLY == 2 - /* nothing more */ - .elseif IEARLY - BRANCH_TO_C000(r11, \name\()_common) - .else .if IHSRR == EXC_HV_OR_STD BEGIN_FTR_SECTION mfspr r11,SPRN_HSRR0 /* save HSRR0 */ @@ -441,6 +450,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) mfspr r11,SPRN_SRR0 /* save SRR0 */ mfspr r12,SPRN_SRR1 /* and SRR1 */ .endif + + .if IBRANCH_TO_COMMON GEN_BRANCH_TO_COMMON \name \virt .endif @@ -926,6 +937,7 @@ INT_DEFINE_BEGIN(machine_check_early) IVEC=0x200 IAREA=PACA_EXMC IVIRT=0 /* no virt entry point */ + IREALMODE_COMMON=1 /* * MSR_RI is not enabled, because PACA_EXMC is being used, so a * nested machine check corrupts it. machine_check_common enables @@ -933,7 +945,6 @@ INT_DEFINE_BEGIN(machine_check_early) */ ISET_RI=0 ISTACK=0 - IEARLY=1 IDAR=1 IDSISR=1 IRECONCILE=0 @@ -973,9 +984,6 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi) EXCEPTION_RESTORE_REGS EXC_STD EXC_COMMON_BEGIN(machine_check_early_common) - mfspr r11,SPRN_SRR0 - mfspr r12,SPRN_SRR1 - /* * Switch to mc_emergency stack and handle re-entrancy (we limit * the nested MCE upto level 4 to avoid stack overflow). @@ -1809,7 +1817,7 @@ EXC_COMMON_BEGIN(emulation_assist_common) INT_DEFINE_BEGIN(hmi_exception_early) IVEC=0xe60 IHSRR=EXC_HV - IEARLY=1 + IREALMODE_COMMON=1 ISTACK=0 IRECONCILE=0 IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */ @@ -1829,8 +1837,6 @@ EXC_REAL_END(hmi_exception, 0xe60, 0x20) EXC_VIRT_NONE(0x4e60, 0x20) EXC_COMMON_BEGIN(hmi_exception_early_common) - mfspr r11,SPRN_HSRR0 /* Save HSRR0 */ - mfspr r12,SPRN_HSRR1 /* Save HSRR1 */ mr r10,r1 /* Save r1 */ ld r1,PACAEMERGSP(r13) /* Use emergency stack for realmode */ subir1,r1,INT_FRAME_SIZE/* alloc stack frame*/ @@ -2156,29 +2162,23 @@ EXC_VIRT_NONE(0x5400, 0x100) INT_DEFINE_BEGIN(denorm_exception) IVEC=0x1500 IHSRR=EXC_HV - IEARLY=2 + IBRANCH_TO_COMMON=0 IKVM_REAL=1 INT_DEFINE_END(denorm_exception) EXC_REAL_BEGIN(denorm_exception, 0x1500, 0x100) GEN_INT_ENTRY denorm_exception, virt=0 #ifdef CONFIG_PPC_DENORMALISATION - mfspr r10,SPRN_HSRR1 - andis. r10,r10,(HSRR1_DENORM)@h /* denorm
[PATCH v3 12/32] powerpc/64s/exception: move KVM test to common code
This allows more code to be moved out of unrelocated regions. The system call KVMTEST is changed to be open-coded and remain in the tramp area to avoid having to move it to entry_64.S. The custom nature of the system call entry code means the hcall case can be made more streamlined than regular interrupt handlers. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S| 239 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 -- arch/powerpc/kvm/book3s_segment.S | 7 - 3 files changed, 119 insertions(+), 138 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index a23f2450f9ed..eb2f6ee4d652 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -44,7 +44,6 @@ * EXC_VIRT_BEGIN/END - virt (AIL), unrelocated exception vectors * TRAMP_REAL_BEGIN- real, unrelocated helpers (virt may call these) * TRAMP_VIRT_BEGIN- virt, unreloc helpers (in practice, real can use) - * TRAMP_KVM_BEGIN - KVM handlers, these are put into real, unrelocated * EXC_COMMON - After switching to virtual, relocated mode. */ @@ -74,13 +73,6 @@ name: #define TRAMP_VIRT_BEGIN(name) \ FIXED_SECTION_ENTRY_BEGIN(virt_trampolines, name) -#ifdef CONFIG_KVM_BOOK3S_64_HANDLER -#define TRAMP_KVM_BEGIN(name) \ - TRAMP_VIRT_BEGIN(name) -#else -#define TRAMP_KVM_BEGIN(name) -#endif - #define EXC_REAL_NONE(start, size) \ FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, exc_real_##start##_##unused, start, size); \ FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, exc_real_##start##_##unused, start, size) @@ -271,6 +263,9 @@ do_define_int n .endm .macro GEN_KVM name + .balign IFETCH_ALIGN_BYTES +\name\()_kvm: + .if IKVM_SKIP cmpwi r10,KVM_GUEST_MODE_SKIP beq 89f @@ -281,13 +276,18 @@ BEGIN_FTR_SECTION_NESTED(947) END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947) .endif + ld r10,PACA_EXGEN+EX_CTR(r13) + mtctr r10 BEGIN_FTR_SECTION_NESTED(948) ld r10,IAREA+EX_PPR(r13) std r10,HSTATE_PPR(r13) END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) - ld r10,IAREA+EX_R10(r13) + ld r11,IAREA+EX_R11(r13) + ld r12,IAREA+EX_R12(r13) std r12,HSTATE_SCRATCH0(r13) sldir12,r9,32 + ld r9,IAREA+EX_R9(r13) + ld r10,IAREA+EX_R10(r13) /* HSRR variants have the 0x2 bit added to their trap number */ .if IHSRR == EXC_HV_OR_STD BEGIN_FTR_SECTION @@ -300,29 +300,16 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .else ori r12,r12,(IVEC) .endif - -#ifdef CONFIG_RELOCATABLE - /* -* KVM requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives -* outside the head section. CONFIG_RELOCATABLE KVM expects CTR -* to be saved in HSTATE_SCRATCH1. -*/ - ld r9,IAREA+EX_CTR(r13) - std r9,HSTATE_SCRATCH1(r13) - __LOAD_FAR_HANDLER(r9, kvmppc_interrupt) - mtctr r9 - ld r9,IAREA+EX_R9(r13) - bctr -#else - ld r9,IAREA+EX_R9(r13) b kvmppc_interrupt -#endif - .if IKVM_SKIP 89:mtocrf 0x80,r9 + ld r10,PACA_EXGEN+EX_CTR(r13) + mtctr r10 ld r9,IAREA+EX_R9(r13) ld r10,IAREA+EX_R10(r13) + ld r11,IAREA+EX_R11(r13) + ld r12,IAREA+EX_R12(r13) .if IHSRR == EXC_HV_OR_STD BEGIN_FTR_SECTION b kvmppc_skip_Hinterrupt @@ -407,11 +394,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) mfctr r10 std r10,IAREA+EX_CTR(r13) mfcrr9 - - .if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT) - KVMTEST \name IHSRR IVEC - .endif - std r11,IAREA+EX_R11(r13) std r12,IAREA+EX_R12(r13) @@ -475,6 +457,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948) .macro __GEN_COMMON_ENTRY name DEFINE_FIXED_SYMBOL(\name\()_common_real) \name\()_common_real: + .if IKVM_REAL + KVMTEST \name IHSRR IVEC + .endif + ld r10,PACAKMSR(r13) /* get MSR value for kernel */ /* MSR[RI] is clear iff using SRR regs */ .if IHSRR == EXC_HV_OR_STD @@ -487,9 +473,17 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real) mtmsrd r10 .if IVIRT + .if IKVM_VIRT + b 1f /* skip the virt test coming from real */ + .endif + .balign IFETCH_ALIGN_BYTES DEFINE_FIXED_SYMBOL(\name\()_common_virt) \name\()_common_virt: + .if IKVM_VIRT + KVMTEST \name IHSRR IVEC +1: + .endif .endif /* IVIRT */ .endm @@ -848,8 +842,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE |
[PATCH v3 15/32] powerpc/64s/exception: trim unused arguments from KVMTEST macro
Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index feb563416abd..7e056488d42a 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -224,7 +224,7 @@ do_define_int n #define kvmppc_interrupt kvmppc_interrupt_pr #endif -.macro KVMTEST name, hsrr, n +.macro KVMTEST name lbz r10,HSTATE_IN_GUEST(r13) cmpwi r10,0 bne \name\()_kvm @@ -293,7 +293,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) .endm #else -.macro KVMTEST name, hsrr, n +.macro KVMTEST name .endm .macro GEN_KVM name .endm @@ -437,7 +437,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) DEFINE_FIXED_SYMBOL(\name\()_common_real) \name\()_common_real: .if IKVM_REAL - KVMTEST \name IHSRR IVEC + KVMTEST \name .endif ld r10,PACAKMSR(r13) /* get MSR value for kernel */ @@ -460,7 +460,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real) DEFINE_FIXED_SYMBOL(\name\()_common_virt) \name\()_common_virt: .if IKVM_VIRT - KVMTEST \name IHSRR IVEC + KVMTEST \name 1: .endif .endif /* IVIRT */ @@ -1582,7 +1582,7 @@ INT_DEFINE_END(system_call) GET_PACA(r13) std r10,PACA_EXGEN+EX_R10(r13) INTERRUPT_TO_KERNEL - KVMTEST system_call EXC_STD 0xc00 /* uses r10, branch to system_call_kvm */ + KVMTEST system_call /* uses r10, branch to system_call_kvm */ mfctr r9 #else mr r9,r13 -- 2.23.0
[PATCH v3 16/32] powerpc/64s/exception: hdecrementer avoid touching the stack
The hdec interrupt handler is reported to sometimes fire in Linux if KVM leaves it pending after a guest exists. This is harmless, so there is a no-op handler for it. The interrupt handler currently uses the regular kernel stack. Change this to avoid touching the stack entirely. This should be the last place where the regular Linux stack can be accessed with asynchronous interrupts (including PMI) soft-masked. It might be possible to take advantage of this invariant, e.g., to context switch the kernel stack SLB entry without clearing MSR[EE]. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/time.h | 1 - arch/powerpc/kernel/exceptions-64s.S | 25 - arch/powerpc/kernel/time.c | 9 - 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 08dbe3e6831c..e0107495c4de 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -24,7 +24,6 @@ extern struct clock_event_device decrementer_clockevent; extern void generic_calibrate_decr(void); -extern void hdec_interrupt(struct pt_regs *regs); /* Some sane defaults: 125 MHz timebase, 1GHz processor */ extern unsigned long ppc_proc_freq; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 7e056488d42a..f87dc4bf937d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1491,6 +1491,8 @@ EXC_COMMON_BEGIN(decrementer_common) INT_DEFINE_BEGIN(hdecrementer) IVEC=0x980 IHSRR=EXC_HV + ISTACK=0 + IRECONCILE=0 IKVM_REAL=1 IKVM_VIRT=1 INT_DEFINE_END(hdecrementer) @@ -1502,11 +1504,24 @@ EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80) GEN_INT_ENTRY hdecrementer, virt=1 EXC_VIRT_END(hdecrementer, 0x4980, 0x80) EXC_COMMON_BEGIN(hdecrementer_common) - GEN_COMMON hdecrementer - bl save_nvgprs - addir3,r1,STACK_FRAME_OVERHEAD - bl hdec_interrupt - b ret_from_except + __GEN_COMMON_ENTRY hdecrementer + /* +* Hypervisor decrementer interrupts not caught by the KVM test +* shouldn't occur but are sometimes left pending on exit from a KVM +* guest. We don't need to do anything to clear them, as they are +* edge-triggered. +* +* Be careful to avoid touching the kernel stack. +*/ + ld r10,PACA_EXGEN+EX_CTR(r13) + mtctr r10 + mtcrf 0x80,r9 + ld r9,PACA_EXGEN+EX_R9(r13) + ld r10,PACA_EXGEN+EX_R10(r13) + ld r11,PACA_EXGEN+EX_R11(r13) + ld r12,PACA_EXGEN+EX_R12(r13) + ld r13,PACA_EXGEN+EX_R13(r13) + HRFI_TO_KERNEL GEN_KVM hdecrementer diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 1168e8b37e30..bda9cb4a0a5f 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -663,15 +663,6 @@ void timer_broadcast_interrupt(void) } #endif -/* - * Hypervisor decrementer interrupts shouldn't occur but are sometimes - * left pending on exit from a KVM guest. We don't need to do anything - * to clear them, as they are edge-triggered. - */ -void hdec_interrupt(struct pt_regs *regs) -{ -} - #ifdef CONFIG_SUSPEND static void generic_suspend_disable_irqs(void) { -- 2.23.0
[PATCH v3 17/32] powerpc/64s/exception: re-inline some handlers
The reduction in interrupt entry size allows some handlers to be re-inlined. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index f87dc4bf937d..ae0e68899f0e 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1186,7 +1186,7 @@ INT_DEFINE_BEGIN(data_access) INT_DEFINE_END(data_access) EXC_REAL_BEGIN(data_access, 0x300, 0x80) - GEN_INT_ENTRY data_access, virt=0, ool=1 + GEN_INT_ENTRY data_access, virt=0 EXC_REAL_END(data_access, 0x300, 0x80) EXC_VIRT_BEGIN(data_access, 0x4300, 0x80) GEN_INT_ENTRY data_access, virt=1 @@ -1216,7 +1216,7 @@ INT_DEFINE_BEGIN(data_access_slb) INT_DEFINE_END(data_access_slb) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) - GEN_INT_ENTRY data_access_slb, virt=0, ool=1 + GEN_INT_ENTRY data_access_slb, virt=0 EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) GEN_INT_ENTRY data_access_slb, virt=1 @@ -1472,7 +1472,7 @@ INT_DEFINE_BEGIN(decrementer) INT_DEFINE_END(decrementer) EXC_REAL_BEGIN(decrementer, 0x900, 0x80) - GEN_INT_ENTRY decrementer, virt=0, ool=1 + GEN_INT_ENTRY decrementer, virt=0 EXC_REAL_END(decrementer, 0x900, 0x80) EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80) GEN_INT_ENTRY decrementer, virt=1 -- 2.23.0
[PATCH v3 18/32] powerpc/64s/exception: Clean up SRR specifiers
Remove more magic numbers and replace with nicely named bools. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 68 +--- 1 file changed, 32 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index ae0e68899f0e..b01ff51892dc 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -105,11 +105,6 @@ name: ori reg,reg,(ABS_ADDR(label))@l;\ addis reg,reg,(ABS_ADDR(label))@h -/* Exception register prefixes */ -#define EXC_HV_OR_STD 2 /* depends on HVMODE */ -#define EXC_HV 1 -#define EXC_STD0 - /* * Branch to label using its 0xC000 address. This results in instruction * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned @@ -128,6 +123,7 @@ name: */ #define IVEC .L_IVEC_\name\() #define IHSRR .L_IHSRR_\name\() +#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\() #define IAREA .L_IAREA_\name\() #define IVIRT .L_IVIRT_\name\() #define IISIDE .L_IISIDE_\name\() @@ -159,7 +155,10 @@ do_define_int n .error "IVEC not defined" .endif .ifndef IHSRR - IHSRR=EXC_STD + IHSRR=0 + .endif + .ifndef IHSRR_IF_HVMODE + IHSRR_IF_HVMODE=0 .endif .ifndef IAREA IAREA=PACA_EXGEN @@ -257,7 +256,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r9,IAREA+EX_R9(r13) ld r10,IAREA+EX_R10(r13) /* HSRR variants have the 0x2 bit added to their trap number */ - .if IHSRR == EXC_HV_OR_STD + .if IHSRR_IF_HVMODE BEGIN_FTR_SECTION ori r12,r12,(IVEC + 0x2) FTR_SECTION_ELSE @@ -278,7 +277,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r10,IAREA+EX_R10(r13) ld r11,IAREA+EX_R11(r13) ld r12,IAREA+EX_R12(r13) - .if IHSRR == EXC_HV_OR_STD + .if IHSRR_IF_HVMODE BEGIN_FTR_SECTION b kvmppc_skip_Hinterrupt FTR_SECTION_ELSE @@ -403,7 +402,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) stw r10,IAREA+EX_DSISR(r13) .endif - .if IHSRR == EXC_HV_OR_STD + .if IHSRR_IF_HVMODE BEGIN_FTR_SECTION mfspr r11,SPRN_HSRR0 /* save HSRR0 */ mfspr r12,SPRN_HSRR1 /* and HSRR1 */ @@ -485,7 +484,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt) .abort "Bad maskable vector" .endif - .if IHSRR == EXC_HV_OR_STD + .if IHSRR_IF_HVMODE BEGIN_FTR_SECTION bne masked_Hinterrupt FTR_SECTION_ELSE @@ -618,12 +617,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) * Restore all registers including H/SRR0/1 saved in a stack frame of a * standard exception. */ -.macro EXCEPTION_RESTORE_REGS hsrr +.macro EXCEPTION_RESTORE_REGS hsrr=0 /* Move original SRR0 and SRR1 into the respective regs */ ld r9,_MSR(r1) - .if \hsrr == EXC_HV_OR_STD - .error "EXC_HV_OR_STD Not implemented for EXCEPTION_RESTORE_REGS" - .endif .if \hsrr mtspr SPRN_HSRR1,r9 .else @@ -898,7 +894,7 @@ EXC_COMMON_BEGIN(system_reset_common) ld r10,SOFTE(r1) stb r10,PACAIRQSOFTMASK(r13) - EXCEPTION_RESTORE_REGS EXC_STD + EXCEPTION_RESTORE_REGS RFI_TO_USER_OR_KERNEL GEN_KVM system_reset @@ -952,7 +948,7 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi) lhz r12,PACA_IN_MCE(r13); \ subir12,r12,1; \ sth r12,PACA_IN_MCE(r13); \ - EXCEPTION_RESTORE_REGS EXC_STD + EXCEPTION_RESTORE_REGS EXC_COMMON_BEGIN(machine_check_early_common) /* @@ -1321,7 +1317,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) INT_DEFINE_BEGIN(hardware_interrupt) IVEC=0x500 - IHSRR=EXC_HV_OR_STD + IHSRR_IF_HVMODE=1 IMASK=IRQS_DISABLED IKVM_REAL=1 IKVM_VIRT=1 @@ -1490,7 +1486,7 @@ EXC_COMMON_BEGIN(decrementer_common) INT_DEFINE_BEGIN(hdecrementer) IVEC=0x980 - IHSRR=EXC_HV + IHSRR=1 ISTACK=0 IRECONCILE=0 IKVM_REAL=1 @@ -1719,7 +1715,7 @@ EXC_COMMON_BEGIN(single_step_common) INT_DEFINE_BEGIN(h_data_storage) IVEC=0xe00 - IHSRR=EXC_HV + IHSRR=1 IDAR=1 IDSISR=1 IKVM_SKIP=1 @@ -1751,7 +1747,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX) INT_DEFINE_BEGIN(h_instr_storage) IVEC=0xe20 - IHSRR=EXC_HV + IHSRR=1 IKVM_REAL=1 IKVM_VIRT=1 INT_DEFINE_END(h_instr_storage) @@ -1774,7 +1770,7 @@ EXC_COMMON_BEGIN(h_instr_storage_common) INT_DEFINE_BEGIN(emulation_assist) IVEC=0xe40 - IHSR
[PATCH v3 19/32] powerpc/64s/exception: add more comments for interrupt handlers
A few of the non-standard handlers are left uncommented. Some more description could be added to some. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 391 --- 1 file changed, 353 insertions(+), 38 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b01ff51892dc..e976cbf4f4aa 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -121,26 +121,26 @@ name: /* * Interrupt code generation macros */ -#define IVEC .L_IVEC_\name\() -#define IHSRR .L_IHSRR_\name\() -#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\() -#define IAREA .L_IAREA_\name\() -#define IVIRT .L_IVIRT_\name\() -#define IISIDE .L_IISIDE_\name\() -#define IDAR .L_IDAR_\name\() -#define IDSISR .L_IDSISR_\name\() -#define ISET_RI.L_ISET_RI_\name\() -#define IBRANCH_TO_COMMON .L_IBRANCH_TO_COMMON_\name\() -#define IREALMODE_COMMON .L_IREALMODE_COMMON_\name\() -#define IMASK .L_IMASK_\name\() -#define IKVM_SKIP .L_IKVM_SKIP_\name\() -#define IKVM_REAL .L_IKVM_REAL_\name\() +#define IVEC .L_IVEC_\name\()/* Interrupt vector address */ +#define IHSRR .L_IHSRR_\name\() /* Sets SRR or HSRR registers */ +#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\() /* HSRR if HV else SRR */ +#define IAREA .L_IAREA_\name\() /* PACA save area */ +#define IVIRT .L_IVIRT_\name\() /* Has virt mode entry point */ +#define IISIDE .L_IISIDE_\name\() /* Uses SRR0/1 not DAR/DSISR */ +#define IDAR .L_IDAR_\name\()/* Uses DAR (or SRR0) */ +#define IDSISR .L_IDSISR_\name\() /* Uses DSISR (or SRR1) */ +#define ISET_RI.L_ISET_RI_\name\() /* Run common code w/ MSR[RI]=1 */ +#define IBRANCH_TO_COMMON .L_IBRANCH_TO_COMMON_\name\() /* ENTRY branch to common */ +#define IREALMODE_COMMON .L_IREALMODE_COMMON_\name\() /* Common runs in realmode */ +#define IMASK .L_IMASK_\name\() /* IRQ soft-mask bit */ +#define IKVM_SKIP .L_IKVM_SKIP_\name\() /* Generate KVM skip handler */ +#define IKVM_REAL .L_IKVM_REAL_\name\() /* Real entry tests KVM */ #define __IKVM_REAL(name) .L_IKVM_REAL_ ## name -#define IKVM_VIRT .L_IKVM_VIRT_\name\() -#define ISTACK .L_ISTACK_\name\() +#define IKVM_VIRT .L_IKVM_VIRT_\name\() /* Virt entry tests KVM */ +#define ISTACK .L_ISTACK_\name\() /* Set regular kernel stack */ #define __ISTACK(name) .L_ISTACK_ ## name -#define IRECONCILE .L_IRECONCILE_\name\() -#define IKUAP .L_IKUAP_\name\() +#define IRECONCILE .L_IRECONCILE_\name\() /* Do RECONCILE_IRQ_STATE */ +#define IKUAP .L_IKUAP_\name\() /* Do KUAP lock */ #define INT_DEFINE_BEGIN(n)\ .macro int_define_ ## n name @@ -759,6 +759,39 @@ __start_interrupts: EXC_VIRT_NONE(0x4000, 0x100) +/** + * Interrupt 0x100 - System Reset Interrupt (SRESET aka NMI). + * This is a non-maskable, asynchronous interrupt always taken in real-mode. + * It is caused by: + * - Wake from power-saving state, on powernv. + * - An NMI from another CPU, triggered by firmware or hypercall. + * - As crash/debug signal injected from BMC, firmware or hypervisor. + * + * Handling: + * Power-save wakeup is the only performance critical path, so this is + * determined quickly as possible first. In this case volatile registers + * can be discarded and SPRs like CFAR don't need to be read. + * + * If not a powersave wakeup, then it's run as a regular interrupt, however + * it uses its own stack and PACA save area to preserve the regular kernel + * environment for debugging. + * + * This interrupt is not maskable, so triggering it when MSR[RI] is clear, + * or SCRATCH0 is in use, etc. may cause a crash. It's also not entirely + * correct to switch to virtual mode to run the regular interrupt handler + * because it might be interrupted when the MMU is in a bad state (e.g., SLB + * is clear). + * + * FWNMI: + * PAPR specifies a "fwnmi" facility which sends the sreset to a different + * entry point with a different register set up. Some hypervisors will + * send the sreset to 0x100 in the guest if it is not fwnmi capable. + * + * KVM: + * Unlike most SRR interrupts, this may be taken by the host while executing + * in a guest, so a KVM test is required. KVM will pull the CPU out of guest + * mode and then raise the sreset. + */ INT_DEFINE_BEGIN(system_reset) IVEC=0x100 IAREA=PACA_EXNMI @@ -834,6 +867,7 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake) * Vectors for the FWNMI option. Share common code. */ TRAMP_REAL_BEGIN(system_reset_fwnmi) + /* XXX: fwnmi guest could run a nested/PR guest, so why no test? */ __IKVM_REAL(system_reset)=0 GEN_INT_ENTR
[PATCH v3 20/32] powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is supported
Apart from SRESET, MCE, and syscall (hcall variant), the SRR type interrupts are not escalated to hypervisor mode, so delivered to the OS. When running PR KVM, the OS is the hypervisor, and the guest runs with MSR[PR]=1, so these interrupts must test if a guest was running when interrupted. These tests are required at the real-mode entry points because the PR KVM host runs with LPCR[AIL]=0. In HV KVM and nested HV KVM, the guest always receives these interrupts, so there is no need for the host to make this test. So remove the tests if PR KVM is not configured. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 65 ++-- 1 file changed, 62 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e976cbf4f4aa..c23eb9c572b2 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -214,9 +214,36 @@ do_define_int n #ifdef CONFIG_KVM_BOOK3S_64_HANDLER #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* - * If hv is possible, interrupts come into to the hv version - * of the kvmppc_interrupt code, which then jumps to the PR handler, - * kvmppc_interrupt_pr, if the guest is a PR guest. + * All interrupts which set HSRR registers, as well as SRESET and MCE and + * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken, + * so they all generally need to test whether they were taken in guest context. + * + * Note: SRESET and MCE may also be sent to the guest by the hypervisor, and be + * taken with MSR[HV]=0. + * + * Interrupts which set SRR registers (with the above exceptions) do not + * elevate to MSR[HV]=1 mode, though most can be taken when running with + * MSR[HV]=1 (e.g., bare metal kernel and userspace). So these interrupts do + * not need to test whether a guest is running because they get delivered to + * the guest directly, including nested HV KVM guests. + * + * The exception is PR KVM, where the guest runs with MSR[PR]=1 and the host + * runs with MSR[HV]=0, so the host takes all interrupts on behalf of the + * guest. PR KVM runs with LPCR[AIL]=0 which causes interrupts to always be + * delivered to the real-mode entry point, therefore such interrupts only test + * KVM in their real mode handlers, and only when PR KVM is possible. + * + * Interrupts that are taken in MSR[HV]=0 and escalate to MSR[HV]=1 are always + * delivered in real-mode when the MMU is in hash mode because the MMU + * registers are not set appropriately to translate host addresses. In nested + * radix mode these can be delivered in virt-mode as the host translations are + * used implicitly (see: effective LPID, effective PID). + */ + +/* + * If an interrupt is taken while a guest is running, it is immediately routed + * to KVM to handle. If both HV and PR KVM arepossible, KVM interrupts go first + * to kvmppc_interrupt_hv, which handles the PR guest case. */ #define kvmppc_interrupt kvmppc_interrupt_hv #else @@ -1258,8 +1285,10 @@ INT_DEFINE_BEGIN(data_access) IVEC=0x300 IDAR=1 IDSISR=1 +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_SKIP=1 IKVM_REAL=1 +#endif INT_DEFINE_END(data_access) EXC_REAL_BEGIN(data_access, 0x300, 0x80) @@ -1306,8 +1335,10 @@ INT_DEFINE_BEGIN(data_access_slb) IAREA=PACA_EXSLB IRECONCILE=0 IDAR=1 +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_SKIP=1 IKVM_REAL=1 +#endif INT_DEFINE_END(data_access_slb) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) @@ -1357,7 +1388,9 @@ INT_DEFINE_BEGIN(instruction_access) IISIDE=1 IDAR=1 IDSISR=1 +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 +#endif INT_DEFINE_END(instruction_access) EXC_REAL_BEGIN(instruction_access, 0x400, 0x80) @@ -1396,7 +1429,9 @@ INT_DEFINE_BEGIN(instruction_access_slb) IRECONCILE=0 IISIDE=1 IDAR=1 +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 +#endif INT_DEFINE_END(instruction_access_slb) EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80) @@ -1488,7 +1523,9 @@ INT_DEFINE_BEGIN(alignment) IVEC=0x600 IDAR=1 IDSISR=1 +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 +#endif INT_DEFINE_END(alignment) EXC_REAL_BEGIN(alignment, 0x600, 0x100) @@ -1518,7 +1555,9 @@ EXC_COMMON_BEGIN(alignment_common) */ INT_DEFINE_BEGIN(program_check) IVEC=0x700 +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 +#endif INT_DEFINE_END(program_check) EXC_REAL_BEGIN(program_check, 0x700, 0x100) @@ -1581,7 +1620,9 @@ EXC_COMMON_BEGIN(program_check_common) INT_DEFINE_BEGIN(fp_unavailable) IVEC=0x800 IRECONCILE=0 +#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE IKVM_REAL=1 +#endif INT_DEFINE_END(fp_unavailable) EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100) @@ -1643,7 +1684,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM) INT_DEFINE_BEGIN(decrementer) IVEC=0x900 IMASK=IRQS_DISAB
[PATCH v3 21/32] powerpc/64s/exception: sreset interrupts reconcile fix
This adds IRQ_HARD_DIS to irq_happened. Although it doesn't seem to matter much because we're not allowed to enable irqs in an NMI handler, the soft-irq debugging code is becoming more strict about ensuring IRQ_HARD_DIS is in sync with MSR[EE], this may help avoid asserts or other issues. Add a comments explaining why MCE does not have this. Early machine check is generally much smaller and more contained code which will explode if you look at it wrong anyway as it runs in real mode, though there's an argument that we should do similar reconciling for the MCE as well. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index c23eb9c572b2..6ff5ea236b17 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -920,18 +920,19 @@ EXC_COMMON_BEGIN(system_reset_common) __GEN_COMMON_BODY system_reset bl save_nvgprs /* -* Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does +* Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does * the right thing. We do not want to reconcile because that goes * through irq tracing which we don't want in NMI. * -* Save PACAIRQHAPPENED because some code will do a hard disable -* (e.g., xmon). So we want to restore this back to where it was -* when we return. DAR is unused in the stack, so save it there. +* Save PACAIRQHAPPENED to _DAR (otherwise unused), and set HARD_DIS +* as we are running with MSR[EE]=0. */ li r10,IRQS_ALL_DISABLED stb r10,PACAIRQSOFTMASK(r13) lbz r10,PACAIRQHAPPENED(r13) std r10,_DAR(r1) + ori r10,r10,PACA_IRQ_HARD_DIS + stb r10,PACAIRQHAPPENED(r13) addir3,r1,STACK_FRAME_OVERHEAD bl system_reset_exception @@ -976,6 +977,11 @@ EXC_COMMON_BEGIN(system_reset_common) * error detected there), determines if it was recoverable and logs the * event. * + * This early code does not "reconcile" irq soft-mask state like SRESET or + * regular interrupts do, so irqs_disabled() among other things may not work + * properly (irq disable/enable already doesn't work because irq tracing can + * not work in real mode). + * * Then, depending on the execution context when the interrupt is taken, there * are 3 main actions: * - Executing in kernel mode. The event is queued with irq_work, which means -- 2.23.0
[PATCH v3 22/32] powerpc/64s/exception: soft nmi interrupt should not use ret_from_except
The soft nmi handler does not reconcile interrupt state, so it should not return via the normal ret_from_except path. Return like other NMIs, using the EXCEPTION_RESTORE_REGS macro. This becomes important when the scv interrupt is implemented, which must handle soft-masked interrupts that have r13 set to something other than the PACA -- returning to kernel in this case must restore r13. Signed-off-by: Nicholas Piggin --- v3: - save/restore irq soft mask state like other NMIs rather than a normal reconcile, to avoid soft mask warnings or possibly worse. arch/powerpc/kernel/exceptions-64s.S | 29 +++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 6ff5ea236b17..5ddfc32cacad 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -2713,6 +2713,7 @@ EXC_VIRT_NONE(0x5800, 0x100) INT_DEFINE_BEGIN(soft_nmi) IVEC=0x900 ISTACK=0 + IRECONCILE=0/* Soft-NMI may fire under local_irq_disable */ INT_DEFINE_END(soft_nmi) /* @@ -2731,9 +2732,35 @@ EXC_COMMON_BEGIN(soft_nmi_common) subir1,r1,INT_FRAME_SIZE __GEN_COMMON_BODY soft_nmi bl save_nvgprs + + /* +* Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see +* system_reset_common) +*/ + li r10,IRQS_ALL_DISABLED + stb r10,PACAIRQSOFTMASK(r13) + lbz r10,PACAIRQHAPPENED(r13) + std r10,_DAR(r1) + ori r10,r10,PACA_IRQ_HARD_DIS + stb r10,PACAIRQHAPPENED(r13) + addir3,r1,STACK_FRAME_OVERHEAD bl soft_nmi_interrupt - b ret_from_except + + /* Clear MSR_RI before setting SRR0 and SRR1. */ + li r9,0 + mtmsrd r9,1 + + /* +* Restore soft mask settings. +*/ + ld r10,_DAR(r1) + stb r10,PACAIRQHAPPENED(r13) + ld r10,SOFTE(r1) + stb r10,PACAIRQSOFTMASK(r13) + + EXCEPTION_RESTORE_REGS hsrr=0 + RFI_TO_KERNEL #endif /* CONFIG_PPC_WATCHDOG */ -- 2.23.0
[PATCH v3 23/32] powerpc/64: system call remove non-volatile GPR save optimisation
powerpc has an optimisation where interrupts avoid saving the non-volatile (or callee saved) registers to the interrupt stack frame if they are not required. Two problems with this are that an interrupt does not always know whether it will need non-volatiles; and if it does need them, they can only be saved from the entry-scoped asm code (because we don't control what the C compiler does with these registers). system calls are the most difficult: some system calls always require all registers (e.g., fork, to copy regs into the child). Sometimes registers are only required under certain conditions (e.g., tracing, signal delivery). These cases require ugly logic in the call chains (e.g., ppc_fork), and require a lot of logic to be implemented in asm. So remove the optimisation for system calls, and always save NVGPRs on entry. Modern high performance CPUs are not so sensitive, because the stores are dense in cache and can be hidden by other expensive work in the syscall path -- the null syscall selftests benchmark on POWER9 is not slowed (124.40ns before and 123.64ns after, i.e., within the noise). Other interrupts retain the NVGPR optimisation for now. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/entry_64.S | 72 +--- arch/powerpc/kernel/syscalls/syscall.tbl | 22 +--- 2 files changed, 28 insertions(+), 66 deletions(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 6ba675b0cf7d..14afe12eae8c 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -98,13 +98,14 @@ END_BTB_FLUSH_SECTION std r11,_XER(r1) std r11,_CTR(r1) std r9,GPR13(r1) + SAVE_NVGPRS(r1) mflrr10 /* * This clears CR0.SO (bit 28), which is the error indication on * return from this system call. */ rldimi r2,r11,28,(63-28) - li r11,0xc01 + li r11,0xc00 std r10,_LINK(r1) std r11,_TRAP(r1) std r3,ORIG_GPR3(r1) @@ -323,7 +324,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) /* Traced system call support */ .Lsyscall_dotrace: - bl save_nvgprs addir3,r1,STACK_FRAME_OVERHEAD bl do_syscall_trace_enter @@ -408,7 +408,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) mtmsrd r10,1 #endif /* CONFIG_PPC_BOOK3E */ - bl save_nvgprs addir3,r1,STACK_FRAME_OVERHEAD bl do_syscall_trace_leave b ret_from_except @@ -442,62 +441,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) _ASM_NOKPROBE_SYMBOL(system_call_common); _ASM_NOKPROBE_SYMBOL(system_call_exit); -/* Save non-volatile GPRs, if not already saved. */ -_GLOBAL(save_nvgprs) - ld r11,_TRAP(r1) - andi. r0,r11,1 - beqlr- - SAVE_NVGPRS(r1) - clrrdi r0,r11,1 - std r0,_TRAP(r1) - blr -_ASM_NOKPROBE_SYMBOL(save_nvgprs); - - -/* - * The sigsuspend and rt_sigsuspend system calls can call do_signal - * and thus put the process into the stopped state where we might - * want to examine its user state with ptrace. Therefore we need - * to save all the nonvolatile registers (r14 - r31) before calling - * the C code. Similarly, fork, vfork and clone need the full - * register state on the stack so that it can be copied to the child. - */ - -_GLOBAL(ppc_fork) - bl save_nvgprs - bl sys_fork - b .Lsyscall_exit - -_GLOBAL(ppc_vfork) - bl save_nvgprs - bl sys_vfork - b .Lsyscall_exit - -_GLOBAL(ppc_clone) - bl save_nvgprs - bl sys_clone - b .Lsyscall_exit - -_GLOBAL(ppc_clone3) - bl save_nvgprs - bl sys_clone3 - b .Lsyscall_exit - -_GLOBAL(ppc32_swapcontext) - bl save_nvgprs - bl compat_sys_swapcontext - b .Lsyscall_exit - -_GLOBAL(ppc64_swapcontext) - bl save_nvgprs - bl sys_swapcontext - b .Lsyscall_exit - -_GLOBAL(ppc_switch_endian) - bl save_nvgprs - bl sys_switch_endian - b .Lsyscall_exit - _GLOBAL(ret_from_fork) bl schedule_tail REST_NVGPRS(r1) @@ -516,6 +459,17 @@ _GLOBAL(ret_from_kernel_thread) li r3,0 b .Lsyscall_exit +/* Save non-volatile GPRs, if not already saved. */ +_GLOBAL(save_nvgprs) + ld r11,_TRAP(r1) + andi. r0,r11,1 + beqlr- + SAVE_NVGPRS(r1) + clrrdi r0,r11,1 + std r0,_TRAP(r1) + blr +_ASM_NOKPROBE_SYMBOL(save_nvgprs); + #ifdef CONFIG_PPC_BOOK3S_64 #define FLUSH_COUNT_CACHE \ diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 35b61bfc1b1a..220ae11555f2 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -9,7 +9,9 @@ # 0 nospu restar
[PATCH v3 24/32] powerpc/64: sstep ifdef the deprecated fast endian switch syscall
Signed-off-by: Nicholas Piggin --- arch/powerpc/lib/sstep.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index c077acb983a1..5f3a7bd9d90d 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -3179,8 +3179,9 @@ int emulate_step(struct pt_regs *regs, unsigned int instr) * entry code works. If that is changed, this will * need to be changed also. */ - if (regs->gpr[0] == 0x1ebe && - cpu_has_feature(CPU_FTR_REAL_LE)) { + if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) && + cpu_has_feature(CPU_FTR_REAL_LE) && + regs->gpr[0] == 0x1ebe) { regs->msr ^= MSR_LE; goto instr_done; } -- 2.23.0
[PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C
System call entry and particularly exit code is beyond the limit of what is reasonable to implement in asm. This conversion moves all conditional branches out of the asm code, except for the case that all GPRs should be restored at exit. Null syscall test is about 5% faster after this patch, because the exit work is handled under local_irq_disable, and the hard mask and pending interrupt replay is handled after that, which avoids games with MSR. Signed-off-by: Nicholas Piggin Signed-off-by: Michal Suchanek --- v2,rebase (from Michal): - Add endian conversion for dtl_idx (ms) - Fix sparse warning about missing declaration (ms) - Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe) v3: Fixes thanks to reports from mpe and selftests errors: - Several soft-mask debug and unsafe smp_processor_id() warnings due to tracing and other false positives due to checks in "unreconciled" code. - Fix a bug with syscall tracing functions that set registers (e.g., PTRACE_SETREG) not setting GPRs properly. - Fix silly tabort_syscall bug that causes kernel crashes when making system calls in transactional state. arch/powerpc/include/asm/asm-prototypes.h | 17 +- .../powerpc/include/asm/book3s/64/kup-radix.h | 14 +- arch/powerpc/include/asm/cputime.h| 29 ++ arch/powerpc/include/asm/hw_irq.h | 4 + arch/powerpc/include/asm/ptrace.h | 3 + arch/powerpc/include/asm/signal.h | 3 + arch/powerpc/include/asm/switch_to.h | 5 + arch/powerpc/include/asm/time.h | 3 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/entry_64.S| 338 +++--- arch/powerpc/kernel/signal.h | 2 - arch/powerpc/kernel/syscall_64.c | 213 +++ arch/powerpc/kernel/systbl.S | 9 +- 13 files changed, 328 insertions(+), 315 deletions(-) create mode 100644 arch/powerpc/kernel/syscall_64.c diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 983c0084fb3f..4b3609554e76 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, unsigned long __init early_init(unsigned long dt_ptr); void __init machine_init(u64 dt_ptr); #endif +#ifdef CONFIG_PPC64 +long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs); +notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs); +notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr); +notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr); +#endif long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low, u32 len_high, u32 len_low); @@ -104,14 +110,6 @@ long sys_switch_endian(void); notrace unsigned int __check_irq_replay(void); void notrace restore_interrupts(void); -/* ptrace */ -long do_syscall_trace_enter(struct pt_regs *regs); -void do_syscall_trace_leave(struct pt_regs *regs); - -/* process */ -void restore_math(struct pt_regs *regs); -void restore_tm_state(struct pt_regs *regs); - /* prom_init (OpenFirmware) */ unsigned long __init prom_init(unsigned long r3, unsigned long r4, unsigned long pp, @@ -122,9 +120,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, void __init early_setup(unsigned long dt_ptr); void early_setup_secondary(void); -/* time */ -void accumulate_stolen_time(void); - /* misc runtime */ extern u64 __bswapdi2(u64); extern s64 __lshrdi3(s64, int); diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index 90dd3a3fc8c7..71081d90f999 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -3,6 +3,7 @@ #define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H #include +#include #define AMR_KUAP_BLOCK_READUL(0x4000) #define AMR_KUAP_BLOCK_WRITE UL(0x8000) @@ -56,7 +57,14 @@ #ifdef CONFIG_PPC_KUAP -#include +#include +#include + +static inline void kuap_check_amr(void) +{ + if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP)) + WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED); +} /* * We support individually allowing read or write, but we don't support nesting @@ -127,6 +135,10 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)), "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); } +#else /* CONFIG_PPC_KUAP */ +static inline void kuap_check_amr(void) +{ +} #endif /* C
[PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning
Kernel addresses and potentially other sensitive data could be leaked in volatile registers after a syscall. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/entry_64.S | 12 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 7404290fa132..0e2c56573a41 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -135,6 +135,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) cmpdi r3,0 bne .Lsyscall_restore_regs + li r0,0 + li r4,0 + li r5,0 + li r6,0 + li r7,0 + li r8,0 + li r9,0 + li r10,0 + li r11,0 + li r12,0 + mtctr r0 + mtspr SPRN_XER,r0 .Lsyscall_restore_regs_cont: BEGIN_FTR_SECTION -- 2.23.0
[PATCH v3 27/32] powerpc/64: implement soft interrupt replay in C
When local_irq_enable() finds a pending soft-masked interrupt, it "replays" it by setting up registers like the initial interrupt entry, then calls into the low level handler to set up an interrupt stack frame and process the interrupt. This is not necessary, and uses more stack than needed. The high level interrupt handler can be called directly from C, with just pt_regs set up on stack. This should be faster and use less stack. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/hw_irq.h| 1 - arch/powerpc/kernel/exceptions-64e.S | 32 -- arch/powerpc/kernel/exceptions-64s.S | 47 arch/powerpc/kernel/irq.c| 165 +-- 4 files changed, 130 insertions(+), 115 deletions(-) diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index 310583e62bd9..0e9a9598f91f 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -52,7 +52,6 @@ #ifndef __ASSEMBLY__ extern void replay_system_reset(void); -extern void __replay_interrupt(unsigned int vector); extern void timer_interrupt(struct pt_regs *); extern void timer_broadcast_interrupt(void); diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index e4076e3c072d..4efac5490216 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -1002,38 +1002,6 @@ masked_interrupt_book3e_0x280: masked_interrupt_book3e_0x2c0: masked_interrupt_book3e PACA_IRQ_DBELL 0 -/* - * Called from arch_local_irq_enable when an interrupt needs - * to be resent. r3 contains either 0x500,0x900,0x260 or 0x280 - * to indicate the kind of interrupt. MSR:EE is already off. - * We generate a stackframe like if a real interrupt had happened. - * - * Note: While MSR:EE is off, we need to make sure that _MSR - * in the generated frame has EE set to 1 or the exception - * handler will not properly re-enable them. - */ -_GLOBAL(__replay_interrupt) - /* We are going to jump to the exception common code which -* will retrieve various register values from the PACA which -* we don't give a damn about. -*/ - mflrr10 - mfmsr r11 - mfcrr4 - mtspr SPRN_SPRG_GEN_SCRATCH,r13; - std r1,PACA_EXGEN+EX_R1(r13); - stw r4,PACA_EXGEN+EX_CR(r13); - ori r11,r11,MSR_EE - subir1,r1,INT_FRAME_SIZE; - cmpwi cr0,r3,0x500 - beq exc_0x500_common - cmpwi cr0,r3,0x900 - beq exc_0x900_common - cmpwi cr0,r3,0x280 - beq exc_0x280_common - blr - - /* * This is called from 0x300 and 0x400 handlers after the prologs with * r14 and r15 containing the fault address and error code, with the diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 5ddfc32cacad..bad8cd9e7dba 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -3146,50 +3146,3 @@ doorbell_super_common_msgclr: LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36)) PPC_MSGCLRP(3) b doorbell_super_common - -/* - * Called from arch_local_irq_enable when an interrupt needs - * to be resent. r3 contains 0x500, 0x900, 0xa00 or 0xe80 to indicate - * which kind of interrupt. MSR:EE is already off. We generate a - * stackframe like if a real interrupt had happened. - * - * Note: While MSR:EE is off, we need to make sure that _MSR - * in the generated frame has EE set to 1 or the exception - * handler will not properly re-enable them. - * - * Note that we don't specify LR as the NIP (return address) for - * the interrupt because that would unbalance the return branch - * predictor. - */ -_GLOBAL(__replay_interrupt) - /* We are going to jump to the exception common code which -* will retrieve various register values from the PACA which -* we don't give a damn about, so we don't bother storing them. -*/ - mfmsr r12 - LOAD_REG_ADDR(r11, replay_interrupt_return) - mfcrr9 - ori r12,r12,MSR_EE - cmpwi r3,0x900 - beq decrementer_common - cmpwi r3,0x500 -BEGIN_FTR_SECTION - beq h_virt_irq_common -FTR_SECTION_ELSE - beq hardware_interrupt_common -ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_300) - cmpwi r3,0xf00 - beq performance_monitor_common -BEGIN_FTR_SECTION - cmpwi r3,0xa00 - beq h_doorbell_common_msgclr - cmpwi r3,0xe60 - beq hmi_exception_common -FTR_SECTION_ELSE - cmpwi r3,0xa00 - beq doorbell_super_common_msgclr -ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE) -replay_interrupt_return: - blr - -_ASM_NOKPROBE_SYMBOL(__replay_interrupt) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 5c9b11878555..afd74eba70aa 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.
[PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C
Implement the bulk of interrupt return logic in C. The asm return code must handle a few cases: restoring full GPRs, and emulating stack store. The stack store emulation is significantly simplfied, rather than creating a new return frame and switching to that before performing the store, it uses the PACA to keep a scratch register around to perform thestore. The asm return code is moved into 64e for now. The new logic has made allowance for 64e, but I don't have a full environment that works well to test it, and even booting in emulated qemu is not great for stress testing. 64e shouldn't be too far off working with this, given a bit more testing and auditing of the logic. This is slightly faster on a POWER9 (page fault speed increases about 1.1%), probably due to reduced mtmsrd. Signed-off-by: Nicholas Piggin Signed-off-by: Michal Suchanek --- v2,rebase (from Michal): - Move the FP restore functions to restore_math. They are not used anywhere else and when restore_math is not built gcc warns about them being unused (ms) - Add asm/context_tracking.h include to exceptions-64e.S for SCHEDULE_USER definition v3: - Fix return from interrupt replay problem by replaying interrupts rather than enabling irqs. This ends up being cleaner and __check_irq_replay goes away completely for 64s. Should bring 64e up to speed and kill a lot of cruft after it's proven on 64s. - Don't use _GLOBAL if it's not called from C - Simplify stack store emulation code further, add a bit more commenting. - Some missing no probe annotations .../powerpc/include/asm/book3s/64/kup-radix.h | 10 + arch/powerpc/include/asm/hw_irq.h | 1 + arch/powerpc/include/asm/switch_to.h | 6 + arch/powerpc/kernel/entry_64.S| 486 +- arch/powerpc/kernel/exceptions-64e.S | 255 - arch/powerpc/kernel/exceptions-64s.S | 119 ++--- arch/powerpc/kernel/irq.c | 36 +- arch/powerpc/kernel/process.c | 89 ++-- arch/powerpc/kernel/syscall_64.c | 164 +- arch/powerpc/kernel/vector.S | 2 +- 10 files changed, 642 insertions(+), 526 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index 71081d90f999..3bcef989a35d 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -60,6 +60,12 @@ #include #include +static inline void kuap_restore_amr(struct pt_regs *regs) +{ + if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) + mtspr(SPRN_AMR, regs->kuap); +} + static inline void kuap_check_amr(void) { if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP)) @@ -136,6 +142,10 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); } #else /* CONFIG_PPC_KUAP */ +static inline void kuap_restore_amr(struct pt_regs *regs) +{ +} + static inline void kuap_check_amr(void) { } diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index 0e9a9598f91f..e0e71777961f 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -52,6 +52,7 @@ #ifndef __ASSEMBLY__ extern void replay_system_reset(void); +extern void replay_soft_interrupts(void); extern void timer_interrupt(struct pt_regs *); extern void timer_broadcast_interrupt(void); diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 476008bc3d08..b867b58b1093 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -23,7 +23,13 @@ extern void switch_booke_debug_regs(struct debug_reg *new_debug); extern int emulate_altivec(struct pt_regs *); +#ifdef CONFIG_PPC_BOOK3S_64 void restore_math(struct pt_regs *regs); +#else +static inline void restore_math(struct pt_regs *regs) +{ +} +#endif void restore_tm_state(struct pt_regs *regs); diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 0e2c56573a41..e13eac968dfc 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -16,6 +16,7 @@ #include #include +#include #include #include #include @@ -221,6 +222,7 @@ _GLOBAL(ret_from_kernel_thread) li r3,0 b .Lsyscall_exit +#ifdef CONFIG_PPC_BOOK3E /* Save non-volatile GPRs, if not already saved. */ _GLOBAL(save_nvgprs) ld r11,_TRAP(r1) @@ -231,6 +233,7 @@ _GLOBAL(save_nvgprs) std r0,_TRAP(r1) blr _ASM_NOKPROBE_SYMBOL(save_nvgprs); +#endif #ifdef CONFIG_PPC_BOOK3S_64 @@ -294,7 +297,7 @@ flush_count_cache: * state of one is saved on its kernel stack. Then the state * of the other is restored from its kernel stack. The memory * management hardware is updated to the second process's state. -
[PATCH v3 29/32] powerpc/64s/exception: remove lite interrupt return
The difference between lite and regular returns is that the lite case restores all NVGPRs, whereas lite skips that. This is quite clumsy though, most interrupts want the NVGPRs saved for debugging, not to modify in the caller, so the NVGPRs restore is not necessary most of the time. Restore NVGPRs explicitly for one case that requires it, and move everything else over to avoiding the restore unless the interrupt return demands it (e.g., handling a signal). Signed-off-by: Nicholas Piggin --- v3: - Add a copule of missing restore cases for instruction emulation arch/powerpc/kernel/entry_64.S | 6 -- arch/powerpc/kernel/exceptions-64s.S | 24 ++-- 2 files changed, 14 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index e13eac968dfc..6d5464f83c05 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -471,12 +471,6 @@ _ASM_NOKPROBE_SYMBOL(fast_interrupt_return) .globl interrupt_return interrupt_return: _ASM_NOKPROBE_SYMBOL(interrupt_return) - REST_NVGPRS(r1) - - .balign IFETCH_ALIGN_BYTES - .globl interrupt_return_lite -interrupt_return_lite: -_ASM_NOKPROBE_SYMBOL(interrupt_return_lite) ld r4,_MSR(r1) andi. r0,r4,MSR_PR beq .Lkernel_interrupt_return diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index d635fd4e40ea..b53e452cbca0 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1513,7 +1513,7 @@ EXC_COMMON_BEGIN(hardware_interrupt_common) RUNLATCH_ON addir3,r1,STACK_FRAME_OVERHEAD bl do_IRQ - b interrupt_return_lite + b interrupt_return GEN_KVM hardware_interrupt @@ -1541,6 +1541,7 @@ EXC_COMMON_BEGIN(alignment_common) GEN_COMMON alignment addir3,r1,STACK_FRAME_OVERHEAD bl alignment_exception + REST_NVGPRS(r1) /* instruction emulation may change GPRs */ b interrupt_return GEN_KVM alignment @@ -1604,6 +1605,7 @@ EXC_COMMON_BEGIN(program_check_common) 3: addir3,r1,STACK_FRAME_OVERHEAD bl program_check_exception + REST_NVGPRS(r1) /* instruction emulation may change GPRs */ b interrupt_return GEN_KVM program_check @@ -1700,7 +1702,7 @@ EXC_COMMON_BEGIN(decrementer_common) RUNLATCH_ON addir3,r1,STACK_FRAME_OVERHEAD bl timer_interrupt - b interrupt_return_lite + b interrupt_return GEN_KVM decrementer @@ -1791,7 +1793,7 @@ EXC_COMMON_BEGIN(doorbell_super_common) #else bl unknown_exception #endif - b interrupt_return_lite + b interrupt_return GEN_KVM doorbell_super @@ -2060,6 +2062,7 @@ EXC_COMMON_BEGIN(emulation_assist_common) GEN_COMMON emulation_assist addir3,r1,STACK_FRAME_OVERHEAD bl emulation_assist_interrupt + REST_NVGPRS(r1) /* instruction emulation may change GPRs */ b interrupt_return GEN_KVM emulation_assist @@ -2176,7 +2179,7 @@ EXC_COMMON_BEGIN(h_doorbell_common) #else bl unknown_exception #endif - b interrupt_return_lite + b interrupt_return GEN_KVM h_doorbell @@ -2206,7 +2209,7 @@ EXC_COMMON_BEGIN(h_virt_irq_common) RUNLATCH_ON addir3,r1,STACK_FRAME_OVERHEAD bl do_IRQ - b interrupt_return_lite + b interrupt_return GEN_KVM h_virt_irq @@ -2253,7 +2256,7 @@ EXC_COMMON_BEGIN(performance_monitor_common) RUNLATCH_ON addir3,r1,STACK_FRAME_OVERHEAD bl performance_monitor_exception - b interrupt_return_lite + b interrupt_return GEN_KVM performance_monitor @@ -2650,6 +2653,7 @@ EXC_COMMON_BEGIN(altivec_assist_common) addir3,r1,STACK_FRAME_OVERHEAD #ifdef CONFIG_ALTIVEC bl altivec_assist_exception + REST_NVGPRS(r1) /* instruction emulation may change GPRs */ #else bl unknown_exception #endif @@ -3038,7 +3042,7 @@ do_hash_page: cmpdi r3,0/* see if __hash_page succeeded */ /* Success */ - beq interrupt_return_lite /* Return from exception on success */ + beq interrupt_return/* Return from exception on success */ /* Error */ blt-13f @@ -3055,7 +3059,7 @@ handle_page_fault: addir3,r1,STACK_FRAME_OVERHEAD bl do_page_fault cmpdi r3,0 - beq+interrupt_return_lite + beq+interrupt_return mr r5,r3 addir3,r1,STACK_FRAME_OVERHEAD ld r4,_DAR(r1) @@ -3070,9 +3074,9 @@ handle_dabr_fault: bl do_break /* * do_break() may have changed the NV GPRS while h
[PATCH v3 30/32] powerpc/64: system call reconcile interrupts
This reconciles interrupts in the system call case like all other interrupts. This allows system_call_common to be shared with the scv system call implementation in a subsequent patch. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/entry_64.S | 11 +++ arch/powerpc/kernel/syscall_64.c | 28 +--- 2 files changed, 24 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 6d5464f83c05..8406812c9734 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -113,6 +113,17 @@ END_BTB_FLUSH_SECTION ld r11,exception_marker@toc(r2) std r11,-16(r10)/* "regshere" marker */ + /* +* RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which +* would clobber syscall parameters. Also we always enter with IRQs +* enabled and nothing pending. system_call_exception() will call +* trace_hardirqs_off(). +*/ + li r11,IRQS_ALL_DISABLED + li r12,PACA_IRQ_HARD_DIS + stb r11,PACAIRQSOFTMASK(r13) + stb r12,PACAIRQHAPPENED(r13) + /* Calling convention has r9 = orig r0, r10 = regs */ mr r9,r0 bl system_call_exception diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index 08e0bebbd3b6..32601a572ff0 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -19,13 +19,19 @@ extern void __noreturn tabort_syscall(unsigned long nip, unsigned long msr); typedef long (*syscall_fn)(long, long, long, long, long, long); -/* Has to run notrace because it is entered "unreconciled" */ -notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, - unsigned long r0, struct pt_regs *regs) +/* Has to run notrace because it is entered not completely "reconciled" */ +notrace long system_call_exception(long r3, long r4, long r5, + long r6, long r7, long r8, + unsigned long r0, struct pt_regs *regs) { unsigned long ti_flags; syscall_fn f; + if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) + BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED); + + trace_hardirqs_off(); /* finish reconciling */ + if (IS_ENABLED(CONFIG_PPC_BOOK3S)) BUG_ON(!(regs->msr & MSR_RI)); BUG_ON(!(regs->msr & MSR_PR)); @@ -33,8 +39,10 @@ notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7, BUG_ON(regs->softe != IRQS_ENABLED); if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) && - unlikely(regs->msr & MSR_TS_T)) + unlikely(regs->msr & MSR_TS_T)) { + local_irq_enable(); tabort_syscall(regs->nip, regs->msr); + } account_cpu_user_entry(); @@ -50,16 +58,6 @@ notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7, kuap_check_amr(); - /* -* A syscall should always be called with interrupts enabled -* so we just unconditionally hard-enable here. When some kind -* of irq tracing is used, we additionally check that condition -* is correct -*/ - if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) { - WARN_ON(irq_soft_mask_return() != IRQS_ENABLED); - WARN_ON(local_paca->irq_happened); - } /* * This is not required for the syscall exit path, but makes the * stack frame look nicer. If this was initialised in the first stack @@ -68,7 +66,7 @@ notrace long system_call_exception(long r3, long r4, long r5, long r6, long r7, */ regs->softe = IRQS_ENABLED; - __hard_irq_enable(); + local_irq_enable(); ti_flags = current_thread_info()->flags; if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) { -- 2.23.0
[PATCH v3 31/32] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
The scv instruction causes an interrupt which can enter the kernel with MSR[EE]=1, thus allowing interrupts to hit at any time. These must not be taken as normal interrupts, because they come from MSR[PR]=0 context, and yet the kernel stack is not yet set up and r13 is not set to the PACA). Treat this as a soft-masked interrupt regardless of the soft masked state. This does not affect behaviour yet, because currently all interrupts are taken with MSR[EE]=0. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b53e452cbca0..7a6be3f32973 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -494,8 +494,24 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt) .macro __GEN_COMMON_BODY name .if IMASK + .if ! ISTACK + .error "No support for masked interrupt to use custom stack" + .endif + + /* If coming from user, skip soft-mask tests. */ + andi. r10,r12,MSR_PR + bne 2f + + /* Kernel code running below __end_interrupts is implicitly +* soft-masked */ + LOAD_HANDLER(r10, __end_interrupts) + cmpdr11,r10 + li r10,IMASK + blt-1f + + /* Test the soft mask state against our interrupt's bit */ lbz r10,PACAIRQSOFTMASK(r13) - andi. r10,r10,IMASK +1: andi. r10,r10,IMASK /* Associate vector numbers with bits in paca->irq_happened */ .if IVEC == 0x500 || IVEC == 0xea0 li r10,PACA_IRQ_EE @@ -526,7 +542,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt) .if ISTACK andi. r10,r12,MSR_PR /* See if coming from user */ - mr r10,r1 /* Save r1 */ +2: mr r10,r1 /* Save r1 */ subir1,r1,INT_FRAME_SIZE/* alloc frame on kernel stack */ beq-100f ld r1,PACAKSAVE(r13) /* kernel stack to use */ @@ -2791,7 +2807,8 @@ masked_interrupt: ld r10,PACA_EXGEN+EX_R10(r13) ld r11,PACA_EXGEN+EX_R11(r13) ld r12,PACA_EXGEN+EX_R12(r13) - /* returns to kernel where r13 must be set up, so don't restore it */ + ld r13,PACA_EXGEN+EX_R13(r13) + /* May return to masked low address where r13 is not set up */ .if \hsrr HRFI_TO_KERNEL .else @@ -2950,6 +2967,10 @@ EXC_COMMON_BEGIN(ppc64_runlatch_on_trampoline) USE_FIXED_SECTION(virt_trampolines) /* +* All code below __end_interrupts is treated as soft-masked. If +* any code runs here with MSR[EE]=1, it must then cope with pending +* soft interrupt being raised (i.e., by ensuring it is replayed). +* * The __end_interrupts marker must be past the out-of-line (OOL) * handlers, so that they are copied to real address 0x100 when running * a relocatable kernel. This ensures they can be reached from the short -- 2.23.0
[PATCH v3 32/32] powerpc/64s: system call support for scv/rfscv instructions
Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that lr is not preserved, and it is 64-bit only. There may yet be changes made to this ABI, so it's for testing only. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve lr. In a comparison of getpid syscall, the test program had scv taking about 3 more cycles in user mode (92 vs 89 for sc), due to lr handling. getpid syscall throughput on POWER9 is improved by 33%, mostly due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin --- Documentation/powerpc/syscall64-abi.rst | 42 +--- arch/powerpc/include/asm/asm-prototypes.h | 2 +- arch/powerpc/include/asm/exception-64s.h | 6 ++ arch/powerpc/include/asm/head-64.h| 2 +- arch/powerpc/include/asm/ppc_asm.h| 2 + arch/powerpc/include/asm/processor.h | 2 +- arch/powerpc/include/asm/setup.h | 4 +- arch/powerpc/kernel/cpu_setup_power.S | 2 +- arch/powerpc/kernel/cputable.c| 3 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 1 + arch/powerpc/kernel/entry_64.S| 114 + arch/powerpc/kernel/exceptions-64s.S | 119 +- arch/powerpc/kernel/setup_64.c| 5 +- arch/powerpc/kernel/syscall_64.c | 14 ++- arch/powerpc/platforms/pseries/setup.c| 8 +- 15 files changed, 295 insertions(+), 31 deletions(-) diff --git a/Documentation/powerpc/syscall64-abi.rst b/Documentation/powerpc/syscall64-abi.rst index e49f69f941b9..30c045e8726e 100644 --- a/Documentation/powerpc/syscall64-abi.rst +++ b/Documentation/powerpc/syscall64-abi.rst @@ -5,6 +5,15 @@ Power Architecture 64-bit Linux system call ABI syscall === +Invocation +-- +The syscall is made with the sc instruction, and returns with execution +continuing at the instruction following the sc instruction. + +If PPC_FEATURE2_SCV appears in the AT_HWCAP2 ELF auxiliary vector, the +scv 0 instruction is an alternative that may be used, with some differences +to calling sequence. + syscall calling sequence\ [1]_ matches the Power Architecture 64-bit ELF ABI specification C function calling sequence, including register preservation rules, with the following differences. @@ -12,16 +21,23 @@ rules, with the following differences. .. [1] Some syscalls (typically low-level management functions) may have different calling sequences (e.g., rt_sigreturn). -Parameters and return value +Parameters +-- The system call number is specified in r0. There is a maximum of 6 integer parameters to a syscall, passed in r3-r8. -Both a return value and a return error code are returned. cr0.SO is the return -error code, and r3 is the return value or error code. When cr0.SO is clear, -the syscall succeeded and r3 is the return value. When cr0.SO is set, the -syscall failed and r3 is the error code that generally corresponds to errno. +Return value + +- For the sc instruction, both a return value and a return error code are + returned. cr0.SO is the return error code, and r3 is the return value or + error code. When cr0.SO is clear, the syscall succeeded and r3 is the return + value. When cr0.SO is set, the syscall failed and r3 is the error code that + generally corresponds to errno. + +- For the scv 0 instruction, there is a return value indicates failure if it + is >= -MAX_ERRNO (-4095) as an unsigned comparison, in which case it is the + negated return error code. Otherwise it is the successful return value. Stack - @@ -34,22 +50,23 @@ Register preservation rules match the ELF ABI calling sequence with the following differences: === = +--- For the sc instruction --- r0 Volatile (System call number.) r3 Volatile (Parameter 1, and return value.) r4-r8 Volatile (Parameters 2-6.) -cr0 Volatile (cr0.SO is the return error condition) +cr0 Volatile (cr0.SO is the return error condition.) cr1, cr5-7 Nonvolatile lr Nonvolatile + +--- For the scv 0 instruction --- +r0 Volatile (System call number.) +r3 Volatile (Parameter 1, and return value.) +r4-r8 Volatile (Parameters 2-6.) === = All floating point and vector data registers as well as control and status registers are nonvolatile. -Invocation --- -The syscall is performed with the sc instruction, and returns with execution -continuing at the instruction following the sc instruction. - Transactional Memory Syscall behavior can change if the processor is in transactional or suspended @@ -75,6 +92,7 @@ auxil
[PATCH] macintosh: therm_windtunnel: fix regression when instantiating devices
Removing attach_adapter from this driver caused a regression for at least some machines. Those machines had the sensors described in their DT, too, so they didn't need manual creation of the sensor devices. The old code worked, though, because manual creation came first. Creation of DT devices then failed later and caused error logs, but the sensors worked nonetheless because of the manually created devices. When removing attach_adaper, manual creation now comes later and loses the race. The sensor devices were already registered via DT, yet with another binding, so the driver could not be bound to it. This fix refactors the code to remove the race and only manually creates devices if there are no DT nodes present. Also, the DT binding is updated to match both, the DT and manually created devices. Because we don't know which device creation will be used at runtime, the code to start the kthread is moved to do_probe() which will be called by both methods. Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter") Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723 Reported-by: Erhard Furtner Tested-by: Erhard Furtner Signed-off-by: Wolfram Sang --- I suggest this stable-tag: # v4.19+ Adding the Debian-PPC List to reach further people maybe willing to test. This patch does not depend on "[PATCH RESEND] macintosh: convert to i2c_new_scanned_device". In fact, this one here should go in first as 5.6 material. I will rebase and resend the i2c_new_scanned_device() conversion on top of this regression fix. I can also take this via I2C if easier. drivers/macintosh/therm_windtunnel.c | 52 +--- 1 file changed, 31 insertions(+), 21 deletions(-) diff --git a/drivers/macintosh/therm_windtunnel.c b/drivers/macintosh/therm_windtunnel.c index 8c744578122a..a0d87ed9da69 100644 --- a/drivers/macintosh/therm_windtunnel.c +++ b/drivers/macintosh/therm_windtunnel.c @@ -300,9 +300,11 @@ static int control_loop(void *dummy) /* i2c probing and setup */ // -static int -do_attach( struct i2c_adapter *adapter ) +static void do_attach(struct i2c_adapter *adapter) { + struct i2c_board_info info = { }; + struct device_node *np; + /* scan 0x48-0x4f (DS1775) and 0x2c-2x2f (ADM1030) */ static const unsigned short scan_ds1775[] = { 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, @@ -313,25 +315,24 @@ do_attach( struct i2c_adapter *adapter ) I2C_CLIENT_END }; - if( strncmp(adapter->name, "uni-n", 5) ) - return 0; - - if( !x.running ) { - struct i2c_board_info info; + if (x.running || strncmp(adapter->name, "uni-n", 5)) + return; - memset(&info, 0, sizeof(struct i2c_board_info)); - strlcpy(info.type, "therm_ds1775", I2C_NAME_SIZE); + np = of_find_compatible_node(adapter->dev.of_node, NULL, "MAC,ds1775"); + if (np) { + of_node_put(np); + } else { + strlcpy(info.type, "MAC,ds1775", I2C_NAME_SIZE); i2c_new_probed_device(adapter, &info, scan_ds1775, NULL); + } - strlcpy(info.type, "therm_adm1030", I2C_NAME_SIZE); + np = of_find_compatible_node(adapter->dev.of_node, NULL, "MAC,adm1030"); + if (np) { + of_node_put(np); + } else { + strlcpy(info.type, "MAC,adm1030", I2C_NAME_SIZE); i2c_new_probed_device(adapter, &info, scan_adm1030, NULL); - - if( x.thermostat && x.fan ) { - x.running = 1; - x.poll_task = kthread_run(control_loop, NULL, "g4fand"); - } } - return 0; } static int @@ -404,8 +405,8 @@ attach_thermostat( struct i2c_client *cl ) enum chip { ds1775, adm1030 }; static const struct i2c_device_id therm_windtunnel_id[] = { - { "therm_ds1775", ds1775 }, - { "therm_adm1030", adm1030 }, + { "MAC,ds1775", ds1775 }, + { "MAC,adm1030", adm1030 }, { } }; MODULE_DEVICE_TABLE(i2c, therm_windtunnel_id); @@ -414,6 +415,7 @@ static int do_probe(struct i2c_client *cl, const struct i2c_device_id *id) { struct i2c_adapter *adapter = cl->adapter; + int ret = 0; if( !i2c_check_functionality(adapter, I2C_FUNC_SMBUS_WORD_DATA | I2C_FUNC_SMBUS_WRITE_BYTE) ) @@ -421,11 +423,19 @@ do_probe(struct i2c_client *cl, const struct i2c_device_id *id) switch (id->driver_data) { case adm1030: - return attach_fan( cl ); + ret = attach_fan(cl); + break; case ds1775: - return attach_thermostat(cl); + ret = attach_thermostat(cl); + break; } - return 0; + + if (!x.running && x.th
[PATCH] i2c: powermac: correct comment about custom handling
The comment had some flaws which are now fixed: - the prefix is 'MAC' not 'AAPL' - no kernel coding style and too short length - 'we do' instead of 'we to' Signed-off-by: Wolfram Sang --- drivers/i2c/busses/i2c-powermac.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/i2c/busses/i2c-powermac.c b/drivers/i2c/busses/i2c-powermac.c index 973e5339033c..d565714c1f13 100644 --- a/drivers/i2c/busses/i2c-powermac.c +++ b/drivers/i2c/busses/i2c-powermac.c @@ -279,14 +279,13 @@ static bool i2c_powermac_get_type(struct i2c_adapter *adap, { char tmp[16]; - /* Note: we to _NOT_ want the standard -* i2c drivers to match with any of our powermac stuff -* unless they have been specifically modified to handle -* it on a case by case basis. For example, for thermal -* control, things like lm75 etc... shall match with their -* corresponding windfarm drivers, _NOT_ the generic ones, -* so we force a prefix of AAPL, onto the modalias to -* make that happen + /* +* Note: we do _NOT_ want the standard i2c drivers to match with any of +* our powermac stuff unless they have been specifically modified to +* handle it on a case by case basis. For example, for thermal control, +* things like lm75 etc... shall match with their corresponding +* windfarm drivers, _NOT_ the generic ones, so we force a prefix of +* 'MAC', onto the modalias to make that happen */ /* First try proper modalias */ -- 2.20.1
Re: [PATCH] macintosh: therm_windtunnel: fix regression when instantiating devices
On Tue, Feb 25, 2020 at 03:41:22PM +0100, John Paul Adrian Glaubitz wrote: > Hello! > > On 2/25/20 3:12 PM, Wolfram Sang wrote: > > Adding the Debian-PPC List to reach further people maybe willing to > > test. > > This might be related [1]. IIUC, this is the same as https://bugzilla.kernel.org/show_bug.cgi?id=199471. I don't think my patch helps here. signature.asc Description: PGP signature
Re: [PATCH] evh_bytechan: fix out of bounds accesses
Hi Laurentiu, On Tue, 25 Feb 2020 11:54:17 +0200 Laurentiu Tudor wrote: > > On 21.02.2020 01:57, Stephen Rothwell wrote: > > > > On Thu, 16 Jan 2020 11:37:14 +1100 Stephen Rothwell > > wrote: > >> > >> On Wed, 15 Jan 2020 14:01:35 -0600 Scott Wood wrote: > >>> > >>> On Thu, 2020-01-16 at 06:42 +1100, Stephen Rothwell wrote: > > On Wed, 15 Jan 2020 07:25:45 -0600 Timur Tabi wrote: > > On 1/14/20 12:31 AM, Stephen Rothwell wrote: > >> +/** > >> + * ev_byte_channel_send - send characters to a byte stream > >> + * @handle: byte stream handle > >> + * @count: (input) num of chars to send, (output) num chars sent > >> + * @bp: pointer to chars to send > >> + * > >> + * Returns 0 for success, or an error code. > >> + */ > >> +static unsigned int ev_byte_channel_send(unsigned int handle, > >> + unsigned int *count, const char *bp) > > > > Well, now you've moved this into the .c file and it is no longer > > available to other callers. Anything wrong with keeping it in the .h > > file? > > There are currently no other callers - are there likely to be in the > future? Even if there are, is it time critical enough that it needs to > be inlined everywhere? > >>> > >>> It's not performance critical and there aren't likely to be other users -- > >>> just a matter of what's cleaner. FWIW I'd rather see the original patch, > >>> that keeps the raw asm hcall stuff as simple wrappers in one place. > >> > >> And I don't mind either way :-) > >> > >> I just want to get rid of the warnings. > > > > Any progress with this? > > I think that the consensus was to pick up the original patch that is, > this one: https://patchwork.ozlabs.org/patch/1220186/ > > I've tested it too, so please feel free to add a: > > Tested-by: Laurentiu Tudor So, whose tree should his go via? -- Cheers, Stephen Rothwell pgp8HKriNlqei.pgp Description: OpenPGP digital signature
Re: [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning
Hi! On Wed, Feb 26, 2020 at 03:35:35AM +1000, Nicholas Piggin wrote: > Kernel addresses and potentially other sensitive data could be leaked > in volatile registers after a syscall. > cmpdi r3,0 > bne .Lsyscall_restore_regs > + li r0,0 > + li r4,0 > + li r5,0 > + li r6,0 > + li r7,0 > + li r8,0 > + li r9,0 > + li r10,0 > + li r11,0 > + li r12,0 > + mtctr r0 > + mtspr SPRN_XER,r0 > .Lsyscall_restore_regs_cont: What about LR? Is that taken care of later? This also deserves a big fat comment imo, it is very important after all, and not so obvious. Segher
Re: MCE handler gets NIP wrong on MPC8378
On 02/20/2020 at 12:48 PM Christophe Leroy wrote: > Le 20/02/2020 à 18:34, Radu Rendec a écrit : > > On 02/20/2020 at 11:25 AM Christophe Leroy wrote: > >> Le 20/02/2020 à 17:02, Radu Rendec a écrit : > >>> On 02/20/2020 at 3:38 AM Christophe Leroy wrote: > On 02/19/2020 10:39 PM, Radu Rendec wrote: > > On 02/19/2020 at 4:21 PM Christophe Leroy > > wrote: > >>> Interesting. > >>> > >>> 0x900 is the adress of the timer interrupt. > >>> > >>> Would the MCE occur just after the timer interrupt ? > > > > I doubt that. I'm using a small test module to artificially trigger the > > MCE. Basically it's just this (the full code is in my original post): > > > >bad_addr_base = ioremap(0xf000, 0x100); > >x = ioread32(bad_addr_base); > > > > I find it hard to believe that every time I load the module the lwbrx > > instruction that triggers the MCE is executed exactly after the timer > > interrupt (or that the timer interrupt always occurs close to the lwbrx > > instruction). > > Can you try to see how much time there is between your read and the MCE ? > The below should allow it, you'll see first value in r13 and the other > in r14 (mce.c is your test code) > > Also provide the timebase frequency as reported in /proc/cpuinfo > >>> > >>> I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c. > >>> > >>> # cat /proc/cpuinfo > >>> processor : 0 > >>> cpu : e300c4 > >>> clock : 800.04MHz > >>> revision: 1.1 (pvr 8086 1011) > >>> bogomips: 200.00 > >>> timebase: 1 > >>> > >>> The difference between r14 and r13 is 0x2b. Assuming TB is > >>> incremented with 'timebase' frequency, that means 20.97 milliseconds > >>> (although the e300 manual says TB is "incremented once every four core > >>> input clock cycles"). > >> > >> I wouldn't be surprised that the internal CPU clock be twice the input > >> clock. > >> > >> So that's long enough to surely get a timer interrupt during every bad > >> access. > >> > >> Now we have to understand why SRR1 contains the address of the timer > >> exception entry and not the address of the bad access. > >> > >> The value of SRR1 confirms that it comes from 0x900 as MSR[IR] and [DR] > >> are cleared when interrupts are enabled. > >> > >> Maybe you should file a support case at NXP. They are usually quite > >> professionnal at responding. > > > > I already did (quite some time ago), but it started off as "why does the > > MCE occur in the first place". That part has already been figured out, > > but unfortunately I don't have a viable solution to it. Like you said, > > now the focus has shifted to understanding why the SRR0 value is not > > what we expect. > > Yes now the point is to understand why it starts processing the timer > interrupt at 0x900 (with IR and DR cleared as observed in SRR1) just > before taking the Machine Check. > > Allthough the execution of the decrementer interrupt is queue for after > the completion of the failing memory access, I'd expect the Machine > Check to take priority. > > Note that I have never observed such a behaviour on MPC8321 which has an > e300c2 core. I apologize for the silence during the past few days, I've been diverted with something else. This is the feedback that I got from NXP: | The e300 core uses SRR0/1 for both non-critical interrupts and machine | check interrupts and if they happen simultaneously a problem can occur | where the return address from the first exception is lost when handling | the second exception concurrently. This only occurs in the rare case | when the software ISR hasn't had the time to save SRR0/1 to the sw stack. | | If the ability to nest interrupts is desired, software then saves off | enough state (i.e. the contents of SRR0, SRR1, etc) that will allow it | to recover (i.e. resume handling the current interrupt) if another | interrupt occurs. So basically what they describe is a race condition between the MCE and a regular interrupt, where the regular interrupt (the timer interrupt, in our case) kicks in after the MCE handler is entered into but before it saves SRR0. This not only requires very precise timing, but would also end up with a saved SRR0 value that points back somewhere inside the MCE handler. But I've thought about something else. We already timed it and we know it consistently takes around 20 ms between the faulty read and the MCE handler execution. I'm thinking that the faulty read is essentially a failed transaction on the internal bus, because no peripheral replies to the access on the bad address. The 20 ms is probably the bus timeout. How does this scenario look to you? - The faulty read starts to execute. A new internal bus transaction is started, the bad address is put on the bus and the CPU waits for a peripheral to reply. - The timer interrupt kicks in. The CPU saves NIP to SRR0 and
RE: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices
On Mon, 2020-02-24 at 17:51 +1100, Oliver O'Halloran wrote: > On Mon, Feb 24, 2020 at 3:43 PM Alastair D'Silva < > alast...@au1.ibm.com> wrote: > > On Sun, 2020-02-23 at 20:37 -0800, Matthew Wilcox wrote: > > > On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva wrote: > > > > V3: > > > > - Rebase against next/next-20200220 > > > > - Move driver to arch/powerpc/platforms/powernv, we now > > > > expect > > > > this > > > > driver to go upstream via the powerpc tree > > > > > > That's rather the opposite direction of normal; mostly drivers > > > live > > > under > > > drivers/ and not in arch/. It's easier for drivers to get > > > overlooked > > > when doing tree-wide changes if they're hiding. > > > > This is true, however, given that it was not all that desirable to > > have > > it under drivers/nvdimm, it's sister driver (for the same hardware) > > is > > also under arch, and that we don't expect this driver to be used on > > any > > platform other than powernv, we think this was the most reasonable > > place to put it. > > Historically powernv specific platform drivers go in their respective > subsystem trees rather than in arch/ and I'd prefer we kept it that > way. When I added the papr_scm driver I put it in the pseries > platform > directory because most of the pseries paravirt code lives there for > some reason; I don't know why. Luckily for me that followed the same > model that Dan used when he put the NFIT driver in drivers/acpi/ and > the libnvdimm core in drivers/nvdimm/ so we didn't have anything to > argue about. However, as Matthew pointed out, it is at odds with how > most subsystems operate. Is there any particular reason we're doing > things this way or should we think about moving libnvdimm users to > drivers/nvdimm/? > > Oliver I'm not too fussed where it ends up, as long as it ends up somewhere :) >From what I can tell, the issue is that we have both "infrastructure" drivers, and end-device drivers. To me, it feels like drivers/nvdimm should contain both, and I think this feels like the right approach. I could move it back to drivers/nvdimm/ocxl, but I felt that it was only tolerated there, not desired. This could be cleared up with a response from Dan Williams, and if it is indeed dersired, this is my preferred location. I think a case could also be made for drivers/ocxl, simply because we don't expect more than a handful of drivers to ever live there (I expect most users will drive their devices from userspace via libocxl). In defence of keeping it in arch/powerpc/powernv, I highly doubt this driver will end up being used on any platform other than this. Even though OpenCAPI was engineered as an open standard, there is some competition from industry giants with a competing standard on a much more popular platform. -- Alastair D'Silva Open Source Developer Linux Technology Centre, IBM Australia mob: 0423 762 819
Re: [PATCH v3 03/27] powerpc: Map & release OpenCAPI LPC memory
On Tue, 2020-02-25 at 11:02 +0100, Frederic Barrat wrote: > > Le 21/02/2020 à 04:26, Alastair D'Silva a écrit : > > From: Alastair D'Silva > > > > This patch adds platform support to map & release LPC memory. > > > > Signed-off-by: Alastair D'Silva > > --- > > arch/powerpc/include/asm/pnv-ocxl.h | 4 +++ > > arch/powerpc/platforms/powernv/ocxl.c | 43 > > +++ > > 2 files changed, 47 insertions(+) > > > > diff --git a/arch/powerpc/include/asm/pnv-ocxl.h > > b/arch/powerpc/include/asm/pnv-ocxl.h > > index 7de82647e761..0b2a6707e555 100644 > > --- a/arch/powerpc/include/asm/pnv-ocxl.h > > +++ b/arch/powerpc/include/asm/pnv-ocxl.h > > @@ -32,5 +32,9 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void > > *platform_data, int pe_handle) > > > > extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr); > > extern void pnv_ocxl_free_xive_irq(u32 irq); > > +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE > > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size); > > +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev); > > +#endif > > This breaks the compilation of the ocxl driver if > CONFIG_MEMORY_HOTPLUG=n > > Those functions still make sense even without memory hotplug, for > example in the context of the implementation you had to access > opencapi > LPC memory through mmap(). The #ifdef is really needed only around > the > check_hotplug_memory_addressable() call. > >Fred Hmm, we do still need sparsemem though. Let me think about his some more. > > > > #endif /* _ASM_PNV_OCXL_H */ > > diff --git a/arch/powerpc/platforms/powernv/ocxl.c > > b/arch/powerpc/platforms/powernv/ocxl.c > > index 8c65aacda9c8..f2edbcc67361 100644 > > --- a/arch/powerpc/platforms/powernv/ocxl.c > > +++ b/arch/powerpc/platforms/powernv/ocxl.c > > @@ -475,6 +475,49 @@ void pnv_ocxl_spa_release(void *platform_data) > > } > > EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release); > > > > +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE > > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size) > > +{ > > + struct pci_controller *hose = pci_bus_to_host(pdev->bus); > > + struct pnv_phb *phb = hose->private_data; > > + u32 bdfn = pci_dev_id(pdev); > > + __be64 base_addr_be64; > > + u64 base_addr; > > + int rc; > > + > > + rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size, > > &base_addr_be64); > > + if (rc) { > > + dev_warn(&pdev->dev, > > +"OPAL could not allocate LPC memory, rc=%d\n", > > rc); > > + return 0; > > + } > > + > > + base_addr = be64_to_cpu(base_addr_be64); > > + > > + rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT, > > + size >> PAGE_SHIFT); > > + if (rc) > > + return 0; > > + > > + return base_addr; > > +} > > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup); > > + > > +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev) > > +{ > > + struct pci_controller *hose = pci_bus_to_host(pdev->bus); > > + struct pnv_phb *phb = hose->private_data; > > + u32 bdfn = pci_dev_id(pdev); > > + int rc; > > + > > + rc = opal_npu_mem_release(phb->opal_id, bdfn); > > + if (rc) > > + dev_warn(&pdev->dev, > > +"OPAL reported rc=%d when releasing LPC > > memory\n", rc); > > +} > > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release); > > +#endif > > + > > int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int > > pe_handle) > > { > > struct spa_data *data = (struct spa_data *) platform_data; > > -- Alastair D'Silva Open Source Developer Linux Technology Centre, IBM Australia mob: 0423 762 819
Re: [PATCH v3 06/27] ocxl: Tally up the LPC memory on a link & allow it to be mapped
On Tue, 2020-02-25 at 17:30 +0100, Frederic Barrat wrote: > > Le 21/02/2020 à 04:26, Alastair D'Silva a écrit : > > From: Alastair D'Silva > > > > Tally up the LPC memory on an OpenCAPI link & allow it to be mapped > > > > Signed-off-by: Alastair D'Silva > > --- > > drivers/misc/ocxl/core.c | 10 ++ > > drivers/misc/ocxl/link.c | 53 > > +++ > > drivers/misc/ocxl/ocxl_internal.h | 33 +++ > > 3 files changed, 96 insertions(+) > > > > diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c > > index b7a09b21ab36..2531c6cf19a0 100644 > > --- a/drivers/misc/ocxl/core.c > > +++ b/drivers/misc/ocxl/core.c > > @@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu, > > u8 afu_idx, struct pci_dev *dev) > > if (rc) > > goto err_free_pasid; > > > > + if (afu->config.lpc_mem_size || afu- > > >config.special_purpose_mem_size) { > > + rc = ocxl_link_add_lpc_mem(afu->fn->link, afu- > > >config.lpc_mem_offset, > > + afu->config.lpc_mem_size + > > + afu- > > >config.special_purpose_mem_size); > > + if (rc) > > + goto err_free_mmio; > > + } > > + > > return 0; > > > > +err_free_mmio: > > + unmap_mmio_areas(afu); > > err_free_pasid: > > reclaim_afu_pasid(afu); > > err_free_actag: > > diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c > > index 58d111afd9f6..1e039cc5ebe5 100644 > > --- a/drivers/misc/ocxl/link.c > > +++ b/drivers/misc/ocxl/link.c > > @@ -84,6 +84,11 @@ struct ocxl_link { > > int dev; > > atomic_t irq_available; > > struct spa *spa; > > + struct mutex lpc_mem_lock; /* protects lpc_mem & lpc_mem_sz */ > > + u64 lpc_mem_sz; /* Total amount of LPC memory presented on the > > link */ > > + u64 lpc_mem; > > + int lpc_consumers; > > + > > void *platform_data; > > }; > > static struct list_head links_list = LIST_HEAD_INIT(links_list); > > @@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int > > PE_mask, struct ocxl_link **out_l > > if (rc) > > goto err_spa; > > > > + mutex_init(&link->lpc_mem_lock); > > + > > /* platform specific hook */ > > rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask, > > &link->platform_data); > > @@ -711,3 +718,49 @@ void ocxl_link_free_irq(void *link_handle, int > > hw_irq) > > atomic_inc(&link->irq_available); > > } > > EXPORT_SYMBOL_GPL(ocxl_link_free_irq); > > + > > +int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size) > > +{ > > + struct ocxl_link *link = (struct ocxl_link *) link_handle; > > + > > + // Check for overflow > > + if (offset > (offset + size)) > > + return -EINVAL; > > + > > + mutex_lock(&link->lpc_mem_lock); > > + link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size); > > + > > + mutex_unlock(&link->lpc_mem_lock); > > + > > + return 0; > > +} > > + > > +u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev) > > +{ > > + struct ocxl_link *link = (struct ocxl_link *) link_handle; > > + > > + mutex_lock(&link->lpc_mem_lock); > > + > > + if(!link->lpc_mem) > > + link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link- > > >lpc_mem_sz); > > + > > + if(link->lpc_mem) > > + link->lpc_consumers++; > > + mutex_unlock(&link->lpc_mem_lock); > > + > > + return link->lpc_mem; > > +} > > + > > +void ocxl_link_lpc_release(void *link_handle, struct pci_dev > > *pdev) > > +{ > > + struct ocxl_link *link = (struct ocxl_link *) link_handle; > > + > > + mutex_lock(&link->lpc_mem_lock); > > + WARN_ON(--link->lpc_consumers < 0); > > Here, we always decrement the lpc_consumers count. However, it was > only > incremented if the mapping was setup correctly in opal. > > We could arguably claim that ocxl_link_lpc_release() should only be > called if ocxl_link_lpc_map() succeeded, but it would make error > path > handling easier if we only decrement the lpc_consumers count if > link->lpc_mem is set. So that we can just call > ocxl_link_lpc_release() > in error paths without having to worry about triggering the WARN_ON > message. > >Fred > > Ok, this makes sense. > > > + if (link->lpc_consumers == 0) { > > + pnv_ocxl_platform_lpc_release(pdev); > > + link->lpc_mem = 0; > > + } > > + > > + mutex_unlock(&link->lpc_mem_lock); > > +} > > diff --git a/drivers/misc/ocxl/ocxl_internal.h > > b/drivers/misc/ocxl/ocxl_internal.h > > index 198e4e4bc51d..d0c8c4838f42 100644 > > --- a/drivers/misc/ocxl/ocxl_internal.h > > +++ b/drivers/misc/ocxl/ocxl_internal.h > > @@ -142,4 +142,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context > > *ctx, u64 offset); > > u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id); > > void ocxl_afu_irq_free_all(struct ocxl_context *ctx); > > > > +/** > > + * ocxl_link
Re: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices
On Tue, Feb 25, 2020 at 4:14 PM Alastair D'Silva wrote: > > On Mon, 2020-02-24 at 17:51 +1100, Oliver O'Halloran wrote: > > On Mon, Feb 24, 2020 at 3:43 PM Alastair D'Silva < > > alast...@au1.ibm.com> wrote: > > > On Sun, 2020-02-23 at 20:37 -0800, Matthew Wilcox wrote: > > > > On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva wrote: > > > > > V3: > > > > > - Rebase against next/next-20200220 > > > > > - Move driver to arch/powerpc/platforms/powernv, we now > > > > > expect > > > > > this > > > > > driver to go upstream via the powerpc tree > > > > > > > > That's rather the opposite direction of normal; mostly drivers > > > > live > > > > under > > > > drivers/ and not in arch/. It's easier for drivers to get > > > > overlooked > > > > when doing tree-wide changes if they're hiding. > > > > > > This is true, however, given that it was not all that desirable to > > > have > > > it under drivers/nvdimm, it's sister driver (for the same hardware) > > > is > > > also under arch, and that we don't expect this driver to be used on > > > any > > > platform other than powernv, we think this was the most reasonable > > > place to put it. > > > > Historically powernv specific platform drivers go in their respective > > subsystem trees rather than in arch/ and I'd prefer we kept it that > > way. When I added the papr_scm driver I put it in the pseries > > platform > > directory because most of the pseries paravirt code lives there for > > some reason; I don't know why. Luckily for me that followed the same > > model that Dan used when he put the NFIT driver in drivers/acpi/ and > > the libnvdimm core in drivers/nvdimm/ so we didn't have anything to > > argue about. However, as Matthew pointed out, it is at odds with how > > most subsystems operate. Is there any particular reason we're doing > > things this way or should we think about moving libnvdimm users to > > drivers/nvdimm/? > > > > Oliver > > > I'm not too fussed where it ends up, as long as it ends up somewhere :) > > From what I can tell, the issue is that we have both "infrastructure" > drivers, and end-device drivers. To me, it feels like drivers/nvdimm > should contain both, and I think this feels like the right approach. > > I could move it back to drivers/nvdimm/ocxl, but I felt that it was > only tolerated there, not desired. This could be cleared up with a > response from Dan Williams, and if it is indeed dersired, this is my > preferred location. Apologies if I gave the impression it was only tolerated. I'm ok with drivers/nvdimm/ocxl/, and to the larger point I'd also be ok with a drivers/{acpi => nvdimm}/nfit and {arch/powerpc/platforms/pseries => drivers/nvdimm}/papr_scm.c move as well to keep all the consumers of the nvdimm related code together with the core.
RE: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices
On Tue, 2020-02-25 at 16:32 -0800, Dan Williams wrote: > On Tue, Feb 25, 2020 at 4:14 PM Alastair D'Silva < > alast...@au1.ibm.com> wrote: > > On Mon, 2020-02-24 at 17:51 +1100, Oliver O'Halloran wrote: > > > On Mon, Feb 24, 2020 at 3:43 PM Alastair D'Silva < > > > alast...@au1.ibm.com> wrote: > > > > On Sun, 2020-02-23 at 20:37 -0800, Matthew Wilcox wrote: > > > > > On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva > > > > > wrote: > > > > > > V3: > > > > > > - Rebase against next/next-20200220 > > > > > > - Move driver to arch/powerpc/platforms/powernv, we now > > > > > > expect > > > > > > this > > > > > > driver to go upstream via the powerpc tree > > > > > > > > > > That's rather the opposite direction of normal; mostly > > > > > drivers > > > > > live > > > > > under > > > > > drivers/ and not in arch/. It's easier for drivers to get > > > > > overlooked > > > > > when doing tree-wide changes if they're hiding. > > > > > > > > This is true, however, given that it was not all that desirable > > > > to > > > > have > > > > it under drivers/nvdimm, it's sister driver (for the same > > > > hardware) > > > > is > > > > also under arch, and that we don't expect this driver to be > > > > used on > > > > any > > > > platform other than powernv, we think this was the most > > > > reasonable > > > > place to put it. > > > > > > Historically powernv specific platform drivers go in their > > > respective > > > subsystem trees rather than in arch/ and I'd prefer we kept it > > > that > > > way. When I added the papr_scm driver I put it in the pseries > > > platform > > > directory because most of the pseries paravirt code lives there > > > for > > > some reason; I don't know why. Luckily for me that followed the > > > same > > > model that Dan used when he put the NFIT driver in drivers/acpi/ > > > and > > > the libnvdimm core in drivers/nvdimm/ so we didn't have anything > > > to > > > argue about. However, as Matthew pointed out, it is at odds with > > > how > > > most subsystems operate. Is there any particular reason we're > > > doing > > > things this way or should we think about moving libnvdimm users > > > to > > > drivers/nvdimm/? > > > > > > Oliver > > > > I'm not too fussed where it ends up, as long as it ends up > > somewhere :) > > > > From what I can tell, the issue is that we have both > > "infrastructure" > > drivers, and end-device drivers. To me, it feels like > > drivers/nvdimm > > should contain both, and I think this feels like the right > > approach. > > > > I could move it back to drivers/nvdimm/ocxl, but I felt that it was > > only tolerated there, not desired. This could be cleared up with a > > response from Dan Williams, and if it is indeed dersired, this is > > my > > preferred location. > > Apologies if I gave the impression it was only tolerated. I'm ok with > drivers/nvdimm/ocxl/, and to the larger point I'd also be ok with a > drivers/{acpi => nvdimm}/nfit and {arch/powerpc/platforms/pseries => > drivers/nvdimm}/papr_scm.c move as well to keep all the consumers of > the nvdimm related code together with the core. Great, thanks for clarifying, text is so imprecise when it comes to nuance :) I'll move ti back to drivers/nvdimm/ocxl then. -- Alastair D'Silva Open Source Developer Linux Technology Centre, IBM Australia mob: 0423 762 819
Re: [PATCH v2 3/3] ASoC: fsl_easrc: Add EASRC ASoC CPU DAI and platform drivers
On Tue, Feb 25, 2020 at 4:05 PM Nicolin Chen wrote: > > On Mon, Feb 24, 2020 at 08:53:25AM +, S.j. Wang wrote: > > Hi > > > > > > > > > > Signed-off-by: Shengjiu Wang > > > > --- > > > > sound/soc/fsl/Kconfig | 10 + > > > > sound/soc/fsl/Makefile |2 + > > > > sound/soc/fsl/fsl_asrc_common.h |1 + > > > > sound/soc/fsl/fsl_easrc.c | 2265 +++ > > > > sound/soc/fsl/fsl_easrc.h | 668 + > > > > sound/soc/fsl/fsl_easrc_dma.c | 440 ++ > > > > > > I see a 90% similarity between fsl_asrc_dma and fsl_easrc_dma files. > > > Would it be possible reuse the existing code? Could share structures from > > > my point of view, just like it reuses "enum asrc_pair_index", I know > > > differentiating "pair" and "context" is a big point here though. > > > > > > A possible quick solution for that, off the top of my head, could be: > > > > > > 1) in fsl_asrc_common.h > > > > > > struct fsl_asrc { > > > > > > }; > > > > > > struct fsl_asrc_pair { > > > > > > }; > > > > > > 2) in fsl_easrc.h > > > > > > /* Renaming shared structures */ > > > #define fsl_easrc fsl_asrc > > > #define fsl_easrc_context fsl_asrc_pair > > > > > > May be a good idea to see if others have some opinion too. > > > > > > > We need to modify the fsl_asrc and fsl_asrc_pair, let them > > To be used by both driver, also we need to put the specific > > Definition for each module to same struct, right? > > Yea. A merged structure if that doesn't look that bad. I see most > of the fields in struct fsl_asrc are being reused by in fsl_easrc. > > > > > > > > +static const struct regmap_config fsl_easrc_regmap_config = { > > > > + .readable_reg = fsl_easrc_readable_reg, > > > > + .volatile_reg = fsl_easrc_volatile_reg, > > > > + .writeable_reg = fsl_easrc_writeable_reg, > > > > > > Can we use regmap_range and regmap_access_table? > > > > > > > Can the regmap_range support discontinuous registers? The > > reg_stride = 4. > > I think it does. Giving an example here: > https://github.com/torvalds/linux/blob/master/drivers/mfd/da9063-i2c.c The register in this i2c driver are continuous, from 0x00, 0x01, 0x02... But our case is 0x00, 0x04, 0x08, does it work? best regards wang shengjiu
Re: [PATCH v3 1/6] powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and kaslr_early_init()
在 2020/2/20 21:40, Christophe Leroy 写道: Le 06/02/2020 à 03:58, Jason Yan a écrit : Some code refactor in kaslr_legal_offset() and kaslr_early_init(). No functional change. This is a preparation for KASLR fsl_booke64. Signed-off-by: Jason Yan Cc: Scott Wood Cc: Diana Craciun Cc: Michael Ellerman Cc: Christophe Leroy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nicholas Piggin Cc: Kees Cook --- arch/powerpc/mm/nohash/kaslr_booke.c | 40 ++-- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c index 4a75f2d9bf0e..07b036e98353 100644 --- a/arch/powerpc/mm/nohash/kaslr_booke.c +++ b/arch/powerpc/mm/nohash/kaslr_booke.c @@ -25,6 +25,7 @@ struct regions { unsigned long pa_start; unsigned long pa_end; unsigned long kernel_size; + unsigned long linear_sz; unsigned long dtb_start; unsigned long dtb_end; unsigned long initrd_start; @@ -260,11 +261,23 @@ static __init void get_cell_sizes(const void *fdt, int node, int *addr_cells, *size_cells = fdt32_to_cpu(*prop); } -static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned long index, - unsigned long offset) +static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned long random) { unsigned long koffset = 0; unsigned long start; + unsigned long index; + unsigned long offset; + + /* + * Decide which 64M we want to start + * Only use the low 8 bits of the random seed + */ + index = random & 0xFF; + index %= regions.linear_sz / SZ_64M; + + /* Decide offset inside 64M */ + offset = random % (SZ_64M - regions.kernel_size); + offset = round_down(offset, SZ_16K); while ((long)index >= 0) { offset = memstart_addr + index * SZ_64M + offset; @@ -289,10 +302,9 @@ static inline __init bool kaslr_disabled(void) static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size, unsigned long kernel_sz) { - unsigned long offset, random; + unsigned long random; unsigned long ram, linear_sz; u64 seed; - unsigned long index; kaslr_get_cmdline(dt_ptr); if (kaslr_disabled()) @@ -333,22 +345,12 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size regions.dtb_start = __pa(dt_ptr); regions.dtb_end = __pa(dt_ptr) + fdt_totalsize(dt_ptr); regions.kernel_size = kernel_sz; + regions.linear_sz = linear_sz; get_initrd_range(dt_ptr); get_crash_kernel(dt_ptr, ram); - /* - * Decide which 64M we want to start - * Only use the low 8 bits of the random seed - */ - index = random & 0xFF; - index %= linear_sz / SZ_64M; - - /* Decide offset inside 64M */ - offset = random % (SZ_64M - kernel_sz); - offset = round_down(offset, SZ_16K); - - return kaslr_legal_offset(dt_ptr, index, offset); + return kaslr_legal_offset(dt_ptr, random); } /* @@ -358,8 +360,6 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size */ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size) { - unsigned long tlb_virt; - phys_addr_t tlb_phys; unsigned long offset; unsigned long kernel_sz; @@ -375,8 +375,8 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size) is_second_reloc = 1; if (offset >= SZ_64M) { - tlb_virt = round_down(kernstart_virt_addr, SZ_64M); - tlb_phys = round_down(kernstart_addr, SZ_64M); + unsigned long tlb_virt = round_down(kernstart_virt_addr, SZ_64M); + phys_addr_t tlb_phys = round_down(kernstart_addr, SZ_64M); That looks like cleanup unrelated to the patch itself. Hi, Christophe These two variables is only for the booke32 code, so I moved the definition here so that I can save a "#ifdef CONFIG_PPC32" for them. Thanks, Jason /* Create kernel map to relocate in */ create_kaslr_tlb_entry(1, tlb_virt, tlb_phys); Christophe .
Re: [PATCH v3 3/6] powerpc/fsl_booke/64: implement KASLR for fsl_booke64
在 2020/2/20 21:48, Christophe Leroy 写道: Le 06/02/2020 à 03:58, Jason Yan a écrit : The implementation for Freescale BookE64 is similar as BookE32. One difference is that Freescale BookE64 set up a TLB mapping of 1G during booting. Another difference is that ppc64 needs the kernel to be 64K-aligned. So we can randomize the kernel in this 1G mapping and make it 64K-aligned. This can save some code to creat another TLB map at early boot. The disadvantage is that we only have about 1G/64K = 16384 slots to put the kernel in. To support secondary cpu boot up, a variable __kaslr_offset was added in first_256B section. This can help secondary cpu get the kaslr offset before the 1:1 mapping has been setup. Signed-off-by: Jason Yan Cc: Scott Wood Cc: Diana Craciun Cc: Michael Ellerman Cc: Christophe Leroy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nicholas Piggin Cc: Kees Cook --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/exceptions-64e.S | 10 + arch/powerpc/kernel/head_64.S | 7 ++ arch/powerpc/kernel/setup_64.c | 4 +++- arch/powerpc/mm/mmu_decl.h | 16 +++--- arch/powerpc/mm/nohash/kaslr_booke.c | 33 +--- 6 files changed, 59 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c150a9d49343..754aeb96bb1c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -568,7 +568,7 @@ config RELOCATABLE config RANDOMIZE_BASE bool "Randomize the address of the kernel image" - depends on (FSL_BOOKE && FLATMEM && PPC32) + depends on (PPC_FSL_BOOK3E && FLATMEM) depends on RELOCATABLE help Randomizes the virtual address at which the kernel image is diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 1b9b174bee86..c1c05b8684ca 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -1378,6 +1378,7 @@ skpinv: addi r6,r6,1 /* Increment */ 1: mflr r6 addi r6,r6,(2f - 1b) tovirt(r6,r6) + add r6,r6,r19 lis r7,MSR_KERNEL@h ori r7,r7,MSR_KERNEL@l mtspr SPRN_SRR0,r6 @@ -1400,6 +1401,7 @@ skpinv: addi r6,r6,1 /* Increment */ /* We translate LR and return */ tovirt(r8,r8) + add r8,r8,r19 mtlr r8 blr @@ -1528,6 +1530,7 @@ a2_tlbinit_code_end: */ _GLOBAL(start_initialization_book3e) mflr r28 + li r19, 0 /* First, we need to setup some initial TLBs to map the kernel * text, data and bss at PAGE_OFFSET. We don't have a real mode @@ -1570,6 +1573,12 @@ _GLOBAL(book3e_secondary_core_init) cmplwi r4,0 bne 2f + li r19, 0 +#ifdef CONFIG_RANDOMIZE_BASE + LOAD_REG_ADDR_PIC(r19, __kaslr_offset) + lwz r19,0(r19) + rlwinm r19,r19,0,0,5 +#endif /* Setup TLB for this core */ bl initial_tlb_book3e @@ -1602,6 +1611,7 @@ _GLOBAL(book3e_secondary_core_init) lis r3,PAGE_OFFSET@highest sldi r3,r3,32 or r28,r28,r3 + add r28,r28,r19 1: mtlr r28 blr diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index ad79fddb974d..744624140fb8 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -104,6 +104,13 @@ __secondary_hold_acknowledge: .8byte 0x0 #ifdef CONFIG_RELOCATABLE +#ifdef CONFIG_RANDOMIZE_BASE + . = 0x58 + .globl __kaslr_offset +__kaslr_offset: +DEFINE_FIXED_SYMBOL(__kaslr_offset) + .long 0 +#endif /* This flag is set to 1 by a loader if the kernel should run * at the loaded address instead of the linked address. This * is used by kexec-tools to keep the the kdump kernel in the diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 6104917a282d..a16b970a8d1a 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -66,7 +66,7 @@ #include #include #include - Why remove this new line which clearly separates things in asm/ and things in local dir ? Sorry to break this. I will add the new line back. +#include #include "setup.h" int spinning_secondaries; @@ -300,6 +300,8 @@ void __init early_setup(unsigned long dt_ptr) /* Enable early debugging if any specified (see udbg.h) */ udbg_early_init(); + kaslr_early_init(__va(dt_ptr), 0); + udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr); /* diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index 3e1c85c7d10b..bbd721d1e3d7 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -147,14 +147,6 @@ void reloc_kernel_entry(void *fdt, long addr); extern void loadcam_entry(unsigned int index); extern void loadcam_multi(int first_idx, int num, int tmp_idx); -#ifdef CONFIG_RANDOMIZE_BASE -void kaslr_early_init(void *d
Re: [PATCH v3 5/6] powerpc/fsl_booke/64: clear the original kernel if randomized
在 2020/2/20 21:49, Christophe Leroy 写道: Le 06/02/2020 à 03:58, Jason Yan a écrit : The original kernel still exists in the memory, clear it now. No such problem with PPC32 ? Or is that common ? PPC32 did this in relocate_init() in fsl_booke.c because PPC32 will not reach kaslr_early_init for the second pass after relocation. Thanks, Jason Christophe Signed-off-by: Jason Yan Cc: Scott Wood Cc: Diana Craciun Cc: Michael Ellerman Cc: Christophe Leroy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nicholas Piggin Cc: Kees Cook --- arch/powerpc/mm/nohash/kaslr_booke.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c index c6f5c1db1394..ed1277059368 100644 --- a/arch/powerpc/mm/nohash/kaslr_booke.c +++ b/arch/powerpc/mm/nohash/kaslr_booke.c @@ -378,8 +378,10 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size) unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58); unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c); - if (*__run_at_load == 1) + if (*__run_at_load == 1) { + kaslr_late_init(); return; + } /* Setup flat device-tree pointer */ initial_boot_params = dt_ptr; .
Re: [PATCH v3 6/6] powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst and add 64bit part
在 2020/2/20 21:50, Christophe Leroy 写道: Le 06/02/2020 à 03:58, Jason Yan a écrit : Now we support both 32 and 64 bit KASLR for fsl booke. Add document for 64 bit part and rename kaslr-booke32.rst to kaslr-booke.rst. Signed-off-by: Jason Yan Cc: Scott Wood Cc: Diana Craciun Cc: Michael Ellerman Cc: Christophe Leroy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nicholas Piggin Cc: Kees Cook --- .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 --- 1 file changed, 31 insertions(+), 4 deletions(-) rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%) Also update Documentation/powerpc/index.rst ? Oh yes, thanks for reminding me of this. Thanks, Jason Christophe .
Re: [PATCH v2 3/3] ASoC: fsl_easrc: Add EASRC ASoC CPU DAI and platform drivers
On Wed, Feb 26, 2020 at 09:51:39AM +0800, Shengjiu Wang wrote: > > > > > +static const struct regmap_config fsl_easrc_regmap_config = { > > > > > + .readable_reg = fsl_easrc_readable_reg, > > > > > + .volatile_reg = fsl_easrc_volatile_reg, > > > > > + .writeable_reg = fsl_easrc_writeable_reg, > > > > > > > > Can we use regmap_range and regmap_access_table? > > > > > > > > > > Can the regmap_range support discontinuous registers? The > > > reg_stride = 4. > > > > I think it does. Giving an example here: > > https://github.com/torvalds/linux/blob/master/drivers/mfd/da9063-i2c.c > > The register in this i2c driver are continuous, from 0x00, 0x01, 0x02... > > But our case is 0x00, 0x04, 0x08, does it work? Ah...I see your point now. I am not very sure -- have only used in I2C drivers. You can ignore if it doesn't likely work for us.
Re: [PATCH v3 3/6] powerpc/fsl_booke/64: implement KASLR for fsl_booke64
在 2020/2/26 10:40, Jason Yan 写道: 在 2020/2/20 21:48, Christophe Leroy 写道: Le 06/02/2020 à 03:58, Jason Yan a écrit : The implementation for Freescale BookE64 is similar as BookE32. One difference is that Freescale BookE64 set up a TLB mapping of 1G during booting. Another difference is that ppc64 needs the kernel to be 64K-aligned. So we can randomize the kernel in this 1G mapping and make it 64K-aligned. This can save some code to creat another TLB map at early boot. The disadvantage is that we only have about 1G/64K = 16384 slots to put the kernel in. To support secondary cpu boot up, a variable __kaslr_offset was added in first_256B section. This can help secondary cpu get the kaslr offset before the 1:1 mapping has been setup. Signed-off-by: Jason Yan Cc: Scott Wood Cc: Diana Craciun Cc: Michael Ellerman Cc: Christophe Leroy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nicholas Piggin Cc: Kees Cook --- arch/powerpc/Kconfig | 2 +- arch/powerpc/kernel/exceptions-64e.S | 10 + arch/powerpc/kernel/head_64.S | 7 ++ arch/powerpc/kernel/setup_64.c | 4 +++- arch/powerpc/mm/mmu_decl.h | 16 +++--- arch/powerpc/mm/nohash/kaslr_booke.c | 33 +--- 6 files changed, 59 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c150a9d49343..754aeb96bb1c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -568,7 +568,7 @@ config RELOCATABLE config RANDOMIZE_BASE bool "Randomize the address of the kernel image" - depends on (FSL_BOOKE && FLATMEM && PPC32) + depends on (PPC_FSL_BOOK3E && FLATMEM) depends on RELOCATABLE help Randomizes the virtual address at which the kernel image is diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 1b9b174bee86..c1c05b8684ca 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -1378,6 +1378,7 @@ skpinv: addi r6,r6,1 /* Increment */ 1: mflr r6 addi r6,r6,(2f - 1b) tovirt(r6,r6) + add r6,r6,r19 lis r7,MSR_KERNEL@h ori r7,r7,MSR_KERNEL@l mtspr SPRN_SRR0,r6 @@ -1400,6 +1401,7 @@ skpinv: addi r6,r6,1 /* Increment */ /* We translate LR and return */ tovirt(r8,r8) + add r8,r8,r19 mtlr r8 blr @@ -1528,6 +1530,7 @@ a2_tlbinit_code_end: */ _GLOBAL(start_initialization_book3e) mflr r28 + li r19, 0 /* First, we need to setup some initial TLBs to map the kernel * text, data and bss at PAGE_OFFSET. We don't have a real mode @@ -1570,6 +1573,12 @@ _GLOBAL(book3e_secondary_core_init) cmplwi r4,0 bne 2f + li r19, 0 +#ifdef CONFIG_RANDOMIZE_BASE + LOAD_REG_ADDR_PIC(r19, __kaslr_offset) + lwz r19,0(r19) + rlwinm r19,r19,0,0,5 +#endif /* Setup TLB for this core */ bl initial_tlb_book3e @@ -1602,6 +1611,7 @@ _GLOBAL(book3e_secondary_core_init) lis r3,PAGE_OFFSET@highest sldi r3,r3,32 or r28,r28,r3 + add r28,r28,r19 1: mtlr r28 blr diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index ad79fddb974d..744624140fb8 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -104,6 +104,13 @@ __secondary_hold_acknowledge: .8byte 0x0 #ifdef CONFIG_RELOCATABLE +#ifdef CONFIG_RANDOMIZE_BASE + . = 0x58 + .globl __kaslr_offset +__kaslr_offset: +DEFINE_FIXED_SYMBOL(__kaslr_offset) + .long 0 +#endif /* This flag is set to 1 by a loader if the kernel should run * at the loaded address instead of the linked address. This * is used by kexec-tools to keep the the kdump kernel in the diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 6104917a282d..a16b970a8d1a 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -66,7 +66,7 @@ #include #include #include - Why remove this new line which clearly separates things in asm/ and things in local dir ? Sorry to break this. I will add the new line back. +#include #include "setup.h" int spinning_secondaries; @@ -300,6 +300,8 @@ void __init early_setup(unsigned long dt_ptr) /* Enable early debugging if any specified (see udbg.h) */ udbg_early_init(); + kaslr_early_init(__va(dt_ptr), 0); + udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr); /* diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index 3e1c85c7d10b..bbd721d1e3d7 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -147,14 +147,6 @@ void reloc_kernel_entry(void *fdt, long addr); extern void loadcam_entry(unsigned int index); extern void loadcam_multi(int first_idx, int num, int tmp_idx); -#ifdef CONFIG_RANDOMIZE
Re: [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning
Segher Boessenkool's on February 26, 2020 7:20 am: > Hi! > > On Wed, Feb 26, 2020 at 03:35:35AM +1000, Nicholas Piggin wrote: >> Kernel addresses and potentially other sensitive data could be leaked >> in volatile registers after a syscall. > >> cmpdi r3,0 >> bne .Lsyscall_restore_regs >> +li r0,0 >> +li r4,0 >> +li r5,0 >> +li r6,0 >> +li r7,0 >> +li r8,0 >> +li r9,0 >> +li r10,0 >> +li r11,0 >> +li r12,0 >> +mtctr r0 >> +mtspr SPRN_XER,r0 >> .Lsyscall_restore_regs_cont: > > What about LR? Is that taken care of later? LR is preserved by sc as per ABI. > This also deserves a big fat comment imo, it is very important after > all, and not so obvious. Sure I can add something. Thanks, Nick
Re: [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
bugzilla-dae...@bugzilla.kernel.org's on February 26, 2020 1:26 am: > https://bugzilla.kernel.org/show_bug.cgi?id=206669 > > Bug ID: 206669 >Summary: Little-endian kernel crashing on POWER8 on heavy > big-endian PowerKVM load >Product: Platform Specific/Hardware >Version: 2.5 > Kernel Version: 5.4.x > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: PPC-64 > Assignee: platform_ppc...@kernel-bugs.osdl.org > Reporter: glaub...@physik.fu-berlin.de > CC: mator...@gmail.com > Regression: No > > Created attachment 287605 > --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit > Backtrace of host system crashing with little-endian kernel > > We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian > unstable hosting a big-endian ppc64 virtual machine running the same kernel in > big-endian mode. > > When building OpenJDK-11 on the big-endian VM, the testsuite crashes the > *host* > system which is little-endian with the following kernel backtrace. The problem > reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host > running 5.4.x. > > Backtrace attached. Thanks for the report, we need to get more data about the first BUG if we can. What function in your vmlinux contains address 0xc017a778? (use nm or objdump etc) Is that the first message you get, No warnings or anything else earlier in the dmesg? Also 0xc02659a0 would be interesting. When reproducing, do you ever get a clean trace of the first bug? Could you try setting /proc/sys/kernel/panic_on_oops and reproducing? Thanks, Nick
[Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
https://bugzilla.kernel.org/show_bug.cgi?id=206669 --- Comment #1 from npig...@gmail.com --- bugzilla-dae...@bugzilla.kernel.org's on February 26, 2020 1:26 am: > https://bugzilla.kernel.org/show_bug.cgi?id=206669 > > Bug ID: 206669 >Summary: Little-endian kernel crashing on POWER8 on heavy > big-endian PowerKVM load >Product: Platform Specific/Hardware >Version: 2.5 > Kernel Version: 5.4.x > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: PPC-64 > Assignee: platform_ppc...@kernel-bugs.osdl.org > Reporter: glaub...@physik.fu-berlin.de > CC: mator...@gmail.com > Regression: No > > Created attachment 287605 > --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit > Backtrace of host system crashing with little-endian kernel > > We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian > unstable hosting a big-endian ppc64 virtual machine running the same kernel > in > big-endian mode. > > When building OpenJDK-11 on the big-endian VM, the testsuite crashes the > *host* > system which is little-endian with the following kernel backtrace. The > problem > reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host > running 5.4.x. > > Backtrace attached. Thanks for the report, we need to get more data about the first BUG if we can. What function in your vmlinux contains address 0xc017a778? (use nm or objdump etc) Is that the first message you get, No warnings or anything else earlier in the dmesg? Also 0xc02659a0 would be interesting. When reproducing, do you ever get a clean trace of the first bug? Could you try setting /proc/sys/kernel/panic_on_oops and reproducing? Thanks, Nick -- You are receiving this mail because: You are watching the assignee of the bug.
[PATCH v3 00/14] Initial Prefixed Instruction support
A future revision of the ISA will introduce prefixed instructions. A prefixed instruction is composed of a 4-byte prefix followed by a 4-byte suffix. All prefixes have the major opcode 1. A prefix will never be a valid word instruction. A suffix may be an existing word instruction or a new instruction. This series enables prefixed instructions and extends the instruction emulation to support them. Then the places where prefixed instructions might need to be emulated are updated. v3 is based on feedback from Christophe Leroy. The major changes: - Completely replacing store_inst() with patch_instruction() in xmon - Improve implementation of mread_instr() to not use mread(). - Base the series on top of https://patchwork.ozlabs.org/patch/1232619/ as this will effect kprobes. - Some renaming and simplification of conditionals. v2 incorporates feedback from Daniel Axtens and and Balamuruhan S. The major changes are: - Squashing together all commits about SRR1 bits - Squashing all commits for supporting prefixed load stores - Changing abbreviated references to sufx/prfx -> suffix/prefix - Introducing macros for returning the length of an instruction - Removing sign extension flag from pstd/pld in sstep.c - Dropping patch "powerpc/fault: Use analyse_instr() to check for store with updates to sp" from the series, it did not really fit with prefixed enablement in the first place and as reported by Greg Kurz did not work correctly. Alistair Popple (1): powerpc: Enable Prefixed Instructions Jordan Niethe (13): powerpc: Define new SRR1 bits for a future ISA version powerpc sstep: Prepare to support prefixed instructions powerpc sstep: Add support for prefixed load/stores powerpc sstep: Add support for prefixed fixed-point arithmetic powerpc: Support prefixed instructions in alignment handler powerpc/traps: Check for prefixed instructions in facility_unavailable_exception() powerpc/xmon: Remove store_inst() for patch_instruction() powerpc/xmon: Add initial support for prefixed instructions powerpc/xmon: Dump prefixed instructions powerpc/kprobes: Support kprobes on prefixed instructions powerpc/uprobes: Add support for prefixed instructions powerpc/hw_breakpoints: Initial support for prefixed instructions powerpc: Add prefix support to mce_find_instr_ea_and_pfn() arch/powerpc/include/asm/kprobes.h| 5 +- arch/powerpc/include/asm/ppc-opcode.h | 13 ++ arch/powerpc/include/asm/reg.h| 7 +- arch/powerpc/include/asm/sstep.h | 9 +- arch/powerpc/include/asm/uaccess.h| 25 arch/powerpc/include/asm/uprobes.h| 16 ++- arch/powerpc/kernel/align.c | 8 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 23 arch/powerpc/kernel/hw_breakpoint.c | 9 +- arch/powerpc/kernel/kprobes.c | 43 -- arch/powerpc/kernel/mce_power.c | 6 +- arch/powerpc/kernel/optprobes.c | 31 +++-- arch/powerpc/kernel/optprobes_head.S | 6 + arch/powerpc/kernel/traps.c | 22 ++- arch/powerpc/kernel/uprobes.c | 4 +- arch/powerpc/kvm/book3s_hv_nested.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- arch/powerpc/kvm/emulate_loadstore.c | 2 +- arch/powerpc/lib/sstep.c | 191 +- arch/powerpc/lib/test_emulate_step.c | 30 ++-- arch/powerpc/xmon/xmon.c | 140 +++ 21 files changed, 497 insertions(+), 97 deletions(-) -- 2.17.1
[PATCH v3 01/14] powerpc: Enable Prefixed Instructions
From: Alistair Popple Prefix instructions have their own FSCR bit which needs to enabled via a CPU feature. The kernel will save the FSCR for problem state but it needs to be enabled initially. Signed-off-by: Alistair Popple --- arch/powerpc/include/asm/reg.h| 3 +++ arch/powerpc/kernel/dt_cpu_ftrs.c | 23 +++ 2 files changed, 26 insertions(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 1aa46dff0957..c7758c2ccc5f 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -397,6 +397,7 @@ #define SPRN_RWMR 0x375 /* Region-Weighting Mode Register */ /* HFSCR and FSCR bit numbers are the same */ +#define FSCR_PREFIX_LG 13 /* Enable Prefix Instructions */ #define FSCR_SCV_LG12 /* Enable System Call Vectored */ #define FSCR_MSGP_LG 10 /* Enable MSGP */ #define FSCR_TAR_LG8 /* Enable Target Address Register */ @@ -408,11 +409,13 @@ #define FSCR_VECVSX_LG 1 /* Enable VMX/VSX */ #define FSCR_FP_LG 0 /* Enable Floating Point */ #define SPRN_FSCR 0x099 /* Facility Status & Control Register */ +#define FSCR_PREFIX __MASK(FSCR_PREFIX_LG) #define FSCR_SCV __MASK(FSCR_SCV_LG) #define FSCR_TAR __MASK(FSCR_TAR_LG) #define FSCR_EBB __MASK(FSCR_EBB_LG) #define FSCR_DSCR__MASK(FSCR_DSCR_LG) #define SPRN_HFSCR 0xbe/* HV=1 Facility Status & Control Register */ +#define HFSCR_PREFIX __MASK(FSCR_PREFIX_LG) #define HFSCR_MSGP __MASK(FSCR_MSGP_LG) #define HFSCR_TAR__MASK(FSCR_TAR_LG) #define HFSCR_EBB__MASK(FSCR_EBB_LG) diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 182b4047c1ef..396f2c6c588e 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -553,6 +553,28 @@ static int __init feat_enable_large_ci(struct dt_cpu_feature *f) return 1; } +static int __init feat_enable_prefix(struct dt_cpu_feature *f) +{ + u64 fscr, hfscr; + + if (f->usable_privilege & USABLE_HV) { + hfscr = mfspr(SPRN_HFSCR); + hfscr |= HFSCR_PREFIX; + mtspr(SPRN_HFSCR, hfscr); + } + + if (f->usable_privilege & USABLE_OS) { + fscr = mfspr(SPRN_FSCR); + fscr |= FSCR_PREFIX; + mtspr(SPRN_FSCR, fscr); + + if (f->usable_privilege & USABLE_PR) + current->thread.fscr |= FSCR_PREFIX; + } + + return 1; +} + struct dt_cpu_feature_match { const char *name; int (*enable)(struct dt_cpu_feature *f); @@ -626,6 +648,7 @@ static struct dt_cpu_feature_match __initdata {"vector-binary128", feat_enable, 0}, {"vector-binary16", feat_enable, 0}, {"wait-v3", feat_enable, 0}, + {"prefix-instructions", feat_enable_prefix, 0}, }; static bool __initdata using_dt_cpu_ftrs; -- 2.17.1
[PATCH v3 02/14] powerpc: Define new SRR1 bits for a future ISA version
Add the BOUNDARY SRR1 bit definition for when the cause of an alignment exception is a prefixed instruction that crosses a 64-byte boundary. Add the PREFIXED SRR1 bit definition for exceptions caused by prefixed instructions. Bit 35 of SRR1 is called SRR1_ISI_N_OR_G. This name comes from it being used to indicate that an ISI was due to the access being no-exec or guarded. A future ISA version adds another purpose. It is also set if there is an access in a cache-inhibited location for prefixed instruction. Rename from SRR1_ISI_N_OR_G to SRR1_ISI_N_G_OR_CIP. Signed-off-by: Jordan Niethe --- v2: Combined all the commits concerning SRR1 bits. --- arch/powerpc/include/asm/reg.h | 4 +++- arch/powerpc/kvm/book3s_hv_nested.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index c7758c2ccc5f..173f33df4fab 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -762,7 +762,7 @@ #endif #define SRR1_ISI_NOPT0x4000 /* ISI: Not found in hash */ -#define SRR1_ISI_N_OR_G 0x1000 /* ISI: Access is no-exec or G */ +#define SRR1_ISI_N_G_OR_CIP 0x1000 /* ISI: Access is no-exec or G or CI for a prefixed instruction */ #define SRR1_ISI_PROT0x0800 /* ISI: Other protection fault */ #define SRR1_WAKEMASK0x0038 /* reason for wakeup */ #define SRR1_WAKEMASK_P8 0x003c /* reason for wakeup on POWER8 and 9 */ @@ -789,6 +789,8 @@ #define SRR1_PROGADDR0x0001 /* SRR0 contains subsequent addr */ #define SRR1_MCE_MCP 0x0008 /* Machine check signal caused interrupt */ +#define SRR1_BOUNDARY0x1000 /* Prefixed instruction crosses 64-byte boundary */ +#define SRR1_PREFIXED0x2000 /* Exception caused by prefixed instruction */ #define SPRN_HSRR0 0x13A /* Save/Restore Register 0 */ #define SPRN_HSRR1 0x13B /* Save/Restore Register 1 */ diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index dc97e5be76f6..6ab685227574 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1169,7 +1169,7 @@ static int kvmhv_translate_addr_nested(struct kvm_vcpu *vcpu, } else if (vcpu->arch.trap == BOOK3S_INTERRUPT_H_INST_STORAGE) { /* Can we execute? */ if (!gpte_p->may_execute) { - flags |= SRR1_ISI_N_OR_G; + flags |= SRR1_ISI_N_G_OR_CIP; goto forward_to_l1; } } else { diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 220305454c23..b53a9f1c1a46 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -1260,7 +1260,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr, status &= ~DSISR_NOHPTE;/* DSISR_NOHPTE == SRR1_ISI_NOPT */ if (!data) { if (gr & (HPTE_R_N | HPTE_R_G)) - return status | SRR1_ISI_N_OR_G; + return status | SRR1_ISI_N_G_OR_CIP; if (!hpte_read_permission(pp, slb_v & key)) return status | SRR1_ISI_PROT; } else if (status & DSISR_ISSTORE) { -- 2.17.1
[PATCH v3 03/14] powerpc sstep: Prepare to support prefixed instructions
Currently all instructions are a single word long. A future ISA version will include prefixed instructions which have a double word length. The functions used for analysing and emulating instructions need to be modified so that they can handle these new instruction types. A prefixed instruction is a word prefix followed by a word suffix. All prefixes uniquely have the primary op-code 1. Suffixes may be valid word instructions or instructions that only exist as suffixes. In handling prefixed instructions it will be convenient to treat the suffix and prefix as separate words. To facilitate this modify analyse_instr() and emulate_step() to take a suffix as a parameter. For word instructions it does not matter what is passed in here - it will be ignored. We also define a new flag, PREFIXED, to be used in instruction_op:type. This flag will indicate when emulating an analysed instruction if the NIP should be advanced by word length or double word length. The callers of analyse_instr() and emulate_step() will need their own changes to be able to support prefixed instructions. For now modify them to pass in 0 as a suffix. Note that at this point no prefixed instructions are emulated or analysed - this is just making it possible to do so. Signed-off-by: Jordan Niethe --- v2: - Move definition of __get_user_instr() and __get_user_instr_inatomic() to "powerpc: Support prefixed instructions in alignment handler." - Use a macro for returning the length of an op - Rename sufx -> suffix - Define and use PPC_NO_SUFFIX instead of 0 v3: - Define and use OP_PREFIX - Rename OP_LENGTH() to GETLENGTH() - Define IS_PREFIX() as 0 for non 64 bit ppc --- arch/powerpc/include/asm/ppc-opcode.h | 13 arch/powerpc/include/asm/sstep.h | 9 ++-- arch/powerpc/kernel/align.c | 2 +- arch/powerpc/kernel/hw_breakpoint.c | 4 ++-- arch/powerpc/kernel/kprobes.c | 2 +- arch/powerpc/kernel/mce_power.c | 2 +- arch/powerpc/kernel/optprobes.c | 3 ++- arch/powerpc/kernel/uprobes.c | 2 +- arch/powerpc/kvm/emulate_loadstore.c | 2 +- arch/powerpc/lib/sstep.c | 12 ++- arch/powerpc/lib/test_emulate_step.c | 30 +-- arch/powerpc/xmon/xmon.c | 5 +++-- 12 files changed, 54 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index c1df75edde44..24dc193cd3ef 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -158,6 +158,9 @@ /* VMX Vector Store Instructions */ #define OP_31_XOP_STVX 231 +/* Prefixed Instructions */ +#define OP_PREFIX 1 + #define OP_31 31 #define OP_LWZ 32 #define OP_STFS 52 @@ -377,6 +380,16 @@ #define PPC_INST_VCMPEQUD 0x10c7 #define PPC_INST_VCMPEQUB 0x1006 +/* macros for prefixed instructions */ +#ifdef __powerpc64__ +#define IS_PREFIX(x) (((x) >> 26) == OP_PREFIX) +#else +#define IS_PREFIX(x) (0) +#endif + +#definePPC_NO_SUFFIX 0 +#definePPC_INST_LENGTH(x) (IS_PREFIX(x) ? 8 : 4) + /* macros to insert fields into opcodes */ #define ___PPC_RA(a) (((a) & 0x1f) << 16) #define ___PPC_RB(b) (((b) & 0x1f) << 11) diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h index 769f055509c9..5539df5c50a4 100644 --- a/arch/powerpc/include/asm/sstep.h +++ b/arch/powerpc/include/asm/sstep.h @@ -89,11 +89,15 @@ enum instruction_type { #define VSX_LDLEFT 4 /* load VSX register from left */ #define VSX_CHECK_VEC 8 /* check MSR_VEC not MSR_VSX for reg >= 32 */ +/* Prefixed flag, ORed in with type */ +#define PREFIXED 0x800 + /* Size field in type word */ #define SIZE(n)((n) << 12) #define GETSIZE(w) ((w) >> 12) #define GETTYPE(t) ((t) & INSTR_TYPE_MASK) +#define GETLENGTH(t) (((t) & PREFIXED) ? 8 : 4) #define MKOP(t, f, s) ((t) | (f) | SIZE(s)) @@ -132,7 +136,7 @@ union vsx_reg { * otherwise. */ extern int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, -unsigned int instr); +unsigned int instr, unsigned int suffix); /* * Emulate an instruction that can be executed just by updating @@ -149,7 +153,8 @@ void emulate_update_regs(struct pt_regs *reg, struct instruction_op *op); * 0 if it could not be emulated, or -1 for an instruction that * should not be emulated (rfid, mtmsrd clearing MSR_RI, etc.). */ -extern int emulate_step(struct pt_regs *regs, unsigned int instr); +extern int emulate_step(struct pt_regs *regs, unsigned int instr, + unsigned int suffix); /* * Emulate a load or store instruction by reading/writing the diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 92045ed64976..ba3bf5c3ab62 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/pow
[PATCH v3 04/14] powerpc sstep: Add support for prefixed load/stores
This adds emulation support for the following prefixed integer load/stores: * Prefixed Load Byte and Zero (plbz) * Prefixed Load Halfword and Zero (plhz) * Prefixed Load Halfword Algebraic (plha) * Prefixed Load Word and Zero (plwz) * Prefixed Load Word Algebraic (plwa) * Prefixed Load Doubleword (pld) * Prefixed Store Byte (pstb) * Prefixed Store Halfword (psth) * Prefixed Store Word (pstw) * Prefixed Store Doubleword (pstd) * Prefixed Load Quadword (plq) * Prefixed Store Quadword (pstq) the follow prefixed floating-point load/stores: * Prefixed Load Floating-Point Single (plfs) * Prefixed Load Floating-Point Double (plfd) * Prefixed Store Floating-Point Single (pstfs) * Prefixed Store Floating-Point Double (pstfd) and for the following prefixed VSX load/stores: * Prefixed Load VSX Scalar Doubleword (plxsd) * Prefixed Load VSX Scalar Single-Precision (plxssp) * Prefixed Load VSX Vector [0|1] (plxv, plxv0, plxv1) * Prefixed Store VSX Scalar Doubleword (pstxsd) * Prefixed Store VSX Scalar Single-Precision (pstxssp) * Prefixed Store VSX Vector [0|1] (pstxv, pstxv0, pstxv1) Signed-off-by: Jordan Niethe --- v2: - Combine all load/store patches - Fix the name of Type 01 instructions - Remove sign extension flag from pstd/pld - Rename sufx -> suffix v3: - Move prefixed loads and stores into the switch statement --- arch/powerpc/lib/sstep.c | 159 +++ 1 file changed, 159 insertions(+) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index efbe72370670..8e4ec953e279 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -187,6 +187,44 @@ static nokprobe_inline unsigned long xform_ea(unsigned int instr, return ea; } +/* + * Calculate effective address for a MLS:D-form / 8LS:D-form + * prefixed instruction + */ +static nokprobe_inline unsigned long mlsd_8lsd_ea(unsigned int instr, + unsigned int suffix, + const struct pt_regs *regs) +{ + int ra, prefix_r; + unsigned int dd; + unsigned long ea, d0, d1, d; + + prefix_r = instr & (1ul << 20); + ra = (suffix >> 16) & 0x1f; + + d0 = instr & 0x3; + d1 = suffix & 0x; + d = (d0 << 16) | d1; + + /* +* sign extend a 34 bit number +*/ + dd = (unsigned int)(d >> 2); + ea = (signed int)dd; + ea = (ea << 2) | (d & 0x3); + + if (!prefix_r && ra) + ea += regs->gpr[ra]; + else if (!prefix_r && !ra) + ; /* Leave ea as is */ + else if (prefix_r && !ra) + ea += regs->nip; + else if (prefix_r && ra) + ; /* Invalid form. Should already be checked for by caller! */ + + return ea; +} + /* * Return the largest power of 2, not greater than sizeof(unsigned long), * such that x is a multiple of it. @@ -1166,6 +1204,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, unsigned int instr, unsigned int suffix) { unsigned int opcode, ra, rb, rc, rd, spr, u; + unsigned int suffixopcode, prefixtype, prefix_r; unsigned long int imm; unsigned long int val, val2; unsigned int mb, me, sh; @@ -2648,6 +2687,126 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, break; } break; + case 1: /* Prefixed instructions */ + prefix_r = instr & (1ul << 20); + ra = (suffix >> 16) & 0x1f; + op->update_reg = ra; + rd = (suffix >> 21) & 0x1f; + op->reg = rd; + op->val = regs->gpr[rd]; + + suffixopcode = suffix >> 26; + prefixtype = (instr >> 24) & 0x3; + switch (prefixtype) { + case 0: /* Type 00 Eight-Byte Load/Store */ + if (prefix_r && ra) + break; + op->ea = mlsd_8lsd_ea(instr, suffix, regs); + switch (suffixopcode) { + case 41:/* plwa */ + op->type = MKOP(LOAD, PREFIXED | SIGNEXT, 4); + break; + case 42:/* plxsd */ + op->reg = rd + 32; + op->type = MKOP(LOAD_VSX, PREFIXED, 8); + op->element_size = 8; + op->vsx_flags = VSX_CHECK_VEC; + break; + case 43:/* plxssp */ + op->reg = rd + 32; + op->type = MKOP(LOAD_VSX, PREFIXED, 4); + op->element_size = 8; + op->vsx_flags = VSX_FPCONV | VS
[PATCH v3 05/14] powerpc sstep: Add support for prefixed fixed-point arithmetic
This adds emulation support for the following prefixed Fixed-Point Arithmetic instructions: * Prefixed Add Immediate (paddi) Signed-off-by: Jordan Niethe --- v3: Since we moved the prefixed loads/stores into the load/store switch statement it no longer makes sense to have paddi in there, so move it out. --- arch/powerpc/lib/sstep.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 8e4ec953e279..f2010a3e1e06 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1331,6 +1331,26 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, switch (opcode) { #ifdef __powerpc64__ + case 1: + prefix_r = instr & (1ul << 20); + ra = (suffix >> 16) & 0x1f; + rd = (suffix >> 21) & 0x1f; + op->reg = rd; + op->val = regs->gpr[rd]; + suffixopcode = suffix >> 26; + prefixtype = (instr >> 24) & 0x3; + switch (prefixtype) { + case 2: + if (prefix_r && ra) + return 0; + switch (suffixopcode) { + case 14:/* paddi */ + op->type = COMPUTE | PREFIXED; + op->val = mlsd_8lsd_ea(instr, suffix, regs); + goto compute_done; + } + } + break; case 2: /* tdi */ if (rd & trap_compare(regs->gpr[ra], (short) instr)) goto trap; -- 2.17.1
[PATCH v3 06/14] powerpc: Support prefixed instructions in alignment handler
Alignment interrupts can be caused by prefixed instructions accessing memory. In the alignment handler the instruction that caused the exception is loaded and attempted emulate. If the instruction is a prefixed instruction load the prefix and suffix to emulate. After emulating increment the NIP by 8. Prefixed instructions are not permitted to cross 64-byte boundaries. If they do the alignment interrupt is invoked with SRR1 BOUNDARY bit set. If this occurs send a SIGBUS to the offending process if in user mode. If in kernel mode call bad_page_fault(). Signed-off-by: Jordan Niethe --- v2: - Move __get_user_instr() and __get_user_instr_inatomic() to this commit (previously in "powerpc sstep: Prepare to support prefixed instructions"). - Rename sufx to suffix - Use a macro for calculating instruction length v3: Move __get_user_{instr(), instr_inatomic()} up with the other get_user definitions and remove nested if. --- arch/powerpc/include/asm/uaccess.h | 25 + arch/powerpc/kernel/align.c| 8 +--- arch/powerpc/kernel/traps.c| 21 - 3 files changed, 50 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 2f500debae21..8903a96cbb4b 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -105,6 +105,31 @@ static inline int __access_ok(unsigned long addr, unsigned long size, #define __put_user_inatomic(x, ptr) \ __put_user_nosleep((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr))) +/* + * When reading an instruction iff it is a prefix, the suffix needs to be also + * loaded. + */ +#define __get_user_instr(x, y, ptr)\ +({ \ + long __gui_ret = 0; \ + y = 0; \ + __gui_ret = __get_user(x, ptr); \ + if (!__gui_ret && IS_PREFIX(x)) \ + __gui_ret = __get_user(y, ptr + 1); \ + __gui_ret; \ +}) + +#define __get_user_instr_inatomic(x, y, ptr) \ +({ \ + long __gui_ret = 0; \ + y = 0; \ + __gui_ret = __get_user_inatomic(x, ptr);\ + if (!__gui_ret && IS_PREFIX(x)) \ + __gui_ret = __get_user_inatomic(y, ptr + 1);\ + __gui_ret; \ +}) + + extern long __put_user_bad(void); /* diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index ba3bf5c3ab62..4984cf681215 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -293,7 +293,7 @@ static int emulate_spe(struct pt_regs *regs, unsigned int reg, int fix_alignment(struct pt_regs *regs) { - unsigned int instr; + unsigned int instr, suffix; struct instruction_op op; int r, type; @@ -303,13 +303,15 @@ int fix_alignment(struct pt_regs *regs) */ CHECK_FULL_REGS(regs); - if (unlikely(__get_user(instr, (unsigned int __user *)regs->nip))) + if (unlikely(__get_user_instr(instr, suffix, + (unsigned int __user *)regs->nip))) return -EFAULT; if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE)) { /* We don't handle PPC little-endian any more... */ if (cpu_has_feature(CPU_FTR_PPC_LE)) return -EIO; instr = swab32(instr); + suffix = swab32(suffix); } #ifdef CONFIG_SPE @@ -334,7 +336,7 @@ int fix_alignment(struct pt_regs *regs) if ((instr & 0xfc0006fe) == (PPC_INST_COPY & 0xfc0006fe)) return -EIO; - r = analyse_instr(&op, regs, instr, PPC_NO_SUFFIX); + r = analyse_instr(&op, regs, instr, suffix); if (r < 0) return -EINVAL; diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 82a3438300fd..d80b82fc1ae3 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -583,6 +583,10 @@ static inline int check_io_access(struct pt_regs *regs) #define REASON_ILLEGAL (ESR_PIL | ESR_PUO) #define REASON_PRIVILEGED ESR_PPR #define REASON_TRAPESR_PTR +#define REASON_PREFIXED0 +#define REASON_BOUNDARY0 + +#define inst_length(reason)4 /* single-step stuff */ #define single_stepping(regs) (current->thread.debug.dbcr0 & DBCR0_IC) @@ -597,6 +601,10 @@ static inline int check_io_access(struct pt_regs *regs) #define REASON_ILLEGAL SRR1_PROGILL #define REASON_PRIVILEGED SRR1_PROGPRIV #define REASON_TRAPSRR1_PROGTRAP +#define REASON_PREFIXEDSRR1_PREFIXED +#define REASON_BOUN
[PATCH v3 07/14] powerpc/traps: Check for prefixed instructions in facility_unavailable_exception()
If prefixed instructions are made unavailable by the [H]FSCR, attempting to use them will cause a facility unavailable exception. Add "PREFIX" to the facility_strings[]. Currently there are no prefixed instructions that are actually emulated by emulate_instruction() within facility_unavailable_exception(). However, when caused by a prefixed instructions the SRR1 PREFIXED bit is set. Prepare for dealing with emulated prefixed instructions by checking for this bit. Signed-off-by: Jordan Niethe --- arch/powerpc/kernel/traps.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index d80b82fc1ae3..cd8b3043c268 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1739,6 +1739,7 @@ void facility_unavailable_exception(struct pt_regs *regs) [FSCR_TAR_LG] = "TAR", [FSCR_MSGP_LG] = "MSGP", [FSCR_SCV_LG] = "SCV", + [FSCR_PREFIX_LG] = "PREFIX", }; char *facility = "unknown"; u64 value; -- 2.17.1
[PATCH v3 08/14] powerpc/xmon: Remove store_inst() for patch_instruction()
For modifying instructions in xmon, patch_instruction() can serve the same role that store_inst() is performing with the advantage of not being specific to xmon. In some places patch_instruction() is already being using followed by store_inst(). In these cases just remove the store_inst(). Otherwise replace store_inst() with patch_instruction(). Signed-off-by: Jordan Niethe --- arch/powerpc/xmon/xmon.c | 13 ++--- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 897e512c6379..a673cf55641c 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -325,11 +325,6 @@ static inline void sync(void) asm volatile("sync; isync"); } -static inline void store_inst(void *p) -{ - asm volatile ("dcbst 0,%0; sync; icbi 0,%0; isync" : : "r" (p)); -} - static inline void cflush(void *p) { asm volatile ("dcbf 0,%0; icbi 0,%0" : : "r" (p)); @@ -882,8 +877,7 @@ static struct bpt *new_breakpoint(unsigned long a) for (bp = bpts; bp < &bpts[NBPTS]; ++bp) { if (!bp->enabled && atomic_read(&bp->ref_count) == 0) { bp->address = a; - bp->instr[1] = bpinstr; - store_inst(&bp->instr[1]); + patch_instruction(&bp->instr[1], bpinstr); return bp; } } @@ -913,7 +907,7 @@ static void insert_bpts(void) bp->enabled = 0; continue; } - store_inst(&bp->instr[0]); + patch_instruction(&bp->instr[0], bp->instr[0]); if (bp->enabled & BP_CIABR) continue; if (patch_instruction((unsigned int *)bp->address, @@ -923,7 +917,6 @@ static void insert_bpts(void) bp->enabled &= ~BP_TRAP; continue; } - store_inst((void *)bp->address); } } @@ -958,8 +951,6 @@ static void remove_bpts(void) (unsigned int *)bp->address, bp->instr[0]) != 0) printf("Couldn't remove breakpoint at %lx\n", bp->address); - else - store_inst((void *)bp->address); } } -- 2.17.1
[PATCH v3 09/14] powerpc/xmon: Add initial support for prefixed instructions
A prefixed instruction is composed of a word prefix and a word suffix. It does not make sense to be able to have a breakpoint on the suffix of a prefixed instruction, so make this impossible. When leaving xmon_core() we check to see if we are currently at a breakpoint. If this is the case, the breakpoint needs to be proceeded from. Initially emulate_step() is tried, but if this fails then we need to execute the saved instruction out of line. The NIP is set to the address of bpt::instr[] for the current breakpoint. bpt::instr[] contains the instruction replaced by the breakpoint, followed by a trap instruction. After bpt::instr[0] is executed and we hit the trap we enter back into xmon_bpt(). We know that if we got here and the offset indicates we are at bpt::instr[1] then we have just executed out of line so we can put the NIP back to the instruction after the breakpoint location and continue on. Adding prefixed instructions complicates this as the bpt::instr[1] needs to be used to hold the suffix. To deal with this make bpt::instr[] big enough for three word instructions. bpt::instr[2] contains the trap, and in the case of word instructions pad bpt::instr[1] with a noop. No support for disassembling prefixed instructions. Signed-off-by: Jordan Niethe --- v2: Rename sufx to suffix v3: - Just directly use PPC_INST_NOP - Typo: plac -> place - Rename read_inst() to mread_inst(). Do not have it call mread(). --- arch/powerpc/xmon/xmon.c | 90 ++-- 1 file changed, 78 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a673cf55641c..a73a35aa4a75 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -97,7 +97,8 @@ static long *xmon_fault_jmp[NR_CPUS]; /* Breakpoint stuff */ struct bpt { unsigned long address; - unsigned intinstr[2]; + /* Prefixed instructions can not cross 64-byte boundaries */ + unsigned intinstr[3] __aligned(64); atomic_tref_count; int enabled; unsigned long pad; @@ -120,6 +121,7 @@ static unsigned bpinstr = 0x7fe8; /* trap */ static int cmds(struct pt_regs *); static int mread(unsigned long, void *, int); static int mwrite(unsigned long, void *, int); +static int mread_instr(unsigned long, unsigned int *, unsigned int *); static int handle_fault(struct pt_regs *); static void byterev(unsigned char *, int); static void memex(void); @@ -701,7 +703,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi) bp = at_breakpoint(regs->nip); if (bp != NULL) { int stepped = emulate_step(regs, bp->instr[0], - PPC_NO_SUFFIX); + bp->instr[1]); if (stepped == 0) { regs->nip = (unsigned long) &bp->instr[0]; atomic_inc(&bp->ref_count); @@ -756,8 +758,8 @@ static int xmon_bpt(struct pt_regs *regs) /* Are we at the trap at bp->instr[1] for some bp? */ bp = in_breakpoint_table(regs->nip, &offset); - if (bp != NULL && offset == 4) { - regs->nip = bp->address + 4; + if (bp != NULL && (offset == 4 || offset == 8)) { + regs->nip = bp->address + offset; atomic_dec(&bp->ref_count); return 1; } @@ -858,8 +860,9 @@ static struct bpt *in_breakpoint_table(unsigned long nip, unsigned long *offp) if (off >= sizeof(bpts)) return NULL; off %= sizeof(struct bpt); - if (off != offsetof(struct bpt, instr[0]) - && off != offsetof(struct bpt, instr[1])) + if (off != offsetof(struct bpt, instr[0]) && + off != offsetof(struct bpt, instr[1]) && + off != offsetof(struct bpt, instr[2])) return NULL; *offp = off - offsetof(struct bpt, instr[0]); return (struct bpt *) (nip - off); @@ -876,8 +879,16 @@ static struct bpt *new_breakpoint(unsigned long a) for (bp = bpts; bp < &bpts[NBPTS]; ++bp) { if (!bp->enabled && atomic_read(&bp->ref_count) == 0) { + /* +* Prefixed instructions are two words, but regular +* instructions are only one. Use a nop to pad out the +* regular instructions so that we can place the trap +* at the same place. For prefixed instructions the nop +* will get overwritten during insert_bpts(). +*/ bp->address = a; - patch_instruction(&bp->instr[1], bpinstr); + patch_instruction(&bp->instr[1], PPC_INST_NOP); + patch_instruction(&bp->instr[2], bpinstr); return bp
[PATCH v3 10/14] powerpc/xmon: Dump prefixed instructions
Currently when xmon is dumping instructions it reads a word at a time and then prints that instruction (either as a hex number or by disassembling it). For prefixed instructions it would be nice to show its prefix and suffix as together. Use read_instr() so that if a prefix is encountered its suffix is loaded too. Then print these in the form: prefix:suffix Xmon uses the disassembly routines from GNU binutils. These currently do not support prefixed instructions so we will not disassemble the prefixed instructions yet. Signed-off-by: Jordan Niethe --- v2: Rename sufx to suffix v3: Simplify generic_inst_dump() --- arch/powerpc/xmon/xmon.c | 38 ++ 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a73a35aa4a75..bf304189e33a 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2900,6 +2900,21 @@ prdump(unsigned long adrs, long ndump) } } +static bool instrs_are_equal(unsigned long insta, unsigned long suffixa, +unsigned long instb, unsigned long suffixb) +{ + if (insta != instb) + return false; + + if (!IS_PREFIX(insta) && !IS_PREFIX(instb)) + return true; + + if (IS_PREFIX(insta) && IS_PREFIX(instb)) + return suffixa == suffixb; + + return false; +} + typedef int (*instruction_dump_func)(unsigned long inst, unsigned long addr); static int @@ -2908,12 +2923,11 @@ generic_inst_dump(unsigned long adr, long count, int praddr, { int nr, dotted; unsigned long first_adr; - unsigned int inst, last_inst = 0; - unsigned char val[4]; + unsigned int inst, suffix, last_inst = 0, last_suffix = 0; dotted = 0; - for (first_adr = adr; count > 0; --count, adr += 4) { - nr = mread(adr, val, 4); + for (first_adr = adr; count > 0; --count, adr += nr) { + nr = mread_instr(adr, &inst, &suffix); if (nr == 0) { if (praddr) { const char *x = fault_chars[fault_type]; @@ -2921,8 +2935,9 @@ generic_inst_dump(unsigned long adr, long count, int praddr, } break; } - inst = GETWORD(val); - if (adr > first_adr && inst == last_inst) { + if (adr > first_adr && instrs_are_equal(inst, suffix, + last_inst, + last_suffix)) { if (!dotted) { printf(" ...\n"); dotted = 1; @@ -2931,10 +2946,17 @@ generic_inst_dump(unsigned long adr, long count, int praddr, } dotted = 0; last_inst = inst; - if (praddr) + last_suffix = suffix; + if (praddr) { printf(REG" %.8x", adr, inst); + if (IS_PREFIX(inst)) + printf(":%.8x", suffix); + } printf("\t"); - dump_func(inst, adr); + if (IS_PREFIX(inst)) + printf("%.8x:%.8x", inst, suffix); + else + dump_func(inst, adr); printf("\n"); } return adr - first_adr; -- 2.17.1
[PATCH v3 11/14] powerpc/kprobes: Support kprobes on prefixed instructions
A prefixed instruction is composed of a word prefix followed by a word suffix. It does not make sense to be able to have a kprobe on the suffix of a prefixed instruction, so make this impossible. Kprobes work by replacing an instruction with a trap and saving that instruction to be single stepped out of place later. Currently there is not enough space allocated to keep a prefixed instruction for single stepping. Increase the amount of space allocated for holding the instruction copy. kprobe_post_handler() expects all instructions to be 4 bytes long which means that it does not function correctly for prefixed instructions. Add checks for prefixed instructions which will use a length of 8 bytes instead. For optprobes we normally patch in loading the instruction we put a probe on into r4 before calling emulate_step(). We now make space and patch in loading the suffix into r5 as well. Signed-off-by: Jordan Niethe --- v3: - Base on top of https://patchwork.ozlabs.org/patch/1232619/ - Change printing format to %x:%x --- arch/powerpc/include/asm/kprobes.h | 5 ++-- arch/powerpc/kernel/kprobes.c| 43 +--- arch/powerpc/kernel/optprobes.c | 32 - arch/powerpc/kernel/optprobes_head.S | 6 4 files changed, 60 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/kprobes.h b/arch/powerpc/include/asm/kprobes.h index 66b3f2983b22..0d44ce8a3163 100644 --- a/arch/powerpc/include/asm/kprobes.h +++ b/arch/powerpc/include/asm/kprobes.h @@ -38,12 +38,13 @@ extern kprobe_opcode_t optprobe_template_entry[]; extern kprobe_opcode_t optprobe_template_op_address[]; extern kprobe_opcode_t optprobe_template_call_handler[]; extern kprobe_opcode_t optprobe_template_insn[]; +extern kprobe_opcode_t optprobe_template_suffix[]; extern kprobe_opcode_t optprobe_template_call_emulate[]; extern kprobe_opcode_t optprobe_template_ret[]; extern kprobe_opcode_t optprobe_template_end[]; -/* Fixed instruction size for powerpc */ -#define MAX_INSN_SIZE 1 +/* Prefixed instructions are two words */ +#define MAX_INSN_SIZE 2 #define MAX_OPTIMIZED_LENGTH sizeof(kprobe_opcode_t) /* 4 bytes */ #define MAX_OPTINSN_SIZE (optprobe_template_end - optprobe_template_entry) #define RELATIVEJUMP_SIZE sizeof(kprobe_opcode_t) /* 4 bytes */ diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index 6b2e9e37f12b..9ccf1b9a1275 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -117,16 +117,28 @@ void *alloc_insn_page(void) int arch_prepare_kprobe(struct kprobe *p) { int ret = 0; + struct kprobe *prev; kprobe_opcode_t insn = *p->addr; + kprobe_opcode_t prefix = *(p->addr - 1); + preempt_disable(); if ((unsigned long)p->addr & 0x03) { printk("Attempt to register kprobe at an unaligned address\n"); ret = -EINVAL; } else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) { printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n"); ret = -EINVAL; + } else if (IS_PREFIX(prefix)) { + printk("Cannot register a kprobe on the second word of prefixed instruction\n"); + ret = -EINVAL; + } + prev = get_kprobe(p->addr - 1); + if (prev && IS_PREFIX(*prev->ainsn.insn)) { + printk("Cannot register a kprobe on the second word of prefixed instruction\n"); + ret = -EINVAL; } + /* insn must be on a special executable page on ppc64. This is * not explicitly required on ppc32 (right now), but it doesn't hurt */ if (!ret) { @@ -136,11 +148,14 @@ int arch_prepare_kprobe(struct kprobe *p) } if (!ret) { - patch_instruction(p->ainsn.insn, *p->addr); + patch_instruction(&p->ainsn.insn[0], p->addr[0]); + if (IS_PREFIX(insn)) + patch_instruction(&p->ainsn.insn[1], p->addr[1]); p->opcode = *p->addr; } p->ainsn.boostable = 0; + preempt_enable_no_resched(); return ret; } NOKPROBE_SYMBOL(arch_prepare_kprobe); @@ -225,10 +240,11 @@ NOKPROBE_SYMBOL(arch_prepare_kretprobe); static int try_to_emulate(struct kprobe *p, struct pt_regs *regs) { int ret; - unsigned int insn = *p->ainsn.insn; + unsigned int insn = p->ainsn.insn[0]; + unsigned int suffix = p->ainsn.insn[1]; /* regs->nip is also adjusted if emulate_step returns 1 */ - ret = emulate_step(regs, insn, PPC_NO_SUFFIX); + ret = emulate_step(regs, insn, suffix); if (ret > 0) { /* * Once this instruction has been boosted @@ -242,7 +258,11 @@ static int try_to_emulate(struct kprobe *p, struct pt_regs *regs) * So, we should never get here... but, its still * good to catch them, just
[PATCH v3 12/14] powerpc/uprobes: Add support for prefixed instructions
Uprobes can execute instructions out of line. Increase the size of the buffer used for this so that this works for prefixed instructions. Take into account the length of prefixed instructions when fixing up the nip. Signed-off-by: Jordan Niethe --- v2: - Fix typo - Use macro for instruction length --- arch/powerpc/include/asm/uprobes.h | 16 arch/powerpc/kernel/uprobes.c | 4 ++-- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/uprobes.h b/arch/powerpc/include/asm/uprobes.h index 2bbdf27d09b5..5516ab27db47 100644 --- a/arch/powerpc/include/asm/uprobes.h +++ b/arch/powerpc/include/asm/uprobes.h @@ -14,18 +14,26 @@ typedef ppc_opcode_t uprobe_opcode_t; +/* + * Ensure we have enough space for prefixed instructions, which + * are double the size of a word instruction, i.e. 8 bytes. + */ #define MAX_UINSN_BYTES4 -#define UPROBE_XOL_SLOT_BYTES (MAX_UINSN_BYTES) +#define UPROBE_XOL_SLOT_BYTES (2 * MAX_UINSN_BYTES) /* The following alias is needed for reference from arch-agnostic code */ #define UPROBE_SWBP_INSN BREAKPOINT_INSTRUCTION #define UPROBE_SWBP_INSN_SIZE 4 /* swbp insn size in bytes */ struct arch_uprobe { +/* + * Ensure there is enough space for prefixed instructions. Prefixed + * instructions must not cross 64-byte boundaries. + */ union { - u32 insn; - u32 ixol; - }; + uprobe_opcode_t insn[2]; + uprobe_opcode_t ixol[2]; + } __aligned(64); }; struct arch_uprobe_task { diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index 4ab40c4b576f..7e0334ad5cfe 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -111,7 +111,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) * support doesn't exist and have to fix-up the next instruction * to be executed. */ - regs->nip = utask->vaddr + MAX_UINSN_BYTES; + regs->nip = utask->vaddr + PPC_INST_LENGTH(auprobe->insn[0]); user_disable_single_step(current); return 0; @@ -173,7 +173,7 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs) * emulate_step() returns 1 if the insn was successfully emulated. * For all other cases, we need to single-step in hardware. */ - ret = emulate_step(regs, auprobe->insn, PPC_NO_SUFFIX); + ret = emulate_step(regs, auprobe->insn[0], auprobe->insn[1]); if (ret > 0) return true; -- 2.17.1
[PATCH v3 13/14] powerpc/hw_breakpoints: Initial support for prefixed instructions
Currently when getting an instruction to emulate in hw_breakpoint_handler() we do not load the suffix of a prefixed instruction. Ensure we load the suffix if the instruction we need to emulate is a prefixed instruction. Signed-off-by: Jordan Niethe --- v2: Rename sufx to suffix v3: Add __user to type cast to remove sparse warning --- arch/powerpc/kernel/hw_breakpoint.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 3a7ec6760dab..edf46356dfb2 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -243,15 +243,16 @@ dar_range_overlaps(unsigned long dar, int size, struct arch_hw_breakpoint *info) static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, struct arch_hw_breakpoint *info) { - unsigned int instr = 0; + unsigned int instr = 0, suffix = 0; int ret, type, size; struct instruction_op op; unsigned long addr = info->address; - if (__get_user_inatomic(instr, (unsigned int *)regs->nip)) + if (__get_user_instr_inatomic(instr, suffix, + (unsigned int __user *)regs->nip)) goto fail; - ret = analyse_instr(&op, regs, instr, PPC_NO_SUFFIX); + ret = analyse_instr(&op, regs, instr, suffix); type = GETTYPE(op.type); size = GETSIZE(op.type); @@ -275,7 +276,7 @@ static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp, return false; } - if (!emulate_step(regs, instr, PPC_NO_SUFFIX)) + if (!emulate_step(regs, instr, suffix)) goto fail; return true; -- 2.17.1
[PATCH v3 14/14] powerpc: Add prefix support to mce_find_instr_ea_and_pfn()
mce_find_instr_ea_and_pfn analyses an instruction to determine the effective address that caused the machine check. Update this to load and pass the suffix to analyse_instr for prefixed instructions. Signed-off-by: Jordan Niethe --- v2: - Rename sufx to suffix --- arch/powerpc/kernel/mce_power.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index 824eda536f5d..091bab4a5464 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -365,7 +365,7 @@ static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr, * in real-mode is tricky and can lead to recursive * faults */ - int instr; + int instr, suffix = 0; unsigned long pfn, instr_addr; struct instruction_op op; struct pt_regs tmp = *regs; @@ -374,7 +374,9 @@ static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr, if (pfn != ULONG_MAX) { instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); instr = *(unsigned int *)(instr_addr); - if (!analyse_instr(&op, &tmp, instr, PPC_NO_SUFFIX)) { + if (IS_PREFIX(instr)) + suffix = *(unsigned int *)(instr_addr + 4); + if (!analyse_instr(&op, &tmp, instr, suffix)) { pfn = addr_to_pfn(regs, op.ea); *addr = op.ea; *phys_addr = (pfn << PAGE_SHIFT); -- 2.17.1
[PATCH] ocxl: Fix misleading comment
In ocxl_context_free() we note that the AFU reference we're releasing was taken in "ocxl_context_init", a function that doesn't actually exist. Fix it to say ocxl_context_alloc() instead, which I expect was what was intended. Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi devices") Cc: Frederic Barrat Signed-off-by: Andrew Donnellan --- drivers/misc/ocxl/context.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c index de8a66b9d76b..c21f65a5c762 100644 --- a/drivers/misc/ocxl/context.c +++ b/drivers/misc/ocxl/context.c @@ -287,7 +287,7 @@ void ocxl_context_free(struct ocxl_context *ctx) ocxl_afu_irq_free_all(ctx); idr_destroy(&ctx->irq_idr); - /* reference to the AFU taken in ocxl_context_init */ + /* reference to the AFU taken in ocxl_context_alloc() */ ocxl_afu_put(ctx->afu); kfree(ctx); } -- 2.20.1
[PATCH 0/3] mm/vma: some more minor changes
The motivation here is to consolidate VMA flags and helpers in generic memory header and reduce code duplication when ever applicable. If there are other possible similar instances which might be missing here, please do let me me know. I will be happy to incorporate them. This series is based on v5.6-rc3. This series has been build tested on multiple platforms but boot tested only on arm64 and x86. Cc: Paul Mackerras Cc: Michael Ellerman Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Andrew Morton Cc: x...@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org Cc: linux...@kvack.org Anshuman Khandual (3): mm/vma: Move VM_NO_KHUGEPAGED into generic header mm/vma: Make vma_is_foreign() available for general use mm/vma: Make is_vma_temporary_stack() available for general use arch/powerpc/mm/book3s64/pkeys.c | 12 arch/x86/include/asm/mmu_context.h | 15 --- include/linux/huge_mm.h| 2 -- include/linux/mm.h | 28 +++- mm/khugepaged.c| 2 -- mm/rmap.c | 14 -- 6 files changed, 27 insertions(+), 46 deletions(-) -- 2.20.1
[PATCH 2/3] mm/vma: Make vma_is_foreign() available for general use
Idea of a foreign VMA with respect to the present context is very generic. But currently there are two identical definitions for this in powerpc and x86 platforms. Lets consolidate those redundant definitions while making vma_is_foreign() available for general use later. This should not cause any functional change. Cc: Paul Mackerras Cc: Michael Ellerman Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Andrew Morton Cc: x...@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org Cc: linux...@kvack.org Signed-off-by: Anshuman Khandual --- arch/powerpc/mm/book3s64/pkeys.c | 12 arch/x86/include/asm/mmu_context.h | 15 --- include/linux/mm.h | 11 +++ 3 files changed, 11 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c index 59e0ebbd8036..07527f1ed108 100644 --- a/arch/powerpc/mm/book3s64/pkeys.c +++ b/arch/powerpc/mm/book3s64/pkeys.c @@ -381,18 +381,6 @@ bool arch_pte_access_permitted(u64 pte, bool write, bool execute) * So do not enforce things if the VMA is not from the current mm, or if we are * in a kernel thread. */ -static inline bool vma_is_foreign(struct vm_area_struct *vma) -{ - if (!current->mm) - return true; - - /* if it is not our ->mm, it has to be foreign */ - if (current->mm != vma->vm_mm) - return true; - - return false; -} - bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write, bool execute, bool foreign) { diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index b538d9ddee9c..4e55370e48e8 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -213,21 +213,6 @@ static inline void arch_unmap(struct mm_struct *mm, unsigned long start, * So do not enforce things if the VMA is not from the current * mm, or if we are in a kernel thread. */ -static inline bool vma_is_foreign(struct vm_area_struct *vma) -{ - if (!current->mm) - return true; - /* -* Should PKRU be enforced on the access to this VMA? If -* the VMA is from another process, then PKRU has no -* relevance and should not be enforced. -*/ - if (current->mm != vma->vm_mm) - return true; - - return false; -} - static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write, bool execute, bool foreign) { diff --git a/include/linux/mm.h b/include/linux/mm.h index 6f7e400e6ea3..2fd4b9bec4be 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -27,6 +27,7 @@ #include #include #include +#include struct mempolicy; struct anon_vma; @@ -542,6 +543,16 @@ static inline bool vma_is_anonymous(struct vm_area_struct *vma) return !vma->vm_ops; } +static inline bool vma_is_foreign(struct vm_area_struct *vma) +{ + if (!current->mm) + return true; + + if (current->mm != vma->vm_mm) + return true; + + return false; +} #ifdef CONFIG_SHMEM /* * The vma_is_shmem is not inline because it is used only by slow -- 2.20.1
[RFC PATCH] Use IS_ENABLED() instead of #ifdefs
--- This works for me. Only had to leave the #ifdef around the map_mem_in_cams() Also had to set linear_sz and ram for the alternative case, otherwise I get arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init': arch/powerpc/mm/nohash/kaslr_booke.c:355:33: error: 'linear_sz' may be used uninitialized in this function [-Werror=maybe-uninitialized] regions.pa_end = memstart_addr + linear_sz; ~~^~~ arch/powerpc/mm/nohash/kaslr_booke.c:315:21: note: 'linear_sz' was declared here unsigned long ram, linear_sz; ^ arch/powerpc/mm/nohash/kaslr_booke.c:187:8: error: 'ram' may be used uninitialized in this function [-Werror=maybe-uninitialized] ret = parse_crashkernel(boot_command_line, size, &crash_size, ^~~ &crash_base); arch/powerpc/mm/nohash/kaslr_booke.c:315:16: note: 'ram' was declared here unsigned long ram, linear_sz; --- arch/powerpc/mm/mmu_decl.h | 2 +- arch/powerpc/mm/nohash/kaslr_booke.c | 97 +++- 2 files changed, 52 insertions(+), 47 deletions(-) diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index b869ea893301..3700e7c04e51 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -139,9 +139,9 @@ extern unsigned long calc_cam_sz(unsigned long ram, unsigned long virt, extern void adjust_total_lowmem(void); extern int switch_to_as1(void); extern void restore_to_as0(int esel, int offset, void *dt_ptr, int bootcpu); +#endif void create_kaslr_tlb_entry(int entry, unsigned long virt, phys_addr_t phys); extern int is_second_reloc; -#endif void reloc_kernel_entry(void *fdt, long addr); extern void loadcam_entry(unsigned int index); diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c b/arch/powerpc/mm/nohash/kaslr_booke.c index c6f5c1db1394..bf69cece9b8c 100644 --- a/arch/powerpc/mm/nohash/kaslr_booke.c +++ b/arch/powerpc/mm/nohash/kaslr_booke.c @@ -267,35 +267,37 @@ static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned long rando unsigned long start; unsigned long offset; -#ifdef CONFIG_PPC32 - /* -* Decide which 64M we want to start -* Only use the low 8 bits of the random seed -*/ - unsigned long index = random & 0xFF; - index %= regions.linear_sz / SZ_64M; - - /* Decide offset inside 64M */ - offset = random % (SZ_64M - regions.kernel_size); - offset = round_down(offset, SZ_16K); + if (IS_ENABLED(CONFIG_PPC32)) { + unsigned long index; + + /* +* Decide which 64M we want to start +* Only use the low 8 bits of the random seed +*/ + index = random & 0xFF; + index %= regions.linear_sz / SZ_64M; + + /* Decide offset inside 64M */ + offset = random % (SZ_64M - regions.kernel_size); + offset = round_down(offset, SZ_16K); + + while ((long)index >= 0) { + offset = memstart_addr + index * SZ_64M + offset; + start = memstart_addr + index * SZ_64M; + koffset = get_usable_address(dt_ptr, start, offset); + if (koffset) + break; + index--; + } + } else { + /* Decide kernel offset inside 1G */ + offset = random % (SZ_1G - regions.kernel_size); + offset = round_down(offset, SZ_64K); - while ((long)index >= 0) { - offset = memstart_addr + index * SZ_64M + offset; - start = memstart_addr + index * SZ_64M; + start = memstart_addr; + offset = memstart_addr + offset; koffset = get_usable_address(dt_ptr, start, offset); - if (koffset) - break; - index--; } -#else - /* Decide kernel offset inside 1G */ - offset = random % (SZ_1G - regions.kernel_size); - offset = round_down(offset, SZ_64K); - - start = memstart_addr; - offset = memstart_addr + offset; - koffset = get_usable_address(dt_ptr, start, offset); -#endif if (koffset != 0) koffset -= memstart_addr; @@ -342,6 +344,8 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size /* If the linear size is smaller than 64M, do not randmize */ if (linear_sz < SZ_64M) return 0; +#else + linear_sz = ram = size; #endif /* check for a reserved-memory node and record its cell sizes */ @@ -373,17 +377,19 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size) { unsigned long offset; unsigned long kernel_sz; + unsigned int *__kaslr_offset; + unsigned int *__
Re: [PATCH v3 10/27] powerpc: Add driver for OpenCAPI Persistent Memory
On 21/2/20 2:27 pm, Alastair D'Silva wrote: From: Alastair D'Silva This driver exposes LPC memory on OpenCAPI pmem cards as an NVDIMM, allowing the existing nvram infrastructure to be used. Namespace metadata is stored on the media itself, so scm_reserve_metadata() maps 1 section's worth of PMEM storage at the start to hold this. The rest of the PMEM range is registered with libnvdimm as an nvdimm. scm_ndctl_config_read/write/size() provide callbacks to libnvdimm to access the metadata. Signed-off-by: Alastair D'Silva I'm not particularly familiar with the nvdimm subsystem, so the scope of my review is more on the ocxl + misc issues side. A few minor checkpatch warnings that don't matter all that much: https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/11786//artifact/linux/checkpatch.log A few other comments below. diff --git a/arch/powerpc/platforms/powernv/pmem/ocxl.c b/arch/powerpc/platforms/powernv/pmem/ocxl.c new file mode 100644 index ..3c4eeb5dcc0f --- /dev/null +++ b/arch/powerpc/platforms/powernv/pmem/ocxl.c @@ -0,0 +1,473 @@ +// SPDX-License-Id +// Copyright 2019 IBM Corp. + +/* + * A driver for OpenCAPI devices that implement the Storage Class + * Memory specification. + */ + +#include +#include +#include +#include +#include +#include "ocxl_internal.h" + + +static const struct pci_device_id ocxlpmem_pci_tbl[] = { + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), }, + { } +}; + +MODULE_DEVICE_TABLE(pci, ocxlpmem_pci_tbl); + +#define NUM_MINORS 256 // Total to reserve + +static dev_t ocxlpmem_dev; +static struct class *ocxlpmem_class; +static struct mutex minors_idr_lock; +static struct idr minors_idr; + +/** + * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command from ndctl + * @ocxlpmem: the device metadata + * @command: the incoming data to write + * Return: 0 on success, negative on failure + */ +static int ndctl_config_write(struct ocxlpmem *ocxlpmem, + struct nd_cmd_set_config_hdr *command) +{ + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) + return -EINVAL; + + memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset, command->in_buf, + command->in_length); Out of scope for this patch - given that we use memcpy_mcsafe in the config read, does it make sense to change memcpy_flushcache to be mcsafe as well? + + return 0; +} + +/** + * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command from ndctl + * @ocxlpmem: the device metadata + * @command: the read request + * Return: 0 on success, negative on failure + */ +static int ndctl_config_read(struct ocxlpmem *ocxlpmem, +struct nd_cmd_get_config_data_hdr *command) +{ + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) + return -EINVAL; + + memcpy_mcsafe(command->out_buf, ocxlpmem->metadata_addr + command->in_offset, + command->in_length); + + return 0; +} + +/** + * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command from ndctl + * @command: the read request + * Return: 0 on success, negative on failure + */ +static int ndctl_config_size(struct nd_cmd_get_config_size *command) +{ + command->status = 0; + command->config_size = LABEL_AREA_SIZE; + command->max_xfer = PAGE_SIZE; + + return 0; +} + +static int ndctl(struct nvdimm_bus_descriptor *nd_desc, +struct nvdimm *nvdimm, +unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc) +{ + struct ocxlpmem *ocxlpmem = container_of(nd_desc, struct ocxlpmem, bus_desc); + + switch (cmd) { + case ND_CMD_GET_CONFIG_SIZE: + *cmd_rc = ndctl_config_size(buf); + return 0; + + case ND_CMD_GET_CONFIG_DATA: + *cmd_rc = ndctl_config_read(ocxlpmem, buf); + return 0; + + case ND_CMD_SET_CONFIG_DATA: + *cmd_rc = ndctl_config_write(ocxlpmem, buf); + return 0; + + default: + return -ENOTTY; + } +} + +/** + * reserve_metadata() - Reserve space for nvdimm metadata + * @ocxlpmem: the device metadata + * @lpc_mem: The resource representing the LPC memory of the OpenCAPI device + */ +static int reserve_metadata(struct ocxlpmem *ocxlpmem, + struct resource *lpc_mem) +{ + ocxlpmem->metadata_addr = devm_memremap(&ocxlpmem->dev, lpc_mem->start, + LABEL_AREA_SIZE, MEMREMAP_WB); + if (IS_ERR(ocxlpmem->metadata_addr)) + return PTR_ERR(ocxlpmem->metadata_addr); + + return 0; +} + +/** + * register_lpc_mem() - Discover persistent memory on a device and register it with the NVDIMM subsystem + * @ocxlpmem: the device metadata + * Return: 0 on success + */ +static int register_lpc_mem(struct ocxlpmem *ocxlpmem) +{ + struct nd_region_de
Re: [PATCH v3 3/6] powerpc/fsl_booke/64: implement KASLR for fsl_booke64
Le 26/02/2020 à 03:40, Jason Yan a écrit : 在 2020/2/20 21:48, Christophe Leroy 写道: Le 06/02/2020 à 03:58, Jason Yan a écrit : /* * Decide which 64M we want to start * Only use the low 8 bits of the random seed */ - index = random & 0xFF; + unsigned long index = random & 0xFF; That's not good in terms of readability, index declaration should remain at the top of the function, should be possible if using IS_ENABLED() instead I'm wondering how to declare a variable inside a code block such as if (IS_ENABLED(CONFIG_PPC32)) at the top of the function and use the variable in another if (IS_ENABLED(CONFIG_PPC32)). Is there any good idea? You declare it outside the block as usual: unsigned long some_var; if (condition) { some_var = something; } do_many_things(); do_other_things(); if (condition) return some_var; else return 0; Christophe
Re: [PATCH v3 3/6] powerpc/fsl_booke/64: implement KASLR for fsl_booke64
Le 26/02/2020 à 04:33, Jason Yan a écrit : 在 2020/2/26 10:40, Jason Yan 写道: 在 2020/2/20 21:48, Christophe Leroy 写道: Le 06/02/2020 à 03:58, Jason Yan a écrit : Hi Christophe, When using a standard C if/else, all code compiled for PPC32 and PPC64, but this will bring some build error because not all variables both defined for PPC32 and PPC64. [yanaijie@138 linux]$ sh ppc64build.sh CALL scripts/atomic/check-atomics.sh CALL scripts/checksyscalls.sh CHK include/generated/compile.h CC arch/powerpc/mm/nohash/kaslr_booke.o arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_choose_location': arch/powerpc/mm/nohash/kaslr_booke.c:341:30: error: 'CONFIG_LOWMEM_CAM_NUM' undeclared (first use in this function); did you mean 'CONFIG_FLATMEM_MANUAL'? ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true); ^ CONFIG_FLATMEM_MANUAL This one has to remain inside an #ifdef. That's the only one that has to remain. arch/powerpc/mm/nohash/kaslr_booke.c:341:30: note: each undeclared identifier is reported only once for each function it appears in arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init': arch/powerpc/mm/nohash/kaslr_booke.c:404:3: error: 'is_second_reloc' In mmu_decl.h, put the declaration outside the #ifdef CONFIG_PPC32 undeclared (first use in this function); did you mean '__cond_lock'? is_second_reloc = 1; ^~~ __cond_lock arch/powerpc/mm/nohash/kaslr_booke.c:411:4: error: implicit declaration of function 'create_kaslr_tlb_entry'; did you mean 'reloc_kernel_entry'? Same, put the declaration outside of the #ifdef [-Werror=implicit-function-declaration] create_kaslr_tlb_entry(1, tlb_virt, tlb_phys); ^~ reloc_kernel_entry cc1: all warnings being treated as errors make[3]: *** [scripts/Makefile.build:268: arch/powerpc/mm/nohash/kaslr_booke.o] Error 1 make[2]: *** [scripts/Makefile.build:505: arch/powerpc/mm/nohash] Error 2 make[1]: *** [scripts/Makefile.build:505: arch/powerpc/mm] Error 2 make: *** [Makefile:1681: arch/powerpc] Error 2 See the patch I sent you. It builds ok for me. Christophe
Re: [PATCH v3 10/27] powerpc: Add driver for OpenCAPI Persistent Memory
On Wed, 2020-02-26 at 16:07 +1100, Andrew Donnellan wrote: > On 21/2/20 2:27 pm, Alastair D'Silva wrote: > > From: Alastair D'Silva > > > > This driver exposes LPC memory on OpenCAPI pmem cards > > as an NVDIMM, allowing the existing nvram infrastructure > > to be used. > > > > Namespace metadata is stored on the media itself, so > > scm_reserve_metadata() maps 1 section's worth of PMEM storage > > at the start to hold this. The rest of the PMEM range is registered > > with libnvdimm as an nvdimm. scm_ndctl_config_read/write/size() > > provide > > callbacks to libnvdimm to access the metadata. > > > > Signed-off-by: Alastair D'Silva > > I'm not particularly familiar with the nvdimm subsystem, so the scope > of > my review is more on the ocxl + misc issues side. > > A few minor checkpatch warnings that don't matter all that much: > > https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/11786//artifact/linux/checkpatch.log > > A few other comments below. > > > diff --git a/arch/powerpc/platforms/powernv/pmem/ocxl.c > > b/arch/powerpc/platforms/powernv/pmem/ocxl.c > > new file mode 100644 > > index ..3c4eeb5dcc0f > > --- /dev/null > > +++ b/arch/powerpc/platforms/powernv/pmem/ocxl.c > > @@ -0,0 +1,473 @@ > > +// SPDX-License-Id > > +// Copyright 2019 IBM Corp. > > + > > +/* > > + * A driver for OpenCAPI devices that implement the Storage Class > > + * Memory specification. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include "ocxl_internal.h" > > + > > + > > +static const struct pci_device_id ocxlpmem_pci_tbl[] = { > > + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), }, > > + { } > > +}; > > + > > +MODULE_DEVICE_TABLE(pci, ocxlpmem_pci_tbl); > > + > > +#define NUM_MINORS 256 // Total to reserve > > + > > +static dev_t ocxlpmem_dev; > > +static struct class *ocxlpmem_class; > > +static struct mutex minors_idr_lock; > > +static struct idr minors_idr; > > + > > +/** > > + * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command > > from ndctl > > + * @ocxlpmem: the device metadata > > + * @command: the incoming data to write > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_write(struct ocxlpmem *ocxlpmem, > > + struct nd_cmd_set_config_hdr *command) > > +{ > > + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) > > + return -EINVAL; > > + > > + memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset, > > command->in_buf, > > + command->in_length); > > Out of scope for this patch - given that we use memcpy_mcsafe in the > config read, does it make sense to change memcpy_flushcache to be > mcsafe > as well? > Aneesh has confirmed that stores don't generate machine checks. > > + > > + return 0; > > +} > > + > > +/** > > + * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command > > from ndctl > > + * @ocxlpmem: the device metadata > > + * @command: the read request > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_read(struct ocxlpmem *ocxlpmem, > > +struct nd_cmd_get_config_data_hdr > > *command) > > +{ > > + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) > > + return -EINVAL; > > + > > + memcpy_mcsafe(command->out_buf, ocxlpmem->metadata_addr + > > command->in_offset, > > + command->in_length); > > + > > + return 0; > > +} > > + > > +/** > > + * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command > > from ndctl > > + * @command: the read request > > + * Return: 0 on success, negative on failure > > + */ > > +static int ndctl_config_size(struct nd_cmd_get_config_size > > *command) > > +{ > > + command->status = 0; > > + command->config_size = LABEL_AREA_SIZE; > > + command->max_xfer = PAGE_SIZE; > > + > > + return 0; > > +} > > + > > +static int ndctl(struct nvdimm_bus_descriptor *nd_desc, > > +struct nvdimm *nvdimm, > > +unsigned int cmd, void *buf, unsigned int buf_len, int > > *cmd_rc) > > +{ > > + struct ocxlpmem *ocxlpmem = container_of(nd_desc, struct > > ocxlpmem, bus_desc); > > + > > + switch (cmd) { > > + case ND_CMD_GET_CONFIG_SIZE: > > + *cmd_rc = ndctl_config_size(buf); > > + return 0; > > + > > + case ND_CMD_GET_CONFIG_DATA: > > + *cmd_rc = ndctl_config_read(ocxlpmem, buf); > > + return 0; > > + > > + case ND_CMD_SET_CONFIG_DATA: > > + *cmd_rc = ndctl_config_write(ocxlpmem, buf); > > + return 0; > > + > > + default: > > + return -ENOTTY; > > + } > > +} > > + > > +/** > > + * reserve_metadata() - Reserve space for nvdimm metadata > > + * @ocxlpmem: the device metadata > > + * @lpc_mem: The resource representing the LPC memory of the > > OpenCAPI device > > + */ > > +static int reserve_metadata(struct ocxlpmem *ocxlpmem, > > +
[PATCH] powerpc: fix emulate_step std test
Signed-off-by: Nicholas Piggin --- arch/powerpc/lib/test_emulate_step.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c index 42347067739c..00d70253cb5b 100644 --- a/arch/powerpc/lib/test_emulate_step.c +++ b/arch/powerpc/lib/test_emulate_step.c @@ -160,7 +160,7 @@ static void __init test_std(void) /* std r5, 0(r3) */ stepped = emulate_step(®s, TEST_STD(5, 3, 0)); - if (stepped == 1 || regs.gpr[5] == a) + if (stepped == 1 && regs.gpr[5] == a) show_result("std", "PASS"); else show_result("std", "FAIL"); -- 2.23.0
[PATCH v4 2/8] powerpc/kprobes: Mark newly allocated probes as RO
From: Christophe Leroy With CONFIG_STRICT_KERNEL_RWX=y and CONFIG_KPROBES=y, there will be one W+X page at boot by default. This can be tested with CONFIG_PPC_PTDUMP=y and CONFIG_PPC_DEBUG_WX=y set, and checking the kernel log during boot. powerpc doesn't implement its own alloc() for kprobes like other architectures do, but we couldn't immediately mark RO anyway since we do a memcpy to the page we allocate later. After that, nothing should be allowed to modify the page, and write permissions are removed well before the kprobe is armed. The memcpy() would fail if >1 probes were allocated, so use patch_instruction() instead which is safe for RO. Reviewed-by: Daniel Axtens Signed-off-by: Russell Currey Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/kprobes.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index 2d27ec4feee4..bfab91ded234 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -24,6 +24,8 @@ #include #include #include +#include +#include DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL; DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); @@ -102,6 +104,16 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset) return addr; } +void *alloc_insn_page(void) +{ + void *page = vmalloc_exec(PAGE_SIZE); + + if (page) + set_memory_ro((unsigned long)page, 1); + + return page; +} + int arch_prepare_kprobe(struct kprobe *p) { int ret = 0; @@ -124,11 +136,8 @@ int arch_prepare_kprobe(struct kprobe *p) } if (!ret) { - memcpy(p->ainsn.insn, p->addr, - MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + patch_instruction(p->ainsn.insn, *p->addr); p->opcode = *p->addr; - flush_icache_range((unsigned long)p->ainsn.insn, - (unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t)); } p->ainsn.boostable = 0; -- 2.25.1
[PATCH v4 0/8] set_memory() routines and STRICT_MODULE_RWX
Picking up from Christophe's last series, including the following changes: - [6/8] Cast "data" to unsigned long instead of int to fix build - [8/8] New, to fix an issue reported by Jordan Niethe Christophe's last series is here: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=156428 Christophe Leroy (4): powerpc/mm: Implement set_memory() routines powerpc/kprobes: Mark newly allocated probes as RO powerpc/mm: implement set_memory_attr() powerpc/32: use set_memory_attr() Russell Currey (4): powerpc/mm/ptdump: debugfs handler for W+X checks at runtime powerpc: Set ARCH_HAS_STRICT_MODULE_RWX powerpc/configs: Enable STRICT_MODULE_RWX in skiroot_defconfig powerpc/mm: Disable set_memory() routines when strict RWX isn't enabled arch/powerpc/Kconfig | 2 + arch/powerpc/Kconfig.debug | 6 +- arch/powerpc/configs/skiroot_defconfig | 1 + arch/powerpc/include/asm/set_memory.h | 34 arch/powerpc/kernel/kprobes.c | 17 +++- arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/pageattr.c | 112 + arch/powerpc/mm/pgtable_32.c | 95 +++-- arch/powerpc/mm/ptdump/ptdump.c| 21 - 9 files changed, 197 insertions(+), 93 deletions(-) create mode 100644 arch/powerpc/include/asm/set_memory.h create mode 100644 arch/powerpc/mm/pageattr.c -- 2.25.1
[PATCH v4 1/8] powerpc/mm: Implement set_memory() routines
From: Christophe Leroy The set_memory_{ro/rw/nx/x}() functions are required for STRICT_MODULE_RWX, and are generally useful primitives to have. This implementation is designed to be completely generic across powerpc's many MMUs. It's possible that this could be optimised to be faster for specific MMUs, but the focus is on having a generic and safe implementation for now. This implementation does not handle cases where the caller is attempting to change the mapping of the page it is executing from, or if another CPU is concurrently using the page being altered. These cases likely shouldn't happen, but a more complex implementation with MMU-specific code could safely handle them, so that is left as a TODO for now. Signed-off-by: Russell Currey Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/set_memory.h | 32 arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/pageattr.c| 74 +++ 4 files changed, 108 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/set_memory.h create mode 100644 arch/powerpc/mm/pageattr.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 497b7d0b2d7e..bd074246e34e 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -129,6 +129,7 @@ config PPC select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64 + select ARCH_HAS_SET_MEMORY select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && !HIBERNATION) select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UACCESS_FLUSHCACHE diff --git a/arch/powerpc/include/asm/set_memory.h b/arch/powerpc/include/asm/set_memory.h new file mode 100644 index ..64011ea444b4 --- /dev/null +++ b/arch/powerpc/include/asm/set_memory.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_SET_MEMORY_H +#define _ASM_POWERPC_SET_MEMORY_H + +#define SET_MEMORY_RO 0 +#define SET_MEMORY_RW 1 +#define SET_MEMORY_NX 2 +#define SET_MEMORY_X 3 + +int change_memory_attr(unsigned long addr, int numpages, long action); + +static inline int set_memory_ro(unsigned long addr, int numpages) +{ + return change_memory_attr(addr, numpages, SET_MEMORY_RO); +} + +static inline int set_memory_rw(unsigned long addr, int numpages) +{ + return change_memory_attr(addr, numpages, SET_MEMORY_RW); +} + +static inline int set_memory_nx(unsigned long addr, int numpages) +{ + return change_memory_attr(addr, numpages, SET_MEMORY_NX); +} + +static inline int set_memory_x(unsigned long addr, int numpages) +{ + return change_memory_attr(addr, numpages, SET_MEMORY_X); +} + +#endif diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index 5e147986400d..a998fdac52f9 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -5,7 +5,7 @@ ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC) -obj-y := fault.o mem.o pgtable.o mmap.o \ +obj-y := fault.o mem.o pgtable.o mmap.o pageattr.o \ init_$(BITS).o pgtable_$(BITS).o \ pgtable-frag.o ioremap.o ioremap_$(BITS).o \ init-common.o mmu_context.o drmem.o diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c new file mode 100644 index ..2b573768a7f7 --- /dev/null +++ b/arch/powerpc/mm/pageattr.c @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * MMU-generic set_memory implementation for powerpc + * + * Copyright 2019, IBM Corporation. + */ + +#include +#include + +#include +#include +#include + + +/* + * Updates the attributes of a page in three steps: + * + * 1. invalidate the page table entry + * 2. flush the TLB + * 3. install the new entry with the updated attributes + * + * This is unsafe if the caller is attempting to change the mapping of the + * page it is executing from, or if another CPU is concurrently using the + * page being altered. + * + * TODO make the implementation resistant to this. + */ +static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) +{ + long action = (long)data; + pte_t pte; + + spin_lock(&init_mm.page_table_lock); + + /* invalidate the PTE so it's safe to modify */ + pte = ptep_get_and_clear(&init_mm, addr, ptep); + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + + /* modify the PTE bits as desired, then apply */ + switch (action) { + case SET_MEMORY_RO: + pte = pte_wrprotect(pte); + break; + case SET_MEMORY_RW: + pte = pte_mkwrite(pte); + break; + case SET_MEMORY_NX: + pte = pte_exprotect(pte); + break;
[PATCH v4 3/8] powerpc/mm/ptdump: debugfs handler for W+X checks at runtime
Very rudimentary, just echo 1 > [debugfs]/check_wx_pages and check the kernel log. Useful for testing strict module RWX. Updated the Kconfig entry to reflect this. Also fixed a typo. Signed-off-by: Russell Currey --- arch/powerpc/Kconfig.debug | 6 -- arch/powerpc/mm/ptdump/ptdump.c | 21 - 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug index 0b063830eea8..e37960ef68c6 100644 --- a/arch/powerpc/Kconfig.debug +++ b/arch/powerpc/Kconfig.debug @@ -370,7 +370,7 @@ config PPC_PTDUMP If you are unsure, say N. config PPC_DEBUG_WX - bool "Warn on W+X mappings at boot" + bool "Warn on W+X mappings at boot & enable manual checks at runtime" depends on PPC_PTDUMP && STRICT_KERNEL_RWX help Generate a warning if any W+X mappings are found at boot. @@ -384,7 +384,9 @@ config PPC_DEBUG_WX of other unfixed kernel bugs easier. There is no runtime or memory usage effect of this option - once the kernel has booted up - it's a one time check. + once the kernel has booted up, it only automatically checks once. + + Enables the "check_wx_pages" debugfs entry for checking at runtime. If in doubt, say "Y". diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c index 206156255247..a15e19a3b14e 100644 --- a/arch/powerpc/mm/ptdump/ptdump.c +++ b/arch/powerpc/mm/ptdump/ptdump.c @@ -4,7 +4,7 @@ * * This traverses the kernel pagetables and dumps the * information about the used sections of memory to - * /sys/kernel/debug/kernel_pagetables. + * /sys/kernel/debug/kernel_page_tables. * * Derived from the arm64 implementation: * Copyright (c) 2014, The Linux Foundation, Laura Abbott. @@ -413,6 +413,25 @@ void ptdump_check_wx(void) else pr_info("Checked W+X mappings: passed, no W+X pages found\n"); } + +static int check_wx_debugfs_set(void *data, u64 val) +{ + if (val != 1ULL) + return -EINVAL; + + ptdump_check_wx(); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(check_wx_fops, NULL, check_wx_debugfs_set, "%llu\n"); + +static int ptdump_check_wx_init(void) +{ + return debugfs_create_file("check_wx_pages", 0200, NULL, + NULL, &check_wx_fops) ? 0 : -ENOMEM; +} +device_initcall(ptdump_check_wx_init); #endif static int ptdump_init(void) -- 2.25.1
[PATCH v4 4/8] powerpc: Set ARCH_HAS_STRICT_MODULE_RWX
To enable strict module RWX on powerpc, set: CONFIG_STRICT_MODULE_RWX=y You should also have CONFIG_STRICT_KERNEL_RWX=y set to have any real security benefit. ARCH_HAS_STRICT_MODULE_RWX is set to require ARCH_HAS_STRICT_KERNEL_RWX. This is due to a quirk in arch/Kconfig and arch/powerpc/Kconfig that makes STRICT_MODULE_RWX *on by default* in configurations where STRICT_KERNEL_RWX is *unavailable*. Since this doesn't make much sense, and module RWX without kernel RWX doesn't make much sense, having the same dependencies as kernel RWX works around this problem. Signed-off-by: Russell Currey --- arch/powerpc/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index bd074246e34e..e1fc7fba10bf 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -131,6 +131,7 @@ config PPC select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64 select ARCH_HAS_SET_MEMORY select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && !HIBERNATION) + select ARCH_HAS_STRICT_MODULE_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UACCESS_FLUSHCACHE select ARCH_HAS_UACCESS_MCSAFE if PPC64 -- 2.25.1