Re: [PATCH v2 4/8] KVM: arm64: Generate hyp relocation data
On Fri, 29 Jan 2021 21:43:25 +, Guenter Roeck wrote: > > Hi, > > On Tue, Jan 05, 2021 at 06:05:37PM +, David Brazdil wrote: > > Add a post-processing step to compilation of KVM nVHE hyp code which > > calls a custom host tool (gen-hyprel) on the partially linked object > > file (hyp sections' names prefixed). > > > > The tool lists all R_AARCH64_ABS64 data relocations targeting hyp > > sections and generates an assembly file that will form a new section > > .hyp.reloc in the kernel binary. The new section contains an array of > > 32-bit offsets to the positions targeted by these relocations. > > > > Since these addresses of those positions will not be determined until > > linking of `vmlinux`, each 32-bit entry carries a R_AARCH64_PREL32 > > relocation with addend + . The linker of > > `vmlinux` will therefore fill the slot accordingly. > > > > This relocation data will be used at runtime to convert the kernel VAs > > at those positions to hyp VAs. > > > > Signed-off-by: David Brazdil > > This patch results in the following error for me. > > error: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: assertion elf.ehdr->e_ident[5] > == 1 failed (lhs=2, rhs=1, line=250) > > The problem is seen when trying to build aarch64 images in big endian > mode. > > Te script used to reproduce the problem as well as bisect results are > attached. I came up with the following patch, which allows the kernel to link and boot. I don't have any BE userspace, so I didn't verify that I could boot a guest (the hypervisor does correctly initialise though). It's not exactly pretty, but it does the job... Thanks, M. >From d80ca05b2ed90fc30d328041692fa80f525c8d12 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Sat, 30 Jan 2021 13:07:51 + Subject: [PATCH] KVM: arm64: Make gen-hyprel endianness agnostic gen-hyprel is, for better or worse, a native-endian program: it assumes that the ELF data structures are in the host's endianness, and even assumes that the compiled kernel is little-endian in one particular case. None of these assumptions hold true though: people actually build (use?) BE arm64 kernels, and seem to avoid doing so on BE hosts. Madness! In order to solve this, wrap each access to the ELF data structures with the required byte-swapping magic. This requires to obtain the kernel data structure, and provide per-endianess wrappers. This result in a kernel that links and even boots in a model. Fixes: 8c49b5d43d4c ("KVM: arm64: Generate hyp relocation data") Reported-by: Guenter Roeck Signed-off-by: Marc Zyngier --- arch/arm64/kvm/hyp/nvhe/Makefile | 1 + arch/arm64/kvm/hyp/nvhe/gen-hyprel.c | 57 2 files changed, 42 insertions(+), 16 deletions(-) diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile index 268be1376f74..09d04dd50eb8 100644 --- a/arch/arm64/kvm/hyp/nvhe/Makefile +++ b/arch/arm64/kvm/hyp/nvhe/Makefile @@ -7,6 +7,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__ ccflags-y := -D__KVM_NVHE_HYPERVISOR__ hostprogs := gen-hyprel +HOST_EXTRACFLAGS += -I$(srctree)/include obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \ hyp-main.o hyp-smp.o psci-relay.o diff --git a/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c b/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c index 58fe31fdba8e..ead02c6a7628 100644 --- a/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c +++ b/arch/arm64/kvm/hyp/nvhe/gen-hyprel.c @@ -25,6 +25,7 @@ */ #include +#include #include #include #include @@ -36,6 +37,8 @@ #include #include +#include + #define HYP_SECTION_PREFIX ".hyp" #define HYP_RELOC_SECTION ".hyp.reloc" #define HYP_SECTION_SYMBOL_PREFIX "__hyp_section_" @@ -121,6 +124,28 @@ static struct { const char *sh_string; } elf; +#if defined(CONFIG_CPU_LITTLE_ENDIAN) + +#define elf16toh(x)le16toh(x) +#define elf32toh(x)le32toh(x) +#define elf64toh(x)le64toh(x) + +#define ELFENDIAN ELFDATA2LSB + +#elif defined(CONFIG_CPU_BIG_ENDIAN) + +#define elf16toh(x)be16toh(x) +#define elf32toh(x)be32toh(x) +#define elf64toh(x)be64toh(x) + +#define ELFENDIAN ELFDATA2MSB + +#else + +#error PDP-endian sadly unsupported... + +#endif + #define fatal_error(fmt, ...) \ ({ \ fprintf(stderr, "error: %s: " fmt "\n", \ @@ -162,12 +187,12 @@ static struct { /* Iterate over all sections in the ELF. */ #define for_each_section(var) \ - for (var = elf.sh_table; var < elf.sh_table + elf.ehdr->e_shnum; ++var) + for (var = elf.sh_table; var &
Re: [PATCH v2 4/8] KVM: arm64: Generate hyp relocation data
Hi Guenter, Thanks a lot for the heads up. On 2021-01-29 21:43, Guenter Roeck wrote: Hi, On Tue, Jan 05, 2021 at 06:05:37PM +, David Brazdil wrote: Add a post-processing step to compilation of KVM nVHE hyp code which calls a custom host tool (gen-hyprel) on the partially linked object file (hyp sections' names prefixed). The tool lists all R_AARCH64_ABS64 data relocations targeting hyp sections and generates an assembly file that will form a new section .hyp.reloc in the kernel binary. The new section contains an array of 32-bit offsets to the positions targeted by these relocations. Since these addresses of those positions will not be determined until linking of `vmlinux`, each 32-bit entry carries a R_AARCH64_PREL32 relocation with addend + . The linker of `vmlinux` will therefore fill the slot accordingly. This relocation data will be used at runtime to convert the kernel VAs at those positions to hyp VAs. Signed-off-by: David Brazdil This patch results in the following error for me. error: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: assertion elf.ehdr->e_ident[5] == 1 failed (lhs=2, rhs=1, line=250) The problem is seen when trying to build aarch64 images in big endian mode. Ah, big-endian. of course, the ELF header is in native endianness, and the sanity checks explode (still much better than generating crap). I'll have a look shortly. It shouldn't too hard to fix, just a bit invasive... Thanks again, M. -- Jazz is not dead. It just smells funny...
[tip: irq/urgent] genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
The following commit has been merged into the irq/urgent branch of tip: Commit-ID: 4c457e8cb75eda91906a4f89fc39bde3f9a43922 Gitweb: https://git.kernel.org/tip/4c457e8cb75eda91906a4f89fc39bde3f9a43922 Author:Marc Zyngier AuthorDate:Sat, 23 Jan 2021 12:27:59 Committer: Thomas Gleixner CommitterDate: Sat, 30 Jan 2021 01:22:31 +01:00 genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), __msi_domain_alloc_irqs() performs the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by: Shameer Kolothum Signed-off-by: Marc Zyngier Signed-off-by: Thomas Gleixner Tested-by: Shameer Kolothum Cc: sta...@vger.kernel.org Link: https://lore.kernel.org/r/20210123122759.1781359-1-...@kernel.org --- include/linux/msi.h | 6 ++- kernel/irq/msi.c| 44 2 files changed, 26 insertions(+), 24 deletions(-) diff --git a/include/linux/msi.h b/include/linux/msi.h index 360a0a7..aef35fd 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -178,6 +178,12 @@ struct msi_desc { list_for_each_entry((desc), dev_to_msi_list((dev)), list) #define for_each_msi_entry_safe(desc, tmp, dev)\ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#define for_each_msi_vector(desc, __irq, dev) \ + for_each_msi_entry((desc), (dev)) \ + if ((desc)->irq)\ + for (__irq = (desc)->irq; \ +__irq < ((desc)->irq + (desc)->nvec_used); \ +__irq++) #ifdef CONFIG_IRQ_MSI_IOMMU static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index dc0e2d7..b338d62 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, can_reserve = msi_check_reservation_mode(domain, info, dev); - for_each_msi_entry(desc, dev) { - virq = desc->irq; - if (desc->nvec_used == 1) - dev_dbg(dev, "irq %d for MSI\n", virq); - else + /* +* This flag is set by the PCI layer as we need to activate +* the MSI entries before the PCI layer enables MSI in the +* card. Otherwise the card latches a random msi message. +*/ + if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) + goto skip_activate; + + for_each_msi_vector(desc, i, dev) { + if (desc->irq == i) { + virq = desc->irq; dev_dbg(dev, "irq [%d-%d] for MSI\n", virq, virq + desc->nvec_used - 1); - /* -* This flag is set by the PCI layer as we need to activate -* the MSI entries before the PCI layer enables MSI in the -* card. Otherwise the card latches a random msi message. -*/ - if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) - continue; + } - irq_data = irq_domain_get_irq_data(domain, desc->irq); + irq_data = irq_domain_get_irq_data(domain, i); if (!can_reserve) { irqd_clr_can_reserve(irq_data); if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK) @@ -462,28 +462,24 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, goto cleanup; } +skip_activate: /* * If these interrupts use reservation mode, clear the activated bit * so request_irq() will assign the final vector. */ if (can_reserve) { - for_each_msi_entry(desc, dev) { - irq_data = irq_domain_get_irq_data(domain, desc->irq); + for_each_msi_vector(desc, i, dev) { + irq_data = irq_domain_get_irq_data(domain, i); irqd_clr_activated(irq_data); } } return 0; cleanup: - for_each_msi_entry(desc, dev) { - struct irq_data *irq
Re: [PATCH V3 10/14] arm64: nvhe: Allow TRBE access at EL1
On 2021-01-28 09:34, Suzuki K Poulose wrote: On 1/27/21 9:58 AM, Marc Zyngier wrote: On 2021-01-27 08:55, Anshuman Khandual wrote: From: Suzuki K Poulose When the kernel is booted at EL2 in a nvhe configuration, enable the TRBE access to the EL1. The EL1 still can't trace EL2, unless EL2 permits explicitly via TRFCR_EL2.E2TRE. Cc: Will Deacon Cc: Catalin Marinas Cc: Marc Zyngier Cc: Mark Rutland cc: Anshuman Khandual Signed-off-by: Suzuki K Poulose Signed-off-by: Anshuman Khandual Acked-by: Marc Zyngier One comment below, though: --- arch/arm64/include/asm/el2_setup.h | 19 +++ arch/arm64/include/asm/kvm_arm.h | 2 ++ 2 files changed, 21 insertions(+) diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index a7f5a1b..05ecce9 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -72,6 +72,25 @@ .endif 3: + +.ifeqs "\mode", "nvhe" + /* + * If the Trace Buffer is available, allow + * the EL1 to own it. Note that EL1 cannot + * trace the EL2, as it is prevented by + * TRFCR_EL2.E2TRE == 0. + */ + ubfx x0, x1, #ID_AA64DFR0_TRBE_SHIFT, #4 + cbz x0, 1f + + mrs_s x0, SYS_TRBIDR_EL1 + and x0, x0, TRBIDR_PROG + cbnz x0, 1f + mov x0, #(MDCR_EL2_E2TB_EL1_OWN << MDCR_EL2_E2TB_SHIFT) + orr x2, x2, x0 +.endif + +1: Note that this will (badly) conflict with the late-VHE patches[1], where this code path has been reworked. Thanks for the heads up. We will need to see how things get merged. Ideally this patch and the previous one (TRBE definitions could go via the arm64 tree / kvm tree), in which case we could rebase these two patches on the respective tree. I think the current plan of action is to go via the arm64 tree, given that there is nothing really KVM specific there. I'll respin the series one last (hopefully!) time on Monday. Let me know if you need a hand with the rebasing. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [v2] irqchip: ls-extirq: add flag IRQCHIP_SKIP_SET_WAKE to remove call trace
On 2021-01-28 02:37, Biwen Li (OSS) wrote: -Original Message- From: Marc Zyngier Sent: 2021年1月27日 19:38 To: Biwen Li (OSS) Cc: mark.rutl...@arm.com; Leo Li ; t...@linutronix.de; ja...@lakedaemon.net; linux-kernel@vger.kernel.org; Jiafei Pan ; linux-arm-ker...@lists.infradead.org; Ran Wang ; Biwen Li Subject: Re: [v2] irqchip: ls-extirq: add flag IRQCHIP_SKIP_SET_WAKE to remove call trace On 2021-01-27 08:58, Biwen Li wrote: > From: Biwen Li > > Add flag IRQCHIP_SKIP_SET_WAKE to remove call trace as follow, ... > [ 45.605239] Unbalanced IRQ 120 wake disable > [ 45.609445] WARNING: CPU: 0 PID: 1124 at kernel/irq/manage.c:800 > irq_set_irq_wake+0x154/0x1a0 > ... > [ 45.645141] pstate: 6085 (nZCv daIf -PAN -UAO -TCO BTYPE=--) > [ 45.651144] pc : irq_set_irq_wake+0x154/0x1a0 > [ 45.655497] lr : irq_set_irq_wake+0x154/0x1a0 > ... > [ 45.742825] Call trace: > [ 45.745268] irq_set_irq_wake+0x154/0x1a0 > [ 45.749278] ds3232_resume+0x38/0x50 > > On ls2088ardb: > In suspend progress(# echo mem > /sys/power/state), > pm_suspend()->suspend_devices_and_enter()->dpm_suspend()->device_suspe > nd() > ->ds3232_suspend()->enable_irq_wake()->irq_set_irq_wake() > ->set_irq_wake_real(), return -ENXIO, there get > "Cannot set wakeup source" in ds3232_suspend(). > > In resume progress(wakeup by flextimer) > dpm_resume_end()->dpm_resume() > ->device_resume()->ds3232_resume() > ->disable_irq_wake()->irq_set_irq_wake() > ->set_irq_wake_real(), there get > kernel call trace(Unbalanced IRQ 120 wake > disable) This is again paraphrasing the stack trace instead of explaining the problem it fixes. How about: "The ls-extirq driver doesn't implement the irq_set_wake() callback, while being wake-up capable. This results in ugly behaviours across suspend/resume cycles. Advertise this by adding IRQCHIP_SKIP_SET_WAKE to the irqchip flags" The subject line should be fixed along the same lines, and a Fixes: tag added. Okay, got it. Thanks. Will update in v3. ... and v3 still doesn't have a Fixes: tag. Frankly, if you can't be bothered to do this, why should I worry about your platform being broken? M. -- Jazz is not dead. It just smells funny...
Re: [v7,5/7] PCI: mediatek-gen3: Add MSI support
On 2021-01-27 12:31, Jianjun Wang wrote: On Tue, 2021-01-26 at 13:57 +, Marc Zyngier wrote: On 2021-01-13 11:39, Jianjun Wang wrote: > Add MSI support for MediaTek Gen3 PCIe controller. > > This PCIe controller supports up to 256 MSI vectors, the MSI hardware > block diagram is as follows: > > +-+ > | GIC | > +-+ > ^ > | > port->irq > | > +-+-+-+-+-+-+-+-+ > |0|1|2|3|4|5|6|7| (PCIe intc) > +-+-+-+-+-+-+-+-+ > ^ ^ ^ > | |...| > +---+ +--++---+ > ||| > +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ > |0|1|...|30|31| |0|1|...|30|31| |0|1|...|30|31| (MSI sets) > +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ > ^ ^ ^ ^^ ^ ^ ^^ ^ ^ ^ > | | | || | | || | | | (MSI vectors) > | | | || | | || | | | > > (MSI SET0) (MSI SET1) ... (MSI SET7) > > With 256 MSI vectors supported, the MSI vectors are composed of 8 sets, > each set has its own address for MSI message, and supports 32 MSI > vectors > to generate interrupt. > > Signed-off-by: Jianjun Wang > Acked-by: Ryder Lee > --- > drivers/pci/controller/pcie-mediatek-gen3.c | 261 > 1 file changed, 261 insertions(+) > > diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c > b/drivers/pci/controller/pcie-mediatek-gen3.c > index 7979a2856c35..471d97cd1ef9 100644 > --- a/drivers/pci/controller/pcie-mediatek-gen3.c > +++ b/drivers/pci/controller/pcie-mediatek-gen3.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -52,11 +53,28 @@ > #define PCIE_LINK_STATUS_REG 0x154 > #define PCIE_PORT_LINKUP BIT(8) > > +#define PCIE_MSI_SET_NUM 8 > +#define PCIE_MSI_IRQS_PER_SET 32 > +#define PCIE_MSI_IRQS_NUM \ > + (PCIE_MSI_IRQS_PER_SET * (PCIE_MSI_SET_NUM)) Spurious inner bracketing. > + > #define PCIE_INT_ENABLE_REG 0x180 > +#define PCIE_MSI_ENABLE GENMASK(PCIE_MSI_SET_NUM + 8 - 1, 8) > +#define PCIE_MSI_SHIFT8 > #define PCIE_INTX_SHIFT 24 > #define PCIE_INTX_MASKGENMASK(27, 24) > > #define PCIE_INT_STATUS_REG 0x184 > +#define PCIE_MSI_SET_ENABLE_REG 0x190 > +#define PCIE_MSI_SET_ENABLE GENMASK(PCIE_MSI_SET_NUM - 1, 0) > + > +#define PCIE_MSI_SET_BASE_REG 0xc00 > +#define PCIE_MSI_SET_OFFSET 0x10 > +#define PCIE_MSI_SET_STATUS_OFFSET0x04 > +#define PCIE_MSI_SET_ENABLE_OFFSET0x08 > + > +#define PCIE_MSI_SET_ADDR_HI_BASE 0xc80 > +#define PCIE_MSI_SET_ADDR_HI_OFFSET 0x04 > > #define PCIE_TRANS_TABLE_BASE_REG 0x800 > #define PCIE_ATR_SRC_ADDR_MSB_OFFSET 0x4 > @@ -76,6 +94,18 @@ > #define PCIE_ATR_TLP_TYPE_MEM PCIE_ATR_TLP_TYPE(0) > #define PCIE_ATR_TLP_TYPE_IO PCIE_ATR_TLP_TYPE(2) > > +/** > + * struct mtk_pcie_msi - MSI information for each set > + * @dev: pointer to PCIe device > + * @base: IO mapped register base > + * @msg_addr: MSI message address > + */ > +struct mtk_msi_set { > + struct device *dev; > + void __iomem *base; > + phys_addr_t msg_addr; > +}; > + > /** > * struct mtk_pcie_port - PCIe port information > * @dev: pointer to PCIe device > @@ -88,6 +118,11 @@ > * @num_clks: PCIe clocks count for this port > * @irq: PCIe controller interrupt number > * @intx_domain: legacy INTx IRQ domain > + * @msi_domain: MSI IRQ domain > + * @msi_bottom_domain: MSI IRQ bottom domain > + * @msi_sets: MSI sets information > + * @lock: lock protecting IRQ bit map > + * @msi_irq_in_use: bit map for assigned MSI IRQ > */ > struct mtk_pcie_port { >struct device *dev; > @@ -101,6 +136,11 @@ struct mtk_pcie_port { > >int irq; >struct irq_domain *intx_domain; > + struct irq_domain *msi_domain; > + struct irq_domain *msi_bottom_domain; > + struct mtk_msi_set msi_sets[PCIE_MSI_SET_NUM]; > + struct mutex lock; > + DECLARE_BITMAP(msi_irq_in_use, PCIE_MSI_IRQS_NUM); > }; > > /** > @@ -243,6 +283,15 @@ static int mtk_pcie_startup_port(struct > mtk_pcie_port *port) >return err; >} > > + /* Enable MSI */ > + val = readl_relaxed(port->base + PCIE_MSI_SET_ENABLE_REG); > + val |= PCIE_MSI_SET_ENABLE;
Re: [v2] irqchip: ls-extirq: add flag IRQCHIP_SKIP_SET_WAKE to remove call trace
On 2021-01-27 08:58, Biwen Li wrote: From: Biwen Li Add flag IRQCHIP_SKIP_SET_WAKE to remove call trace as follow, ... [ 45.605239] Unbalanced IRQ 120 wake disable [ 45.609445] WARNING: CPU: 0 PID: 1124 at kernel/irq/manage.c:800 irq_set_irq_wake+0x154/0x1a0 ... [ 45.645141] pstate: 6085 (nZCv daIf -PAN -UAO -TCO BTYPE=--) [ 45.651144] pc : irq_set_irq_wake+0x154/0x1a0 [ 45.655497] lr : irq_set_irq_wake+0x154/0x1a0 ... [ 45.742825] Call trace: [ 45.745268] irq_set_irq_wake+0x154/0x1a0 [ 45.749278] ds3232_resume+0x38/0x50 On ls2088ardb: In suspend progress(# echo mem > /sys/power/state), pm_suspend()->suspend_devices_and_enter()->dpm_suspend()->device_suspend() ->ds3232_suspend()->enable_irq_wake()->irq_set_irq_wake() ->set_irq_wake_real(), return -ENXIO, there get "Cannot set wakeup source" in ds3232_suspend(). In resume progress(wakeup by flextimer) dpm_resume_end()->dpm_resume() ->device_resume()->ds3232_resume() ->disable_irq_wake()->irq_set_irq_wake() ->set_irq_wake_real(), there get kernel call trace(Unbalanced IRQ 120 wake disable) This is again paraphrasing the stack trace instead of explaining the problem it fixes. How about: "The ls-extirq driver doesn't implement the irq_set_wake() callback, while being wake-up capable. This results in ugly behaviours across suspend/resume cycles. Advertise this by adding IRQCHIP_SKIP_SET_WAKE to the irqchip flags" The subject line should be fixed along the same lines, and a Fixes: tag added. M. -- Jazz is not dead. It just smells funny...
Re: [PATCH V3 10/14] arm64: nvhe: Allow TRBE access at EL1
On 2021-01-27 08:55, Anshuman Khandual wrote: From: Suzuki K Poulose When the kernel is booted at EL2 in a nvhe configuration, enable the TRBE access to the EL1. The EL1 still can't trace EL2, unless EL2 permits explicitly via TRFCR_EL2.E2TRE. Cc: Will Deacon Cc: Catalin Marinas Cc: Marc Zyngier Cc: Mark Rutland cc: Anshuman Khandual Signed-off-by: Suzuki K Poulose Signed-off-by: Anshuman Khandual Acked-by: Marc Zyngier One comment below, though: --- arch/arm64/include/asm/el2_setup.h | 19 +++ arch/arm64/include/asm/kvm_arm.h | 2 ++ 2 files changed, 21 insertions(+) diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index a7f5a1b..05ecce9 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -72,6 +72,25 @@ .endif 3: + +.ifeqs "\mode", "nvhe" + /* +* If the Trace Buffer is available, allow +* the EL1 to own it. Note that EL1 cannot +* trace the EL2, as it is prevented by +* TRFCR_EL2.E2TRE == 0. +*/ + ubfxx0, x1, #ID_AA64DFR0_TRBE_SHIFT, #4 + cbz x0, 1f + + mrs_s x0, SYS_TRBIDR_EL1 + and x0, x0, TRBIDR_PROG + cbnzx0, 1f + mov x0, #(MDCR_EL2_E2TB_EL1_OWN << MDCR_EL2_E2TB_SHIFT) + orr x2, x2, x0 +.endif + +1: Note that this will (badly) conflict with the late-VHE patches[1], where this code path has been reworked. Thanks, M. [1] https://lore.kernel.org/r/20210125105019.2946057-1-...@kernel.org -- Jazz is not dead. It just smells funny...
[PATCH v5 06/21] arm64: Move VHE-specific SPE setup to mutate_to_vhe()
There isn't much that a VHE kernel needs on top of whatever has been done for nVHE, so let's move the little we need to the VHE stub (the SPE setup), and drop the init_el2_state macro. No expected functional change. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/kernel/hyp-stub.S | 28 +--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 373ed2213e1d..6b5c73cf9d52 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -92,9 +92,6 @@ SYM_CODE_START_LOCAL(mutate_to_vhe) msr hcr_el2, x0 isb - // Doesn't do much on VHE, but still, worth a shot - init_el2_state vhe - // Use the EL1 allocated stack, per-cpu offset mrs x0, sp_el1 mov sp, x0 @@ -107,6 +104,31 @@ SYM_CODE_START_LOCAL(mutate_to_vhe) mrs_s x0, SYS_VBAR_EL12 msr vbar_el1, x0 + // Fixup SPE configuration, if supported... + mrs x1, id_aa64dfr0_el1 + ubfxx1, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 + mov x2, xzr + cbz x1, skip_spe + + // ... and not owned by EL3 + mrs_s x0, SYS_PMBIDR_EL1 + and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT) + cbnzx0, skip_spe + + // Let the SPE driver in control of the sampling + mrs_s x0, SYS_PMSCR_EL1 + bic x0, x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT) + bic x0, x0, #(1 << SYS_PMSCR_EL2_PA_SHIFT) + msr_s SYS_PMSCR_EL1, x0 + mov x2, #MDCR_EL2_TPMS + +skip_spe: + // For VHE, use EL2 translation and disable access from EL1 + mrs x0, mdcr_el2 + bic x0, x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) + orr x0, x0, x2 + msr mdcr_el2, x0 + // Transfer the MM state from EL1 to EL2 mrs_s x0, SYS_TCR_EL12 msr tcr_el1, x0 -- 2.29.2
[PATCH v5 09/21] arm64: cpufeature: Add global feature override facility
Add a facility to globally override a feature, no matter what the HW says. Yes, this sounds dangerous, but we do respect the "safe" value for a given feature. This doesn't mean the user doesn't need to know what they are doing. Nothing uses this yet, so we are pretty safe. For now. Signed-off-by: Marc Zyngier Reviewed-by: Suzuki K Poulose Acked-by: David Brazdil --- arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/kernel/cpufeature.c | 42 - 2 files changed, 42 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 9a555809b89c..fe469389068e 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -63,6 +63,11 @@ struct arm64_ftr_bits { s64 safe_val; /* safe value for FTR_EXACT features */ }; +struct arm64_ftr_override { + u64 val; + u64 mask; +}; + /* * @arm64_ftr_reg - Feature register * @strict_maskBits which should match across all CPUs for sanity. @@ -74,6 +79,7 @@ struct arm64_ftr_reg { u64 user_mask; u64 sys_val; u64 user_val; + struct arm64_ftr_override *override; const struct arm64_ftr_bits *ftr_bits; }; diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index e99eddec0a46..cb58c7c991ef 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -352,9 +352,12 @@ static const struct arm64_ftr_bits ftr_ctr[] = { ARM64_FTR_END, }; +static struct arm64_ftr_override no_override = { }; + struct arm64_ftr_reg arm64_ftr_reg_ctrel0 = { .name = "SYS_CTR_EL0", - .ftr_bits = ftr_ctr + .ftr_bits = ftr_ctr, + .override = &no_override, }; static const struct arm64_ftr_bits ftr_id_mmfr0[] = { @@ -544,13 +547,16 @@ static const struct arm64_ftr_bits ftr_raz[] = { ARM64_FTR_END, }; -#define ARM64_FTR_REG(id, table) { \ - .sys_id = id, \ - .reg = &(struct arm64_ftr_reg){\ - .name = #id,\ - .ftr_bits = &((table)[0]), \ +#define ARM64_FTR_REG_OVERRIDE(id, table, ovr) { \ + .sys_id = id, \ + .reg = &(struct arm64_ftr_reg){\ + .name = #id,\ + .override = (ovr), \ + .ftr_bits = &((table)[0]), \ }} +#define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, &no_override) + static const struct __ftr_reg_entry { u32 sys_id; struct arm64_ftr_reg*reg; @@ -770,6 +776,30 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, reg->override->val); + + if ((ftr_mask & reg->override->mask) == ftr_mask) { + s64 tmp = arm64_ftr_safe_value(ftrp, ftr_ovr, ftr_new); + char *str = NULL; + + if (ftr_ovr != tmp) { + /* Unsafe, remove the override */ + reg->override->mask &= ~ftr_mask; + reg->override->val &= ~ftr_mask; + tmp = ftr_ovr; + str = "ignoring override"; + } else if (ftr_new != tmp) { + /* Override was valid */ + ftr_new = tmp; + str = "forced"; + } + + if (str) + pr_warn("%s[%d:%d]: %s to %llx\n", + reg->name, + ftrp->shift + ftrp->width - 1, + ftrp->shift, str, tmp); + } val = arm64_ftr_set_value(ftrp, val, ftr_new); -- 2.29.2
[PATCH v5 03/21] arm64: Turn the MMU-on sequence into a macro
Turning the MMU on is a popular sport in the arm64 kernel, and we do it more than once, or even twice. As we are about to add even more, let's turn it into a macro. No expected functional change. Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Acked-by: David Brazdil --- arch/arm64/include/asm/assembler.h | 17 + arch/arm64/kernel/head.S | 19 --- arch/arm64/mm/proc.S | 12 +--- 3 files changed, 22 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index bf125c591116..8cded93f99c3 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -675,6 +675,23 @@ USER(\label, icivau, \tmp2)// invalidate I line PoU .endif .endm +/* + * Set SCTLR_EL1 to the passed value, and invalidate the local icache + * in the process. This is called when setting the MMU on. + */ +.macro set_sctlr_el1, reg + msr sctlr_el1, \reg + isb + /* +* Invalidate the local I-cache so that any instructions fetched +* speculatively from the PoC are discarded, since they may have +* been dynamically patched at the PoU. +*/ + ic iallu + dsb nsh + isb +.endm + /* * Check whether to yield to another runnable task from kernel mode NEON code * (which runs with preemption disabled). diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index a0dc987724ed..28e9735302df 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -703,16 +703,9 @@ SYM_FUNC_START(__enable_mmu) offset_ttbr1 x1, x3 msr ttbr1_el1, x1 // load TTBR1 isb - msr sctlr_el1, x0 - isb - /* -* Invalidate the local I-cache so that any instructions fetched -* speculatively from the PoC are discarded, since they may have -* been dynamically patched at the PoU. -*/ - ic iallu - dsb nsh - isb + + set_sctlr_el1 x0 + ret SYM_FUNC_END(__enable_mmu) @@ -883,11 +876,7 @@ SYM_FUNC_START_LOCAL(__primary_switch) tlbivmalle1 // Remove any stale TLB entries dsb nsh - msr sctlr_el1, x19 // re-enable the MMU - isb - ic iallu // flush instructions fetched - dsb nsh // via old mapping - isb + set_sctlr_el1 x19 // re-enable the MMU bl __relocate_kernel #endif diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index ece785477bdc..c967bfd30d2b 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -291,17 +291,7 @@ skip_pgd: /* We're done: fire up the MMU again */ mrs x17, sctlr_el1 orr x17, x17, #SCTLR_ELx_M - msr sctlr_el1, x17 - isb - - /* -* Invalidate the local I-cache so that any instructions fetched -* speculatively from the PoC are discarded, since they may have -* been dynamically patched at the PoU. -*/ - ic iallu - dsb nsh - isb + set_sctlr_el1 x17 /* Set the flag to zero to indicate that we're all done */ str wzr, [flag_ptr] -- 2.29.2
[PATCH v5 04/21] arm64: Provide an 'upgrade to VHE' stub hypercall
As we are about to change the way a VHE system boots, let's provide the core helper, in the form of a stub hypercall that enables VHE and replicates the full EL1 context at EL2, thanks to EL1 and VHE-EL2 being extremely similar. On exception return, the kernel carries on at EL2. Fancy! Nothing calls this new hypercall yet, so no functional change. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/include/asm/virt.h | 7 +++- arch/arm64/kernel/hyp-stub.S | 76 ++- 2 files changed, 80 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h index ee6a48df89d9..7379f35ae2c6 100644 --- a/arch/arm64/include/asm/virt.h +++ b/arch/arm64/include/asm/virt.h @@ -35,8 +35,13 @@ */ #define HVC_RESET_VECTORS 2 +/* + * HVC_VHE_RESTART - Upgrade the CPU from EL1 to EL2, if possible + */ +#define HVC_VHE_RESTART3 + /* Max number of HYP stub hypercalls */ -#define HVC_STUB_HCALL_NR 3 +#define HVC_STUB_HCALL_NR 4 /* Error returned when an invalid stub number is passed into x0 */ #define HVC_STUB_ERR 0xbadca11 diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 160f5881a0b7..3f3dbbe8914d 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -8,9 +8,9 @@ #include #include -#include #include +#include #include #include #include @@ -47,10 +47,13 @@ SYM_CODE_END(__hyp_stub_vectors) SYM_CODE_START_LOCAL(el1_sync) cmp x0, #HVC_SET_VECTORS - b.ne2f + b.ne1f msr vbar_el2, x1 b 9f +1: cmp x0, #HVC_VHE_RESTART + b.eqmutate_to_vhe + 2: cmp x0, #HVC_SOFT_RESTART b.ne3f mov x0, x2 @@ -70,6 +73,75 @@ SYM_CODE_START_LOCAL(el1_sync) eret SYM_CODE_END(el1_sync) +// nVHE? No way! Give me the real thing! +SYM_CODE_START_LOCAL(mutate_to_vhe) + // Be prepared to fail + mov_q x0, HVC_STUB_ERR + + // Sanity check: MMU *must* be off + mrs x1, sctlr_el2 + tbnzx1, #0, 1f + + // Needs to be VHE capable, obviously + mrs x1, id_aa64mmfr1_el1 + ubfxx1, x1, #ID_AA64MMFR1_VHE_SHIFT, #4 + cbz x1, 1f + + // Engage the VHE magic! + mov_q x0, HCR_HOST_VHE_FLAGS + msr hcr_el2, x0 + isb + + // Doesn't do much on VHE, but still, worth a shot + init_el2_state vhe + + // Use the EL1 allocated stack, per-cpu offset + mrs x0, sp_el1 + mov sp, x0 + mrs x0, tpidr_el1 + msr tpidr_el2, x0 + + // FP configuration, vectors + mrs_s x0, SYS_CPACR_EL12 + msr cpacr_el1, x0 + mrs_s x0, SYS_VBAR_EL12 + msr vbar_el1, x0 + + // Transfer the MM state from EL1 to EL2 + mrs_s x0, SYS_TCR_EL12 + msr tcr_el1, x0 + mrs_s x0, SYS_TTBR0_EL12 + msr ttbr0_el1, x0 + mrs_s x0, SYS_TTBR1_EL12 + msr ttbr1_el1, x0 + mrs_s x0, SYS_MAIR_EL12 + msr mair_el1, x0 + isb + + // Invalidate TLBs before enabling the MMU + tlbivmalle1 + dsb nsh + + // Enable the EL2 S1 MMU, as set up from EL1 + mrs_s x0, SYS_SCTLR_EL12 + set_sctlr_el1 x0 + + // Disable the EL1 S1 MMU for a good measure + mov_q x0, INIT_SCTLR_EL1_MMU_OFF + msr_s SYS_SCTLR_EL12, x0 + + // Hack the exception return to stay at EL2 + mrs x0, spsr_el1 + and x0, x0, #~PSR_MODE_MASK + mov x1, #PSR_MODE_EL2h + orr x0, x0, x1 + msr spsr_el1, x0 + + mov x0, xzr + +1: eret +SYM_CODE_END(mutate_to_vhe) + .macro invalid_vector label SYM_CODE_START_LOCAL(\label) b \label -- 2.29.2
[PATCH v5 00/21] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth
It recently came to light that there is a need to be able to override some CPU features very early on, before the kernel is fully up and running. The reasons for this range from specific feature support (such as using Protected KVM on VHE HW, which is the main motivation for this work) to errata workaround (a feature is broken on a CPU and needs to be turned off, or rather not enabled). This series tries to offer a limited framework for this kind of problems, by allowing a set of options to be passed on the command-line and altering the feature set that the cpufeature subsystem exposes to the rest of the kernel. Note that this doesn't change anything for code that directly uses the CPU ID registers. The series completely changes the way a VHE-capable system boots, by *always* booting non-VHE first, and then upgrading to VHE when deemed capable. Although it sounds scary, this is actually simple to implement (and I wish I had done that five years ago). The "upgrade to VHE" path is then conditioned on the VHE feature not being disabled from the command-line. Said command-line parsing borrows a lot from the kaslr code, and subsequently allows the "nokaslr" option to be moved to the new infrastructure (though it all looks a bit... odd). Further patches now add support for disabling BTI and PAuth, the latter being based on an initial series by Srinivas Ramana[0]. There is some ongoing discussions about being able to disable MTE, but no clear resolution on that subject yet. This has been tested on multiple VHE and non-VHE systems. * From v4 [4]: - Documentation fixes - Moved the val/mask pair into a arm64_ftr_override structure, leading to simpler code - All arm64_ftr_reg now have a default override, which simplifies the code a bit further - Dropped some of the "const" attributes - Renamed init_shadow_regs() to init_feature_override() - Renamed struct reg_desc to struct ftr_set_desc - Refactored command-line parsing - Simplified handling of VHE being disabled on the cmdline - Turn EL1 S1 MMU off on switch to VHE - HVC_VHE_RESTART now returns an error code on failure - Added missing asmlinkage and dummy prototypes - Collected Acks and RBs from David, Catalin and Suzuki * From v3 [3]: - Fixed the VHE_RESTART stub (duh!) - Switched to using arm64_ftr_safe_value() instead of the user provided value - Per-feature override warning * From v2 [2]: - Simplify the VHE_RESTART stub - Fixed a number of spelling mistakes, and hopefully introduced a few more - Override features in __read_sysreg_by_encoding() - Allow both BTI and PAuth to be overridden on the command line - Rebased on -rc3 * From v1 [1]: - Fix SPE init on VHE when EL2 doesn't own SPE - Fix re-init when KASLR is used - Handle the resume path - Rebased to 5.11-rc2 [0] https://lore.kernel.org/r/1610152163-16554-1-git-send-email-sram...@codeaurora.org [1] https://lore.kernel.org/r/20201228104958.1848833-1-...@kernel.org [2] https://lore.kernel.org/r/20210104135011.2063104-1-...@kernel.org [3] https://lore.kernel.org/r/2021032811.2455113-1-...@kernel.org [4] https://lore.kernel.org/r/20210118094533.2874082-1-...@kernel.org Marc Zyngier (20): arm64: Fix labels in el2_setup macros arm64: Fix outdated TCR setup comment arm64: Turn the MMU-on sequence into a macro arm64: Provide an 'upgrade to VHE' stub hypercall arm64: Initialise as nVHE before switching to VHE arm64: Move VHE-specific SPE setup to mutate_to_vhe() arm64: Simplify init_el2_state to be non-VHE only arm64: Move SCTLR_EL1 initialisation to EL-agnostic code arm64: cpufeature: Add global feature override facility arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding() arm64: Extract early FDT mapping from kaslr_early_init() arm64: cpufeature: Add an early command-line cpufeature override facility arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line arm64: Honor VHE being disabled from the command-line arm64: Add an aliasing facility for the idreg override arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0 KVM: arm64: Document HVC_VHE_RESTART stub hypercall arm64: Move "nokaslr" over to the early cpufeature infrastructure arm64: cpufeatures: Allow disabling of BTI from the command-line arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line Srinivas Ramana (1): arm64: Defer enabling pointer authentication on boot core .../admin-guide/kernel-parameters.txt | 9 + Documentation/virt/kvm/arm/hyp-abi.rst| 9 + arch/arm64/include/asm/assembler.h| 17 ++ arch/arm64/include/asm/cpufeature.h | 11 + arch/arm64/include/asm/el2_setup.h| 60 ++ arch/arm64/include/asm/pointer_auth.h | 10 + arch/arm64/include/asm/setup.h| 11 + arch/arm64/include/asm/s
Re: [PATCH 1/2] [REPOST] dt-bindings: qcom,pdc: Add compatible for SM8250
On Fri, 15 Jan 2021 14:39:40 +0530, Vinod Koul wrote: > Add the compatible string for SM8250 SoC from Qualcomm. This compatible > is used already in DTS files but not documented yet Applied to irq/irqchip-5.12, thanks! [1/2] dt-bindings: qcom,pdc: Add compatible for SM8250 commit: e6f93c0115cb24ae4b473f28a27294e99faf129a [2/2] dt-bindings: qcom,pdc: Add compatible for SM8350 commit: 9eaad15e5a409f59660f9fdf867f7d3e6e3db15a Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [RFC PATCH v1 0/5] Enable CPU TTRem feature for stage-2
Hi Yanan, On 2021-01-26 13:41, Yanan Wang wrote: Hi all, This series enable CPU TTRem feature for stage-2 page table and a RFC is sent for some comments, thanks. The ARMv8.4 TTRem feature offers 3 levels of support when changing block size without changing any other parameters that are listed as requiring use of break-before-make. And I found that maybe we can use this feature to make some improvement for stage-2 page table and the following explains what TTRem exactly does for the improvement. If migration of a VM with hugepages is canceled midway, KVM will adjust the stage-2 table mappings back to block mappings. We currently use BBM to replace the table entry with a block entry. Take adjustment of 1G block mapping as an example, with BBM procedures, we have to invalidate the old table entry first, flush TLB and unmap the old table mappings, right before installing the new block entry. In all honesty, I think the amount of work that is getting added to support this "migration cancelled mid-way" use case is getting out of control. This is adding a complexity and corner cases for a use case that really shouldn't happen that often. And it is adding it at the worse possible place, where we really should keep things as straightforward as possible. I would expect userspace to have a good enough knowledge of whether the migration is likely to succeed, and not to attempt it if it is likely to fail. And yes, it will fail sometimes. But it should be so rare that adding this various stages of BBM support shouldn't be that useful. Or is there something else that I am missing? Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] irqchip: ls-extirq: add flag IRQCHIP_SKIP_SET_WAKE to remove call trace
On 2021-01-26 11:00, Biwen Li wrote: From: Biwen Li Add flag IRQCHIP_SKIP_SET_WAKE to remove call trace as follow, [useless trace] More importantly, what is the bug that you are fixing? M. -- Jazz is not dead. It just smells funny...
Re: [v7,5/7] PCI: mediatek-gen3: Add MSI support
On 2021-01-13 11:39, Jianjun Wang wrote: Add MSI support for MediaTek Gen3 PCIe controller. This PCIe controller supports up to 256 MSI vectors, the MSI hardware block diagram is as follows: +-+ | GIC | +-+ ^ | port->irq | +-+-+-+-+-+-+-+-+ |0|1|2|3|4|5|6|7| (PCIe intc) +-+-+-+-+-+-+-+-+ ^ ^ ^ | |...| +---+ +--++---+ ||| +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ |0|1|...|30|31| |0|1|...|30|31| |0|1|...|30|31| (MSI sets) +-+-+---+--+--+ +-+-+---+--+--+ +-+-+---+--+--+ ^ ^ ^ ^^ ^ ^ ^^ ^ ^ ^ | | | || | | || | | | (MSI vectors) | | | || | | || | | | (MSI SET0) (MSI SET1) ... (MSI SET7) With 256 MSI vectors supported, the MSI vectors are composed of 8 sets, each set has its own address for MSI message, and supports 32 MSI vectors to generate interrupt. Signed-off-by: Jianjun Wang Acked-by: Ryder Lee --- drivers/pci/controller/pcie-mediatek-gen3.c | 261 1 file changed, 261 insertions(+) diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c index 7979a2856c35..471d97cd1ef9 100644 --- a/drivers/pci/controller/pcie-mediatek-gen3.c +++ b/drivers/pci/controller/pcie-mediatek-gen3.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -52,11 +53,28 @@ #define PCIE_LINK_STATUS_REG 0x154 #define PCIE_PORT_LINKUP BIT(8) +#define PCIE_MSI_SET_NUM 8 +#define PCIE_MSI_IRQS_PER_SET 32 +#define PCIE_MSI_IRQS_NUM \ + (PCIE_MSI_IRQS_PER_SET * (PCIE_MSI_SET_NUM)) Spurious inner bracketing. + #define PCIE_INT_ENABLE_REG0x180 +#define PCIE_MSI_ENABLEGENMASK(PCIE_MSI_SET_NUM + 8 - 1, 8) +#define PCIE_MSI_SHIFT 8 #define PCIE_INTX_SHIFT24 #define PCIE_INTX_MASK GENMASK(27, 24) #define PCIE_INT_STATUS_REG0x184 +#define PCIE_MSI_SET_ENABLE_REG0x190 +#define PCIE_MSI_SET_ENABLEGENMASK(PCIE_MSI_SET_NUM - 1, 0) + +#define PCIE_MSI_SET_BASE_REG 0xc00 +#define PCIE_MSI_SET_OFFSET0x10 +#define PCIE_MSI_SET_STATUS_OFFSET 0x04 +#define PCIE_MSI_SET_ENABLE_OFFSET 0x08 + +#define PCIE_MSI_SET_ADDR_HI_BASE 0xc80 +#define PCIE_MSI_SET_ADDR_HI_OFFSET0x04 #define PCIE_TRANS_TABLE_BASE_REG 0x800 #define PCIE_ATR_SRC_ADDR_MSB_OFFSET 0x4 @@ -76,6 +94,18 @@ #define PCIE_ATR_TLP_TYPE_MEM PCIE_ATR_TLP_TYPE(0) #define PCIE_ATR_TLP_TYPE_IO PCIE_ATR_TLP_TYPE(2) +/** + * struct mtk_pcie_msi - MSI information for each set + * @dev: pointer to PCIe device + * @base: IO mapped register base + * @msg_addr: MSI message address + */ +struct mtk_msi_set { + struct device *dev; + void __iomem *base; + phys_addr_t msg_addr; +}; + /** * struct mtk_pcie_port - PCIe port information * @dev: pointer to PCIe device @@ -88,6 +118,11 @@ * @num_clks: PCIe clocks count for this port * @irq: PCIe controller interrupt number * @intx_domain: legacy INTx IRQ domain + * @msi_domain: MSI IRQ domain + * @msi_bottom_domain: MSI IRQ bottom domain + * @msi_sets: MSI sets information + * @lock: lock protecting IRQ bit map + * @msi_irq_in_use: bit map for assigned MSI IRQ */ struct mtk_pcie_port { struct device *dev; @@ -101,6 +136,11 @@ struct mtk_pcie_port { int irq; struct irq_domain *intx_domain; + struct irq_domain *msi_domain; + struct irq_domain *msi_bottom_domain; + struct mtk_msi_set msi_sets[PCIE_MSI_SET_NUM]; + struct mutex lock; + DECLARE_BITMAP(msi_irq_in_use, PCIE_MSI_IRQS_NUM); }; /** @@ -243,6 +283,15 @@ static int mtk_pcie_startup_port(struct mtk_pcie_port *port) return err; } + /* Enable MSI */ + val = readl_relaxed(port->base + PCIE_MSI_SET_ENABLE_REG); + val |= PCIE_MSI_SET_ENABLE; + writel_relaxed(val, port->base + PCIE_MSI_SET_ENABLE_REG); + + val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG); + val |= PCIE_MSI_ENABLE; + writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG); + /* Set PCIe translation windows */ resource_list_for_each_entry(entry, &host->windows) { struct resource *res = entry->res; @@ -286,6 +335,129 @@ static int mtk_pcie_set_affinity(struct irq_data *data, return -EINVAL; } +static struct irq_chip mtk_msi_irq_chip = { + .name = "MSI", + .irq_ack = irq_chip_ack_parent, +}; + +static struct msi_domain_info mtk_msi_domain_info = { + .flags = (MS
Re: [v7,4/7] PCI: mediatek-gen3: Add INTx support
On 2021-01-13 11:39, Jianjun Wang wrote: Add INTx support for MediaTek Gen3 PCIe controller. Signed-off-by: Jianjun Wang Acked-by: Ryder Lee --- drivers/pci/controller/pcie-mediatek-gen3.c | 163 1 file changed, 163 insertions(+) diff --git a/drivers/pci/controller/pcie-mediatek-gen3.c b/drivers/pci/controller/pcie-mediatek-gen3.c index c00ea7c167de..7979a2856c35 100644 --- a/drivers/pci/controller/pcie-mediatek-gen3.c +++ b/drivers/pci/controller/pcie-mediatek-gen3.c @@ -9,6 +9,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -49,6 +52,12 @@ #define PCIE_LINK_STATUS_REG 0x154 #define PCIE_PORT_LINKUP BIT(8) +#define PCIE_INT_ENABLE_REG0x180 +#define PCIE_INTX_SHIFT24 +#define PCIE_INTX_MASK GENMASK(27, 24) I guess this '24' is actually PCIE_INTX_SHIFT? In this case, please write it as GENMASK(PCIE_INTX_SHIFT + PCI_NUM_INTX - 1, PCIE_INTX_SHIFT) to make it clear that you are dealing with one bit per INTx. + +#define PCIE_INT_STATUS_REG0x184 + #define PCIE_TRANS_TABLE_BASE_REG 0x800 #define PCIE_ATR_SRC_ADDR_MSB_OFFSET 0x4 #define PCIE_ATR_TRSL_ADDR_LSB_OFFSET 0x8 @@ -77,6 +86,8 @@ * @phy: PHY controller block * @clks: PCIe clocks * @num_clks: PCIe clocks count for this port + * @irq: PCIe controller interrupt number + * @intx_domain: legacy INTx IRQ domain */ struct mtk_pcie_port { struct device *dev; @@ -87,6 +98,9 @@ struct mtk_pcie_port { struct phy *phy; struct clk_bulk_data *clks; int num_clks; + + int irq; + struct irq_domain *intx_domain; }; /** @@ -266,6 +280,149 @@ static int mtk_pcie_startup_port(struct mtk_pcie_port *port) return 0; } +static int mtk_pcie_set_affinity(struct irq_data *data, +const struct cpumask *mask, bool force) +{ + return -EINVAL; +} + +static void mtk_intx_mask(struct irq_data *data) +{ + struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data); + u32 val; + + val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG); + val &= ~BIT(data->hwirq + PCIE_INTX_SHIFT); + writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG); This is missing some locking. Otherwise, two concurrent mask/unmask for different interrupts will corrupt each other's state. +} + +static void mtk_intx_unmask(struct irq_data *data) +{ + struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data); + u32 val; + + val = readl_relaxed(port->base + PCIE_INT_ENABLE_REG); + val |= BIT(data->hwirq + PCIE_INTX_SHIFT); + writel_relaxed(val, port->base + PCIE_INT_ENABLE_REG); Same thing here. +} + +/** + * mtk_intx_eoi + * @data: pointer to chip specific data + * + * As an emulated level IRQ, its interrupt status will remain + * until the corresponding de-assert message is received; hence that + * the status can only be cleared when the interrupt has been serviced. + */ +static void mtk_intx_eoi(struct irq_data *data) +{ + struct mtk_pcie_port *port = irq_data_get_irq_chip_data(data); + unsigned long hwirq; + + hwirq = data->hwirq + PCIE_INTX_SHIFT; + writel_relaxed(BIT(hwirq), port->base + PCIE_INT_STATUS_REG); +} + +static struct irq_chip mtk_intx_irq_chip = { + .irq_mask = mtk_intx_mask, + .irq_unmask = mtk_intx_unmask, + .irq_eoi= mtk_intx_eoi, + .irq_set_affinity = mtk_pcie_set_affinity, + .name = "PCIe", nit: "PCIe" is not really descriptive. "INTx" would be a bit better. +}; + +static int mtk_pcie_intx_map(struct irq_domain *domain, unsigned int irq, +irq_hw_number_t hwirq) +{ + irq_set_chip_and_handler_name(irq, &mtk_intx_irq_chip, + handle_fasteoi_irq, "INTx"); + irq_set_chip_data(irq, domain->host_data); You probably want to set the chip_data *before* wiring the handler, as otherwise you could end-up with a NULL pointer in any of the callbacks if the interrupt fires between the two. + + return 0; +} + +static const struct irq_domain_ops intx_domain_ops = { + .map = mtk_pcie_intx_map, +}; + +static int mtk_pcie_init_irq_domains(struct mtk_pcie_port *port, +struct device_node *node) +{ + struct device *dev = port->dev; + struct device_node *intc_node; + + /* Setup INTx */ + intc_node = of_get_child_by_name(node, "interrupt-controller"); + if (!intc_node) { + dev_err(dev, "missing PCIe Intc node\n"); + return -ENODEV; + } + + port->intx_domain = irq_domain_add_linear(intc_node, PCI_NUM_INTX, + &intx_domain_ops, port); + if (!port->intx_domain) { + dev_err(dev, "f
Re: [PATCH v3 0/3] Some optimization for stage-2 translation
On Thu, 14 Jan 2021 20:13:47 +0800, Yanan Wang wrote: > This patch series(v3) make some optimization for stage-2 translation. > > About patch-1: > Procedures of hyp stage-1 map and guest stage-2 map are quite different, > but they are now tied closely by function kvm_set_valid_leaf_pte(). > So adjust the relative code for ease of code maintenance in the future. > > [...] Applied to kvm-arm64/concurrent-translation-fault, thanks! [1/3] KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2 map commit: 8ed80051c8c31d1587722fdb3af16677eba9d693 [2/3] KVM: arm64: Filter out the case of only changing permissions from stage-2 map path commit: 694d071f8d85d504055540a27f0dbe9dbf44584e [3/3] KVM: arm64: Mark the page dirty only if the fault is handled successfully commit: 509552e65ae8287178a5cdea2d734dcd2d6380ab Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH] genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
On 2021-01-25 14:39, Shameerali Kolothum Thodi wrote: -Original Message- From: Marc Zyngier [mailto:m...@kernel.org] Sent: 23 January 2021 12:28 To: linux-kernel@vger.kernel.org Cc: Thomas Gleixner ; Bjorn Helgaas ; Shameerali Kolothum Thodi ; sta...@vger.kernel.org Subject: [PATCH] genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), we perform the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by: Shameer Kolothum Signed-off-by: Marc Zyngier Cc: sta...@vger.kernel.org --- include/linux/msi.h | 6 ++ kernel/irq/msi.c| 44 2 files changed, 26 insertions(+), 24 deletions(-) diff --git a/include/linux/msi.h b/include/linux/msi.h index 360a0a7e7341..aef35fd1cf11 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -178,6 +178,12 @@ struct msi_desc { list_for_each_entry((desc), dev_to_msi_list((dev)), list) #define for_each_msi_entry_safe(desc, tmp, dev)\ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#define for_each_msi_vector(desc, __irq, dev) \ + for_each_msi_entry((desc), (dev)) \ + if ((desc)->irq) \ + for (__irq = (desc)->irq;\ +__irq < ((desc)->irq + (desc)->nvec_used);\ +__irq++) #ifdef CONFIG_IRQ_MSI_IOMMU static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index 2c0c4d6d0f83..d924676c8781 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, can_reserve = msi_check_reservation_mode(domain, info, dev); - for_each_msi_entry(desc, dev) { - virq = desc->irq; - if (desc->nvec_used == 1) - dev_dbg(dev, "irq %d for MSI\n", virq); - else + /* +* This flag is set by the PCI layer as we need to activate +* the MSI entries before the PCI layer enables MSI in the +* card. Otherwise the card latches a random msi message. +*/ + if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) + goto skip_activate; This will change the dbg print behavior. From the commit f3b0946d629c, it looks like the below dev_dbg() code was there for !MSI_FLAG_ACTIVATE_EARLY case as well. Not sure how much this matters though. I'm not sure this matters either. We may have relied on these statements some 6/7 years ago, as the whole hierarchy stuff was brand new, but we now have a much better debug infrastructure thanks to Thomas. I'd be totally in favour of dropping it. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 18/21] arm64: Move "nokaslr" over to the early cpufeature infrastructure
On 2021-01-25 14:19, Ard Biesheuvel wrote: On Mon, 25 Jan 2021 at 14:54, Marc Zyngier wrote: On 2021-01-25 12:54, Ard Biesheuvel wrote: [...] > This struct now takes up > - ~100 bytes for the characters themselves (which btw are not emitted > into __initdata or __initconst) > - 6x8 bytes for the char pointers > - 6x24 bytes for the RELA relocations that annotate these pointers as > quantities that need to be relocated at boot (on a kernel built with > KASLR) > > I know it's only a drop in the ocean, but in this case, where the > struct is statically declared and defined only once, and in the same > place, we could easily turn this into > > static const struct { >char alias[24]; >char param[20]; > }; > > and get rid of all the overhead. The only slightly annoying thing is > that the array sizes need to be kept in sync with the largest instance > appearing in the array, but this is easy when the struct type is > declared in the same place where its only instance is defined. Fair enough. I personally find the result butt-ugly, but I agree that it certainly saves some memory. Does the following work for you? I can even give symbolic names to the various constants (how generous of me! ;-). To be honest, I was anticipating more of a discussion, but this looks reasonable to me. It looked like a reasonable ask: all the strings are completely useless once the kernel has booted, and I'm the first to moan that I can't boot an arm64 kernel with less than 60MB of RAM (OK, it's a pretty bloated kernel...). Does 'charfeature[80];' really need 80 bytes though? It really needs 75 bytes, because of this: { "arm64.nopauth", "id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 " "id_aa64isar1.api=0 id_aa64isar1.apa=0"}, 80 is a round enough number. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v5 18/21] arm64: Move "nokaslr" over to the early cpufeature infrastructure
On 2021-01-25 12:54, Ard Biesheuvel wrote: On Mon, 25 Jan 2021 at 11:53, Marc Zyngier wrote: Given that the early cpufeature infrastructure has borrowed quite a lot of code from the kaslr implementation, let's reimplement the matching of the "nokaslr" option with it. Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Acked-by: David Brazdil --- arch/arm64/kernel/idreg-override.c | 15 + arch/arm64/kernel/kaslr.c | 36 ++ 2 files changed, 17 insertions(+), 34 deletions(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index cbb8eaa48742..3ccf51b84ba4 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -31,8 +31,22 @@ static const struct ftr_set_desc mmfr1 __initdata = { }, }; +extern struct arm64_ftr_override kaslr_feature_override; + +static const struct ftr_set_desc kaslr __initdata = { This should be __initconst not __initdata (below too) + .name = "kaslr", +#ifdef CONFIG_RANDOMIZE_BASE + .override = &kaslr_feature_override, +#endif + .fields = { + { "disabled", 0 }, + {} + }, +}; + static const struct ftr_set_desc * const regs[] __initdata = { &mmfr1, + &kaslr, }; static const struct { @@ -41,6 +55,7 @@ static const struct { } aliases[] __initdata = { { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, + { "nokaslr","kaslr.disabled=1" }, }; This struct now takes up - ~100 bytes for the characters themselves (which btw are not emitted into __initdata or __initconst) - 6x8 bytes for the char pointers - 6x24 bytes for the RELA relocations that annotate these pointers as quantities that need to be relocated at boot (on a kernel built with KASLR) I know it's only a drop in the ocean, but in this case, where the struct is statically declared and defined only once, and in the same place, we could easily turn this into static const struct { char alias[24]; char param[20]; }; and get rid of all the overhead. The only slightly annoying thing is that the array sizes need to be kept in sync with the largest instance appearing in the array, but this is easy when the struct type is declared in the same place where its only instance is defined. Fair enough. I personally find the result butt-ugly, but I agree that it certainly saves some memory. Does the following work for you? I can even give symbolic names to the various constants (how generous of me! ;-). Thanks, M. diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index d1310438d95c..9e7043bdc808 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -14,15 +14,15 @@ #include struct ftr_set_desc { - const char *name; + charname[20]; struct arm64_ftr_override *override; struct { - const char *name; + charname[20]; u8 shift; } fields[]; }; -static const struct ftr_set_desc mmfr1 __initdata = { +static const struct ftr_set_desc mmfr1 __initconst = { .name = "id_aa64mmfr1", .override = &id_aa64mmfr1_override, .fields = { @@ -31,7 +31,7 @@ static const struct ftr_set_desc mmfr1 __initdata = { }, }; -static const struct ftr_set_desc pfr1 __initdata = { +static const struct ftr_set_desc pfr1 __initconst = { .name = "id_aa64pfr1", .override = &id_aa64pfr1_override, .fields = { @@ -40,7 +40,7 @@ static const struct ftr_set_desc pfr1 __initdata = { }, }; -static const struct ftr_set_desc isar1 __initdata = { +static const struct ftr_set_desc isar1 __initconst = { .name = "id_aa64isar1", .override = &id_aa64isar1_override, .fields = { @@ -54,7 +54,7 @@ static const struct ftr_set_desc isar1 __initdata = { extern struct arm64_ftr_override kaslr_feature_override; -static const struct ftr_set_desc kaslr __initdata = { +static const struct ftr_set_desc kaslr __initconst = { .name = "kaslr", #ifdef CONFIG_RANDOMIZE_BASE .override = &kaslr_feature_override, @@ -65,7 +65,7 @@ static const struct ftr_set_desc kaslr __initdata = { }, }; -static const struct ftr_set_desc * const regs[] __initdata = { +static const struct ftr_set_desc * const regs[] __initconst = { &mmfr1, &pfr1, &isar1, @@ -73,9 +73,9 @@ st
[PATCH v5 01/21] arm64: Fix labels in el2_setup macros
If someone happens to write the following code: b 1f init_el2_state vhe 1: [...] they will be in for a long debugging session, as the label "1f" will be resolved *inside* the init_el2_state macro instead of after it. Not really what one expects. Instead, rewite the EL2 setup macros to use unambiguous labels, thanks to the usual macro counter trick. Acked-by: Catalin Marinas Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/include/asm/el2_setup.h | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index a7f5a1bbc8ac..540116de80bf 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -45,24 +45,24 @@ mrs x1, id_aa64dfr0_el1 sbfxx0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4 cmp x0, #1 - b.lt1f // Skip if no PMU present + b.lt.Lskip_pmu_\@ // Skip if no PMU present mrs x0, pmcr_el0// Disable debug access traps ubfxx0, x0, #11, #5 // to EL2 and allow access to -1: +.Lskip_pmu_\@: cselx2, xzr, x0, lt // all PMU counters from EL1 /* Statistical profiling */ ubfxx0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 - cbz x0, 3f // Skip if SPE not present + cbz x0, .Lskip_spe_\@ // Skip if SPE not present .ifeqs "\mode", "nvhe" mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2, and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT) - cbnzx0, 2f // then permit sampling of physical + cbnzx0, .Lskip_spe_el2_\@ // then permit sampling of physical mov x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \ 1 << SYS_PMSCR_EL2_PA_SHIFT) msr_s SYS_PMSCR_EL2, x0 // addresses and physical counter -2: +.Lskip_spe_el2_\@: mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) orr x2, x2, x0 // If we don't have VHE, then // use EL1&0 translation. @@ -71,7 +71,7 @@ // and disable access from EL1 .endif -3: +.Lskip_spe_\@: msr mdcr_el2, x2// Configure debug traps .endm @@ -79,9 +79,9 @@ .macro __init_el2_lor mrs x1, id_aa64mmfr1_el1 ubfxx0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4 - cbz x0, 1f + cbz x0, .Lskip_lor_\@ msr_s SYS_LORC_EL1, xzr -1: +.Lskip_lor_\@: .endm /* Stage-2 translation */ @@ -93,7 +93,7 @@ .macro __init_el2_gicv3 mrs x0, id_aa64pfr0_el1 ubfxx0, x0, #ID_AA64PFR0_GIC_SHIFT, #4 - cbz x0, 1f + cbz x0, .Lskip_gicv3_\@ mrs_s x0, SYS_ICC_SRE_EL2 orr x0, x0, #ICC_SRE_EL2_SRE// Set ICC_SRE_EL2.SRE==1 @@ -103,7 +103,7 @@ mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back, tbz x0, #0, 1f // and check that it sticks msr_s SYS_ICH_HCR_EL2, xzr// Reset ICC_HCR_EL2 to defaults -1: +.Lskip_gicv3_\@: .endm .macro __init_el2_hstr @@ -128,14 +128,14 @@ .macro __init_el2_nvhe_sve mrs x1, id_aa64pfr0_el1 ubfxx1, x1, #ID_AA64PFR0_SVE_SHIFT, #4 - cbz x1, 1f + cbz x1, .Lskip_sve_\@ bic x0, x0, #CPTR_EL2_TZ// Also disable SVE traps msr cptr_el2, x0// Disable copro. traps to EL2 isb mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector msr_s SYS_ZCR_EL2, x1 // length for EL1. -1: +.Lskip_sve_\@: .endm .macro __init_el2_nvhe_prepare_eret -- 2.29.2
[PATCH v5 02/21] arm64: Fix outdated TCR setup comment
The arm64 kernel has long be able to use more than 39bit VAs. Since day one, actually. Let's rewrite the offending comment. Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Acked-by: David Brazdil --- arch/arm64/mm/proc.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 1f7ee8c8b7b8..ece785477bdc 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -464,8 +464,8 @@ SYM_FUNC_START(__cpu_setup) #endif msr mair_el1, x5 /* -* Set/prepare TCR and TTBR. We use 512GB (39-bit) address range for -* both user and kernel. +* Set/prepare TCR and TTBR. TCR_EL1.T1SZ gets further +* adjusted if the kernel is compiled with 52bit VA support. */ mov_q x10, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \ TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \ -- 2.29.2
Re: [PATCH v5 13/21] arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line
On 2021-01-25 13:15, Suzuki K Poulose wrote: On 1/25/21 10:50 AM, Marc Zyngier wrote: As we want to be able to disable VHE at runtime, let's match "id_aa64mmfr1.vh=" from the command line as an override. This doesn't have much effect yet as our boot code doesn't look at the cpufeature, but only at the HW registers. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/include/asm/cpufeature.h | 2 ++ arch/arm64/kernel/cpufeature.c | 5 - arch/arm64/kernel/idreg-override.c | 11 +++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 4179cfc8ed57..b0ed37da4067 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -818,6 +818,8 @@ static inline unsigned int get_vmid_bits(u64 mmfr1) return 8; } +extern struct arm64_ftr_override id_aa64mmfr1_override; + u32 get_kvm_ipa_limit(void); void dump_cpu_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4b84a1e1dc51..c1d6712c4249 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -557,6 +557,8 @@ static const struct arm64_ftr_bits ftr_raz[] = { #define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, &no_override) +struct arm64_ftr_override id_aa64mmfr1_override; Does this need to be ro_after_init ? Could do, together with the other two override targeting system registers. Otherwise, looks good to me: Acked-by: Suzuki K Poulose Thanks, M. -- Jazz is not dead. It just smells funny...
Re: (subset) [PATCH v6 0/5] ARM: arm64: Add SMCCC TRNG entropy service
On Wed, 6 Jan 2021 10:34:48 +, Andre Przywara wrote: > a fix to v5, now *really* fixing the wrong priority of SMCCC vs. RNDR > in arch_get_random_seed_long_early(). Apologies for messing this up > in v5 and thanks to broonie for being on the watch! > > Will, Catalin: it would be much appreciated if you could consider taking > patch 1/5. This contains the common definitions, and is a prerequisite > for every other patch, although they are somewhat independent and likely > will need to go through different subsystems. > > [...] Applied to kvm-arm64/rng-5.12, thanks! [5/5] KVM: arm64: implement the TRNG hypervisor call commit: a8e190cdae1bf8e9e490776b8179babc1962bb25 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [BUG]: KVM: arm64: Fix symbol dependency in __hyp_call_panic_nvhe
Hi all, On Mon, 25 Jan 2021 20:07:56 +, Oliver Upton wrote: > > > That means we have two options: > > (a) define __hyp_panic_string in a different .c file in all pre-5.9 > > branches, or > > (b) revert the backported patch. > > > > The patch was needed in 5.9 and should stay there. It wasn't needed in > > earlier > > versions because the symbol was being kept alive by another user. It did > > "fix" > > the inline asm semantics, but the problem was never triggered in pre-5.9. > > > > Sasha, with this and the GCC bug in mind, would you agree that (b) is the > > better course of action? > > Sasha, > > Any chance we can get this patch reverted as David has suggested? It > was backported to 5.4 LTS in commit 653ae33b030b ("KVM: arm64: Fix > symbol dependency in __hyp_call_panic_nvhe") and is causing build > issues with a 4.9.4 vintage of GCC. Absolutely. This issue has been lingering for some time now, and needs addressing. I'm definitely in favour of reverting this patch, although there are two possible alternatives: - Cherry-pick 9fd339a45be5 ("arm64: Work around broken GCC 4.9 handling of "S" constraint"), which works around this particular GCC bug - Cherry-pick dca5244d2f5b ("compiler.h: Raise minimum version of GCC to 5.1 for arm64"), which forbids GCC 4.9 as it has been demonstrated to mis-compile the kernel (and this patch is targeting stable anyway) Thanks, M. -- Without deviation from the norm, progress is not possible.
[PATCH v5 13/21] arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line
As we want to be able to disable VHE at runtime, let's match "id_aa64mmfr1.vh=" from the command line as an override. This doesn't have much effect yet as our boot code doesn't look at the cpufeature, but only at the HW registers. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/include/asm/cpufeature.h | 2 ++ arch/arm64/kernel/cpufeature.c | 5 - arch/arm64/kernel/idreg-override.c | 11 +++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 4179cfc8ed57..b0ed37da4067 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -818,6 +818,8 @@ static inline unsigned int get_vmid_bits(u64 mmfr1) return 8; } +extern struct arm64_ftr_override id_aa64mmfr1_override; + u32 get_kvm_ipa_limit(void); void dump_cpu_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4b84a1e1dc51..c1d6712c4249 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -557,6 +557,8 @@ static const struct arm64_ftr_bits ftr_raz[] = { #define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, &no_override) +struct arm64_ftr_override id_aa64mmfr1_override; + static const struct __ftr_reg_entry { u32 sys_id; struct arm64_ftr_reg*reg; @@ -604,7 +606,8 @@ static const struct __ftr_reg_entry { /* Op1 = 0, CRn = 0, CRm = 7 */ ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0), - ARM64_FTR_REG(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1), + ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1, + &id_aa64mmfr1_override), ARM64_FTR_REG(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2), /* Op1 = 0, CRn = 1, CRm = 2 */ diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 2d184d6674fd..489aa274e3ec 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -10,6 +10,7 @@ #include #include +#include #include struct ftr_set_desc { @@ -21,7 +22,17 @@ struct ftr_set_desc { } fields[]; }; +static const struct ftr_set_desc mmfr1 __initdata = { + .name = "id_aa64mmfr1", + .override = &id_aa64mmfr1_override, + .fields = { + { "vh", ID_AA64MMFR1_VHE_SHIFT }, + {} + }, +}; + static const struct ftr_set_desc * const regs[] __initdata = { + &mmfr1, }; static char *cmdline_contains_option(const char *cmdline, const char *option) -- 2.29.2
[PATCH v5 20/21] arm64: Defer enabling pointer authentication on boot core
From: Srinivas Ramana Defer enabling pointer authentication on boot core until after its required to be enabled by cpufeature framework. This will help in controlling the feature dynamically with a boot parameter. Signed-off-by: Ajay Patil Signed-off-by: Prasad Sodagudi Signed-off-by: Srinivas Ramana Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/1610152163-16554-2-git-send-email-sram...@codeaurora.org Reviewed-by: Catalin Marinas Acked-by: David Brazdil --- arch/arm64/include/asm/pointer_auth.h | 10 ++ arch/arm64/include/asm/stackprotector.h | 1 + arch/arm64/kernel/head.S| 4 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/pointer_auth.h b/arch/arm64/include/asm/pointer_auth.h index c6b4f0603024..b112a11e9302 100644 --- a/arch/arm64/include/asm/pointer_auth.h +++ b/arch/arm64/include/asm/pointer_auth.h @@ -76,6 +76,15 @@ static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr) return ptrauth_clear_pac(ptr); } +static __always_inline void ptrauth_enable(void) +{ + if (!system_supports_address_auth()) + return; + sysreg_clear_set(sctlr_el1, 0, (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | + SCTLR_ELx_ENDA | SCTLR_ELx_ENDB)); + isb(); +} + #define ptrauth_thread_init_user(tsk) \ ptrauth_keys_init_user(&(tsk)->thread.keys_user) #define ptrauth_thread_init_kernel(tsk) \ @@ -84,6 +93,7 @@ static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr) ptrauth_keys_switch_kernel(&(tsk)->thread.keys_kernel) #else /* CONFIG_ARM64_PTR_AUTH */ +#define ptrauth_enable() #define ptrauth_prctl_reset_keys(tsk, arg) (-EINVAL) #define ptrauth_strip_insn_pac(lr) (lr) #define ptrauth_thread_init_user(tsk) diff --git a/arch/arm64/include/asm/stackprotector.h b/arch/arm64/include/asm/stackprotector.h index 7263e0bac680..33f1bb453150 100644 --- a/arch/arm64/include/asm/stackprotector.h +++ b/arch/arm64/include/asm/stackprotector.h @@ -41,6 +41,7 @@ static __always_inline void boot_init_stack_canary(void) #endif ptrauth_thread_init_kernel(current); ptrauth_thread_switch_kernel(current); + ptrauth_enable(); } #endif /* _ASM_STACKPROTECTOR_H */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 3243e3ae9bd8..2e116ef255e1 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -404,10 +404,6 @@ SYM_FUNC_START_LOCAL(__primary_switched) adr_l x5, init_task msr sp_el0, x5 // Save thread_info -#ifdef CONFIG_ARM64_PTR_AUTH - __ptrauth_keys_init_cpu x5, x6, x7, x8 -#endif - adr_l x8, vectors // load VBAR_EL1 with virtual msr vbar_el1, x8// vector table address isb -- 2.29.2
[PATCH v5 16/21] arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0
Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't that easy to understand, and it is likely that users would much prefer write "kvm-arm.mode=nvhe", or "...=protected". So here you go. This has the added advantage that we can now always honor the "kvm-arm.mode=protected" option, even when booting on a VHE system. Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Acked-by: David Brazdil --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/kernel/idreg-override.c | 2 ++ arch/arm64/kvm/arm.c| 3 +++ 3 files changed, 8 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9e3cdb271d06..2786fd39a047 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2257,6 +2257,9 @@ kvm-arm.mode= [KVM,ARM] Select one of KVM/arm64's modes of operation. + nvhe: Standard nVHE-based mode, without support for + protected guests. + protected: nVHE-based mode with support for guests whose state is kept private from the host. Not valid if the kernel is running in EL2. diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 4fad3fc4d104..cbb8eaa48742 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -39,6 +39,8 @@ static const struct { const char *alias; const char *feature; } aliases[] __initdata = { + { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, + { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, }; static char *cmdline_contains_option(const char *cmdline, const char *option) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 04c44853b103..597565a65ca2 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1966,6 +1966,9 @@ static int __init early_kvm_mode_cfg(char *arg) return 0; } + if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) + return 0; + return -EINVAL; } early_param("kvm-arm.mode", early_kvm_mode_cfg); -- 2.29.2
[PATCH v5 15/21] arm64: Add an aliasing facility for the idreg override
In order to map the override of idregs to options that a user can easily understand, let's introduce yet another option array, which maps an option to the corresponding idreg options. Signed-off-by: Marc Zyngier Reviewed-by: Catalin Marinas Acked-by: David Brazdil --- arch/arm64/kernel/idreg-override.c | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 489aa274e3ec..4fad3fc4d104 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -35,6 +35,12 @@ static const struct ftr_set_desc * const regs[] __initdata = { &mmfr1, }; +static const struct { + const char *alias; + const char *feature; +} aliases[] __initdata = { +}; + static char *cmdline_contains_option(const char *cmdline, const char *option) { char *str = strstr(cmdline, option); @@ -88,6 +94,15 @@ static void __init match_options(const char *cmdline) } } +static __init void match_aliases(const char *cmdline) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(aliases); i++) + if (cmdline_contains_option(cmdline, aliases[i].alias)) + match_options(aliases[i].feature); +} + static __init void parse_cmdline(void) { if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) { @@ -108,6 +123,7 @@ static __init void parse_cmdline(void) goto out; match_options(prop); + match_aliases(prop); if (!IS_ENABLED(CONFIG_CMDLINE_EXTEND)) return; @@ -115,6 +131,7 @@ static __init void parse_cmdline(void) out: match_options(CONFIG_CMDLINE); + match_aliases(CONFIG_CMDLINE); } /* Keep checkers quiet */ -- 2.29.2
[PATCH v5 14/21] arm64: Honor VHE being disabled from the command-line
Finally we can check whether VHE is disabled on the command line, and not enable it if that's the user's wish. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/kernel/asm-offsets.c | 3 +++ arch/arm64/kernel/hyp-stub.S| 11 +++ 2 files changed, 14 insertions(+) diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index f42fd9e33981..1add0f21bffe 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -99,6 +99,9 @@ int main(void) DEFINE(CPU_BOOT_STACK, offsetof(struct secondary_data, stack)); DEFINE(CPU_BOOT_TASK,offsetof(struct secondary_data, task)); BLANK(); + DEFINE(FTR_OVR_VAL_OFFSET, offsetof(struct arm64_ftr_override, val)); + DEFINE(FTR_OVR_MASK_OFFSET, offsetof(struct arm64_ftr_override, mask)); + BLANK(); #ifdef CONFIG_KVM DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt)); DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, arch.fault.disr_el1)); diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 6b5c73cf9d52..aa7e8d592295 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -87,6 +87,17 @@ SYM_CODE_START_LOCAL(mutate_to_vhe) ubfxx1, x1, #ID_AA64MMFR1_VHE_SHIFT, #4 cbz x1, 1f + // Check whether VHE is disabled from the command line + adr_l x1, id_aa64mmfr1_override + ldr x2, [x1, FTR_OVR_VAL_OFFSET] + ldr x1, [x1, FTR_OVR_MASK_OFFSET] + ubfxx2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4 + ubfxx1, x1, #ID_AA64MMFR1_VHE_SHIFT, #4 + cmp x1, xzr + and x2, x2, x1 + csinv x2, x2, xzr, ne + cbz x2, 1f + // Engage the VHE magic! mov_q x0, HCR_HOST_VHE_FLAGS msr hcr_el2, x0 -- 2.29.2
[PATCH v5 21/21] arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line
In order to be able to disable Pointer Authentication at runtime, whether it is for testing purposes, or to work around HW issues, let's add support for overriding the ID_AA64ISAR1_EL1.{GPI,GPA,API,APA} fields. This is further mapped on the arm64.nopauth command-line alias. Signed-off-by: Marc Zyngier Reviewed-by: Catalin Marinas Acked-by: David Brazdil --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/include/asm/cpufeature.h | 1 + arch/arm64/kernel/cpufeature.c | 4 +++- arch/arm64/kernel/idreg-override.c | 16 4 files changed, 23 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7599fd0f1ad7..f9cb28a39bd0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -376,6 +376,9 @@ arm64.nobti [ARM64] Unconditionally disable Branch Target Identification support + arm64.nopauth [ARM64] Unconditionally disable Pointer Authentication + support + ataflop=[HW,M68k] atarimouse= [HW,MOUSE] Atari Mouse diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 4e2f2de9d0d7..ec6311903ad4 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -820,6 +820,7 @@ static inline unsigned int get_vmid_bits(u64 mmfr1) extern struct arm64_ftr_override id_aa64mmfr1_override; extern struct arm64_ftr_override id_aa64pfr1_override; +extern struct arm64_ftr_override id_aa64isar1_override; u32 get_kvm_ipa_limit(void); void dump_cpu_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index bb99ddb212b5..954a2b7e5d45 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -559,6 +559,7 @@ static const struct arm64_ftr_bits ftr_raz[] = { struct arm64_ftr_override id_aa64mmfr1_override; struct arm64_ftr_override id_aa64pfr1_override; +struct arm64_ftr_override id_aa64isar1_override; static const struct __ftr_reg_entry { u32 sys_id; @@ -604,7 +605,8 @@ static const struct __ftr_reg_entry { /* Op1 = 0, CRn = 0, CRm = 6 */ ARM64_FTR_REG(SYS_ID_AA64ISAR0_EL1, ftr_id_aa64isar0), - ARM64_FTR_REG(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1), + ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1, + &id_aa64isar1_override), /* Op1 = 0, CRn = 0, CRm = 7 */ ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0), diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 71349b644246..d1310438d95c 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -40,6 +40,18 @@ static const struct ftr_set_desc pfr1 __initdata = { }, }; +static const struct ftr_set_desc isar1 __initdata = { + .name = "id_aa64isar1", + .override = &id_aa64isar1_override, + .fields = { + { "gpi", ID_AA64ISAR1_GPI_SHIFT }, + { "gpa", ID_AA64ISAR1_GPA_SHIFT }, + { "api", ID_AA64ISAR1_API_SHIFT }, + { "apa", ID_AA64ISAR1_APA_SHIFT }, + {} + }, +}; + extern struct arm64_ftr_override kaslr_feature_override; static const struct ftr_set_desc kaslr __initdata = { @@ -56,6 +68,7 @@ static const struct ftr_set_desc kaslr __initdata = { static const struct ftr_set_desc * const regs[] __initdata = { &mmfr1, &pfr1, + &isar1, &kaslr, }; @@ -66,6 +79,9 @@ static const struct { { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, { "arm64.nobti","id_aa64pfr1.bt=0" }, + { "arm64.nopauth", + "id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 " + "id_aa64isar1.api=0 id_aa64isar1.apa=0" }, { "nokaslr","kaslr.disabled=1" }, }; -- 2.29.2
[PATCH v5 11/21] arm64: Extract early FDT mapping from kaslr_early_init()
As we want to parse more options very early in the kernel lifetime, let's always map the FDT early. This is achieved by moving that code out of kaslr_early_init(). No functionnal change expected. Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Acked-by: David Brazdil --- arch/arm64/include/asm/setup.h | 11 +++ arch/arm64/kernel/head.S | 3 ++- arch/arm64/kernel/kaslr.c | 7 +++ arch/arm64/kernel/setup.c | 15 +++ 4 files changed, 31 insertions(+), 5 deletions(-) create mode 100644 arch/arm64/include/asm/setup.h diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h new file mode 100644 index ..d3320618ed14 --- /dev/null +++ b/arch/arm64/include/asm/setup.h @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 + +#ifndef __ARM64_ASM_SETUP_H +#define __ARM64_ASM_SETUP_H + +#include + +void *get_early_fdt_ptr(void); +void early_fdt_map(u64 dt_phys); + +#endif diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index b425d2587cdb..d74e5f84042e 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -433,6 +433,8 @@ SYM_FUNC_START_LOCAL(__primary_switched) bl __pi_memset dsb ishst // Make zero page visible to PTW + mov x0, x21 // pass FDT address in x0 + bl early_fdt_map // Try mapping the FDT early bl switch_to_vhe #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) bl kasan_early_init @@ -440,7 +442,6 @@ SYM_FUNC_START_LOCAL(__primary_switched) #ifdef CONFIG_RANDOMIZE_BASE tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized? b.ne0f - mov x0, x21 // pass FDT address in x0 bl kaslr_early_init// parse FDT for KASLR options cbz x0, 0f // KASLR disabled? just proceed orr x23, x23, x0// record KASLR offset diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c index 1c74c45b9494..5fc86e7d01a1 100644 --- a/arch/arm64/kernel/kaslr.c +++ b/arch/arm64/kernel/kaslr.c @@ -19,6 +19,7 @@ #include #include #include +#include enum kaslr_status { KASLR_ENABLED, @@ -92,12 +93,11 @@ static __init bool is_kaslr_disabled_cmdline(void *fdt) * containing function pointers) to be reinitialized, and zero-initialized * .bss variables will be reset to 0. */ -u64 __init kaslr_early_init(u64 dt_phys) +u64 __init kaslr_early_init(void) { void *fdt; u64 seed, offset, mask, module_range; unsigned long raw; - int size; /* * Set a reasonable default for module_alloc_base in case @@ -111,8 +111,7 @@ u64 __init kaslr_early_init(u64 dt_phys) * and proceed with KASLR disabled. We will make another * attempt at mapping the FDT in setup_machine() */ - early_fixmap_init(); - fdt = fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL); + fdt = get_early_fdt_ptr(); if (!fdt) { kaslr_status = KASLR_DISABLED_FDT_REMAP; return 0; diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index c18aacde8bb0..61845c0821d9 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -168,6 +168,21 @@ static void __init smp_build_mpidr_hash(void) pr_warn("Large number of MPIDR hash buckets detected\n"); } +static void *early_fdt_ptr __initdata; + +void __init *get_early_fdt_ptr(void) +{ + return early_fdt_ptr; +} + +asmlinkage void __init early_fdt_map(u64 dt_phys) +{ + int fdt_size; + + early_fixmap_init(); + early_fdt_ptr = fixmap_remap_fdt(dt_phys, &fdt_size, PAGE_KERNEL); +} + static void __init setup_machine_fdt(phys_addr_t dt_phys) { int size; -- 2.29.2
[PATCH v5 19/21] arm64: cpufeatures: Allow disabling of BTI from the command-line
In order to be able to disable BTI at runtime, whether it is for testing purposes, or to work around HW issues, let's add support for overriding the ID_AA64PFR1_EL1.BTI field. This is further mapped on the arm64.nobti command-line alias. Signed-off-by: Marc Zyngier Reviewed-by: Catalin Marinas Acked-by: David Brazdil --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/include/asm/cpufeature.h | 1 + arch/arm64/kernel/cpufeature.c | 4 +++- arch/arm64/kernel/idreg-override.c | 11 +++ arch/arm64/mm/mmu.c | 2 +- 5 files changed, 19 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2786fd39a047..7599fd0f1ad7 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -373,6 +373,9 @@ arcrimi=[HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards Format: ,, + arm64.nobti [ARM64] Unconditionally disable Branch Target + Identification support + ataflop=[HW,M68k] atarimouse= [HW,MOUSE] Atari Mouse diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index b0ed37da4067..4e2f2de9d0d7 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -819,6 +819,7 @@ static inline unsigned int get_vmid_bits(u64 mmfr1) } extern struct arm64_ftr_override id_aa64mmfr1_override; +extern struct arm64_ftr_override id_aa64pfr1_override; u32 get_kvm_ipa_limit(void); void dump_cpu_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index c1d6712c4249..bb99ddb212b5 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -558,6 +558,7 @@ static const struct arm64_ftr_bits ftr_raz[] = { #define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, &no_override) struct arm64_ftr_override id_aa64mmfr1_override; +struct arm64_ftr_override id_aa64pfr1_override; static const struct __ftr_reg_entry { u32 sys_id; @@ -593,7 +594,8 @@ static const struct __ftr_reg_entry { /* Op1 = 0, CRn = 0, CRm = 4 */ ARM64_FTR_REG(SYS_ID_AA64PFR0_EL1, ftr_id_aa64pfr0), - ARM64_FTR_REG(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1), + ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1, + &id_aa64pfr1_override), ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0), /* Op1 = 0, CRn = 0, CRm = 5 */ diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 3ccf51b84ba4..71349b644246 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -31,6 +31,15 @@ static const struct ftr_set_desc mmfr1 __initdata = { }, }; +static const struct ftr_set_desc pfr1 __initdata = { + .name = "id_aa64pfr1", + .override = &id_aa64pfr1_override, + .fields = { + { "bt", ID_AA64PFR1_BT_SHIFT }, + {} + }, +}; + extern struct arm64_ftr_override kaslr_feature_override; static const struct ftr_set_desc kaslr __initdata = { @@ -46,6 +55,7 @@ static const struct ftr_set_desc kaslr __initdata = { static const struct ftr_set_desc * const regs[] __initdata = { &mmfr1, + &pfr1, &kaslr, }; @@ -55,6 +65,7 @@ static const struct { } aliases[] __initdata = { { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, + { "arm64.nobti","id_aa64pfr1.bt=0" }, { "nokaslr","kaslr.disabled=1" }, }; diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index ae0c3d023824..617e704c980b 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -628,7 +628,7 @@ static bool arm64_early_this_cpu_has_bti(void) if (!IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) return false; - pfr1 = read_sysreg_s(SYS_ID_AA64PFR1_EL1); + pfr1 = __read_sysreg_by_encoding(SYS_ID_AA64PFR1_EL1); return cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_BT_SHIFT); } -- 2.29.2
[PATCH v5 10/21] arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding()
__read_sysreg_by_encoding() is used by a bunch of cpufeature helpers, which should take the feature override into account. Let's do that. For a good measure (and because we are likely to need to further down the line), make this helper available to the rest of the non-modular kernel. Code that needs to know the *real* features of a CPU can still use read_sysreg_s(), and find the bare, ugly truth. Signed-off-by: Marc Zyngier Reviewed-by: Suzuki K Poulose Acked-by: David Brazdil --- arch/arm64/include/asm/cpufeature.h | 1 + arch/arm64/kernel/cpufeature.c | 15 +-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index fe469389068e..4179cfc8ed57 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -606,6 +606,7 @@ void __init setup_cpu_features(void); void check_local_cpu_capabilities(void); u64 read_sanitised_ftr_reg(u32 id); +u64 __read_sysreg_by_encoding(u32 sys_id); static inline bool cpu_supports_mixed_endian_el0(void) { diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index cb58c7c991ef..4b84a1e1dc51 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1145,14 +1145,17 @@ u64 read_sanitised_ftr_reg(u32 id) EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg); #define read_sysreg_case(r)\ - case r: return read_sysreg_s(r) + case r: val = read_sysreg_s(r); break; /* * __read_sysreg_by_encoding() - Used by a STARTING cpu before cpuinfo is populated. * Read the system register on the current CPU */ -static u64 __read_sysreg_by_encoding(u32 sys_id) +u64 __read_sysreg_by_encoding(u32 sys_id) { + struct arm64_ftr_reg *regp; + u64 val; + switch (sys_id) { read_sysreg_case(SYS_ID_PFR0_EL1); read_sysreg_case(SYS_ID_PFR1_EL1); @@ -1195,6 +1198,14 @@ static u64 __read_sysreg_by_encoding(u32 sys_id) BUG(); return 0; } + + regp = get_arm64_ftr_reg(sys_id); + if (regp) { + val &= ~regp->override->mask; + val |= (regp->override->val & regp->override->mask); + } + + return val; } #include -- 2.29.2
[PATCH v5 17/21] KVM: arm64: Document HVC_VHE_RESTART stub hypercall
For completeness, let's document the HVC_VHE_RESTART stub. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- Documentation/virt/kvm/arm/hyp-abi.rst | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/virt/kvm/arm/hyp-abi.rst b/Documentation/virt/kvm/arm/hyp-abi.rst index 83cadd8186fa..4d43fbc25195 100644 --- a/Documentation/virt/kvm/arm/hyp-abi.rst +++ b/Documentation/virt/kvm/arm/hyp-abi.rst @@ -58,6 +58,15 @@ these functions (see arch/arm{,64}/include/asm/virt.h): into place (arm64 only), and jump to the restart address while at HYP/EL2. This hypercall is not expected to return to its caller. +* :: + +x0 = HVC_VHE_RESTART (arm64 only) + + Attempt to upgrade the kernel's exception level from EL1 to EL2 by enabling + the VHE mode. This is conditioned by the CPU supporting VHE, the EL2 MMU + being off, and VHE not being disabled by any other means (command line + option, for example). + Any other value of r0/x0 triggers a hypervisor-specific handling, which is not documented here. -- 2.29.2
[PATCH v5 18/21] arm64: Move "nokaslr" over to the early cpufeature infrastructure
Given that the early cpufeature infrastructure has borrowed quite a lot of code from the kaslr implementation, let's reimplement the matching of the "nokaslr" option with it. Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Acked-by: David Brazdil --- arch/arm64/kernel/idreg-override.c | 15 + arch/arm64/kernel/kaslr.c | 36 ++ 2 files changed, 17 insertions(+), 34 deletions(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index cbb8eaa48742..3ccf51b84ba4 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -31,8 +31,22 @@ static const struct ftr_set_desc mmfr1 __initdata = { }, }; +extern struct arm64_ftr_override kaslr_feature_override; + +static const struct ftr_set_desc kaslr __initdata = { + .name = "kaslr", +#ifdef CONFIG_RANDOMIZE_BASE + .override = &kaslr_feature_override, +#endif + .fields = { + { "disabled", 0 }, + {} + }, +}; + static const struct ftr_set_desc * const regs[] __initdata = { &mmfr1, + &kaslr, }; static const struct { @@ -41,6 +55,7 @@ static const struct { } aliases[] __initdata = { { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, + { "nokaslr","kaslr.disabled=1" }, }; static char *cmdline_contains_option(const char *cmdline, const char *option) diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c index 5fc86e7d01a1..27f8939deb1b 100644 --- a/arch/arm64/kernel/kaslr.c +++ b/arch/arm64/kernel/kaslr.c @@ -51,39 +51,7 @@ static __init u64 get_kaslr_seed(void *fdt) return ret; } -static __init bool cmdline_contains_nokaslr(const u8 *cmdline) -{ - const u8 *str; - - str = strstr(cmdline, "nokaslr"); - return str == cmdline || (str > cmdline && *(str - 1) == ' '); -} - -static __init bool is_kaslr_disabled_cmdline(void *fdt) -{ - if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) { - int node; - const u8 *prop; - - node = fdt_path_offset(fdt, "/chosen"); - if (node < 0) - goto out; - - prop = fdt_getprop(fdt, node, "bootargs", NULL); - if (!prop) - goto out; - - if (cmdline_contains_nokaslr(prop)) - return true; - - if (IS_ENABLED(CONFIG_CMDLINE_EXTEND)) - goto out; - - return false; - } -out: - return cmdline_contains_nokaslr(CONFIG_CMDLINE); -} +struct arm64_ftr_override kaslr_feature_override __initdata; /* * This routine will be executed with the kernel mapped at its default virtual @@ -126,7 +94,7 @@ u64 __init kaslr_early_init(void) * Check if 'nokaslr' appears on the command line, and * return 0 if that is the case. */ - if (is_kaslr_disabled_cmdline(fdt)) { + if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) { kaslr_status = KASLR_DISABLED_CMDLINE; return 0; } -- 2.29.2
[PATCH v5 12/21] arm64: cpufeature: Add an early command-line cpufeature override facility
In order to be able to override CPU features at boot time, let's add a command line parser that matches options of the form "cpureg.feature=value", and store the corresponding value into the override val/mask pair. No features are currently defined, so no expected change in functionality. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/kernel/Makefile | 2 +- arch/arm64/kernel/head.S | 1 + arch/arm64/kernel/idreg-override.c | 130 + 3 files changed, 132 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/kernel/idreg-override.c diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 86364ab6f13f..2262f0392857 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -17,7 +17,7 @@ obj-y := debug-monitors.o entry.o irq.o fpsimd.o \ return_address.o cpuinfo.o cpu_errata.o \ cpufeature.o alternative.o cacheinfo.o \ smp.o smp_spin_table.o topology.o smccc-call.o \ - syscall.o proton-pack.o + syscall.o proton-pack.o idreg-override.o targets+= efi-entry.o diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index d74e5f84042e..3243e3ae9bd8 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -435,6 +435,7 @@ SYM_FUNC_START_LOCAL(__primary_switched) mov x0, x21 // pass FDT address in x0 bl early_fdt_map // Try mapping the FDT early + bl init_feature_override bl switch_to_vhe #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) bl kasan_early_init diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c new file mode 100644 index ..2d184d6674fd --- /dev/null +++ b/arch/arm64/kernel/idreg-override.c @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Early cpufeature override framework + * + * Copyright (C) 2020 Google LLC + * Author: Marc Zyngier + */ + +#include +#include + +#include +#include + +struct ftr_set_desc { + const char *name; + struct arm64_ftr_override *override; + struct { + const char *name; + u8 shift; + } fields[]; +}; + +static const struct ftr_set_desc * const regs[] __initdata = { +}; + +static char *cmdline_contains_option(const char *cmdline, const char *option) +{ + char *str = strstr(cmdline, option); + + if ((str == cmdline || (str > cmdline && *(str - 1) == ' '))) + return str; + + return NULL; +} + +static int __init find_field(const char *cmdline, +const struct ftr_set_desc *reg, int f, u64 *v) +{ + char buf[256], *str; + size_t len; + + snprintf(buf, ARRAY_SIZE(buf), "%s.%s=", reg->name, reg->fields[f].name); + + str = cmdline_contains_option(cmdline, buf); + if (!str) + return -1; + + str += strlen(buf); + len = strcspn(str, " "); + len = min(len, ARRAY_SIZE(buf) - 1); + strncpy(buf, str, len); + buf[len] = 0; + + return kstrtou64(buf, 0, v); +} + +static void __init match_options(const char *cmdline) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(regs); i++) { + int f; + + if (!regs[i]->override) + continue; + + for (f = 0; regs[i]->fields[f].name; f++) { + u64 v; + + if (find_field(cmdline, regs[i], f, &v)) + continue; + + regs[i]->override->val |= (v & 0xf) << regs[i]->fields[f].shift; + regs[i]->override->mask |= 0xfUL << regs[i]->fields[f].shift; + } + } +} + +static __init void parse_cmdline(void) +{ + if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) { + const u8 *prop; + void *fdt; + int node; + + fdt = get_early_fdt_ptr(); + if (!fdt) + goto out; + + node = fdt_path_offset(fdt, "/chosen"); + if (node < 0) + goto out; + + prop = fdt_getprop(fdt, node, "bootargs", NULL); + if (!prop) + goto out; + + match_options(prop); + + if (!IS_ENABLED(CONFIG_CMDLINE_EXTEND)) + return; + } + +out: + match_options(CONFIG_CMDLINE); +} + +/* Keep checkers quiet */ +void init_feature_o
[PATCH v5 07/21] arm64: Simplify init_el2_state to be non-VHE only
As init_el2_state is now nVHE only, let's simplify it and drop the VHE setup. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/include/asm/el2_setup.h | 36 +++--- arch/arm64/kernel/head.S | 2 +- arch/arm64/kvm/hyp/nvhe/hyp-init.S | 2 +- 3 files changed, 10 insertions(+), 30 deletions(-) diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index 540116de80bf..d77d358f9395 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -32,16 +32,14 @@ * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in * EL2. */ -.macro __init_el2_timers mode -.ifeqs "\mode", "nvhe" +.macro __init_el2_timers mrs x0, cnthctl_el2 orr x0, x0, #3 // Enable EL1 physical timers msr cnthctl_el2, x0 -.endif msr cntvoff_el2, xzr// Clear virtual offset .endm -.macro __init_el2_debug mode +.macro __init_el2_debug mrs x1, id_aa64dfr0_el1 sbfxx0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4 cmp x0, #1 @@ -55,7 +53,6 @@ ubfxx0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 cbz x0, .Lskip_spe_\@ // Skip if SPE not present -.ifeqs "\mode", "nvhe" mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2, and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT) cbnzx0, .Lskip_spe_el2_\@ // then permit sampling of physical @@ -66,10 +63,6 @@ mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) orr x2, x2, x0 // If we don't have VHE, then // use EL1&0 translation. -.else - orr x2, x2, #MDCR_EL2_TPMS // For VHE, use EL2 translation - // and disable access from EL1 -.endif .Lskip_spe_\@: msr mdcr_el2, x2// Configure debug traps @@ -145,37 +138,24 @@ /** * Initialize EL2 registers to sane values. This should be called early on all - * cores that were booted in EL2. + * cores that were booted in EL2. Note that everything gets initialised as + * if VHE was not evailable. The kernel context will be upgraded to VHE + * if possible later on in the boot process * * Regs: x0, x1 and x2 are clobbered. */ -.macro init_el2_state mode -.ifnes "\mode", "vhe" -.ifnes "\mode", "nvhe" -.error "Invalid 'mode' argument" -.endif -.endif - +.macro init_el2_state __init_el2_sctlr - __init_el2_timers \mode - __init_el2_debug \mode + __init_el2_timers + __init_el2_debug __init_el2_lor __init_el2_stage2 __init_el2_gicv3 __init_el2_hstr - - /* -* When VHE is not in use, early init of EL2 needs to be done here. -* When VHE _is_ in use, EL1 will not be used in the host and -* requires no configuration, and all non-hyp-specific EL2 setup -* will be done via the _EL1 system register aliases in __cpu_setup. -*/ -.ifeqs "\mode", "nvhe" __init_el2_nvhe_idregs __init_el2_nvhe_cptr __init_el2_nvhe_sve __init_el2_nvhe_prepare_eret -.endif .endm #endif /* __ARM_KVM_INIT_H__ */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 07445fd976ef..36212c05df42 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -501,7 +501,7 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL) msr hcr_el2, x0 isb - init_el2_state nvhe + init_el2_state /* Hypervisor stub */ adr_l x0, __hyp_stub_vectors diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S index 31b060a44045..222cfc3e7190 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S @@ -189,7 +189,7 @@ SYM_CODE_START_LOCAL(__kvm_hyp_init_cpu) 2: msr SPsel, #1 // We want to use SP_EL{1,2} /* Initialize EL2 CPU state to sane values. */ - init_el2_state nvhe // Clobbers x0..x2 + init_el2_state // Clobbers x0..x2 /* Enable MMU, set vectors and stack. */ mov x0, x28 -- 2.29.2
[PATCH v5 05/21] arm64: Initialise as nVHE before switching to VHE
As we are aiming to be able to control whether we enable VHE or not, let's always drop down to EL1 first, and only then upgrade to VHE if at all possible. This means that if the kernel is booted at EL2, we always start with a nVHE init, drop to EL1 to initialise the the kernel, and only then upgrade the kernel EL to EL2 if possible (the process is obviously shortened for secondary CPUs). The resume path is handled similarly to a secondary CPU boot. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/kernel/head.S | 38 ++-- arch/arm64/kernel/hyp-stub.S | 24 +++ arch/arm64/kernel/sleep.S| 1 + 3 files changed, 27 insertions(+), 36 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 28e9735302df..07445fd976ef 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -433,6 +433,7 @@ SYM_FUNC_START_LOCAL(__primary_switched) bl __pi_memset dsb ishst // Make zero page visible to PTW + bl switch_to_vhe #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) bl kasan_early_init #endif @@ -493,42 +494,6 @@ SYM_INNER_LABEL(init_el1, SYM_L_LOCAL) eret SYM_INNER_LABEL(init_el2, SYM_L_LOCAL) -#ifdef CONFIG_ARM64_VHE - /* -* Check for VHE being present. x2 being non-zero indicates that we -* do have VHE, and that the kernel is intended to run at EL2. -*/ - mrs x2, id_aa64mmfr1_el1 - ubfxx2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4 -#else - mov x2, xzr -#endif - cbz x2, init_el2_nvhe - - /* -* When VHE _is_ in use, EL1 will not be used in the host and -* requires no configuration, and all non-hyp-specific EL2 setup -* will be done via the _EL1 system register aliases in __cpu_setup. -*/ - mov_q x0, HCR_HOST_VHE_FLAGS - msr hcr_el2, x0 - isb - - init_el2_state vhe - - isb - - mov_q x0, INIT_PSTATE_EL2 - msr spsr_el2, x0 - msr elr_el2, lr - mov w0, #BOOT_CPU_MODE_EL2 - eret - -SYM_INNER_LABEL(init_el2_nvhe, SYM_L_LOCAL) - /* -* When VHE is not in use, early init of EL2 and EL1 needs to be -* done here. -*/ mov_q x0, INIT_SCTLR_EL1_MMU_OFF msr sctlr_el1, x0 @@ -623,6 +588,7 @@ SYM_FUNC_START_LOCAL(secondary_startup) /* * Common entry point for secondary CPUs. */ + bl switch_to_vhe bl __cpu_secondary_check52bitva bl __cpu_setup // initialise processor adrpx1, swapper_pg_dir diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 3f3dbbe8914d..373ed2213e1d 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -190,3 +190,27 @@ SYM_FUNC_START(__hyp_reset_vectors) hvc #0 ret SYM_FUNC_END(__hyp_reset_vectors) + +/* + * Entry point to switch to VHE if deemed capable + */ +SYM_FUNC_START(switch_to_vhe) +#ifdef CONFIG_ARM64_VHE + // Need to have booted at EL2 + adr_l x1, __boot_cpu_mode + ldr w0, [x1] + cmp w0, #BOOT_CPU_MODE_EL2 + b.ne1f + + // and still be at EL1 + mrs x0, CurrentEL + cmp x0, #CurrentEL_EL1 + b.ne1f + + // Turn the world upside down + mov x0, #HVC_VHE_RESTART + hvc #0 +1: +#endif + ret +SYM_FUNC_END(switch_to_vhe) diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S index 6bdef7362c0e..5bfd9b87f85d 100644 --- a/arch/arm64/kernel/sleep.S +++ b/arch/arm64/kernel/sleep.S @@ -100,6 +100,7 @@ SYM_FUNC_END(__cpu_suspend_enter) .pushsection ".idmap.text", "awx" SYM_CODE_START(cpu_resume) bl init_kernel_el + bl switch_to_vhe bl __cpu_setup /* enable the MMU early - so we can access sleep_save_stash by va */ adrpx1, swapper_pg_dir -- 2.29.2
[PATCH v5 08/21] arm64: Move SCTLR_EL1 initialisation to EL-agnostic code
We can now move the initial SCTLR_EL1 setup to be used for both EL1 and EL2 setup. Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Acked-by: David Brazdil --- arch/arm64/kernel/head.S | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 36212c05df42..b425d2587cdb 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -479,13 +479,14 @@ EXPORT_SYMBOL(kimage_vaddr) * booted in EL1 or EL2 respectively. */ SYM_FUNC_START(init_kernel_el) + mov_q x0, INIT_SCTLR_EL1_MMU_OFF + msr sctlr_el1, x0 + mrs x0, CurrentEL cmp x0, #CurrentEL_EL2 b.eqinit_el2 SYM_INNER_LABEL(init_el1, SYM_L_LOCAL) - mov_q x0, INIT_SCTLR_EL1_MMU_OFF - msr sctlr_el1, x0 isb mov_q x0, INIT_PSTATE_EL1 msr spsr_el1, x0 @@ -494,9 +495,6 @@ SYM_INNER_LABEL(init_el1, SYM_L_LOCAL) eret SYM_INNER_LABEL(init_el2, SYM_L_LOCAL) - mov_q x0, INIT_SCTLR_EL1_MMU_OFF - msr sctlr_el1, x0 - mov_q x0, HCR_HOST_NVHE_FLAGS msr hcr_el2, x0 isb -- 2.29.2
Re: [PATCH v4 15/21] arm64: Add an aliasing facility for the idreg override
On Mon, 18 Jan 2021 13:18:39 +, David Brazdil wrote: > > On Mon, Jan 18, 2021 at 09:45:27AM +, Marc Zyngier wrote: > > In order to map the override of idregs to options that a user > > can easily understand, let's introduce yet another option > > array, which maps an option to the corresponding idreg options. > > > > Signed-off-by: Marc Zyngier > Acked-by: David Brazdil > > > --- > > arch/arm64/kernel/idreg-override.c | 20 > > 1 file changed, 20 insertions(+) > > > > diff --git a/arch/arm64/kernel/idreg-override.c > > b/arch/arm64/kernel/idreg-override.c > > index 75d9845f489b..16bc8b3b93ae 100644 > > --- a/arch/arm64/kernel/idreg-override.c > > +++ b/arch/arm64/kernel/idreg-override.c > > @@ -37,6 +37,12 @@ static const struct reg_desc * const regs[] __initdata = > > { > > &mmfr1, > > }; > > > > +static const struct { > > + const char * const alias; > > + const char * const feature; > > +} aliases[] __initdata = { > > +}; > > + > > static int __init find_field(const char *cmdline, const struct reg_desc > > *reg, > > int f, u64 *v) > > { > > @@ -80,6 +86,18 @@ static void __init match_options(const char *cmdline) > > } > > } > > > > +static __init void match_aliases(const char *cmdline) > > +{ > > + int i; > > + > > + for (i = 0; i < ARRAY_SIZE(aliases); i++) { > > + char *str = strstr(cmdline, aliases[i].alias); > > + > > + if ((str == cmdline || (str > cmdline && *(str - 1) == ' '))) > > nit: Extract to a 'cmdline_contains' helper? Took me a good few seconds to > parse this in the previous patch. Giving it a name would help, and now it's > also shared. Good point. Adopted! Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v4 04/21] arm64: Provide an 'upgrade to VHE' stub hypercall
On Mon, 18 Jan 2021 11:25:16 +, David Brazdil wrote: > > On Mon, Jan 18, 2021 at 09:45:16AM +, Marc Zyngier wrote: > > As we are about to change the way a VHE system boots, let's > > provide the core helper, in the form of a stub hypercall that > > enables VHE and replicates the full EL1 context at EL2, thanks > > to EL1 and VHE-EL2 being extremely similar. > > > > On exception return, the kernel carries on at EL2. Fancy! > > > > Nothing calls this new hypercall yet, so no functional change. > > > > Signed-off-by: Marc Zyngier > > --- > > arch/arm64/include/asm/virt.h | 7 +++- > > arch/arm64/kernel/hyp-stub.S | 67 +-- > > 2 files changed, 71 insertions(+), 3 deletions(-) > > > > diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h > > index ee6a48df89d9..7379f35ae2c6 100644 > > --- a/arch/arm64/include/asm/virt.h > > +++ b/arch/arm64/include/asm/virt.h > > @@ -35,8 +35,13 @@ > > */ > > #define HVC_RESET_VECTORS 2 > > > > +/* > > + * HVC_VHE_RESTART - Upgrade the CPU from EL1 to EL2, if possible > > + */ > > +#define HVC_VHE_RESTART3 > > + > > /* Max number of HYP stub hypercalls */ > > -#define HVC_STUB_HCALL_NR 3 > > +#define HVC_STUB_HCALL_NR 4 > > > > /* Error returned when an invalid stub number is passed into x0 */ > > #define HVC_STUB_ERR 0xbadca11 > > diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S > > index 160f5881a0b7..fb12398b5c28 100644 > > --- a/arch/arm64/kernel/hyp-stub.S > > +++ b/arch/arm64/kernel/hyp-stub.S > > @@ -8,9 +8,9 @@ > > > > #include > > #include > > -#include > > > > #include > > +#include > > #include > > #include > > #include > > @@ -47,10 +47,13 @@ SYM_CODE_END(__hyp_stub_vectors) > > > > SYM_CODE_START_LOCAL(el1_sync) > > cmp x0, #HVC_SET_VECTORS > > - b.ne2f > > + b.ne1f > > msr vbar_el2, x1 > > b 9f > > > > +1: cmp x0, #HVC_VHE_RESTART > > + b.eqmutate_to_vhe > > + > > 2: cmp x0, #HVC_SOFT_RESTART > > b.ne3f > > mov x0, x2 > > @@ -70,6 +73,66 @@ SYM_CODE_START_LOCAL(el1_sync) > > eret > > SYM_CODE_END(el1_sync) > > > > +// nVHE? No way! Give me the real thing! > > +SYM_CODE_START_LOCAL(mutate_to_vhe) > > + // Sanity check: MMU *must* be off > > + mrs x0, sctlr_el2 > > + tbnzx0, #0, 1f > > + > > + // Needs to be VHE capable, obviously > > + mrs x0, id_aa64mmfr1_el1 > > + ubfxx0, x0, #ID_AA64MMFR1_VHE_SHIFT, #4 > > + cbz x0, 1f > > nit: There is a HVC_STUB_ERR that you could return if these sanity > checks fail. The documentation also states that it should be > returned on error. Good point. I've now added it, but how the error can be handled is still up in the air. For now, I've decided to let the kernel continue its (probably doomed) course. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v4 18/21] arm64: Move "nokaslr" over to the early cpufeature infrastructure
On Mon, 18 Jan 2021 14:46:36 +, David Brazdil wrote: > > On Mon, Jan 18, 2021 at 09:45:30AM +, Marc Zyngier wrote: > > Given that the early cpufeature infrastructure has borrowed quite > > a lot of code from the kaslr implementation, let's reimplement > > the matching of the "nokaslr" option with it. > > > > Signed-off-by: Marc Zyngier > Acked-by: David Brazdil [...] > > @@ -126,7 +95,7 @@ u64 __init kaslr_early_init(void) > > * Check if 'nokaslr' appears on the command line, and > > * return 0 if that is the case. > > */ > > - if (is_kaslr_disabled_cmdline(fdt)) { > > + if (kaslr_feature_val & kaslr_feature_mask & 0xf) { > > nit: Isn't the 0xf redundant here? You don't re-mask for VH either. Actually, I do. See the two back to back ubfx that extract both the mask and the feature. The "& 0xf" here serves the same purpose. Is it redundant? At the moment, quite possibly. But since we have space for 16 "features", this is an indication that we are only using the first one. I expect that eventually, we'll use it for other things. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v4 12/21] arm64: cpufeature: Add an early command-line cpufeature override facility
On Sat, 23 Jan 2021 13:43:52 +, Catalin Marinas wrote: > > On Mon, Jan 18, 2021 at 09:45:24AM +, Marc Zyngier wrote: > > +struct reg_desc { > > + const char * const name; > > + u64 * const val; > > + u64 * const mask; > > + struct { > > + const char * const name; > > + u8 shift; > > + } fields[]; > > +}; > > Sorry, I didn't see this earlier. Do we need to add all these consts > here? So you want the pointers to be const but why is 'shift' special > and not a const then? Is it modified later? > > Would this not work: > > struct reg_desc { > const char *name; > u64 *val; > u64 *mask; > struct { > const char *name; > u8 shift; > } fields[]; > }; > > > +static const struct reg_desc * const regs[] __initdata = { > > as we already declare the whole struct reg_desc pointers here as const. > I may have confused myself... It definitely is better. Specially given that we throw all of this away right after boot, there is no harm in keeping it simple. I've also renamed "reg_desc" to "ftr_set_desc", because we don't always describe a register (like for kaslr). Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v4 14/21] arm64: Honor VHE being disabled from the command-line
On Sat, 23 Jan 2021 14:07:53 +, Catalin Marinas wrote: > > On Mon, Jan 18, 2021 at 09:45:26AM +, Marc Zyngier wrote: > > diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S > > index 59820f9b8522..bbab2148a2a2 100644 > > --- a/arch/arm64/kernel/hyp-stub.S > > +++ b/arch/arm64/kernel/hyp-stub.S > > @@ -77,13 +77,24 @@ SYM_CODE_END(el1_sync) > > SYM_CODE_START_LOCAL(mutate_to_vhe) > > // Sanity check: MMU *must* be off > > mrs x0, sctlr_el2 > > - tbnzx0, #0, 1f > > + tbnzx0, #0, 2f > > > > // Needs to be VHE capable, obviously > > mrs x0, id_aa64mmfr1_el1 > > ubfxx0, x0, #ID_AA64MMFR1_VHE_SHIFT, #4 > > - cbz x0, 1f > > + cbz x0, 2f > > > > + // Check whether VHE is disabled from the command line > > + adr_l x1, id_aa64mmfr1_val > > + ldr x0, [x1] > > + adr_l x1, id_aa64mmfr1_mask > > + ldr x1, [x1] > > + ubfxx0, x0, #ID_AA64MMFR1_VHE_SHIFT, #4 > > + ubfxx1, x1, #ID_AA64MMFR1_VHE_SHIFT, #4 > > + cbz x1, 1f > > + and x0, x0, x1 > > + cbz x0, 2f > > +1: > > I can see the advantage here in separate id_aa64mmfr1_val/mask but we > could use some asm offsets here and keep the pointer indirection simpler > in C code. You'd just need something like 'adr_l mmfr1_ovrd + VAL_OFFSET'. > > Anyway, if you have a strong preference for the current approach, leave > it as is. I've now moved over to a structure containing both val/mask, meaning that we only need to keep a single pointer around in the various feature descriptors. It certainly looks better. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2 0/8] arm64: Relocate absolute hyp VAs
On Tue, 5 Jan 2021 18:05:33 +, David Brazdil wrote: > nVHE hyp code is linked into the same kernel binary but executes under > different memory mappings. If the compiler of hyp code chooses absolute > addressing for accessing a symbol, the kernel linker will relocate that > address to a kernel image virtual address, causing a runtime exception. > > So far the strategy has been to force PC-relative addressing by wrapping > all symbol references with the hyp_symbol_addr macro. This is error > prone and developer unfriendly. > > [...] Applied to kvm-arm64/hyp-reloc, thanks! [1/8] KVM: arm64: Rename .idmap.text in hyp linker script commit: eceaf38f521982bad6dbac1c02becdd80fd6af7c [2/8] KVM: arm64: Set up .hyp.rodata ELF section commit: 16174eea2e4fe8247e04c17da682f2034fec0369 [3/8] KVM: arm64: Add symbol at the beginning of each hyp section commit: f7a4825d9569593b9a81f0768313b86175691ef1 [4/8] KVM: arm64: Generate hyp relocation data commit: 8c49b5d43d4c45ca0bb0d1faa23feef2e76e89fa [5/8] KVM: arm64: Apply hyp relocations at runtime commit: 6ec6259d7084ed32e164c9f7b69049464dd90fa5 [6/8] KVM: arm64: Fix constant-pool users in hyp commit: 97cbd2fc0257c6af7036a9a6415ca8ad43535d6b [7/8] KVM: arm64: Remove patching of fn pointers in hyp commit: 537db4af26e3f2e0f304f2032bc593f7e2a54938 [8/8] KVM: arm64: Remove hyp_symbol_addr commit: 247bc166e6b3b1e4068f120f55582a3aa210cc2d Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2 8/8] KVM: arm64: Remove hyp_symbol_addr
On Tue, 05 Jan 2021 18:05:41 +, David Brazdil wrote: > > Hyp code used the hyp_symbol_addr helper to force PC-relative addressing > because absolute addressing results in kernel VAs due to the way hyp > code is linked. This is not true anymore, so remove the helper and > update all of its users. > > Acked-by: Ard Biesheuvel > Signed-off-by: David Brazdil > --- > arch/arm64/include/asm/kvm_asm.h | 26 > arch/arm64/kvm/hyp/include/hyp/switch.h | 4 ++-- > arch/arm64/kvm/hyp/nvhe/hyp-smp.c| 4 ++-- > arch/arm64/kvm/hyp/nvhe/psci-relay.c | 24 +++--- > arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c | 2 +- > 5 files changed, 17 insertions(+), 43 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_asm.h > b/arch/arm64/include/asm/kvm_asm.h > index 8a33d83ea843..22d933e9b59e 100644 > --- a/arch/arm64/include/asm/kvm_asm.h > +++ b/arch/arm64/include/asm/kvm_asm.h > @@ -199,32 +199,6 @@ extern void __vgic_v3_init_lrs(void); > > extern u32 __kvm_get_mdcr_el2(void); > > -#if defined(GCC_VERSION) && GCC_VERSION < 5 > -#define SYM_CONSTRAINT "i" > -#else > -#define SYM_CONSTRAINT "S" > -#endif > - > -/* > - * Obtain the PC-relative address of a kernel symbol > - * s: symbol > - * > - * The goal of this macro is to return a symbol's address based on a > - * PC-relative computation, as opposed to a loading the VA from a > - * constant pool or something similar. This works well for HYP, as an > - * absolute VA is guaranteed to be wrong. Only use this if trying to > - * obtain the address of a symbol (i.e. not something you obtained by > - * following a pointer). > - */ > -#define hyp_symbol_addr(s) \ > - ({ \ > - typeof(s) *addr;\ > - asm("adrp %0, %1\n" \ > - "add%0, %0, :lo12:%1\n" \ > - : "=r" (addr) : SYM_CONSTRAINT (&s)); \ > - addr; \ > - }) > - This hunk is going to conflict in a fairly benign way with the removal of the GCC workaround which I think Will queued for 5.12. I'll work something out... Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v2 3/8] KVM: arm64: Add symbol at the beginning of each hyp section
On Tue, 05 Jan 2021 18:05:36 +, David Brazdil wrote: > > Generating hyp relocations will require referencing positions at a given > offset from the beginning of hyp sections. Since the final layout will > not be determined until the linking of `vmlinux`, modify the hyp linker > script to insert a symbol at the first byte of each hyp section to use > as an anchor. The linker of `vmlinux` will place the symbols together > with the sections. > > Signed-off-by: David Brazdil > --- > arch/arm64/include/asm/hyp_image.h | 29 +++-- > arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 4 ++-- > 2 files changed, 29 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/include/asm/hyp_image.h > b/arch/arm64/include/asm/hyp_image.h > index daa1a1da539e..f97b774b58f4 100644 > --- a/arch/arm64/include/asm/hyp_image.h > +++ b/arch/arm64/include/asm/hyp_image.h > @@ -7,6 +7,9 @@ > #ifndef __ARM64_HYP_IMAGE_H__ > #define __ARM64_HYP_IMAGE_H__ > > +#define HYP_CONCAT(a, b) __HYP_CONCAT(a, b) > +#define __HYP_CONCAT(a, b) a ## b > + OK, this may seem like a completely pointless comment, but I have a total mental block when I see macros written upside down I'll fix that myself, no need to resend just for that. Thanks, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH] genirq/msi: Make sure early activation of all PCI MSIs
Hi Shameer, On 2021-01-22 09:21, Shameerali Kolothum Thodi wrote: Hi Marc, [...] I find this pretty complicated, and the I'd like to avoid injecting the PCI MSI-vs-MSI-X concept in something that is supposed to be bus-agnostic. Agree. That’s mainly because I was very skeptical(TBH, very limited in my knowledge on this msi core code) about changing the MSI-X activation logic here and just thought of limiting the impact to MSI case as a first attempt. What's wrong with the following (untested) patch, which looks much simpler? Yes, had tried this as one of the early fix, but as said above was not very sure of the impact on other platforms. Tested this again and it works. Please send it. Actually, there is a better way, which is to have a proper iterator on vectors instead of descriptors. With that and a bit of cleanup, the patch[1] looks pretty neat. Please give it a go. Thanks, M. [1] https://lore.kernel.org/r/20210123122759.1781359-1-...@kernel.org -- Jazz is not dead. It just smells funny...
[PATCH] genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), we perform the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by: Shameer Kolothum Signed-off-by: Marc Zyngier Cc: sta...@vger.kernel.org --- include/linux/msi.h | 6 ++ kernel/irq/msi.c| 44 2 files changed, 26 insertions(+), 24 deletions(-) diff --git a/include/linux/msi.h b/include/linux/msi.h index 360a0a7e7341..aef35fd1cf11 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -178,6 +178,12 @@ struct msi_desc { list_for_each_entry((desc), dev_to_msi_list((dev)), list) #define for_each_msi_entry_safe(desc, tmp, dev)\ list_for_each_entry_safe((desc), (tmp), dev_to_msi_list((dev)), list) +#define for_each_msi_vector(desc, __irq, dev) \ + for_each_msi_entry((desc), (dev)) \ + if ((desc)->irq)\ + for (__irq = (desc)->irq; \ +__irq < ((desc)->irq + (desc)->nvec_used); \ +__irq++) #ifdef CONFIG_IRQ_MSI_IOMMU static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index 2c0c4d6d0f83..d924676c8781 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -436,22 +436,22 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, can_reserve = msi_check_reservation_mode(domain, info, dev); - for_each_msi_entry(desc, dev) { - virq = desc->irq; - if (desc->nvec_used == 1) - dev_dbg(dev, "irq %d for MSI\n", virq); - else + /* +* This flag is set by the PCI layer as we need to activate +* the MSI entries before the PCI layer enables MSI in the +* card. Otherwise the card latches a random msi message. +*/ + if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) + goto skip_activate; + + for_each_msi_vector(desc, i, dev) { + if (desc->irq == i) { + virq = desc->irq; dev_dbg(dev, "irq [%d-%d] for MSI\n", virq, virq + desc->nvec_used - 1); - /* -* This flag is set by the PCI layer as we need to activate -* the MSI entries before the PCI layer enables MSI in the -* card. Otherwise the card latches a random msi message. -*/ - if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) - continue; + } - irq_data = irq_domain_get_irq_data(domain, desc->irq); + irq_data = irq_domain_get_irq_data(domain, i); if (!can_reserve) { irqd_clr_can_reserve(irq_data); if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK) @@ -462,28 +462,24 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, goto cleanup; } +skip_activate: /* * If these interrupts use reservation mode, clear the activated bit * so request_irq() will assign the final vector. */ if (can_reserve) { - for_each_msi_entry(desc, dev) { - irq_data = irq_domain_get_irq_data(domain, desc->irq); + for_each_msi_vector(desc, i, dev) { + irq_data = irq_domain_get_irq_data(domain, i); irqd_clr_activated(irq_data); } } return 0; cleanup: - for_each_msi_entry(desc, dev) { - struct irq_data *irqd; - - if (desc->irq == virq) - break; - - irqd = irq_domain_get_irq_data(domain, desc->irq); - if (irqd_is_activated(irqd)) - irq_domain_deactivate_irq(irqd); + for_each_msi_vector(desc, i, dev) { + irq_data = irq_domain_get_irq_data(domain, i); + if (irqd_is_activated(irq_data)) + irq_domain_deactivate_irq(irq_data); } msi_domain_free_irqs(domain, dev); return ret; -- 2.29.2
Re: [RFC PATCH v4 0/2] Some optimization for stage-2 translation
Hi Yanan, On 2021-01-22 10:13, Yanan Wang wrote: Hi, Will, Marc, Is there any further comment on the v3 series I post previously? None, I was planning to queue them for 5.12 over the weekend. If they are not fine to you, then I think maybe we should just turn back to the original solution in v1, where I suggestted to filter out the case of only updating access permissions in the map handler and handle it right there. Here are the reasons for my current opinion: With an errno returned from the map handler for this single case, there will be one more vcpu exit from guest and we also have to consider the spurious dirty pages. Besides, it seems that the EAGAIN errno has been chosen specially for this case and can not be used elsewhere for other reasons, as we will change this errno to zero at the end of the function. The v1 solution looks like more concise at last, so I refine the diff and post the v4 with two patches here, just for a contrast. Which solution will you prefer now? Could you please let me know. I'm still very much opposed to mixing mapping and permission changes. How bad is the spurious return to a vcpu? Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH] kvm: arm64: Try stage2 block mapping for host device MMIO
On 2021-01-22 08:36, Keqian Zhu wrote: The MMIO region of a device maybe huge (GB level), try to use block mapping in stage2 to speedup both map and unmap. Especially for unmap, it performs TLBI right after each invalidation of PTE. If all mapping is of PAGE_SIZE, it takes much time to handle GB level range. This is only on VM teardown, right? Or do you unmap the device more ofet? Can you please quantify the speedup and the conditions this occurs in? I have the feeling that we are just circling around another problem, which is that we could rely on a VM-wide TLBI when tearing down the guest. I worked on something like that[1] a long while ago, and parked it for some reason. Maybe it is worth reviving. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/elide-cmo-tlbi Signed-off-by: Keqian Zhu --- arch/arm64/include/asm/kvm_pgtable.h | 11 +++ arch/arm64/kvm/hyp/pgtable.c | 15 +++ arch/arm64/kvm/mmu.c | 12 3 files changed, 34 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 52ab38db04c7..2266ac45f10c 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -82,6 +82,17 @@ struct kvm_pgtable_walker { const enum kvm_pgtable_walk_flags flags; }; +/** + * kvm_supported_pgsize() - Get the max supported page size of a mapping. + * @pgt: Initialised page-table structure. + * @addr: Virtual address at which to place the mapping. + * @end: End virtual address of the mapping. + * @phys: Physical address of the memory to map. + * + * The smallest return value is PAGE_SIZE. + */ +u64 kvm_supported_pgsize(struct kvm_pgtable *pgt, u64 addr, u64 end, u64 phys); + /** * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table. * @pgt: Uninitialised page-table structure to initialise. diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index bdf8e55ed308..ab11609b9b13 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -81,6 +81,21 @@ static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level) return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule); } +u64 kvm_supported_pgsize(struct kvm_pgtable *pgt, u64 addr, u64 end, u64 phys) +{ + u32 lvl; + u64 pgsize = PAGE_SIZE; + + for (lvl = pgt->start_level; lvl < KVM_PGTABLE_MAX_LEVELS; lvl++) { + if (kvm_block_mapping_supported(addr, end, phys, lvl)) { + pgsize = kvm_granule_size(lvl); + break; + } + } + + return pgsize; +} + static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level) { u64 shift = kvm_granule_shift(level); diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7d2257cc5438..80b403fc8e64 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -499,7 +499,8 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, phys_addr_t pa, unsigned long size, bool writable) { - phys_addr_t addr; + phys_addr_t addr, end; + unsigned long pgsize; int ret = 0; struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, }; struct kvm_pgtable *pgt = kvm->arch.mmu.pgt; @@ -509,21 +510,24 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, size += offset_in_page(guest_ipa); guest_ipa &= PAGE_MASK; + end = guest_ipa + size; - for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) { + for (addr = guest_ipa; addr < end; addr += pgsize) { ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm)); if (ret) break; + pgsize = kvm_supported_pgsize(pgt, addr, end, pa); + spin_lock(&kvm->mmu_lock); - ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot, + ret = kvm_pgtable_stage2_map(pgt, addr, pgsize, pa, prot, &cache); spin_unlock(&kvm->mmu_lock); if (ret) break; - pa += PAGE_SIZE; + pa += pgsize; } kvm_mmu_free_memory_cache(&cache); This otherwise looks neat enough. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] genirq/msi: Make sure early activation of all PCI MSIs
Hi Shameer, On Thu, 21 Jan 2021 11:02:47 +, Shameer Kolothum wrote: > > We currently do early activation of MSI irqs for PCI/MSI based on > the MSI_FLAG_ACTIVATE_EARLY flag. Though this activates all the > allocated MSIs in the case of MSI-X, it only does so for the > base irq in the case of MSI. This is because, for MSI, there > is only one msi_desc entry for all the 32 irqs it can support > and the current implementation iterates over the msi entries and > ends up activating the base irq only. > > The above creates an issue on platforms where the msi controller > supports direct injection of vLPIs(eg: ARM GICv4 ITS). On these > platforms, upon irq activation, ITS driver maps the event to an > ITT entry. And for Guest pass-through to work, early mapping of > all the dev MSI vectors is required. Otherwise, the vfio irq > bypass manager registration will fail. eg, On a HiSilicon D06 > platform with GICv4 enabled, Guest boot with zip dev pass-through > reports, > > "vfio-pci :75:00.1: irq bypass producer (token 06e5176a) > registration fails: 66311" > > and Guest boot fails. > > This is traced to, > kvm_arch_irq_bypass_add_producer > kvm_vgic_v4_set_forwarding > vgic_its_resolve_lpi --> returns E_ITS_INT_UNMAPPED_INTERRUPT > > Hence make sure we activate all the irqs for both MSI and MSI-x cases. > > Signed-off-by: Shameer Kolothum > --- > It is unclear to me whether not performing the early activation of all > MSI irqs was deliberate and has consequences on any other platforms. > Please let me know. Probably just an oversight. > > Thanks, > Shameer > --- > kernel/irq/msi.c | 114 +-- > 1 file changed, 90 insertions(+), 24 deletions(-) > > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c > index 2c0c4d6d0f83..eec187fc32a9 100644 > --- a/kernel/irq/msi.c > +++ b/kernel/irq/msi.c > @@ -395,6 +395,78 @@ static bool msi_check_reservation_mode(struct irq_domain > *domain, > return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit; > } > > +static void msi_domain_deactivate_irq(struct irq_domain *domain, int irq) > +{ > + struct irq_data *irqd; > + > + irqd = irq_domain_get_irq_data(domain, irq); > + if (irqd_is_activated(irqd)) > + irq_domain_deactivate_irq(irqd); > +} > + > +static int msi_domain_activate_irq(struct irq_domain *domain, > +int irq, bool can_reserve) > +{ > + struct irq_data *irqd; > + > + irqd = irq_domain_get_irq_data(domain, irq); > + if (!can_reserve) { > + irqd_clr_can_reserve(irqd); > + if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK) > + irqd_set_msi_nomask_quirk(irqd); > + } > + return irq_domain_activate_irq(irqd, can_reserve); > +} > + > +static int msi_domain_activate_msix_irqs(struct irq_domain *domain, > + struct device *dev, bool can_reserve) > +{ > + struct msi_desc *desc; > + int ret, irq; > + > + for_each_msi_entry(desc, dev) { > + irq = desc->irq; > + ret = msi_domain_activate_irq(domain, irq, can_reserve); > + if (ret) > + goto out; > + } > + return 0; > + > +out: > + for_each_msi_entry(desc, dev) { > + if (irq == desc->irq) > + break; > + msi_domain_deactivate_irq(domain, desc->irq); > + } > + return ret; > +} > + > +static int msi_domain_activate_msi_irqs(struct irq_domain *domain, > + struct device *dev, bool can_reserve) > +{ > + struct msi_desc *desc; > + int i, ret, base, irq; > + > + desc = first_msi_entry(dev); > + base = desc->irq; > + > + for (i = 0; i < desc->nvec_used; i++) { > + irq = base + i; > + ret = msi_domain_activate_irq(domain, irq, can_reserve); > + if (ret) > + goto out; > + } > + return 0; > + > +out: > + for (i = 0; i < desc->nvec_used; i++) { > + if (irq == base + i) > + break; > + msi_domain_deactivate_irq(domain, base + i); > + } > + return ret; > +} > + > int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, > int nvec) > { > @@ -443,21 +515,25 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, > struct device *dev, > else > dev_dbg(dev, "irq [%d-%d] for MSI\n", > virq, virq + desc->nvec_used - 1); > - /* > - * This flag is set by the PCI layer as we need to activate > - * the MSI entries before the PCI layer enables MSI in the > - * card. Otherwise the card latches a random msi message. > - */ > - if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) > - continue; > + } > > -
Re: [PATCH 0/2] irqchip: Remove obsolete drivers
On Wed, 20 Jan 2021 14:30:06 +0100, Arnd Bergmann wrote: > A few Arm platforms are getting removed in v5.12, this removes > the corresponding irqchip drivers. > > Link: > https://lore.kernel.org/linux-arm-kernel/20210120124812.2800027-1-a...@kernel.org/T/ > > > Arnd Bergmann (2): > irqchip: remove sigma tango driver > irqchip: remove sirfsoc driver > > [...] Applied to irq/irqchip-5.12, thanks! [1/2] irqchip: remove sigma tango driver commit: 00e772c4929257b11b51d47e4645f67826ded0fc [2/2] irqchip: remove sirfsoc driver commit: 5c1ea0d842b1e73ae04870527ec29d5479c35041 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v5 00/10] sunxi: Support IRQ wakeup from deep sleep
On Sun, 17 Jan 2021 23:50:30 -0600, Samuel Holland wrote: > Allwinner sun6i/sun8i/sun50i SoCs (A31 and newer) have two interrupt > controllers: GIC and R_INTC. GIC does not support wakeup. R_INTC handles > the external NMI pin, and provides 32+ IRQs to the ARISC. The first 16 > of these correspond 1:1 to a block of GIC IRQs starting with the NMI. > The last 13-16 multiplex the first (up to) 128 GIC SPIs. > > This series replaces the existing chained irqchip driver that could only > control the NMI, with a stacked irqchip driver that also provides wakeup > capability for those multiplexed SPI IRQs. The idea is to preconfigure > the ARISC's IRQ controller, and then the ARISC firmware knows to wake up > as soon as it receives an IRQ. It can also decide how deep it can > suspend based on the enabled wakeup IRQs. > > [...] Applied to irq/irqchip-5.12, thanks! [01/10] dt-bindings: irq: sun6i-r: Split the binding from sun7i-nmi commit: ad6b47cdef760410311f41876b21eb0c6fda4717 [02/10] dt-bindings: irq: sun6i-r: Add a compatible for the H3 commit: 6436eb4417094ea3308b33d8392fc02a1068dc78 [03/10] irqchip/sun6i-r: Use a stacked irqchip driver commit: 4e34614636b31747b190488240a95647c227021f [04/10] irqchip/sun6i-r: Add wakeup support commit: 7ab365f6cd6de1e2b0cb1e1e3873dbf68e6f1003 Please route the dts patches via the soc tree. Also, I had to manually fix the first patch as it wouldn't apply on top of 5.11-rc4 (which tree has it been diffed against?). Please check that the resolution is correct. Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v3 2/2] irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller
On Wed, 20 Jan 2021 10:10:18 +, Bert Vermeulen wrote: > > Signed-off-by: Bert Vermeulen Please write a decent commit message. > --- > drivers/irqchip/Makefile | 1 + > drivers/irqchip/irq-realtek-rtl.c | 180 ++ > 2 files changed, 181 insertions(+) > create mode 100644 drivers/irqchip/irq-realtek-rtl.c > > diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile > index 0ac93bfaec61..4fc1086bed7e 100644 > --- a/drivers/irqchip/Makefile > +++ b/drivers/irqchip/Makefile > @@ -113,3 +113,4 @@ obj-$(CONFIG_LOONGSON_PCH_PIC)+= > irq-loongson-pch-pic.o > obj-$(CONFIG_LOONGSON_PCH_MSI) += irq-loongson-pch-msi.o > obj-$(CONFIG_MST_IRQ)+= irq-mst-intc.o > obj-$(CONFIG_SL28CPLD_INTC) += irq-sl28cpld.o > +obj-$(CONFIG_MACH_REALTEK_RTL) += irq-realtek-rtl.o > diff --git a/drivers/irqchip/irq-realtek-rtl.c > b/drivers/irqchip/irq-realtek-rtl.c > new file mode 100644 > index ..bafe9ee4a85a > --- /dev/null > +++ b/drivers/irqchip/irq-realtek-rtl.c > @@ -0,0 +1,180 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright (C) 2006-2012 Tony Wu > + * Copyright (C) 2020 Birger Koblitz > + * Copyright (C) 2020 Bert Vermeulen > + * Copyright (C) 2020 John Crispin Given the list of copyright owners, shouldn't this be reflected in the SoB chain one way or another? > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +/* Global Interrupt Mask Register */ > +#define RTL_ICTL_GIMR0x00 > +/* Global Interrupt Status Register */ > +#define RTL_ICTL_GISR0x04 > +/* Interrupt Routing Registers */ > +#define RTL_ICTL_IRR00x08 > +#define RTL_ICTL_IRR10x0c > +#define RTL_ICTL_IRR20x10 > +#define RTL_ICTL_IRR30x14 > + > +#define REG(x) (realtek_ictl_base + x) > + > +static DEFINE_RAW_SPINLOCK(irq_lock); > +static void __iomem *realtek_ictl_base; > + > +static void realtek_ictl_unmask_irq(struct irq_data *i) > +{ > + unsigned long flags; > + u32 value; > + > + raw_spin_lock_irqsave(&irq_lock, flags); > + > + value = readl(REG(RTL_ICTL_GIMR)); > + value |= BIT(i->hwirq); > + writel(value, REG(RTL_ICTL_GIMR)); > + > + raw_spin_unlock_irqrestore(&irq_lock, flags); > +} > + > +static void realtek_ictl_mask_irq(struct irq_data *i) > +{ > + unsigned long flags; > + u32 value; > + > + raw_spin_lock_irqsave(&irq_lock, flags); > + > + value = readl(REG(RTL_ICTL_GIMR)); > + value &= ~BIT(i->hwirq); > + writel(value, REG(RTL_ICTL_GIMR)); > + > + raw_spin_unlock_irqrestore(&irq_lock, flags); > +} > + > +static struct irq_chip realtek_ictl_irq = { > + .name = "realtek-rtl-intc", > + .irq_mask = realtek_ictl_mask_irq, > + .irq_unmask = realtek_ictl_unmask_irq, > +}; > + > +static int intc_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t > hw) > +{ > + irq_set_chip_and_handler(hw, &realtek_ictl_irq, handle_level_irq); > + > + return 0; > +} > + > +static const struct irq_domain_ops irq_domain_ops = { > + .map = intc_map, > + .xlate = irq_domain_xlate_onecell, > +}; > + > +static void realtek_irq_dispatch(struct irq_desc *desc) > +{ > + struct irq_chip *chip = irq_desc_get_chip(desc); > + struct irq_domain *domain; > + unsigned int pending; > + > + chained_irq_enter(chip, desc); > + pending = readl(REG(RTL_ICTL_GIMR)) & readl(REG(RTL_ICTL_GISR)); > + if (unlikely(!pending)) { > + spurious_interrupt(); > + goto out; > + } > + domain = irq_desc_get_handler_data(desc); > + generic_handle_irq(irq_find_mapping(domain, __ffs(pending))); > + > +out: > + chained_irq_exit(chip, desc); > +} > + > +/* > + * SoC interrupts are cascaded to MIPS CPU interrupts according to the > + * interrupt-map in the device tree. Each SoC interrupt gets 4 bits for > + * the CPU interrupt in an Interrupt Routing Register. Max 32 SoC interrupts > + * thus go into 4 IRRs. > + */ > +static int __init map_interrupts(struct device_node *node) > +{ > + struct device_node *cpu_ictl; > + const __be32 *imap; > + u32 imaplen, soc_int, cpu_int, tmp, regs[4]; > + int ret, i, irr_regs[] = { > + RTL_ICTL_IRR3, > + RTL_ICTL_IRR2, > + RTL_ICTL_IRR1, > + RTL_ICTL_IRR0, > + }; > + > + ret = of_property_read_u32(node, "#address-cells", &tmp); > + if (ret || tmp) > + return -EINVAL; > + > + imap = of_get_property(node, "interrupt-map", &imaplen); > + if (!imap || imaplen % 3) > + return -EINVAL; > + > + memset(regs, 0, sizeof(regs)); > + for (i = 0; i < imaplen; i += 3 * sizeof(u32)) { > + soc_int = be32_to_cpup(imap); > + if (soc_int > 31) > + return -EINVAL; > + > + cpu_ictl = o
Re: [RFC PATCH 3/7] arm64: mm: use nGnRnE instead of nGnRE on Apple processors
On 2021-01-21 17:55, Will Deacon wrote: On Thu, Jan 21, 2021 at 04:25:54PM +, Marc Zyngier wrote: On 2021-01-21 15:12, Mohamed Mediouni wrote: > Please ignore that patch. > > It turns out that the PCIe controller on Apple M1 expects posted > writes and so the memory range for it ought to be set nGnRE. > So, we need to use nGnRnE for on-chip MMIO and nGnRE for PCIe BARs. > > The MAIR approach isn’t adequate for such a thing, so we’ll have to > look elsewhere. Well, there isn't many alternative to having a memory type defined in MAIR if you want to access your PCIe devices with specific semantics. It probably means defining a memory type for PCI only, but: - we only have a single free MT entry, and I'm not sure we can afford to waste this on a specific platform (can we re-purpose GRE instead?), We already have an nGnRnE MAIR for config space accesses. I'm confused. If M1 needs nGnRE for PCI, and overrides nGnRE to nE for its in-SoC accesses, where does nGnRE goes? Or do you propose that it is the page tables that get a different MT index? M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 3/7] arm64: mm: use nGnRnE instead of nGnRE on Apple processors
On 2021-01-21 15:12, Mohamed Mediouni wrote: Please ignore that patch. It turns out that the PCIe controller on Apple M1 expects posted writes and so the memory range for it ought to be set nGnRE. So, we need to use nGnRnE for on-chip MMIO and nGnRE for PCIe BARs. The MAIR approach isn’t adequate for such a thing, so we’ll have to look elsewhere. Well, there isn't many alternative to having a memory type defined in MAIR if you want to access your PCIe devices with specific semantics. It probably means defining a memory type for PCI only, but: - we only have a single free MT entry, and I'm not sure we can afford to waste this on a specific platform (can we re-purpose GRE instead?), - we'd need to teach the PCI code to use this... Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 7/7] irqchip/apple-aic: add SMP support to the Apple AIC driver.
On 2021-01-21 13:34, Mohamed Mediouni wrote: On 21 Jan 2021, at 14:22, Marc Zyngier wrote: On 2021-01-21 12:50, Mohamed Mediouni wrote: On 21 Jan 2021, at 13:44, Arnd Bergmann wrote: On Wed, Jan 20, 2021 at 2:27 PM Mohamed Mediouni [...] + aic.fast_ipi = of_property_read_bool(node, "fast-ipi"); Where is this property documented, and what decides which one to use? It’s getting documented in the next patch set. This property is there to enable support for older iPhone processors later on, some of which do not have fast IPI support. On Apple M1, fast-ipi is always on. Then please focus on a single implementation. Additional features can always be merged later once something is up and running. Also, there sysregs can be detected by matching the MIDR, so I don't think we need a DT property for that. Thanks, Because UART access adapters for the new M1 Macs aren’t plentiful at all, I actually use this for development, with iPhones which have much more easy to buy Lightning-to-UART adapters. (That’s why the old implementation is there too) Might be worth splitting the new one to a new commit though... This series is supposed to cover M1 only, and adding extra support as part of it is only likely to make the code harder to review. I'd rather you focus on a single IPI interface (fast or slow, I don't really care). Extra features can come in later. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 7/7] irqchip/apple-aic: add SMP support to the Apple AIC driver.
On 2021-01-21 13:32, Mark Rutland wrote: On Thu, Jan 21, 2021 at 01:22:37PM +, Marc Zyngier wrote: On 2021-01-21 12:50, Mohamed Mediouni wrote: > > On 21 Jan 2021, at 13:44, Arnd Bergmann wrote: > > > > On Wed, Jan 20, 2021 at 2:27 PM Mohamed Mediouni [...] > > > + aic.fast_ipi = of_property_read_bool(node, "fast-ipi"); > > > > Where is this property documented, and what decides which one to use? > It’s getting documented in the next patch set. > > This property is there to enable support for older iPhone processors > later on, some of which do not have fast IPI support. > > On Apple M1, fast-ipi is always on. Then please focus on a single implementation. Additional features can always be merged later once something is up and running. Also, there sysregs can be detected by matching the MIDR, so I don't think we need a DT property for that. Generally we do not detect IMP-DEF sysregs based on MIDR because they won't necessarily be exposed to a VM, so I suspect that we do need DT properties to describe that IMP-DEF sysregs are accessible, and should not rely on the MIDR alone. Maybe that's implicit in another property, but worth bearing in mind. Hmm. That's a good point. I think this could be keyed off the compatible property, which should accurately reflect the version of the interrupt controller. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 7/7] irqchip/apple-aic: add SMP support to the Apple AIC driver.
On 2021-01-21 12:50, Mohamed Mediouni wrote: On 21 Jan 2021, at 13:44, Arnd Bergmann wrote: On Wed, Jan 20, 2021 at 2:27 PM Mohamed Mediouni [...] + aic.fast_ipi = of_property_read_bool(node, "fast-ipi"); Where is this property documented, and what decides which one to use? It’s getting documented in the next patch set. This property is there to enable support for older iPhone processors later on, some of which do not have fast IPI support. On Apple M1, fast-ipi is always on. Then please focus on a single implementation. Additional features can always be merged later once something is up and running. Also, there sysregs can be detected by matching the MIDR, so I don't think we need a DT property for that. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [RFC PATCH 3/7] arm64: mm: use nGnRnE instead of nGnRE on Apple processors
On 2021-01-21 11:27, Will Deacon wrote: On Wed, Jan 20, 2021 at 02:27:13PM +0100, Mohamed Mediouni wrote: Use nGnRnE instead of nGnRE on Apple SoCs to workaround a serious hardware quirk. On Apple processors, writes using the nGnRE device memory type get dropped in flight, getting to nowhere. Signed-off-by: Stan Skowronek Signed-off-by: Mohamed Mediouni --- arch/arm64/mm/proc.S | 26 ++ 1 file changed, 26 insertions(+) diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 1f7ee8c8b7b8..06436916f137 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -51,6 +51,25 @@ #define TCR_KASAN_HW_FLAGS 0 #endif +#ifdef CONFIG_ARCH_APPLE + +/* + * Apple cores appear to black-hole writes done with nGnRE. + * We settled on a work-around that uses MAIR vs changing every single user of + * nGnRE across the arm64 code. + */ + +#define MAIR_EL1_SET_APPLE \ + (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) | \ +MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRE) | \ +MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) |\ +MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) | \ +MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |\ +MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) | \ +MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED)) + +#endif + /* * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory and * changed during __cpu_setup to Normal Tagged if the system supports MTE. @@ -432,6 +451,13 @@ SYM_FUNC_START(__cpu_setup) * Memory region attributes */ mov_q x5, MAIR_EL1_SET +#ifdef CONFIG_ARCH_APPLE + mrs x0, MIDR_EL1 + lsr w0, w0, #24 + mov_q x1, MAIR_EL1_SET_APPLE + cmp x0, #0x61 // 0x61 = Implementer: Apple + cselx5, x1, x5, eq Why does this need to be done so early? It would be a lot cleaner if we could detect this in a similar fashion to other errata and update the MAIR appropriately. If that's not possible because of early IO mappings (which ones?), then we could instead initialise to nGnRnE unconditionally, but relax it to nGnRE if we detect that we _don't_ have the erratum. Would that imply another round-trip into the idmap, much like we do when we switch to non-global mappings? Or do you expect that we can change the MAIR with live mappings? M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2] gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default
On 2021-01-20 15:48, Greg Kroah-Hartman wrote: On Wed, Jan 20, 2021 at 03:39:30PM +, Marc Zyngier wrote: Anyway, I said what I had to say. If platforms break with this change, I'll expect it to be disabled in 5.12. I'm thinking we can not change the default and will probably revert this patch "soon". I think there is a lot of value in keeping this enabled for a bit, so that we can work out what breaks, and find solutions that scale a bit better. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v2] gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default
On 2021-01-18 20:38, Saravana Kannan wrote: On Sun, Jan 17, 2021 at 4:02 AM Marc Zyngier wrote: Hi Saravana, Thanks for posting this, much appreciated. On Sat, 16 Jan 2021 01:14:11 +, Saravana Kannan wrote: > > There are multiple instances of GPIO devictree nodes of the form: > > foo { > compatible = "acme,foo"; > ... > > gpio0: gpio0@ { > compatible = "acme,bar"; > ... > gpio-controller; > }; > > gpio1: gpio1@ { > compatible = "acme,bar"; > ... > gpio-controller; > }; > > ... > } > > bazz { > my-gpios = <&gpio0 ...>; > } > > Case 1: The driver for "foo" populates struct device for these gpio* > nodes and then probes them using a driver that binds with "acme,bar". > This lines up with how DT nodes with the "compatible" property are > generally converted to struct devices and then registered with driver > core to probe them. This also allows the gpio* devices to hook into all > the driver core capabilities like runtime PM, probe deferral, > suspend/resume ordering, device links, etc. > > Case 2: The driver for "foo" doesn't populate its child device nodes > with "compatible" property and instead just loops through its child > nodes and directly registers the GPIOs with gpiolib without ever > populating a struct device or binding a driver to it. That's not quite an accurate description. The gpiolib subsystem does create a struct device. It doesn't register a driver though, which is what causes the issue with fr_devlink (more on that below). The devices created by gpiolib care are created for case 1 and case 2. They are just devices gpiolib uses to represent a virtual software device to hook different attributes to and expose them in sysfs. I'm not talking about those devices here. The devices I'm referring to are devices that represent the actual HW IP -- so what I'm saying is accurate. Please read what you wrote : "without ever populating a struct device". I stand by the "not quite accurate". > > Drivers that follow the case 2 cause problems with fw_devlink=on. This > is because fw_devlink will prevent bazz from probing until there's a > struct device that has gpio0 as its fwnode (because bazz lists gpio0 as > a GPIO supplier). Once the struct device is available, fw_devlink will > create a device link between with gpio0 as the supplier and bazz as the > consumer. After this point, the device link will prevent bazz from > probing until its supplier (the gpio0 device) has bound to a driver. > Once the supplier is bound to a driver, the probe of bazz is triggered > automatically. > > Finding and refactoring all the instances of drivers that follow case 2 > will cause a lot of code churn and it not something that can be done in > one shot. Examples of such instances are [1] [2]. > > This patch works around this problem and avoids all the code churn by > simply creating a stub driver to bind to the gpio_device. Since the > gpio_device already points to the GPIO device tree node, this allows all > the consumers to continue probing when the driver follows case 2. > > If/when all the old drivers are refactored, we can revert this > patch. My personal gripe with this approach is that it is an abrupt change in the way DT and device model are mapped onto each other. As far as I know (and someone please correct me if I am wrong), there is zero expectation that a device/subdevice/random IP block described by a node with a "compatible" property will end-up being managed by a driver that is bound to that particular node. The node/subnode division is a good way to express some HW boundaries, but doesn't say anything about the way this should be handled in the kernel. Assuming that everything containing a "compatible" string will eventually be bound to a driver is IMO pretty limiting. The default behavior of of_platform_populate() is to create a struct device for every node with "compatible" property. That's how top level devices (or children of simple-bus devices) are populated. IIRC, that's what a lot of other busses do too. So, if anything, not having a struct device for nodes with "compatible" property is an exception. Again, that's not a description of the reality. The case we are talking about here does have a struct device. The misconception you keep repeating is that binding it to a driver is the only valid approach. Honestly, if one has a driver that supports a HW IP, I don't see any reason it should operate outside of the device-driver model support
Re: [PATCH v6 0/5] ARM: arm64: Add SMCCC TRNG entropy service
On 2021-01-20 13:49, Will Deacon wrote: On Wed, Jan 20, 2021 at 01:45:24PM +, Andre Przywara wrote: On Wed, 20 Jan 2021 13:26:26 + Marc Zyngier wrote: Hi, > On 2021-01-20 13:01, Will Deacon wrote: > > On Wed, 6 Jan 2021 10:34:48 +, Andre Przywara wrote: > >> a fix to v5, now *really* fixing the wrong priority of SMCCC vs. > >> RNDR in arch_get_random_seed_long_early(). Apologies for messing > >> this up in v5 and thanks to broonie for being on the watch! > >> > >> Will, Catalin: it would be much appreciated if you could consider > >> taking > >> patch 1/5. This contains the common definitions, and is a > >> prerequisite for every other patch, although they are somewhat > >> independent and likely > >> will need to go through different subsystems. > >> > >> [...] > > > > Applied the first patch only to arm64 (for-next/rng), thanks! > > > > [1/5] firmware: smccc: Add SMCCC TRNG function call IDs > > https://git.kernel.org/arm64/c/67c6bb56b649 > > I can't see how the rest of the patches can go via any other tree > if all the definitions are in the first one. > > Andre, can you explain what your plan is? Well, I don't really have a great solution for that, other than hoping that 1/5 makes it into Linus' master at some point. I see that it's a stretch, but pulling 1/5 into 5.11 now would prepare the stage for the others to go via any tree, into 5.12-rc1? Or you could maybe take both 1/5 and 5/5 into your kvm-arm tree, and would hope that a git rebase later would sort this out for you? But I think you are much more experienced in those kind of issues, so happy to hear about any other solutions. for-next/rng is a stable branch, so anybody who wants the first patch can just pull it (without anything I queue on top). OK. I'll pull that branch and stash the KVM stuff on top. Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v6 0/5] ARM: arm64: Add SMCCC TRNG entropy service
On 2021-01-20 13:01, Will Deacon wrote: On Wed, 6 Jan 2021 10:34:48 +, Andre Przywara wrote: a fix to v5, now *really* fixing the wrong priority of SMCCC vs. RNDR in arch_get_random_seed_long_early(). Apologies for messing this up in v5 and thanks to broonie for being on the watch! Will, Catalin: it would be much appreciated if you could consider taking patch 1/5. This contains the common definitions, and is a prerequisite for every other patch, although they are somewhat independent and likely will need to go through different subsystems. [...] Applied the first patch only to arm64 (for-next/rng), thanks! [1/5] firmware: smccc: Add SMCCC TRNG function call IDs https://git.kernel.org/arm64/c/67c6bb56b649 I can't see how the rest of the patches can go via any other tree if all the definitions are in the first one. Andre, can you explain what your plan is? Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH 1/3] arm64/kernel: FIQ support
Hi Mohamed, On 2021-01-20 11:36, Mohamed Mediouni wrote: From: Stan Skowronek On Apple processors, the timer is wired through FIQ. Which timer? There are at least 3, potentially 4 timers per CPU that can fire. As such, add FIQ support to the kernel. Signed-off-by: Stan Skowronek Missing SoB from the sender. --- arch/arm64/include/asm/arch_gicv3.h | 2 +- arch/arm64/include/asm/assembler.h | 8 ++-- arch/arm64/include/asm/daifflags.h | 4 +- arch/arm64/include/asm/irq.h| 4 ++ arch/arm64/include/asm/irqflags.h | 6 +-- arch/arm64/kernel/entry.S | 74 ++--- arch/arm64/kernel/irq.c | 14 ++ arch/arm64/kernel/process.c | 2 +- 8 files changed, 97 insertions(+), 17 deletions(-) diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h index 880b9054d75c..934b9be582d2 100644 --- a/arch/arm64/include/asm/arch_gicv3.h +++ b/arch/arm64/include/asm/arch_gicv3.h @@ -173,7 +173,7 @@ static inline void gic_pmr_mask_irqs(void) static inline void gic_arch_enable_irqs(void) { - asm volatile ("msr daifclr, #2" : : : "memory"); + asm volatile ("msr daifclr, #3" : : : "memory"); If I trust the persistent rumour, this system doesn't have a GIC. Why this change? } #endif /* __ASSEMBLY__ */ diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index bf125c591116..6fe55713dfe0 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -40,9 +40,9 @@ msr daif, \flags .endm - /* IRQ is the lowest priority flag, unconditionally unmask the rest. */ - .macro enable_da_f - msr daifclr, #(8 | 4 | 1) + /* IRQ/FIQ is the lowest priority flag, unconditionally unmask the rest. */ + .macro enable_da + msr daifclr, #(8 | 4) This cannot be unconditional. This potentially changes existing behaviours, and I'd feel a lot safer if FIQ was only messed with on that specific HW. I have the feeling that this should be detected on the boot CPU and patched before any interrupt can fire. .endm /* @@ -50,7 +50,7 @@ */ .macro save_and_disable_irq, flags mrs \flags, daif - msr daifset, #2 + msr daifset, #3 .endm .macro restore_irq, flags diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h index 1c26d7baa67f..44de96c7fb1a 100644 --- a/arch/arm64/include/asm/daifflags.h +++ b/arch/arm64/include/asm/daifflags.h @@ -13,8 +13,8 @@ #include #define DAIF_PROCCTX 0 -#define DAIF_PROCCTX_NOIRQ PSR_I_BIT -#define DAIF_ERRCTX(PSR_I_BIT | PSR_A_BIT) +#define DAIF_PROCCTX_NOIRQ (PSR_I_BIT | PSR_F_BIT) +#define DAIF_ERRCTX(PSR_I_BIT | PSR_F_BIT | PSR_A_BIT) #define DAIF_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT) diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h index b2b0c6405eb0..2d1537d3a245 100644 --- a/arch/arm64/include/asm/irq.h +++ b/arch/arm64/include/asm/irq.h @@ -13,5 +13,9 @@ static inline int nr_legacy_irqs(void) return 0; } +int set_handle_fiq(void (*handle_fiq)(struct pt_regs *)); + +extern void (*handle_arch_fiq)(struct pt_regs *) __ro_after_init; I guess this is set from the root interrupt controller, which also will set handle_arch_irq? Why do we need two entry points? We have ISR_EL1 to find out what is pending. Isn't that enough? + #endif /* !__ASSEMBLER__ */ #endif diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h index ff328e5bbb75..26d7f378113e 100644 --- a/arch/arm64/include/asm/irqflags.h +++ b/arch/arm64/include/asm/irqflags.h @@ -35,7 +35,7 @@ static inline void arch_local_irq_enable(void) } asm volatile(ALTERNATIVE( - "msr daifclr, #2 // arch_local_irq_enable", + "msr daifclr, #3 // arch_local_irq_enable", __msr_s(SYS_ICC_PMR_EL1, "%0"), ARM64_HAS_IRQ_PRIO_MASKING) : @@ -54,7 +54,7 @@ static inline void arch_local_irq_disable(void) } asm volatile(ALTERNATIVE( - "msr daifset, #2 // arch_local_irq_disable", + "msr daifset, #3 // arch_local_irq_disable", __msr_s(SYS_ICC_PMR_EL1, "%0"), ARM64_HAS_IRQ_PRIO_MASKING) : @@ -85,7 +85,7 @@ static inline int arch_irqs_disabled_flags(unsigned long flags) int res; asm volatile(ALTERNATIVE( - "and %w0, %w1, #" __stringify(PSR_I_BIT), + "and %w0, %w1, #" __stringify(PSR_I_BIT | PSR_F_BIT), "eor %w0, %w1, #" __stringify(GIC_PRIO_IRQON), ARM64_HAS_IRQ_PRIO_MASKING) : "=&r" (res) diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/ker
Re: [RFC PATCH 0/2] Avoid booting stall caused by idmap_kpti_install_ng_mappings
Hi Justin, On 2021-01-20 04:51, Justin He wrote: Hi, Kindly ping 😊 -Original Message- From: Jia He Sent: Wednesday, January 13, 2021 9:41 AM To: Catalin Marinas ; Will Deacon ; linux-arm-ker...@lists.infradead.org; linux- ker...@vger.kernel.org Cc: Anshuman Khandual ; Suzuki Poulose ; Justin He ; Mark Rutland ; Gustavo A. R. Silva ; Richard Henderson ; Dave P Martin ; Steven Price ; Andrew Morton ; Mike Rapoport ; Ard Biesheuvel ; Gavin Shan ; Kefeng Wang ; Mark Brown ; Marc Zyngier ; Cristian Marussi Subject: [RFC PATCH 0/2] Avoid booting stall caused by There is a 10s stall in idmap_kpti_install_ng_mappings when kernel boots on a Ampere EMAG server. Commit f992b4dfd58b ("arm64: kpti: Add ->enable callback to remap swapper using nG mappings") updates the nG bit runtime if kpti is required. But things get worse if rodata=full in map_mem(). NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS is required when creating pagetable mapping. Hence all ptes are fully mapped in this case. On a Ampere EMAG server with 256G memory(pagesize=4k), it causes the 10s stall. After moving init_cpu_features() ahead of early_fixmap_init(), we can use cpu_have_const_cap earlier than before. Hence we can avoid this stall by updating arm64_use_ng_mappings. After this patch series, it reduces the kernel boot time from 14.7s to 4.1s: Before: [ 14.757569] Freeing initrd memory: 60752K After: [4.138819] Freeing initrd memory: 60752K Set it as RFC because I want to resolve any other points which I have misconerned. But you don't really explain *why* having the CPU Feature discovery early helps at all. Is that so that you can bypass the idmap mapping? I'd expect something that explain the problem instead of paraphrasing the patches. Another thing is whether you have tested this on some ThunderX HW (the first version, not TX2), as this is the whole reason for this code... Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH] of: property: Add device link support for interrupts
On 2021-01-20 09:53, Geert Uytterhoeven wrote: Hi Saravana, On Fri, Dec 18, 2020 at 10:11 PM Saravana Kannan wrote: Add support for creating device links out of interrupts property. Cc: Marc Zyngier Cc: Kevin Hilman Signed-off-by: Saravana Kannan Thanks for your patch! This does not seem to add all links. I see links being created to the secondary interrupt controller (e61c "renesas,irqc"), but not to the primary interrupt controller (GIC) Which is good, as the GIC driver is not a platform_driver, and thus creating links would break everything ;-) BTW, I'd _love_ the GIC driver to be a platform_driver, as the GIC is part of a power/and or clock domain on Renesas SoCs... The trouble is that we need the GIC much earlier than the device model is available, because the timer needs to be available super early. This isn't specific to the GIC, by the way, but also to all root interrupt controllers that end-up controlling the per-CPU interrupts. If you try to relax this constraint, you'll end up observing some of the very weird dependencies between sysfs, sched, and the device model (I went there a few years back, wasted a week on it, did a reset --hard on the branch, and swore never to look at this again! ;-) But for a start, I'd like the ITS code to be turned into a platform driver, as this would potentially allow for the various domains to be instantiated dynamically. One day. M. -- Jazz is not dead. It just smells funny...
[PATCH v4 16/21] arm64: Make kvm-arm.mode={nvhe,protected} an alias of id_aa64mmfr1.vh=0
Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't that easy to understand, and it is likely that users would much prefer write "kvm-arm.mode=nvhe", or "...=protected". So here you go. This has the added advantage that we can now always honor the "kvm-arm.mode=protected" option, even when booting on a VHE system. Signed-off-by: Marc Zyngier --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/kernel/idreg-override.c | 2 ++ arch/arm64/kvm/arm.c| 3 +++ 3 files changed, 8 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9e3cdb271d06..2786fd39a047 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2257,6 +2257,9 @@ kvm-arm.mode= [KVM,ARM] Select one of KVM/arm64's modes of operation. + nvhe: Standard nVHE-based mode, without support for + protected guests. + protected: nVHE-based mode with support for guests whose state is kept private from the host. Not valid if the kernel is running in EL2. diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 16bc8b3b93ae..1db54878b2c4 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -41,6 +41,8 @@ static const struct { const char * const alias; const char * const feature; } aliases[] __initdata = { + { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, + { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, }; static int __init find_field(const char *cmdline, const struct reg_desc *reg, diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 04c44853b103..597565a65ca2 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1966,6 +1966,9 @@ static int __init early_kvm_mode_cfg(char *arg) return 0; } + if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) + return 0; + return -EINVAL; } early_param("kvm-arm.mode", early_kvm_mode_cfg); -- 2.29.2
[PATCH v4 13/21] arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line
As we want to be able to disable VHE at runtime, let's match "id_aa64mmfr1.vh=" from the command line as an override. This doesn't have much effect yet as our boot code doesn't look at the cpufeature, but only at the HW registers. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/cpufeature.h | 3 +++ arch/arm64/kernel/cpufeature.c | 6 +- arch/arm64/kernel/idreg-override.c | 12 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index fe0130d6c0ff..80a5f423444e 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -814,6 +814,9 @@ static inline unsigned int get_vmid_bits(u64 mmfr1) return 8; } +extern u64 id_aa64mmfr1_val; +extern u64 id_aa64mmfr1_mask; + u32 get_kvm_ipa_limit(void); void dump_cpu_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 48a011935d8c..5b9343d2e9f0 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -555,6 +555,9 @@ static const struct arm64_ftr_bits ftr_raz[] = { #define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, NULL, NULL) +u64 id_aa64mmfr1_val; +u64 id_aa64mmfr1_mask; + static const struct __ftr_reg_entry { u32 sys_id; struct arm64_ftr_reg*reg; @@ -602,7 +605,8 @@ static const struct __ftr_reg_entry { /* Op1 = 0, CRn = 0, CRm = 7 */ ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0), - ARM64_FTR_REG(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1), + ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64MMFR1_EL1, ftr_id_aa64mmfr1, + &id_aa64mmfr1_val, &id_aa64mmfr1_mask), ARM64_FTR_REG(SYS_ID_AA64MMFR2_EL1, ftr_id_aa64mmfr2), /* Op1 = 0, CRn = 1, CRm = 2 */ diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 392f93b67103..75d9845f489b 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -10,6 +10,7 @@ #include #include +#include #include struct reg_desc { @@ -22,7 +23,18 @@ struct reg_desc { } fields[]; }; +static const struct reg_desc mmfr1 __initdata = { + .name = "id_aa64mmfr1", + .val= &id_aa64mmfr1_val, + .mask = &id_aa64mmfr1_mask, + .fields = { + { "vh", ID_AA64MMFR1_VHE_SHIFT }, + {} + }, +}; + static const struct reg_desc * const regs[] __initdata = { + &mmfr1, }; static int __init find_field(const char *cmdline, const struct reg_desc *reg, -- 2.29.2
[PATCH v4 20/21] arm64: Defer enabling pointer authentication on boot core
From: Srinivas Ramana Defer enabling pointer authentication on boot core until after its required to be enabled by cpufeature framework. This will help in controlling the feature dynamically with a boot parameter. Signed-off-by: Ajay Patil Signed-off-by: Prasad Sodagudi Signed-off-by: Srinivas Ramana Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/1610152163-16554-2-git-send-email-sram...@codeaurora.org --- arch/arm64/include/asm/pointer_auth.h | 10 ++ arch/arm64/include/asm/stackprotector.h | 1 + arch/arm64/kernel/head.S| 4 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/pointer_auth.h b/arch/arm64/include/asm/pointer_auth.h index c6b4f0603024..b112a11e9302 100644 --- a/arch/arm64/include/asm/pointer_auth.h +++ b/arch/arm64/include/asm/pointer_auth.h @@ -76,6 +76,15 @@ static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr) return ptrauth_clear_pac(ptr); } +static __always_inline void ptrauth_enable(void) +{ + if (!system_supports_address_auth()) + return; + sysreg_clear_set(sctlr_el1, 0, (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | + SCTLR_ELx_ENDA | SCTLR_ELx_ENDB)); + isb(); +} + #define ptrauth_thread_init_user(tsk) \ ptrauth_keys_init_user(&(tsk)->thread.keys_user) #define ptrauth_thread_init_kernel(tsk) \ @@ -84,6 +93,7 @@ static inline unsigned long ptrauth_strip_insn_pac(unsigned long ptr) ptrauth_keys_switch_kernel(&(tsk)->thread.keys_kernel) #else /* CONFIG_ARM64_PTR_AUTH */ +#define ptrauth_enable() #define ptrauth_prctl_reset_keys(tsk, arg) (-EINVAL) #define ptrauth_strip_insn_pac(lr) (lr) #define ptrauth_thread_init_user(tsk) diff --git a/arch/arm64/include/asm/stackprotector.h b/arch/arm64/include/asm/stackprotector.h index 7263e0bac680..33f1bb453150 100644 --- a/arch/arm64/include/asm/stackprotector.h +++ b/arch/arm64/include/asm/stackprotector.h @@ -41,6 +41,7 @@ static __always_inline void boot_init_stack_canary(void) #endif ptrauth_thread_init_kernel(current); ptrauth_thread_switch_kernel(current); + ptrauth_enable(); } #endif /* _ASM_STACKPROTECTOR_H */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index b3c4dd04f74b..2a152d96d767 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -404,10 +404,6 @@ SYM_FUNC_START_LOCAL(__primary_switched) adr_l x5, init_task msr sp_el0, x5 // Save thread_info -#ifdef CONFIG_ARM64_PTR_AUTH - __ptrauth_keys_init_cpu x5, x6, x7, x8 -#endif - adr_l x8, vectors // load VBAR_EL1 with virtual msr vbar_el1, x8// vector table address isb -- 2.29.2
[PATCH v4 21/21] arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line
In order to be able to disable Pointer Authentication at runtime, whether it is for testing purposes, or to work around HW issues, let's add support for overriding the ID_AA64ISAR1_EL1.{GPI,GPA,API,APA} fields. This is further mapped on the arm64.nopauth command-line alias. Signed-off-by: Marc Zyngier --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/include/asm/cpufeature.h | 2 ++ arch/arm64/kernel/cpufeature.c | 5 - arch/arm64/kernel/idreg-override.c | 17 + 4 files changed, 26 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7599fd0f1ad7..f9cb28a39bd0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -376,6 +376,9 @@ arm64.nobti [ARM64] Unconditionally disable Branch Target Identification support + arm64.nopauth [ARM64] Unconditionally disable Pointer Authentication + support + ataflop=[HW,M68k] atarimouse= [HW,MOUSE] Atari Mouse diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index d3e0f6dd43c4..9d8dcf380ad5 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -818,6 +818,8 @@ extern u64 id_aa64mmfr1_val; extern u64 id_aa64mmfr1_mask; extern u64 id_aa64pfr1_val; extern u64 id_aa64pfr1_mask; +extern u64 id_aa64isar1_val; +extern u64 id_aa64isar1_mask; u32 get_kvm_ipa_limit(void); void dump_cpu_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index f223171a7c34..f5ba7fd615b5 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -559,6 +559,8 @@ u64 id_aa64mmfr1_val; u64 id_aa64mmfr1_mask; u64 id_aa64pfr1_val; u64 id_aa64pfr1_mask; +u64 id_aa64isar1_val; +u64 id_aa64isar1_mask; static const struct __ftr_reg_entry { u32 sys_id; @@ -604,7 +606,8 @@ static const struct __ftr_reg_entry { /* Op1 = 0, CRn = 0, CRm = 6 */ ARM64_FTR_REG(SYS_ID_AA64ISAR0_EL1, ftr_id_aa64isar0), - ARM64_FTR_REG(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1), + ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64ISAR1_EL1, ftr_id_aa64isar1, + &id_aa64isar1_val, &id_aa64isar1_mask), /* Op1 = 0, CRn = 0, CRm = 7 */ ARM64_FTR_REG(SYS_ID_AA64MMFR0_EL1, ftr_id_aa64mmfr0), diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index a9e3ed193fd4..7037c9b214d0 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -43,6 +43,19 @@ static const struct reg_desc pfr1 __initdata = { }, }; +static const struct reg_desc isar1 __initdata = { + .name = "id_aa64isar1", + .val= &id_aa64isar1_val, + .mask = &id_aa64isar1_mask, + .fields = { + { "gpi", ID_AA64ISAR1_GPI_SHIFT }, + { "gpa", ID_AA64ISAR1_GPA_SHIFT }, + { "api", ID_AA64ISAR1_API_SHIFT }, + { "apa", ID_AA64ISAR1_APA_SHIFT }, + {} + }, +}; + extern u64 kaslr_feature_val; extern u64 kaslr_feature_mask; @@ -61,6 +74,7 @@ static const struct reg_desc kaslr __initdata = { static const struct reg_desc * const regs[] __initdata = { &mmfr1, &pfr1, + &isar1, &kaslr, }; @@ -71,6 +85,9 @@ static const struct { { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, { "arm64.nobti","id_aa64pfr1.bt=0" }, + { "arm64.nopauth", + "id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 " + "id_aa64isar1.api=0 id_aa64isar1.apa=0" }, { "nokaslr","kaslr.disabled=1" }, }; -- 2.29.2
[PATCH v4 17/21] KVM: arm64: Document HVC_VHE_RESTART stub hypercall
For completeness, let's document the HVC_VHE_RESTART stub. Signed-off-by: Marc Zyngier --- Documentation/virt/kvm/arm/hyp-abi.rst | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/virt/kvm/arm/hyp-abi.rst b/Documentation/virt/kvm/arm/hyp-abi.rst index 83cadd8186fa..1ba628baf11b 100644 --- a/Documentation/virt/kvm/arm/hyp-abi.rst +++ b/Documentation/virt/kvm/arm/hyp-abi.rst @@ -58,6 +58,15 @@ these functions (see arch/arm{,64}/include/asm/virt.h): into place (arm64 only), and jump to the restart address while at HYP/EL2. This hypercall is not expected to return to its caller. +* :: + +x0 = HVC_VHE_RESTART (arm64 only) + + Attempt to upgrade the kernel's exception level from EL1 to EL2 by enabling + the VHE mode. This is conditioned by the CPU supporting VHE, the EL2 MMU + being off, and VHE not being disabled by any other mean (comment line option, + for example). + Any other value of r0/x0 triggers a hypervisor-specific handling, which is not documented here. -- 2.29.2
[PATCH v4 01/21] arm64: Fix labels in el2_setup macros
If someone happens to write the following code: b 1f init_el2_state vhe 1: [...] they will be in for a long debugging session, as the label "1f" will be resolved *inside* the init_el2_state macro instead of after it. Not really what one expects. Instead, rewite the EL2 setup macros to use unambiguous labels, thanks to the usual macro counter trick. Acked-by: Catalin Marinas Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/el2_setup.h | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index a7f5a1bbc8ac..540116de80bf 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -45,24 +45,24 @@ mrs x1, id_aa64dfr0_el1 sbfxx0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4 cmp x0, #1 - b.lt1f // Skip if no PMU present + b.lt.Lskip_pmu_\@ // Skip if no PMU present mrs x0, pmcr_el0// Disable debug access traps ubfxx0, x0, #11, #5 // to EL2 and allow access to -1: +.Lskip_pmu_\@: cselx2, xzr, x0, lt // all PMU counters from EL1 /* Statistical profiling */ ubfxx0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 - cbz x0, 3f // Skip if SPE not present + cbz x0, .Lskip_spe_\@ // Skip if SPE not present .ifeqs "\mode", "nvhe" mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2, and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT) - cbnzx0, 2f // then permit sampling of physical + cbnzx0, .Lskip_spe_el2_\@ // then permit sampling of physical mov x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \ 1 << SYS_PMSCR_EL2_PA_SHIFT) msr_s SYS_PMSCR_EL2, x0 // addresses and physical counter -2: +.Lskip_spe_el2_\@: mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) orr x2, x2, x0 // If we don't have VHE, then // use EL1&0 translation. @@ -71,7 +71,7 @@ // and disable access from EL1 .endif -3: +.Lskip_spe_\@: msr mdcr_el2, x2// Configure debug traps .endm @@ -79,9 +79,9 @@ .macro __init_el2_lor mrs x1, id_aa64mmfr1_el1 ubfxx0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4 - cbz x0, 1f + cbz x0, .Lskip_lor_\@ msr_s SYS_LORC_EL1, xzr -1: +.Lskip_lor_\@: .endm /* Stage-2 translation */ @@ -93,7 +93,7 @@ .macro __init_el2_gicv3 mrs x0, id_aa64pfr0_el1 ubfxx0, x0, #ID_AA64PFR0_GIC_SHIFT, #4 - cbz x0, 1f + cbz x0, .Lskip_gicv3_\@ mrs_s x0, SYS_ICC_SRE_EL2 orr x0, x0, #ICC_SRE_EL2_SRE// Set ICC_SRE_EL2.SRE==1 @@ -103,7 +103,7 @@ mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back, tbz x0, #0, 1f // and check that it sticks msr_s SYS_ICH_HCR_EL2, xzr// Reset ICC_HCR_EL2 to defaults -1: +.Lskip_gicv3_\@: .endm .macro __init_el2_hstr @@ -128,14 +128,14 @@ .macro __init_el2_nvhe_sve mrs x1, id_aa64pfr0_el1 ubfxx1, x1, #ID_AA64PFR0_SVE_SHIFT, #4 - cbz x1, 1f + cbz x1, .Lskip_sve_\@ bic x0, x0, #CPTR_EL2_TZ// Also disable SVE traps msr cptr_el2, x0// Disable copro. traps to EL2 isb mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector msr_s SYS_ZCR_EL2, x1 // length for EL1. -1: +.Lskip_sve_\@: .endm .macro __init_el2_nvhe_prepare_eret -- 2.29.2
[PATCH v4 12/21] arm64: cpufeature: Add an early command-line cpufeature override facility
In order to be able to override CPU features at boot time, let's add a command line parser that matches options of the form "cpureg.feature=value", and store the corresponding value into the override val/mask pair. No features are currently defined, so no expected change in functionality. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/Makefile | 2 +- arch/arm64/kernel/head.S | 1 + arch/arm64/kernel/idreg-override.c | 119 + 3 files changed, 121 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/kernel/idreg-override.c diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 86364ab6f13f..2262f0392857 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -17,7 +17,7 @@ obj-y := debug-monitors.o entry.o irq.o fpsimd.o \ return_address.o cpuinfo.o cpu_errata.o \ cpufeature.o alternative.o cacheinfo.o \ smp.o smp_spin_table.o topology.o smccc-call.o \ - syscall.o proton-pack.o + syscall.o proton-pack.o idreg-override.o targets+= efi-entry.o diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index d74e5f84042e..b3c4dd04f74b 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -435,6 +435,7 @@ SYM_FUNC_START_LOCAL(__primary_switched) mov x0, x21 // pass FDT address in x0 bl early_fdt_map // Try mapping the FDT early + bl init_shadow_regs bl switch_to_vhe #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) bl kasan_early_init diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c new file mode 100644 index ..392f93b67103 --- /dev/null +++ b/arch/arm64/kernel/idreg-override.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Early cpufeature override framework + * + * Copyright (C) 2020 Google LLC + * Author: Marc Zyngier + */ + +#include +#include + +#include +#include + +struct reg_desc { + const char * const name; + u64 * const val; + u64 * const mask; + struct { + const char * const name; + u8 shift; + } fields[]; +}; + +static const struct reg_desc * const regs[] __initdata = { +}; + +static int __init find_field(const char *cmdline, const struct reg_desc *reg, +int f, u64 *v) +{ + char buf[256], *str; + size_t len; + + snprintf(buf, ARRAY_SIZE(buf), "%s.%s=", reg->name, reg->fields[f].name); + + str = strstr(cmdline, buf); + if (!(str == cmdline || (str > cmdline && *(str - 1) == ' '))) + return -1; + + str += strlen(buf); + len = strcspn(str, " "); + len = min(len, ARRAY_SIZE(buf) - 1); + strncpy(buf, str, len); + buf[len] = 0; + + return kstrtou64(buf, 0, v); +} + +static void __init match_options(const char *cmdline) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(regs); i++) { + int f; + + if (!regs[i]->val || !regs[i]->mask) + continue; + + for (f = 0; regs[i]->fields[f].name; f++) { + u64 v; + + if (find_field(cmdline, regs[i], f, &v)) + continue; + + *regs[i]->val |= (v & 0xf) << regs[i]->fields[f].shift; + *regs[i]->mask |= 0xfUL << regs[i]->fields[f].shift; + } + } +} + +static __init void parse_cmdline(void) +{ + if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) { + const u8 *prop; + void *fdt; + int node; + + fdt = get_early_fdt_ptr(); + if (!fdt) + goto out; + + node = fdt_path_offset(fdt, "/chosen"); + if (node < 0) + goto out; + + prop = fdt_getprop(fdt, node, "bootargs", NULL); + if (!prop) + goto out; + + match_options(prop); + + if (!IS_ENABLED(CONFIG_CMDLINE_EXTEND)) + return; + } + +out: + match_options(CONFIG_CMDLINE); +} + +void __init init_shadow_regs(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(regs); i++) { + if (regs[i]->val) + *regs[i]->val = 0; + if (regs[i]->mask) + *regs[i]->mask = 0; + } + + parse_cmdline(
[PATCH v4 10/21] arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding()
__read_sysreg_by_encoding() is used by a bunch of cpufeature helpers, which should take the feature override into account. Let's do that. For a good measure (and because we are likely to need to further down the line), make this helper available to the rest of the non-modular kernel. Code that needs to know the *real* features of a CPU can still use read_sysreg_s(), and find the bare, ugly truth. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/cpufeature.h | 1 + arch/arm64/kernel/cpufeature.c | 15 +-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 465d2cb63bfc..fe0130d6c0ff 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -602,6 +602,7 @@ void __init setup_cpu_features(void); void check_local_cpu_capabilities(void); u64 read_sanitised_ftr_reg(u32 id); +u64 __read_sysreg_by_encoding(u32 sys_id); static inline bool cpu_supports_mixed_endian_el0(void) { diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index aaa075c6f029..48a011935d8c 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1149,14 +1149,17 @@ u64 read_sanitised_ftr_reg(u32 id) EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg); #define read_sysreg_case(r)\ - case r: return read_sysreg_s(r) + case r: val = read_sysreg_s(r); break; /* * __read_sysreg_by_encoding() - Used by a STARTING cpu before cpuinfo is populated. * Read the system register on the current CPU */ -static u64 __read_sysreg_by_encoding(u32 sys_id) +u64 __read_sysreg_by_encoding(u32 sys_id) { + struct arm64_ftr_reg *regp; + u64 val; + switch (sys_id) { read_sysreg_case(SYS_ID_PFR0_EL1); read_sysreg_case(SYS_ID_PFR1_EL1); @@ -1199,6 +1202,14 @@ static u64 __read_sysreg_by_encoding(u32 sys_id) BUG(); return 0; } + + regp = get_arm64_ftr_reg(sys_id); + if (regp && regp->override_mask && regp->override_val) { + val &= ~*regp->override_mask; + val |= (*regp->override_val & *regp->override_mask); + } + + return val; } #include -- 2.29.2
[PATCH v4 14/21] arm64: Honor VHE being disabled from the command-line
Finally we can check whether VHE is disabled on the command line, and not enable it if that's the user's wish. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/hyp-stub.S | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 59820f9b8522..bbab2148a2a2 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -77,13 +77,24 @@ SYM_CODE_END(el1_sync) SYM_CODE_START_LOCAL(mutate_to_vhe) // Sanity check: MMU *must* be off mrs x0, sctlr_el2 - tbnzx0, #0, 1f + tbnzx0, #0, 2f // Needs to be VHE capable, obviously mrs x0, id_aa64mmfr1_el1 ubfxx0, x0, #ID_AA64MMFR1_VHE_SHIFT, #4 - cbz x0, 1f + cbz x0, 2f + // Check whether VHE is disabled from the command line + adr_l x1, id_aa64mmfr1_val + ldr x0, [x1] + adr_l x1, id_aa64mmfr1_mask + ldr x1, [x1] + ubfxx0, x0, #ID_AA64MMFR1_VHE_SHIFT, #4 + ubfxx1, x1, #ID_AA64MMFR1_VHE_SHIFT, #4 + cbz x1, 1f + and x0, x0, x1 + cbz x0, 2f +1: // Engage the VHE magic! mov_q x0, HCR_HOST_VHE_FLAGS msr hcr_el2, x0 @@ -152,7 +163,7 @@ skip_spe: orr x0, x0, x1 msr spsr_el1, x0 -1: eret +2: eret SYM_CODE_END(mutate_to_vhe) .macro invalid_vector label -- 2.29.2
[PATCH v4 18/21] arm64: Move "nokaslr" over to the early cpufeature infrastructure
Given that the early cpufeature infrastructure has borrowed quite a lot of code from the kaslr implementation, let's reimplement the matching of the "nokaslr" option with it. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/idreg-override.c | 17 ++ arch/arm64/kernel/kaslr.c | 37 +++--- 2 files changed, 20 insertions(+), 34 deletions(-) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 1db54878b2c4..143fe7b8e3ce 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -33,8 +33,24 @@ static const struct reg_desc mmfr1 __initdata = { }, }; +extern u64 kaslr_feature_val; +extern u64 kaslr_feature_mask; + +static const struct reg_desc kaslr __initdata = { + .name = "kaslr", +#ifdef CONFIG_RANDOMIZE_BASE + .val= &kaslr_feature_val, + .mask = &kaslr_feature_mask, +#endif + .fields = { + { "disabled", 0 }, + {} + }, +}; + static const struct reg_desc * const regs[] __initdata = { &mmfr1, + &kaslr, }; static const struct { @@ -43,6 +59,7 @@ static const struct { } aliases[] __initdata = { { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, + { "nokaslr","kaslr.disabled=1" }, }; static int __init find_field(const char *cmdline, const struct reg_desc *reg, diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c index 5fc86e7d01a1..dcc4a5aadbe2 100644 --- a/arch/arm64/kernel/kaslr.c +++ b/arch/arm64/kernel/kaslr.c @@ -51,39 +51,8 @@ static __init u64 get_kaslr_seed(void *fdt) return ret; } -static __init bool cmdline_contains_nokaslr(const u8 *cmdline) -{ - const u8 *str; - - str = strstr(cmdline, "nokaslr"); - return str == cmdline || (str > cmdline && *(str - 1) == ' '); -} - -static __init bool is_kaslr_disabled_cmdline(void *fdt) -{ - if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) { - int node; - const u8 *prop; - - node = fdt_path_offset(fdt, "/chosen"); - if (node < 0) - goto out; - - prop = fdt_getprop(fdt, node, "bootargs", NULL); - if (!prop) - goto out; - - if (cmdline_contains_nokaslr(prop)) - return true; - - if (IS_ENABLED(CONFIG_CMDLINE_EXTEND)) - goto out; - - return false; - } -out: - return cmdline_contains_nokaslr(CONFIG_CMDLINE); -} +u64 kaslr_feature_val __initdata; +u64 kaslr_feature_mask __initdata; /* * This routine will be executed with the kernel mapped at its default virtual @@ -126,7 +95,7 @@ u64 __init kaslr_early_init(void) * Check if 'nokaslr' appears on the command line, and * return 0 if that is the case. */ - if (is_kaslr_disabled_cmdline(fdt)) { + if (kaslr_feature_val & kaslr_feature_mask & 0xf) { kaslr_status = KASLR_DISABLED_CMDLINE; return 0; } -- 2.29.2
[PATCH v4 11/21] arm64: Extract early FDT mapping from kaslr_early_init()
As we want to parse more options very early in the kernel lifetime, let's always map the FDT early. This is achieved by moving that code out of kaslr_early_init(). No functionnal change expected. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/setup.h | 11 +++ arch/arm64/kernel/head.S | 3 ++- arch/arm64/kernel/kaslr.c | 7 +++ arch/arm64/kernel/setup.c | 15 +++ 4 files changed, 31 insertions(+), 5 deletions(-) create mode 100644 arch/arm64/include/asm/setup.h diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h new file mode 100644 index ..d3320618ed14 --- /dev/null +++ b/arch/arm64/include/asm/setup.h @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 + +#ifndef __ARM64_ASM_SETUP_H +#define __ARM64_ASM_SETUP_H + +#include + +void *get_early_fdt_ptr(void); +void early_fdt_map(u64 dt_phys); + +#endif diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index b425d2587cdb..d74e5f84042e 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -433,6 +433,8 @@ SYM_FUNC_START_LOCAL(__primary_switched) bl __pi_memset dsb ishst // Make zero page visible to PTW + mov x0, x21 // pass FDT address in x0 + bl early_fdt_map // Try mapping the FDT early bl switch_to_vhe #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) bl kasan_early_init @@ -440,7 +442,6 @@ SYM_FUNC_START_LOCAL(__primary_switched) #ifdef CONFIG_RANDOMIZE_BASE tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized? b.ne0f - mov x0, x21 // pass FDT address in x0 bl kaslr_early_init// parse FDT for KASLR options cbz x0, 0f // KASLR disabled? just proceed orr x23, x23, x0// record KASLR offset diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c index 1c74c45b9494..5fc86e7d01a1 100644 --- a/arch/arm64/kernel/kaslr.c +++ b/arch/arm64/kernel/kaslr.c @@ -19,6 +19,7 @@ #include #include #include +#include enum kaslr_status { KASLR_ENABLED, @@ -92,12 +93,11 @@ static __init bool is_kaslr_disabled_cmdline(void *fdt) * containing function pointers) to be reinitialized, and zero-initialized * .bss variables will be reset to 0. */ -u64 __init kaslr_early_init(u64 dt_phys) +u64 __init kaslr_early_init(void) { void *fdt; u64 seed, offset, mask, module_range; unsigned long raw; - int size; /* * Set a reasonable default for module_alloc_base in case @@ -111,8 +111,7 @@ u64 __init kaslr_early_init(u64 dt_phys) * and proceed with KASLR disabled. We will make another * attempt at mapping the FDT in setup_machine() */ - early_fixmap_init(); - fdt = fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL); + fdt = get_early_fdt_ptr(); if (!fdt) { kaslr_status = KASLR_DISABLED_FDT_REMAP; return 0; diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index c18aacde8bb0..01a994730754 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -168,6 +168,21 @@ static void __init smp_build_mpidr_hash(void) pr_warn("Large number of MPIDR hash buckets detected\n"); } +static void *early_fdt_ptr __initdata; + +void __init *get_early_fdt_ptr(void) +{ + return early_fdt_ptr; +} + +void __init early_fdt_map(u64 dt_phys) +{ + int fdt_size; + + early_fixmap_init(); + early_fdt_ptr = fixmap_remap_fdt(dt_phys, &fdt_size, PAGE_KERNEL); +} + static void __init setup_machine_fdt(phys_addr_t dt_phys) { int size; -- 2.29.2
[PATCH v4 15/21] arm64: Add an aliasing facility for the idreg override
In order to map the override of idregs to options that a user can easily understand, let's introduce yet another option array, which maps an option to the corresponding idreg options. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/idreg-override.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 75d9845f489b..16bc8b3b93ae 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -37,6 +37,12 @@ static const struct reg_desc * const regs[] __initdata = { &mmfr1, }; +static const struct { + const char * const alias; + const char * const feature; +} aliases[] __initdata = { +}; + static int __init find_field(const char *cmdline, const struct reg_desc *reg, int f, u64 *v) { @@ -80,6 +86,18 @@ static void __init match_options(const char *cmdline) } } +static __init void match_aliases(const char *cmdline) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(aliases); i++) { + char *str = strstr(cmdline, aliases[i].alias); + + if ((str == cmdline || (str > cmdline && *(str - 1) == ' '))) + match_options(aliases[i].feature); + } +} + static __init void parse_cmdline(void) { if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) { @@ -100,6 +118,7 @@ static __init void parse_cmdline(void) goto out; match_options(prop); + match_aliases(prop); if (!IS_ENABLED(CONFIG_CMDLINE_EXTEND)) return; @@ -107,6 +126,7 @@ static __init void parse_cmdline(void) out: match_options(CONFIG_CMDLINE); + match_aliases(CONFIG_CMDLINE); } void __init init_shadow_regs(void) -- 2.29.2
Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
On 2021-01-18 19:16, Geert Uytterhoeven wrote: Hi Marc, On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier wrote: On 2021-01-18 17:39, Geert Uytterhoeven wrote: > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan > wrote: >> Cyclic dependencies in some firmware was one of the last remaining >> reasons fw_devlink=on couldn't be set by default. Now that cyclic >> dependencies don't block probing, set fw_devlink=on by default. >> >> Setting fw_devlink=on by default brings a bunch of benefits >> (currently, >> only for systems with device tree firmware): >> * Significantly cuts down deferred probes. >> * Device probe is effectively attempted in graph order. >> * Makes it much easier to load drivers as modules without having to >> worry about functional dependencies between modules (depmod is still >> needed for symbol dependencies). >> >> If this patch prevents some devices from probing, it's very likely due >> to the system having one or more device drivers that "probe"/set up a >> device (DT node with compatible property) without creating a struct >> device for it. If we hit such cases, the device drivers need to be >> fixed so that they populate struct devices and probe them like normal >> device drivers so that the driver core is aware of the devices and >> their >> status. See [1] for an example of such a case. >> >> [1] - >> https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com/ >> Signed-off-by: Saravana Kannan > > Shimoda-san reported that next-20210111 and later fail to boot > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon > is enabled. > > I have bisected this to commit e590474768f1cc04 ("driver core: Set > fw_devlink=on by default"). There is a tentative patch from Saravana here[1], which works around some issues on my RK3399 platform, and it'd be interesting to find out whether that helps on your system. Thanks, M. [1] https://lore.kernel.org/r/20210116011412.3211292-1-sarava...@google.com Thanks for the suggestion, but given no devices probe (incl. GPIO providers), I'm afraid it won't help. [testing] Indeed. With the debug prints in device_links_check_suppliers enabled, and some postprocessing, I get: 255 supplier e618.system-controller not ready 9 supplier fe99.iommu not ready 9 supplier fe98.iommu not ready 6 supplier febd.iommu not ready 6 supplier ec67.iommu not ready 3 supplier febe.iommu not ready 3 supplier e774.iommu not ready 3 supplier e674.iommu not ready 3 supplier e65ee000.usb-phy not ready 3 supplier e657.iommu not ready 3 supplier e6054000.gpio not ready 3 supplier e6053000.gpio not ready As everything is part of a PM Domain, the (lack of the) system controller must be the culprit. What's wrong with it? It is registered very early in the boot: [0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0 Yeah, this looks like the exact same problem. The devlink stuff assumes that because there is a "compatible" property, there will be a driver directly associated with the node containing this property. If any other node has a reference to that first node, the dependency will only get resolved if/when that first node is bound to a driver. Trouble is, there are *tons* of code in the tree that invalidate this heuristic, and for each occurrence of this we get another failure. The patch I referred to papers over it by registering a dummy driver, but that doesn't scale easily... M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default
Hi Geert, On 2021-01-18 17:39, Geert Uytterhoeven wrote: Hi Saravana, On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan wrote: Cyclic dependencies in some firmware was one of the last remaining reasons fw_devlink=on couldn't be set by default. Now that cyclic dependencies don't block probing, set fw_devlink=on by default. Setting fw_devlink=on by default brings a bunch of benefits (currently, only for systems with device tree firmware): * Significantly cuts down deferred probes. * Device probe is effectively attempted in graph order. * Makes it much easier to load drivers as modules without having to worry about functional dependencies between modules (depmod is still needed for symbol dependencies). If this patch prevents some devices from probing, it's very likely due to the system having one or more device drivers that "probe"/set up a device (DT node with compatible property) without creating a struct device for it. If we hit such cases, the device drivers need to be fixed so that they populate struct devices and probe them like normal device drivers so that the driver core is aware of the devices and their status. See [1] for an example of such a case. [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com/ Signed-off-by: Saravana Kannan Shimoda-san reported that next-20210111 and later fail to boot on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon is enabled. I have bisected this to commit e590474768f1cc04 ("driver core: Set fw_devlink=on by default"). There is a tentative patch from Saravana here[1], which works around some issues on my RK3399 platform, and it'd be interesting to find out whether that helps on your system. Thanks, M. [1] https://lore.kernel.org/r/20210116011412.3211292-1-sarava...@google.com -- Jazz is not dead. It just smells funny...
Re: [PATCH] KVM: arm64: Fix the return value of smp_call_function_single()
On 2021-01-18 09:31, Yejune Deng wrote: In smp_call_function_single(), the 3rd parameter isn't the return value and it's always positive. But it may return a negative value. So the 'ret' is should be the return value of the smp_call_function_single(). In check_kvm_target_cpu(), 'phys_target' is more readable than 'ret'. Signed-off-by: Yejune Deng --- arch/arm64/kvm/arm.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 04c44853b103..5fa5c04106de 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1815,9 +1815,9 @@ static int init_hyp_mode(void) return err; } -static void check_kvm_target_cpu(void *ret) +static void check_kvm_target_cpu(void *phys_target) { - *(int *)ret = kvm_target_cpu(); + *(int *)phys_target = kvm_target_cpu(); } struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) @@ -1879,7 +1879,7 @@ void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons) int kvm_arch_init(void *opaque) { int err; - int ret, cpu; + int ret, cpu, phys_target; bool in_hyp_mode; if (!is_hyp_mode_available()) { @@ -1900,7 +1900,7 @@ int kvm_arch_init(void *opaque) "Only trusted guests should be used on this system.\n"); for_each_online_cpu(cpu) { - smp_call_function_single(cpu, check_kvm_target_cpu, &ret, 1); + ret = smp_call_function_single(cpu, check_kvm_target_cpu, &phys_target, 1); if (ret < 0) { kvm_err("Error, CPU %d not supported!\n", cpu); return -ENODEV; That's not the purpose of this code. We check the target CPU for so that we can decide to *fail* the KVM init if there is a CPU we do not support (we definitely used to do that with 32bit), and the error message clearly states this. So if you want to handle a cross-call failure, please do that. But don't change the existing semantics of this code. M. -- Jazz is not dead. It just smells funny...
[PATCH v4 19/21] arm64: cpufeatures: Allow disabling of BTI from the command-line
In order to be able to disable BTI at runtime, whether it is for testing purposes, or to work around HW issues, let's add support for overriding the ID_AA64PFR1_EL1.BTI field. This is further mapped on the arm64.nobti command-line alias. Signed-off-by: Marc Zyngier --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/include/asm/cpufeature.h | 2 ++ arch/arm64/kernel/cpufeature.c | 5 - arch/arm64/kernel/idreg-override.c | 12 arch/arm64/mm/mmu.c | 2 +- 5 files changed, 22 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2786fd39a047..7599fd0f1ad7 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -373,6 +373,9 @@ arcrimi=[HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards Format: ,, + arm64.nobti [ARM64] Unconditionally disable Branch Target + Identification support + ataflop=[HW,M68k] atarimouse= [HW,MOUSE] Atari Mouse diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 80a5f423444e..d3e0f6dd43c4 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -816,6 +816,8 @@ static inline unsigned int get_vmid_bits(u64 mmfr1) extern u64 id_aa64mmfr1_val; extern u64 id_aa64mmfr1_mask; +extern u64 id_aa64pfr1_val; +extern u64 id_aa64pfr1_mask; u32 get_kvm_ipa_limit(void); void dump_cpu_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 5b9343d2e9f0..f223171a7c34 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -557,6 +557,8 @@ static const struct arm64_ftr_bits ftr_raz[] = { u64 id_aa64mmfr1_val; u64 id_aa64mmfr1_mask; +u64 id_aa64pfr1_val; +u64 id_aa64pfr1_mask; static const struct __ftr_reg_entry { u32 sys_id; @@ -592,7 +594,8 @@ static const struct __ftr_reg_entry { /* Op1 = 0, CRn = 0, CRm = 4 */ ARM64_FTR_REG(SYS_ID_AA64PFR0_EL1, ftr_id_aa64pfr0), - ARM64_FTR_REG(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1), + ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1, + &id_aa64pfr1_val, &id_aa64pfr1_mask), ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0), /* Op1 = 0, CRn = 0, CRm = 5 */ diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index 143fe7b8e3ce..a9e3ed193fd4 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -33,6 +33,16 @@ static const struct reg_desc mmfr1 __initdata = { }, }; +static const struct reg_desc pfr1 __initdata = { + .name = "id_aa64pfr1", + .val= &id_aa64pfr1_val, + .mask = &id_aa64pfr1_mask, + .fields = { + { "bt", ID_AA64PFR1_BT_SHIFT }, + {} + }, +}; + extern u64 kaslr_feature_val; extern u64 kaslr_feature_mask; @@ -50,6 +60,7 @@ static const struct reg_desc kaslr __initdata = { static const struct reg_desc * const regs[] __initdata = { &mmfr1, + &pfr1, &kaslr, }; @@ -59,6 +70,7 @@ static const struct { } aliases[] __initdata = { { "kvm-arm.mode=nvhe", "id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, + { "arm64.nobti","id_aa64pfr1.bt=0" }, { "nokaslr","kaslr.disabled=1" }, }; diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index ae0c3d023824..617e704c980b 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -628,7 +628,7 @@ static bool arm64_early_this_cpu_has_bti(void) if (!IS_ENABLED(CONFIG_ARM64_BTI_KERNEL)) return false; - pfr1 = read_sysreg_s(SYS_ID_AA64PFR1_EL1); + pfr1 = __read_sysreg_by_encoding(SYS_ID_AA64PFR1_EL1); return cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_BT_SHIFT); } -- 2.29.2
[PATCH v4 04/21] arm64: Provide an 'upgrade to VHE' stub hypercall
As we are about to change the way a VHE system boots, let's provide the core helper, in the form of a stub hypercall that enables VHE and replicates the full EL1 context at EL2, thanks to EL1 and VHE-EL2 being extremely similar. On exception return, the kernel carries on at EL2. Fancy! Nothing calls this new hypercall yet, so no functional change. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/virt.h | 7 +++- arch/arm64/kernel/hyp-stub.S | 67 +-- 2 files changed, 71 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h index ee6a48df89d9..7379f35ae2c6 100644 --- a/arch/arm64/include/asm/virt.h +++ b/arch/arm64/include/asm/virt.h @@ -35,8 +35,13 @@ */ #define HVC_RESET_VECTORS 2 +/* + * HVC_VHE_RESTART - Upgrade the CPU from EL1 to EL2, if possible + */ +#define HVC_VHE_RESTART3 + /* Max number of HYP stub hypercalls */ -#define HVC_STUB_HCALL_NR 3 +#define HVC_STUB_HCALL_NR 4 /* Error returned when an invalid stub number is passed into x0 */ #define HVC_STUB_ERR 0xbadca11 diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 160f5881a0b7..fb12398b5c28 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -8,9 +8,9 @@ #include #include -#include #include +#include #include #include #include @@ -47,10 +47,13 @@ SYM_CODE_END(__hyp_stub_vectors) SYM_CODE_START_LOCAL(el1_sync) cmp x0, #HVC_SET_VECTORS - b.ne2f + b.ne1f msr vbar_el2, x1 b 9f +1: cmp x0, #HVC_VHE_RESTART + b.eqmutate_to_vhe + 2: cmp x0, #HVC_SOFT_RESTART b.ne3f mov x0, x2 @@ -70,6 +73,66 @@ SYM_CODE_START_LOCAL(el1_sync) eret SYM_CODE_END(el1_sync) +// nVHE? No way! Give me the real thing! +SYM_CODE_START_LOCAL(mutate_to_vhe) + // Sanity check: MMU *must* be off + mrs x0, sctlr_el2 + tbnzx0, #0, 1f + + // Needs to be VHE capable, obviously + mrs x0, id_aa64mmfr1_el1 + ubfxx0, x0, #ID_AA64MMFR1_VHE_SHIFT, #4 + cbz x0, 1f + + // Engage the VHE magic! + mov_q x0, HCR_HOST_VHE_FLAGS + msr hcr_el2, x0 + isb + + // Doesn't do much on VHE, but still, worth a shot + init_el2_state vhe + + // Use the EL1 allocated stack, per-cpu offset + mrs x0, sp_el1 + mov sp, x0 + mrs x0, tpidr_el1 + msr tpidr_el2, x0 + + // FP configuration, vectors + mrs_s x0, SYS_CPACR_EL12 + msr cpacr_el1, x0 + mrs_s x0, SYS_VBAR_EL12 + msr vbar_el1, x0 + + // Transfer the MM state from EL1 to EL2 + mrs_s x0, SYS_TCR_EL12 + msr tcr_el1, x0 + mrs_s x0, SYS_TTBR0_EL12 + msr ttbr0_el1, x0 + mrs_s x0, SYS_TTBR1_EL12 + msr ttbr1_el1, x0 + mrs_s x0, SYS_MAIR_EL12 + msr mair_el1, x0 + isb + + // Invalidate TLBs before enabling the MMU + tlbivmalle1 + dsb nsh + + // Enable the EL2 S1 MMU, as set up from EL1 + mrs_s x0, SYS_SCTLR_EL12 + set_sctlr_el1 x0 + + // Hack the exception return to stay at EL2 + mrs x0, spsr_el1 + and x0, x0, #~PSR_MODE_MASK + mov x1, #PSR_MODE_EL2h + orr x0, x0, x1 + msr spsr_el1, x0 + +1: eret +SYM_CODE_END(mutate_to_vhe) + .macro invalid_vector label SYM_CODE_START_LOCAL(\label) b \label -- 2.29.2
[PATCH v4 03/21] arm64: Turn the MMU-on sequence into a macro
Turning the MMU on is a popular sport in the arm64 kernel, and we do it more than once, or even twice. As we are about to add even more, let's turn it into a macro. No expected functional change. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/assembler.h | 17 + arch/arm64/kernel/head.S | 19 --- arch/arm64/mm/proc.S | 12 +--- 3 files changed, 22 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index bf125c591116..8cded93f99c3 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -675,6 +675,23 @@ USER(\label, icivau, \tmp2)// invalidate I line PoU .endif .endm +/* + * Set SCTLR_EL1 to the passed value, and invalidate the local icache + * in the process. This is called when setting the MMU on. + */ +.macro set_sctlr_el1, reg + msr sctlr_el1, \reg + isb + /* +* Invalidate the local I-cache so that any instructions fetched +* speculatively from the PoC are discarded, since they may have +* been dynamically patched at the PoU. +*/ + ic iallu + dsb nsh + isb +.endm + /* * Check whether to yield to another runnable task from kernel mode NEON code * (which runs with preemption disabled). diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index a0dc987724ed..28e9735302df 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -703,16 +703,9 @@ SYM_FUNC_START(__enable_mmu) offset_ttbr1 x1, x3 msr ttbr1_el1, x1 // load TTBR1 isb - msr sctlr_el1, x0 - isb - /* -* Invalidate the local I-cache so that any instructions fetched -* speculatively from the PoC are discarded, since they may have -* been dynamically patched at the PoU. -*/ - ic iallu - dsb nsh - isb + + set_sctlr_el1 x0 + ret SYM_FUNC_END(__enable_mmu) @@ -883,11 +876,7 @@ SYM_FUNC_START_LOCAL(__primary_switch) tlbivmalle1 // Remove any stale TLB entries dsb nsh - msr sctlr_el1, x19 // re-enable the MMU - isb - ic iallu // flush instructions fetched - dsb nsh // via old mapping - isb + set_sctlr_el1 x19 // re-enable the MMU bl __relocate_kernel #endif diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index ece785477bdc..c967bfd30d2b 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -291,17 +291,7 @@ skip_pgd: /* We're done: fire up the MMU again */ mrs x17, sctlr_el1 orr x17, x17, #SCTLR_ELx_M - msr sctlr_el1, x17 - isb - - /* -* Invalidate the local I-cache so that any instructions fetched -* speculatively from the PoC are discarded, since they may have -* been dynamically patched at the PoU. -*/ - ic iallu - dsb nsh - isb + set_sctlr_el1 x17 /* Set the flag to zero to indicate that we're all done */ str wzr, [flag_ptr] -- 2.29.2
[PATCH v4 02/21] arm64: Fix outdated TCR setup comment
The arm64 kernel has long be able to use more than 39bit VAs. Since day one, actually. Let's rewrite the offending comment. Signed-off-by: Marc Zyngier --- arch/arm64/mm/proc.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 1f7ee8c8b7b8..ece785477bdc 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -464,8 +464,8 @@ SYM_FUNC_START(__cpu_setup) #endif msr mair_el1, x5 /* -* Set/prepare TCR and TTBR. We use 512GB (39-bit) address range for -* both user and kernel. +* Set/prepare TCR and TTBR. TCR_EL1.T1SZ gets further +* adjusted if the kernel is compiled with 52bit VA support. */ mov_q x10, TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \ TCR_TG_FLAGS | TCR_KASLR_FLAGS | TCR_ASID16 | \ -- 2.29.2
[PATCH v4 07/21] arm64: Simplify init_el2_state to be non-VHE only
As init_el2_state is now nVHE only, let's simplify it and drop the VHE setup. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/el2_setup.h | 36 +++--- arch/arm64/kernel/head.S | 2 +- arch/arm64/kvm/hyp/nvhe/hyp-init.S | 2 +- 3 files changed, 10 insertions(+), 30 deletions(-) diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index 540116de80bf..d77d358f9395 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -32,16 +32,14 @@ * to transparently mess with the EL0 bits via CNTKCTL_EL1 access in * EL2. */ -.macro __init_el2_timers mode -.ifeqs "\mode", "nvhe" +.macro __init_el2_timers mrs x0, cnthctl_el2 orr x0, x0, #3 // Enable EL1 physical timers msr cnthctl_el2, x0 -.endif msr cntvoff_el2, xzr// Clear virtual offset .endm -.macro __init_el2_debug mode +.macro __init_el2_debug mrs x1, id_aa64dfr0_el1 sbfxx0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4 cmp x0, #1 @@ -55,7 +53,6 @@ ubfxx0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 cbz x0, .Lskip_spe_\@ // Skip if SPE not present -.ifeqs "\mode", "nvhe" mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2, and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT) cbnzx0, .Lskip_spe_el2_\@ // then permit sampling of physical @@ -66,10 +63,6 @@ mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) orr x2, x2, x0 // If we don't have VHE, then // use EL1&0 translation. -.else - orr x2, x2, #MDCR_EL2_TPMS // For VHE, use EL2 translation - // and disable access from EL1 -.endif .Lskip_spe_\@: msr mdcr_el2, x2// Configure debug traps @@ -145,37 +138,24 @@ /** * Initialize EL2 registers to sane values. This should be called early on all - * cores that were booted in EL2. + * cores that were booted in EL2. Note that everything gets initialised as + * if VHE was not evailable. The kernel context will be upgraded to VHE + * if possible later on in the boot process * * Regs: x0, x1 and x2 are clobbered. */ -.macro init_el2_state mode -.ifnes "\mode", "vhe" -.ifnes "\mode", "nvhe" -.error "Invalid 'mode' argument" -.endif -.endif - +.macro init_el2_state __init_el2_sctlr - __init_el2_timers \mode - __init_el2_debug \mode + __init_el2_timers + __init_el2_debug __init_el2_lor __init_el2_stage2 __init_el2_gicv3 __init_el2_hstr - - /* -* When VHE is not in use, early init of EL2 needs to be done here. -* When VHE _is_ in use, EL1 will not be used in the host and -* requires no configuration, and all non-hyp-specific EL2 setup -* will be done via the _EL1 system register aliases in __cpu_setup. -*/ -.ifeqs "\mode", "nvhe" __init_el2_nvhe_idregs __init_el2_nvhe_cptr __init_el2_nvhe_sve __init_el2_nvhe_prepare_eret -.endif .endm #endif /* __ARM_KVM_INIT_H__ */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 07445fd976ef..36212c05df42 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -501,7 +501,7 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL) msr hcr_el2, x0 isb - init_el2_state nvhe + init_el2_state /* Hypervisor stub */ adr_l x0, __hyp_stub_vectors diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S index 31b060a44045..222cfc3e7190 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S @@ -189,7 +189,7 @@ SYM_CODE_START_LOCAL(__kvm_hyp_init_cpu) 2: msr SPsel, #1 // We want to use SP_EL{1,2} /* Initialize EL2 CPU state to sane values. */ - init_el2_state nvhe // Clobbers x0..x2 + init_el2_state // Clobbers x0..x2 /* Enable MMU, set vectors and stack. */ mov x0, x28 -- 2.29.2
[PATCH v4 08/21] arm64: Move SCTLR_EL1 initialisation to EL-agnostic code
We can now move the initial SCTLR_EL1 setup to be used for both EL1 and EL2 setup. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/head.S | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 36212c05df42..b425d2587cdb 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -479,13 +479,14 @@ EXPORT_SYMBOL(kimage_vaddr) * booted in EL1 or EL2 respectively. */ SYM_FUNC_START(init_kernel_el) + mov_q x0, INIT_SCTLR_EL1_MMU_OFF + msr sctlr_el1, x0 + mrs x0, CurrentEL cmp x0, #CurrentEL_EL2 b.eqinit_el2 SYM_INNER_LABEL(init_el1, SYM_L_LOCAL) - mov_q x0, INIT_SCTLR_EL1_MMU_OFF - msr sctlr_el1, x0 isb mov_q x0, INIT_PSTATE_EL1 msr spsr_el1, x0 @@ -494,9 +495,6 @@ SYM_INNER_LABEL(init_el1, SYM_L_LOCAL) eret SYM_INNER_LABEL(init_el2, SYM_L_LOCAL) - mov_q x0, INIT_SCTLR_EL1_MMU_OFF - msr sctlr_el1, x0 - mov_q x0, HCR_HOST_NVHE_FLAGS msr hcr_el2, x0 isb -- 2.29.2
[PATCH v4 05/21] arm64: Initialise as nVHE before switching to VHE
As we are aiming to be able to control whether we enable VHE or not, let's always drop down to EL1 first, and only then upgrade to VHE if at all possible. This means that if the kernel is booted at EL2, we always start with a nVHE init, drop to EL1 to initialise the the kernel, and only then upgrade the kernel EL to EL2 if possible (the process is obviously shortened for secondary CPUs). The resume path is handled similarly to a secondary CPU boot. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/head.S | 38 ++-- arch/arm64/kernel/hyp-stub.S | 24 +++ arch/arm64/kernel/sleep.S| 1 + 3 files changed, 27 insertions(+), 36 deletions(-) diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 28e9735302df..07445fd976ef 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -433,6 +433,7 @@ SYM_FUNC_START_LOCAL(__primary_switched) bl __pi_memset dsb ishst // Make zero page visible to PTW + bl switch_to_vhe #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) bl kasan_early_init #endif @@ -493,42 +494,6 @@ SYM_INNER_LABEL(init_el1, SYM_L_LOCAL) eret SYM_INNER_LABEL(init_el2, SYM_L_LOCAL) -#ifdef CONFIG_ARM64_VHE - /* -* Check for VHE being present. x2 being non-zero indicates that we -* do have VHE, and that the kernel is intended to run at EL2. -*/ - mrs x2, id_aa64mmfr1_el1 - ubfxx2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4 -#else - mov x2, xzr -#endif - cbz x2, init_el2_nvhe - - /* -* When VHE _is_ in use, EL1 will not be used in the host and -* requires no configuration, and all non-hyp-specific EL2 setup -* will be done via the _EL1 system register aliases in __cpu_setup. -*/ - mov_q x0, HCR_HOST_VHE_FLAGS - msr hcr_el2, x0 - isb - - init_el2_state vhe - - isb - - mov_q x0, INIT_PSTATE_EL2 - msr spsr_el2, x0 - msr elr_el2, lr - mov w0, #BOOT_CPU_MODE_EL2 - eret - -SYM_INNER_LABEL(init_el2_nvhe, SYM_L_LOCAL) - /* -* When VHE is not in use, early init of EL2 and EL1 needs to be -* done here. -*/ mov_q x0, INIT_SCTLR_EL1_MMU_OFF msr sctlr_el1, x0 @@ -623,6 +588,7 @@ SYM_FUNC_START_LOCAL(secondary_startup) /* * Common entry point for secondary CPUs. */ + bl switch_to_vhe bl __cpu_secondary_check52bitva bl __cpu_setup // initialise processor adrpx1, swapper_pg_dir diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index fb12398b5c28..52b29c54a9f7 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -181,3 +181,27 @@ SYM_FUNC_START(__hyp_reset_vectors) hvc #0 ret SYM_FUNC_END(__hyp_reset_vectors) + +/* + * Entry point to switch to VHE if deemed capable + */ +SYM_FUNC_START(switch_to_vhe) +#ifdef CONFIG_ARM64_VHE + // Need to have booted at EL2 + adr_l x1, __boot_cpu_mode + ldr w0, [x1] + cmp w0, #BOOT_CPU_MODE_EL2 + b.ne1f + + // and still be at EL1 + mrs x0, CurrentEL + cmp x0, #CurrentEL_EL1 + b.ne1f + + // Turn the world upside down + mov x0, #HVC_VHE_RESTART + hvc #0 +1: +#endif + ret +SYM_FUNC_END(switch_to_vhe) diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S index 6bdef7362c0e..5bfd9b87f85d 100644 --- a/arch/arm64/kernel/sleep.S +++ b/arch/arm64/kernel/sleep.S @@ -100,6 +100,7 @@ SYM_FUNC_END(__cpu_suspend_enter) .pushsection ".idmap.text", "awx" SYM_CODE_START(cpu_resume) bl init_kernel_el + bl switch_to_vhe bl __cpu_setup /* enable the MMU early - so we can access sleep_save_stash by va */ adrpx1, swapper_pg_dir -- 2.29.2
[PATCH v4 06/21] arm64: Move VHE-specific SPE setup to mutate_to_vhe()
There isn't much that a VHE kernel needs on top of whatever has been done for nVHE, so let's move the little we need to the VHE stub (the SPE setup), and drop the init_el2_state macro. No expected functional change. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/hyp-stub.S | 28 +--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 52b29c54a9f7..59820f9b8522 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -89,9 +89,6 @@ SYM_CODE_START_LOCAL(mutate_to_vhe) msr hcr_el2, x0 isb - // Doesn't do much on VHE, but still, worth a shot - init_el2_state vhe - // Use the EL1 allocated stack, per-cpu offset mrs x0, sp_el1 mov sp, x0 @@ -104,6 +101,31 @@ SYM_CODE_START_LOCAL(mutate_to_vhe) mrs_s x0, SYS_VBAR_EL12 msr vbar_el1, x0 + // Fixup SPE configuration, if supported... + mrs x1, id_aa64dfr0_el1 + ubfxx1, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 + mov x2, xzr + cbz x1, skip_spe + + // ... and not owned by EL3 + mrs_s x0, SYS_PMBIDR_EL1 + and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT) + cbnzx0, skip_spe + + // Let the SPE driver in control of the sampling + mrs_s x0, SYS_PMSCR_EL1 + bic x0, x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT) + bic x0, x0, #(1 << SYS_PMSCR_EL2_PA_SHIFT) + msr_s SYS_PMSCR_EL1, x0 + mov x2, #MDCR_EL2_TPMS + +skip_spe: + // For VHE, use EL2 translation and disable access from EL1 + mrs x0, mdcr_el2 + bic x0, x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) + orr x0, x0, x2 + msr mdcr_el2, x0 + // Transfer the MM state from EL1 to EL2 mrs_s x0, SYS_TCR_EL12 msr tcr_el1, x0 -- 2.29.2
[PATCH v4 09/21] arm64: cpufeature: Add global feature override facility
Add a facility to globally override a feature, no matter what the HW says. Yes, this sounds dangerous, but we do respect the "safe" value for a given feature. This doesn't mean the user doesn't need to know what they are doing. Nothing uses this yet, so we are pretty safe. For now. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/cpufeature.h | 2 ++ arch/arm64/kernel/cpufeature.c | 44 + 2 files changed, 41 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 9a555809b89c..465d2cb63bfc 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -75,6 +75,8 @@ struct arm64_ftr_reg { u64 sys_val; u64 user_val; const struct arm64_ftr_bits *ftr_bits; + u64 *override_val; + u64 *override_mask; }; extern struct arm64_ftr_reg arm64_ftr_reg_ctrel0; diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index e99eddec0a46..aaa075c6f029 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -544,13 +544,17 @@ static const struct arm64_ftr_bits ftr_raz[] = { ARM64_FTR_END, }; -#define ARM64_FTR_REG(id, table) { \ - .sys_id = id, \ - .reg = &(struct arm64_ftr_reg){\ - .name = #id,\ - .ftr_bits = &((table)[0]), \ +#define ARM64_FTR_REG_OVERRIDE(id, table, v, m) { \ + .sys_id = id, \ + .reg = &(struct arm64_ftr_reg){\ + .name = #id,\ + .ftr_bits = &((table)[0]), \ + .override_val = v, \ + .override_mask = m, \ }} +#define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, NULL, NULL) + static const struct __ftr_reg_entry { u32 sys_id; struct arm64_ftr_reg*reg; @@ -760,6 +764,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -767,9 +772,38 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + s64 tmp = arm64_ftr_safe_value(ftrp, ftr_ovr, ftr_new); + char *str = NULL; + + if (ftr_ovr != tmp) { + /* Unsafe, remove the override */ + *reg->override_mask &= ~ftr_mask; + *reg->override_val &= ~ftr_mask; + tmp = ftr_ovr; + str = "ignoring override"; + } else if (ftr_new != tmp) { + /* Override was valid */ + ftr_new = tmp; + str = "forced"; + } + + if (str) + pr_warn("%s[%d:%d]: %s to %llx\n", + reg->name, + ftrp->shift + ftrp->width - 1, + ftrp->shift, str, tmp); + } val = arm64_ftr_set_value(ftrp, val, ftr_new); -- 2.29.2
[PATCH v4 00/21] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth
It recently came to light that there is a need to be able to override some CPU features very early on, before the kernel is fully up and running. The reasons for this range from specific feature support (such as using Protected KVM on VHE HW, which is the main motivation for this work) to errata workaround (a feature is broken on a CPU and needs to be turned off, or rather not enabled). This series tries to offer a limited framework for this kind of problems, by allowing a set of options to be passed on the command-line and altering the feature set that the cpufeature subsystem exposes to the rest of the kernel. Note that this doesn't change anything for code that directly uses the CPU ID registers. The series completely changes the way a VHE-capable system boots, by *always* booting non-VHE first, and then upgrading to VHE when deemed capable. Although it sounds scary, this is actually simple to implement (and I wish I had done that five years ago). The "upgrade to VHE" path is then conditioned on the VHE feature not being disabled from the command-line. Said command-line parsing borrows a lot from the kaslr code, and subsequently allows the "nokaslr" option to be moved to the new infrastructure (though it all looks a bit... odd). Further patches now add support for disabling BTI and PAuth, the latter being based on an initial series by Srinivas Ramana[0]. There is some ongoing discussions about being able to disable MTE, but no clear resolution on that subject yet. This has been tested on multiple VHE and non-VHE systems. * From v3 [3]: - Fixed the VHE_RESTART stub (duh!) - Switched to using arm64_ftr_safe_value() instead of the user provided value - Per-feature override warning * From v2 [2]: - Simplify the VHE_RESTART stub - Fixed a number of spelling mistakes, and hopefully introduced a few more - Override features in __read_sysreg_by_encoding() - Allow both BTI and PAuth to be overridden on the command line - Rebased on -rc3 * From v1 [1]: - Fix SPE init on VHE when EL2 doesn't own SPE - Fix re-init when KASLR is used - Handle the resume path - Rebased to 5.11-rc2 [0] https://lore.kernel.org/r/1610152163-16554-1-git-send-email-sram...@codeaurora.org [1] https://lore.kernel.org/r/20201228104958.1848833-1-...@kernel.org [2] https://lore.kernel.org/r/20210104135011.2063104-1-...@kernel.org [3] https://lore.kernel.org/r/2021032811.2455113-1-...@kernel.org Marc Zyngier (20): arm64: Fix labels in el2_setup macros arm64: Fix outdated TCR setup comment arm64: Turn the MMU-on sequence into a macro arm64: Provide an 'upgrade to VHE' stub hypercall arm64: Initialise as nVHE before switching to VHE arm64: Move VHE-specific SPE setup to mutate_to_vhe() arm64: Simplify init_el2_state to be non-VHE only arm64: Move SCTLR_EL1 initialisation to EL-agnostic code arm64: cpufeature: Add global feature override facility arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding() arm64: Extract early FDT mapping from kaslr_early_init() arm64: cpufeature: Add an early command-line cpufeature override facility arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line arm64: Honor VHE being disabled from the command-line arm64: Add an aliasing facility for the idreg override arm64: Make kvm-arm.mode={nvhe,protected} an alias of id_aa64mmfr1.vh=0 KVM: arm64: Document HVC_VHE_RESTART stub hypercall arm64: Move "nokaslr" over to the early cpufeature infrastructure arm64: cpufeatures: Allow disabling of BTI from the command-line arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line Srinivas Ramana (1): arm64: Defer enabling pointer authentication on boot core .../admin-guide/kernel-parameters.txt | 9 + Documentation/virt/kvm/arm/hyp-abi.rst| 9 + arch/arm64/include/asm/assembler.h| 17 ++ arch/arm64/include/asm/cpufeature.h | 10 + arch/arm64/include/asm/el2_setup.h| 60 ++ arch/arm64/include/asm/pointer_auth.h | 10 + arch/arm64/include/asm/setup.h| 11 + arch/arm64/include/asm/stackprotector.h | 1 + arch/arm64/include/asm/virt.h | 7 +- arch/arm64/kernel/Makefile| 2 +- arch/arm64/kernel/cpufeature.c| 75 ++- arch/arm64/kernel/head.S | 75 ++- arch/arm64/kernel/hyp-stub.S | 124 ++- arch/arm64/kernel/idreg-override.c| 199 ++ arch/arm64/kernel/kaslr.c | 44 +--- arch/arm64/kernel/setup.c | 15 ++ arch/arm64/kernel/sleep.S | 1 + arch/arm64/kvm/arm.c | 3 + arch/arm64/kvm/hyp/nvhe/hyp-init.S| 2 +- arch/arm64/mm/mmu.c | 2 +- arch/arm64/mm/proc.S
Re: [PATCH v2] gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default
Hi Saravana, Thanks for posting this, much appreciated. On Sat, 16 Jan 2021 01:14:11 +, Saravana Kannan wrote: > > There are multiple instances of GPIO devictree nodes of the form: > > foo { > compatible = "acme,foo"; > ... > > gpio0: gpio0@ { > compatible = "acme,bar"; > ... > gpio-controller; > }; > > gpio1: gpio1@ { > compatible = "acme,bar"; > ... > gpio-controller; > }; > > ... > } > > bazz { > my-gpios = <&gpio0 ...>; > } > > Case 1: The driver for "foo" populates struct device for these gpio* > nodes and then probes them using a driver that binds with "acme,bar". > This lines up with how DT nodes with the "compatible" property are > generally converted to struct devices and then registered with driver > core to probe them. This also allows the gpio* devices to hook into all > the driver core capabilities like runtime PM, probe deferral, > suspend/resume ordering, device links, etc. > > Case 2: The driver for "foo" doesn't populate its child device nodes > with "compatible" property and instead just loops through its child > nodes and directly registers the GPIOs with gpiolib without ever > populating a struct device or binding a driver to it. That's not quite an accurate description. The gpiolib subsystem does create a struct device. It doesn't register a driver though, which is what causes the issue with fr_devlink (more on that below). > > Drivers that follow the case 2 cause problems with fw_devlink=on. This > is because fw_devlink will prevent bazz from probing until there's a > struct device that has gpio0 as its fwnode (because bazz lists gpio0 as > a GPIO supplier). Once the struct device is available, fw_devlink will > create a device link between with gpio0 as the supplier and bazz as the > consumer. After this point, the device link will prevent bazz from > probing until its supplier (the gpio0 device) has bound to a driver. > Once the supplier is bound to a driver, the probe of bazz is triggered > automatically. > > Finding and refactoring all the instances of drivers that follow case 2 > will cause a lot of code churn and it not something that can be done in > one shot. Examples of such instances are [1] [2]. > > This patch works around this problem and avoids all the code churn by > simply creating a stub driver to bind to the gpio_device. Since the > gpio_device already points to the GPIO device tree node, this allows all > the consumers to continue probing when the driver follows case 2. > > If/when all the old drivers are refactored, we can revert this > patch. My personal gripe with this approach is that it is an abrupt change in the way DT and device model are mapped onto each other. As far as I know (and someone please correct me if I am wrong), there is zero expectation that a device/subdevice/random IP block described by a node with a "compatible" property will end-up being managed by a driver that is bound to that particular node. The node/subnode division is a good way to express some HW boundaries, but doesn't say anything about the way this should be handled in the kernel. Assuming that everything containing a "compatible" string will eventually be bound to a driver is IMO pretty limiting. > > [1] - https://lore.kernel.org/lkml/20201014191235.7f71fcb4@xhacker.debian/ > [2] - > https://lore.kernel.org/lkml/e28e1f38d87c12a3c714a6573beba...@kernel.org/ > Cc: Marc Zyngier > Cc: Jisheng Zhang > Cc: Kever Yang > Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default") > Signed-off-by: Saravana Kannan > --- > v1 -> v2: > - Fixed up compilation errors that were introduced accidentally > - Fixed a missing put_device() > > drivers/gpio/gpiolib.c | 37 + > 1 file changed, 37 insertions(+) > > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > index b02cc2abd3b6..12c579a953b0 100644 > --- a/drivers/gpio/gpiolib.c > +++ b/drivers/gpio/gpiolib.c > @@ -574,6 +574,9 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void > *data, > unsignedi; > int base = gc->base; > struct gpio_device *gdev; > + struct device_node *of_node; > + struct fwnode_handle *fwnode; > + struct device *fwnode_dev; > > /* >* First: allocate and populate the internal stat container, and > @@ -596,6 +599,22 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, > void *data, >
Re: [PATCH] KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return
On Tue, 29 Dec 2020 16:00:59 +, David Brazdil wrote: > The KVM/arm64 PSCI relay assumes that SYSTEM_OFF and SYSTEM_RESET should > not return, as dictated by the PSCI spec. However, there is firmware out > there which breaks this assumption, leading to a hyp panic. Make KVM > more robust to broken firmware by allowing these to return. Applied to next, thanks! [1/1] KVM: arm64: Allow PSCI SYSTEM_OFF/RESET to return commit: 2c91ef39216149df6703c3fa6a47dd9a1e6091c1 Cheers, M. -- Without deviation from the norm, progress is not possible.
Re: [PATCH v4 03/10] irqchip/sun6i-r: Use a stacked irqchip driver
On 2021-01-15 04:01, Samuel Holland wrote: Hello, On 1/14/21 3:06 PM, Marc Zyngier wrote: Hi Samuel, On 2021-01-12 05:59, Samuel Holland wrote: [...] +static void sun6i_r_intc_ack_nmi(void) +{ + writel(SUN6I_NMI_BIT, base + SUN6I_IRQ_PENDING(0)); writel_relaxed() irq_chip_unmask_parent(), which calls gic_unmask_irq(), is called immediately after this in .irq_unmask. Since gic_unmask_irq() also uses writel_relaxed(), the GIC write could be ordered before the write here. That's odd. writel() places a barrier *before* the actual write, ensuring that this write is ordered w.r.t. previous accesses. If you are trying to ensure ordering with what follows, you need an explicit barrier after this access. I guess that in the end, you may need both, as what you have orders the access to GICC_AIR to take place before the write to this pending register, and you also need to provide the ordering you just described. I was getting occasional spurious interrupts (1 out of each 20-25) when using a level trigger, which were resolved by switching to writel() here. I mentioned this in the changelog, but it probably deserves a comment in the code as well. Or maybe I should use an explicit barrier somewhere? Please document it in the code. This is subtle enough to warrant a good description. +} + +static void sun6i_r_intc_nmi_ack(struct irq_data *data) +{ + if (irqd_get_trigger_type(data) & IRQ_TYPE_EDGE_BOTH) + sun6i_r_intc_ack_nmi(); + else + data->chip_data = SUN6I_NMI_NEEDS_ACK; +} + +static void sun6i_r_intc_nmi_eoi(struct irq_data *data) +{ + /* For oneshot IRQs, delay the ack until the IRQ is unmasked. */ + if (data->chip_data == SUN6I_NMI_NEEDS_ACK && !irqd_irq_masked(data)) { + sun6i_r_intc_ack_nmi(); + data->chip_data = 0; nit: NULL rather than 0? NULL seemed less appropriate since I'm not using the field as a pointer, but I don't have a strong opinion about it. chip_data *is* a pointer, which is why we conventionally use NULL rather than an integer value. Up to you. [...] +static struct irq_chip sun6i_r_intc_nmi_chip = { + .name = "sun6i-r-intc", + .irq_ack= sun6i_r_intc_nmi_ack, + .irq_mask = irq_chip_mask_parent, + .irq_unmask = sun6i_r_intc_nmi_unmask, + .irq_eoi= sun6i_r_intc_nmi_eoi, + .irq_set_affinity = irq_chip_set_affinity_parent, + .irq_set_type = sun6i_r_intc_nmi_set_type, + .irq_set_irqchip_state = sun6i_r_intc_nmi_set_irqchip_state, You probably also want to wire irq_get_irqchip_state(), while you're at it. I thought if the interrupt was pending here, it would necessarily also be pending at the GIC, so adding a separate layer would be redundant. irq_set_vcpu_affinity(), __irq_get_irqchip_state(), and irq_set_irqchip_state() [the functions, not the callbacks] have the interesting property that they search up the irqdomain hierarchy for the first irqdomain with the callback. So if all the callback would do is defer to its parent, it doesn't need to be provided at all*. Ah, of course... I even wrote that code! Thanks, M. -- Jazz is not dead. It just smells funny...
Re: [PATCH v4 04/10] irqchip/sun6i-r: Add wakeup support
On Tue, 12 Jan 2021 05:59:44 +, Samuel Holland wrote: > > Maintain bitmaps of wake-enabled IRQs and mux inputs, and program them > to the hardware during the syscore phase of suspend and shutdown. Then > restore the original set of enabled IRQs (only the NMI) during resume. > > This serves two purposes. First, it lets power management firmware > running on the ARISC coprocessor know which wakeup sources Linux wants > to have enabled. That way, it can avoid turning them off when it shuts > down the remainder of the clock tree. Second, it preconfigures the > coprocessor's interrupt controller, so the firmware's wakeup logic > is as simple as waiting for an interrupt to arrive. > > The suspend/resume logic is not conditional on PM_SLEEP because it is > identical to the init/shutdown logic. Wake IRQs may be enabled during > shutdown to allow powering the board back on. As an example, see > commit a5c5e50cce9d ("Input: gpio-keys - add shutdown callback"). > > Signed-off-by: Samuel Holland > --- > drivers/irqchip/irq-sun6i-r.c | 107 -- > 1 file changed, 101 insertions(+), 6 deletions(-) > > diff --git a/drivers/irqchip/irq-sun6i-r.c b/drivers/irqchip/irq-sun6i-r.c > index d04d067423f4..a1b58c98d6ca 100644 > --- a/drivers/irqchip/irq-sun6i-r.c > +++ b/drivers/irqchip/irq-sun6i-r.c > @@ -39,6 +39,7 @@ > * set of 128 mux bits. This requires a second set of top-level registers. > */ > > +#include > #include > #include > #include > @@ -46,6 +47,7 @@ > #include > #include > #include > +#include > > #include > > @@ -67,8 +69,17 @@ > #define SUN6I_NR_DIRECT_IRQS 16 > #define SUN6I_NR_MUX_BITS128 > > +struct sun6i_r_intc_variant { > + u32 first_mux_irq; > + u32 nr_mux_irqs; > + u32 mux_valid[BITS_TO_U32(SUN6I_NR_MUX_BITS)]; > +}; > + > static void __iomem *base; > static irq_hw_number_t nmi_hwirq; > +static DECLARE_BITMAP(wake_irq_enabled, SUN6I_NR_TOP_LEVEL_IRQS); > +static DECLARE_BITMAP(wake_mux_enabled, SUN6I_NR_MUX_BITS); > +static DECLARE_BITMAP(wake_mux_valid, SUN6I_NR_MUX_BITS); > > static void sun6i_r_intc_ack_nmi(void) > { > @@ -145,6 +156,21 @@ static int sun6i_r_intc_nmi_set_irqchip_state(struct > irq_data *data, > return irq_chip_set_parent_state(data, which, state); > } > > +static int sun6i_r_intc_irq_set_wake(struct irq_data *data, unsigned int on) > +{ > + unsigned long offset_from_nmi = data->hwirq - nmi_hwirq; > + > + if (offset_from_nmi < SUN6I_NR_DIRECT_IRQS) > + assign_bit(offset_from_nmi, wake_irq_enabled, on); > + else if (test_bit(data->hwirq, wake_mux_valid)) > + assign_bit(data->hwirq, wake_mux_enabled, on); > + else > + /* Not wakeup capable. */ > + return -EPERM; > + > + return 0; > +} > + > static struct irq_chip sun6i_r_intc_nmi_chip = { > .name = "sun6i-r-intc", > .irq_ack= sun6i_r_intc_nmi_ack, > @@ -154,8 +180,19 @@ static struct irq_chip sun6i_r_intc_nmi_chip = { > .irq_set_affinity = irq_chip_set_affinity_parent, > .irq_set_type = sun6i_r_intc_nmi_set_type, > .irq_set_irqchip_state = sun6i_r_intc_nmi_set_irqchip_state, > - .flags = IRQCHIP_SET_TYPE_MASKED | > - IRQCHIP_SKIP_SET_WAKE, > + .irq_set_wake = sun6i_r_intc_irq_set_wake, > + .flags = IRQCHIP_SET_TYPE_MASKED, > +}; > + > +static struct irq_chip sun6i_r_intc_wakeup_chip = { > + .name = "sun6i-r-intc", > + .irq_mask = irq_chip_mask_parent, > + .irq_unmask = irq_chip_unmask_parent, > + .irq_eoi= irq_chip_eoi_parent, > + .irq_set_affinity = irq_chip_set_affinity_parent, > + .irq_set_type = irq_chip_set_type_parent, > + .irq_set_wake = sun6i_r_intc_irq_set_wake, > + .flags = IRQCHIP_SET_TYPE_MASKED, Worth implementing irq_get/set_irqchip_state() using the _parent helper, I guess. Thanks, M. -- Without deviation from the norm, progress is not possible.