from:"\"Mark Rutland\""

Re: KVM: arm64: A new approach for SPE support

2023-01-04 Thread Mark Rutland

On Wed, Jan 04, 2023 at 11:04:41AM +, Alexandru Elisei wrote:
> Hi Mark,
> 
> Thank you for having a look!
> 
> On Wed, Jan 04, 2023 at 09:19:25AM +0000, Mark Rutland wrote:
> > On Tue, Jan 03, 2023 at 02:27:59PM +, Alexandru Elisei wrote:
> > > Hi,
> > > 
> > > Gentle ping regarding this.
> > 
> > Hi Alexandru,
> > 
> > Sorry for the delay; things were a bit hectic at the end of last year, and 
> > this
> > is still on my queue of things to look at.
> > 
> > > Thanks,
> > > Alex
> > > 
> > > On Wed, Nov 23, 2022 at 11:40:45AM +, Alexandru Elisei wrote:
> > > > The previous discussion about how best to add SPE support to KVM [1] is
> > > > heading in the direction of pinning at EL2 only the buffer, when the 
> > > > guest
> > > > enables profiling, instead of pinning the entire VM memory. Although 
> > > > better
> > > > than pinning the entire VM at EL2, it still has some disadvantages:
> > > > 
> > > > 1. Pinning memory at stage 2 goes against the design principle of 
> > > > secondary
> > > > MMUs, which must reflect all changes in the primary (host's stage 1) 
> > > > page
> > > > tables. This means a mechanism by which to pin VM memory at stage 2 
> > > > must be
> > > > created from scratch just for SPE. Although I haven't done this yet, 
> > > > I'm a
> > > > bit concerned that this will turn out to be fragile and/or complicated.
> > > > 
> > > > 2. The architecture allows software to change the VA to IPA translations
> > > > for the profiling buffer when the buffer is enabled if profiling is
> > > > disabled (the buffer is enabled, but sampling is disabled). Since SPE 
> > > > can
> > > > be programmed to profile EL0 only, and there is no easy way for KVM to 
> > > > trap
> > > > the exact moment when profiling becomes enabled in this scenario to
> > > > translate the buffer's guest VAs to IPA, to pin the IPAs at stage 2, it 
> > > > is
> > > > required for KVM impose limitations on how a guest uses SPE for 
> > > > emulation
> > > > to work.
> > > > 
> > > > I've prototyped a new approach [2] which eliminates both disadvantages, 
> > > > but
> > > > comes with its own set of drawbacks. The approach I've been working on 
> > > > is
> > > > to have KVM allocate a buffer in the kernel address space to profile the
> > > > guest, and when the buffer becomes full (or profiling is disabled for 
> > > > other
> > > > reasons), to copy the contents of the buffer to guest memory.
> > 
> > This sounds neat!
> > 
> > I have a few comments below, I'll try to take a more in-depth look shortly.
> > 
> > > > I'll start with the advantages:
> > > > 
> > > > 1. No memory pinning at stage 2.
> > > > 
> > > > 2. No meaningful restrictions on how the guest programs SPE, since the
> > > > translation of the guest VAs to IPAs is done by KVM when profiling has 
> > > > been
> > > > completed.
> > > > 
> > > > 3. Neoverse N1 errata 1978083 ("Incorrect programming of PMBPTR_EL1 
> > > > might
> > > > result in a deadlock") [6] is handled without any extra work.
> > > > 
> > > > As I see it, there are two main disadvantages:
> > > > 
> > > > 1. The contents of the KVM buffer must be copied to the guest. In the
> > > > prototype this is done all at once, when profiling is stopped [3].
> > > > Presumably this can be amortized by unmapping the pages corresponding to
> > > > the guest buffer from stage 2 (or marking them as invalid) and copying 
> > > > the
> > > > data when the guest reads from those pages. Needs investigating.
> > 
> > I don't think we need to mess with the translation tables here; for a guest 
> > to
> > look at the buffer it's going to have to look at PMBPTR_EL1 (and a guest 
> > could
> > poll that and issue barriers without ever stopping SPE), so we could also 
> > force
> > writebacks when the guest reads PMBPTR_EL1.
> 
> I'm confused about this statement: are you saying that the guest must
> necessarily read PMBPTR_EL1 before accessing the buffer, and therefore KVM
> can defer all writebacks when PMBPTR_EL1 is read,

Re: KVM: arm64: A new approach for SPE support

2023-01-04 Thread Mark Rutland

On Tue, Jan 03, 2023 at 02:27:59PM +, Alexandru Elisei wrote:
> Hi,
> 
> Gentle ping regarding this.

Hi Alexandru,

Sorry for the delay; things were a bit hectic at the end of last year, and this
is still on my queue of things to look at.

> Thanks,
> Alex
> 
> On Wed, Nov 23, 2022 at 11:40:45AM +, Alexandru Elisei wrote:
> > The previous discussion about how best to add SPE support to KVM [1] is
> > heading in the direction of pinning at EL2 only the buffer, when the guest
> > enables profiling, instead of pinning the entire VM memory. Although better
> > than pinning the entire VM at EL2, it still has some disadvantages:
> > 
> > 1. Pinning memory at stage 2 goes against the design principle of secondary
> > MMUs, which must reflect all changes in the primary (host's stage 1) page
> > tables. This means a mechanism by which to pin VM memory at stage 2 must be
> > created from scratch just for SPE. Although I haven't done this yet, I'm a
> > bit concerned that this will turn out to be fragile and/or complicated.
> > 
> > 2. The architecture allows software to change the VA to IPA translations
> > for the profiling buffer when the buffer is enabled if profiling is
> > disabled (the buffer is enabled, but sampling is disabled). Since SPE can
> > be programmed to profile EL0 only, and there is no easy way for KVM to trap
> > the exact moment when profiling becomes enabled in this scenario to
> > translate the buffer's guest VAs to IPA, to pin the IPAs at stage 2, it is
> > required for KVM impose limitations on how a guest uses SPE for emulation
> > to work.
> > 
> > I've prototyped a new approach [2] which eliminates both disadvantages, but
> > comes with its own set of drawbacks. The approach I've been working on is
> > to have KVM allocate a buffer in the kernel address space to profile the
> > guest, and when the buffer becomes full (or profiling is disabled for other
> > reasons), to copy the contents of the buffer to guest memory.

This sounds neat!

I have a few comments below, I'll try to take a more in-depth look shortly.

> > I'll start with the advantages:
> > 
> > 1. No memory pinning at stage 2.
> > 
> > 2. No meaningful restrictions on how the guest programs SPE, since the
> > translation of the guest VAs to IPAs is done by KVM when profiling has been
> > completed.
> > 
> > 3. Neoverse N1 errata 1978083 ("Incorrect programming of PMBPTR_EL1 might
> > result in a deadlock") [6] is handled without any extra work.
> > 
> > As I see it, there are two main disadvantages:
> > 
> > 1. The contents of the KVM buffer must be copied to the guest. In the
> > prototype this is done all at once, when profiling is stopped [3].
> > Presumably this can be amortized by unmapping the pages corresponding to
> > the guest buffer from stage 2 (or marking them as invalid) and copying the
> > data when the guest reads from those pages. Needs investigating.

I don't think we need to mess with the translation tables here; for a guest to
look at the buffer it's going to have to look at PMBPTR_EL1 (and a guest could
poll that and issue barriers without ever stopping SPE), so we could also force
writebacks when the guest reads PMBPTR_EL1.

> > 2. When KVM profiles the guest, the KVM buffer owning exception level must
> > necessarily be EL2. This means that while profiling is happening,
> > PMBIDR_EL1.P = 1 (programming of the buffer is not allowed). PMBIDR_EL1
> > cannot be trapped without FEAT_FGT, so a guest that reads the register
> > after profiling becomes enabled will read the P bit as 1. I cannot think of
> > any valid reason for a guest to look at the bit after enabling profiling.
> > With FEAT_FGT, KVM would be able to trap accesses to the register.

This is unfortunate. :/

I agree it's unlikely the a guest would look at this, but I could imagine some
OSs doing this as a sanity-check, since they never expect this to change, and
if it suddenly becomes 1 they might treat this as an error.

Can we require FGT for guest SPE usage?

> > 3. In the worst case scenario, when the entire VM memory is mapped in the
> > host, this approach consumes more memory because the memory for the buffer
> > is separate from the memory allocated to the VM. On the plus side, there
> > will always be less memory pinned in the host for the VM process, since
> > only the buffer has to be pinned, instead of the buffer plus the guest's
> > stage 1 translation tables (to avoid SPE encountering a stage 2 fault on a
> > stage 1 translation table walk). Could be mitigated by providing an ioctl
> > to userspace to set the maximum size for the buffer.

It's a shame we don't have a mechanism to raise an interrupt prior to the SPE
buffer becoming full, or we could force a writeback each time we hit a
watermark.

I suspect having a maximum size set ahead of time (and pre-allocating the
buffer?) is the right thing to do. As long as it's set to a reasonably large
value we can treat filling the buffer as a collision.

> > I prefer this new ap

Re: [PATCH v3 7/8] perf: Add perf_event_attr::config3

2022-12-06 Thread Mark Rutland

Peter, it looks like this series is blocked on the below now; what would you
prefer out of:

(a) Take this as is, and look add adding additional validation on top.

(b) Add some flag to indicate a PMU driver supports config3, and have the core
code check that, but leave the existing fields as-is for now (and hopefully
follow up with further validation later for the existing fields).

(c) Go audit all the existing drivers, add flags to indicate support for
existing fields, and have the core code check that. Atop that, add support
for config3 with the same sort of flag check.

I suspect that'd end up needing to go check more than config1/config2 given
all the filter controls and so on that drivers aren't great at checking,
and that might being fairly invasive.

(d) Something else?

I think we want to get to a point where drivers indicate what they actually
support and the core code rejects stuff drivers don't support or recognise, but
I think it'd be a little unreasonable to delay this series on cleaning up all
the existing issues.

I'm tempted to say (b) as that shouldn't introduce any regressions, should be a
relatively simple change to this series, and doesn't precluse making the rest
stricter as a follow-up. I'm happy to take a look at that (and IIUC Rob is
too).

What's your preference?

Thanks,
Mark.

On Mon, Nov 28, 2022 at 11:15:21AM -0600, Rob Herring wrote:
> On Mon, Nov 28, 2022 at 10:36 AM Alexander Shishkin
>  wrote:
> >
> > Rob Herring  writes:
> >
> > > On Fri, Nov 18, 2022 at 10:49 AM Will Deacon  wrote:
> > >>
> > >> On Fri, Nov 04, 2022 at 10:55:07AM -0500, Rob Herring wrote:
> > >> > @@ -515,6 +516,8 @@ struct perf_event_attr {
> > >> >* truncated accordingly on 32 bit architectures.
> > >> >*/
> > >> >   __u64   sig_data;
> > >> > +
> > >> > + __u64   config3; /* extension of config2 */
> > >>
> > >> I need an ack from the perf core maintainers before I can take this.
> > >
> > > Peter, Arnaldo, Ingo,
> > >
> > > Can I get an ack on this please.
> >
> > It appears that PMUs that don't use config{1,2} and now config3 allow
> > them to be whatever without any validation, whereas in reality we should
> > probably -EINVAL in those cases. Should something be done about that?
> 
> Always the 3rd occurrence that gets to clean-up things. ;)
> 
> I think we'd have to add some capability flags for PMU drivers to set
> to enable configN usage and then use those to validate configN is 0.
> Wouldn't be too hard to do for config3 as we know there's exactly 1
> user, but for 1,2 there's about 80 PMU drivers to check.
> 
> Rob
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [REPOST][URGENT] kvmarm mailing list migration

2022-10-14 Thread Mark Rutland

On Thu, Oct 13, 2022 at 04:09:20PM +0100, Marc Zyngier wrote:
> [Reposting this, as it has been almost two weeks since the initial
>  announcement and we're still at sub-10% of the users having
>  subscribed to the new list]

FWIW, I didn't subscribe until just now because there weren't clear
instructions on the linked page. For everyone else's benefit, to subscribe you
need to send a mail to:

  kvmarm+subscr...@lists.linux.dev

... with any subject and body. You'll then get a confirmation email that you
need to reply to.

Thanks,
Mark.

>  
> Hi all,
> 
> As you probably all know, the kvmarm mailing has been hosted on
> Columbia's machines for as long as the project existed (over 13
> years). After all this time, the university has decided to retire the
> list infrastructure and asked us to find a new hosting.
> 
> A new mailing list has been created on lists.linux.dev[1], and I'm
> kindly asking everyone interested in following the KVM/arm64
> developments to start subscribing to it (and start posting your
> patches there). I hope that people will move over to it quickly enough
> that we can soon give Columbia the green light to turn their systems
> off.
> 
> Note that the new list will only get archived automatically once we
> fully switch over, but I'll make sure we fill any gap and not lose any
> message. In the meantime, please Cc both lists.
> 
> I would like to thank Columbia University for their long lasting
> support and willingness to help during this transition, as well as
> Konstantin (and the kernel.org crew) for quickly stepping up to the
> challenge and giving us a new home!
> 
> Thanks,
> 
>   M.
> 
> [1] https://subspace.kernel.org/lists.linux.dev.html
> 
> -- 
> Without deviation from the norm, progress is not possible.
> 
> -- 
> Without deviation from the norm, progress is not possible.
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v6 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace

2022-04-21 Thread Mark Rutland

On Tue, Apr 19, 2022 at 10:37:56AM -0700, Kalesh Singh wrote:
> On Wed, Apr 13, 2022 at 6:59 AM Mark Rutland  wrote:
> > I'm fine with the concept of splitting the unwind and logging steps; this is
> > akin to doing:
> >
> > stack_trace_save_tsk(...);
> > ...
> > stack_trace_print(...);
> >
> > ... and I'm fine with having a stack_trace_save_hyp(...) variant.
> >
> > However, I would like to ensure that we're reusing logic rather than
> > duplicating it wholesale.
> 
> Agreed. Although some reimplementation may be unavoidable, as we can't
> safely link against kernel code from the protected KVM hypervisor.

Sure; I just mean that we have one implementation, even if that gets recompiled
in separate objects for different contexts.

> Perhaps we can move some of the common logic to a shared header that
> can be included in both places (host, hyp), WDYT?

My rough thinking was that we'd build the same stacktrace.c file (reworked from
the current one) as stracktrace.o and stacktrace.nvhe.o, but moving things
around into headers is also an option. Either way will need some
experimentation.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v6 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace

2022-04-13 Thread Mark Rutland

Hi Kalesh,

Sorry for the radiosilence.

I see that in v7 you've dropped the stacktrace bits for now; I'm just
commenting here fot future reference.

On Thu, Mar 31, 2022 at 12:22:05PM -0700, Kalesh Singh wrote:
> Hi everyone,
> 
> There has been expressed interest in having hypervisor stack unwinding
> in production Android builds.
> 
> The current design targets NVHE_EL2_DEBUG enabled builds and is not
> suitable for production environments, since this config disables host
> stage-2 protection on hyp_panic() which breaks security guarantees.
> The benefit of this approach is that the stack unwinding can happen at
> EL1 and allows us to reuse most of the unwinding logic from the host
> kernel unwinder.
> 
> Proposal for how this can be done without disabling host stage-2 protection:
>   - The host allocates a "panic_info" page and shares it with the hypervisor.
>   - On hyp_panic(), the hypervisor can unwind and dump its stack
> addresses to the shared page.
>   - The host can read out this information and symbolize these addresses.
> 
> This would allow for getting hyp stack traces in production while
> preserving the security model. The downside being that the core
> unwinding logic would be duplicated at EL2.
> 
> Are there any objections to making this change?

I'm fine with the concept of splitting the unwind and logging steps; this is
akin to doing:

stack_trace_save_tsk(...);
...
stack_trace_print(...);

... and I'm fine with having a stack_trace_save_hyp(...) variant.

However, I would like to ensure that we're reusing logic rather than
duplicating it wholesale. There are some changes I would like to make to the
stacktrace code in the near future that might make that a bit easier, e.g.
reworking the stack transition checks to be table-driven, and factoring out the
way we handle return trampolines.

I'll Cc you on changes to the stacktrace code. There are some preparatory
cleanups I'd like to get out of the way first which I'll send shortly.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] KVM: arm64: Only open the interrupt window on exit due to an interrupt

2022-03-04 Thread Mark Rutland

On Fri, Mar 04, 2022 at 01:59:14PM +, Marc Zyngier wrote:
> Now that we properly account for interrupts taken whilst the guest
> was running, it becomes obvious that there is no need to open
> this accounting window if we didn't exit because of an interrupt.
> 
> This saves a number of system register accesses and other barriers
> if we exited for any other reason (such as a trap, for example).
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kvm/arm.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index fefd5774ab55..f49ebdd9c990 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -887,9 +887,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>* context synchronization event) is necessary to ensure that
>* pending interrupts are taken.
>*/
> - local_irq_enable();
> - isb();
> - local_irq_disable();
> + if (ARM_EXCEPTION_CODE(ret) == ARM_EXCEPTION_IRQ) {
> + local_irq_enable();
> + isb();
> + local_irq_disable();
> + }
>  
>   guest_timing_exit_irqoff();
>  
> -- 
> 2.34.1
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 4/9] KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack

2022-02-22 Thread Mark Rutland

On Tue, Feb 22, 2022 at 08:51:05AM -0800, Kalesh Singh wrote:
> Maps the stack pages in the flexible private VA range and allocates
> guard pages below the stack as unbacked VA space. The stack is aligned
> to twice its size to aid overflow detection (implemented in a subsequent
> patch in the series).
> 
> Signed-off-by: Kalesh Singh 
> ---
>  arch/arm64/kvm/hyp/nvhe/setup.c | 25 +
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 27af337f9fea..69df21320b09 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, 
> unsigned long size,
>   if (ret)
>   return ret;
>  
> - end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
> + /*
> +  * Private mappings are allocated upwards from __io_map_base
> +  * so allocate the guard page first then the stack.
> +  */
> + start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, 
> PAGE_SIZE);
> + if (IS_ERR_OR_NULL(start))
> + return PTR_ERR(start);

As on a prior patch, this usage of PTR_ERR() pattern is wrong when the
ptr is NULL.

> + /*
> +  * The stack is aligned to twice its size to facilitate overflow
> +  * detection.
> +  */
> + end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
>   start = end - PAGE_SIZE;
> - ret = pkvm_create_mappings(start, end, PAGE_HYP);
> - if (ret)
> - return ret;
> + start = (void 
> *)__pkvm_create_private_mapping((phys_addr_t)start,
> + PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);
> + if (IS_ERR_OR_NULL(start))
> + return PTR_ERR(start);

Likewise.

Thanks,
Mark.

> + end = start + PAGE_SIZE;
> +
> + /* Update stack_hyp_va to end of the stack's private VA range */
> + per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned 
> long) end;
>   }
>  
>   /*
> -- 
> 2.35.1.473.g83b2b277ed-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 1/9] KVM: arm64: Introduce hyp_alloc_private_va_range()

2022-02-22 Thread Mark Rutland

On Tue, Feb 22, 2022 at 08:51:02AM -0800, Kalesh Singh wrote:
> hyp_alloc_private_va_range() can be used to reserve private VA ranges
> in the nVHE hypervisor. Also update  __create_hyp_private_mapping()
> to allow specifying an alignment for the private VA mapping.
> 
> These will be used to implement stack guard pages for KVM nVHE hypervisor
> (nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
> 
> Signed-off-by: Kalesh Singh 
> ---
>  arch/arm64/include/asm/kvm_mmu.h |  4 +++
>  arch/arm64/kvm/mmu.c | 61 +---
>  2 files changed, 44 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..0b0c71302b92 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -153,6 +153,10 @@ static __always_inline unsigned long 
> __kern_hyp_va(unsigned long v)
>  int kvm_share_hyp(void *from, void *to);
>  void kvm_unshare_hyp(void *from, void *to);
>  int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> + size_t align, unsigned long *haddr,
> + enum kvm_pgtable_prot prot);
>  int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>  void __iomem **kaddr,
>  void __iomem **haddr);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..e5abcce44ad0 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum 
> kvm_pgtable_prot prot)
>   return 0;
>  }
>  
> -static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> - unsigned long *haddr,
> - enum kvm_pgtable_prot prot)
> +
> +/*
> + * Allocates a private VA range below io_map_base.
> + *
> + * @size:The size of the VA range to reserve.
> + * @align:   The required alignment for the allocation.
> + */
> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
>  {
>   unsigned long base;
> - int ret = 0;
> -
> - if (!kvm_host_owns_hyp_mappings()) {
> - base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> -  phys_addr, size, prot);
> - if (IS_ERR_OR_NULL((void *)base))
> - return PTR_ERR((void *)base);

There is a latent bug here; PTR_ERR() is not valid for NULL.

Today on arm64 that will happen to return 0, which may or may not be
what you want, but it's a bad pattern regardless.

That applies to the two copies below that this has been transformed
into.

Thanks,
Mark

> - *haddr = base;
> -
> - return 0;
> - }
>  
>   mutex_lock(&kvm_hyp_pgd_mutex);
>  
> @@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t 
> phys_addr, size_t size,
>*
>* The allocated size is always a multiple of PAGE_SIZE.
>*/
> - size = PAGE_ALIGN(size + offset_in_page(phys_addr));
> - base = io_map_base - size;
> + base = io_map_base - PAGE_ALIGN(size);
> + base = ALIGN_DOWN(base, align);
>  
>   /*
>* Verify that BIT(VA_BITS - 1) hasn't been flipped by
> @@ -493,20 +487,45 @@ static int __create_hyp_private_mapping(phys_addr_t 
> phys_addr, size_t size,
>* overflowed the idmap/IO address range.
>*/
>   if ((base ^ io_map_base) & BIT(VA_BITS - 1))
> - ret = -ENOMEM;
> + base = (unsigned long)ERR_PTR(-ENOMEM);
>   else
>   io_map_base = base;
>  
>   mutex_unlock(&kvm_hyp_pgd_mutex);
>  
> + return base;
> +}
> +
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> + size_t align, unsigned long *haddr,
> + enum kvm_pgtable_prot prot)
> +{
> + unsigned long addr;
> + int ret = 0;
> +
> + if (!kvm_host_owns_hyp_mappings()) {
> + addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> +  phys_addr, size, prot);
> + if (IS_ERR_OR_NULL((void *)addr))
> + return PTR_ERR((void *)addr);
> + *haddr = addr;
> +
> + return 0;
> + }
> +
> + size += offset_in_page(phys_addr);
> + addr = hyp_alloc_private_va_range(size, align);
> + if (IS_ERR_OR_NULL((void *)addr))
> + return PTR_ERR((void *)addr);
> +
>   if (ret)
>   goto out;
>  
> - ret = __create_hyp_mappings(base, size, phys_addr, prot);
> + ret = __create_hyp_mappings(addr, size, phys_addr, prot);
>   if (ret)
>   goto out;
>  
> - *haddr = base + offset_in_page(phys_addr

Re: [PATCH v2 5/9] arm64: asm: Introduce test_sp_overflow macro

2022-02-22 Thread Mark Rutland

On Tue, Feb 22, 2022 at 08:51:06AM -0800, Kalesh Singh wrote:
> From: Quentin Perret 
> 
> The asm entry code in the kernel uses a trick to check if VMAP'd stacks
> have overflowed by aligning them at THREAD_SHIFT * 2 granularity and
> checking the SP's THREAD_SHIFT bit.
> 
> Protected KVM will soon make use of a similar trick to detect stack
> overflows, so factor out the asm code in a re-usable macro.
> 
> Signed-off-by: Quentin Perret 
> [Kalesh - Resolve minor conflicts]
> Signed-off-by: Kalesh Singh 
> ---
>  arch/arm64/include/asm/assembler.h | 11 +++
>  arch/arm64/kernel/entry.S  |  7 +--
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h 
> b/arch/arm64/include/asm/assembler.h
> index e8bd0af0141c..ad40eb0eee83 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -850,4 +850,15 @@ alternative_endif
>  
>  #endif /* GNU_PROPERTY_AARCH64_FEATURE_1_DEFAULT */
>  
> +/*
> + * Test whether the SP has overflowed, without corrupting a GPR.
> + */
> +.macro test_sp_overflow shift, label
> + add sp, sp, x0  // sp' = sp + x0
> + sub x0, sp, x0  // x0' = sp' - x0 = (sp + x0) - 
> x0 = sp
> + tbnzx0, #\shift, \label
> + sub x0, sp, x0  // x0'' = sp' - x0' = (sp + x0) 
> - sp = x0
> + sub sp, sp, x0  // sp'' = sp' - x0 = (sp + x0) 
> - x0 = sp
> +.endm

I'm a little unhappy about factoring this out, since it's not really
self-contained and leaves sp and x0 partially-swapped when it branches
to the label. You can't really make that clear with comments on the
macro, and you need comments at each use-sire, so I'd ratehr we just
open-coded a copy of this.

> +
>  #endif   /* __ASM_ASSEMBLER_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 772ec2ecf488..ce99ee30c77e 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -53,15 +53,10 @@ alternative_else_nop_endif
>   sub sp, sp, #PT_REGS_SIZE
>  #ifdef CONFIG_VMAP_STACK
>   /*
> -  * Test whether the SP has overflowed, without corrupting a GPR.
>* Task and IRQ stacks are aligned so that SP & (1 << THREAD_SHIFT)
>* should always be zero.
>*/
> - add sp, sp, x0  // sp' = sp + x0
> - sub x0, sp, x0  // x0' = sp' - x0 = (sp + x0) - 
> x0 = sp
> - tbnzx0, #THREAD_SHIFT, 0f
> - sub x0, sp, x0  // x0'' = sp' - x0' = (sp + x0) 
> - sp = x0
> - sub sp, sp, x0  // sp'' = sp' - x0 = (sp + x0) 
> - x0 = sp
> + test_sp_overflow THREAD_SHIFT, 0f
>   b   el\el\ht\()_\regsize\()_\label
>  
>  0:

Further to my comment above, immediately after this we have:

/* Stash the original SP (minus PT_REGS_SIZE) in tpidr_el0. */
msr tpidr_el0, x0

/* Recover the original x0 value and stash it in tpidrro_el0 */
sub x0, sp, x0
msr tpidrro_el0, x0

... which is really surprising with the `test_sp_overflow` macro because
it's not clear that modifies x0 and sp in this way.

Thanks,
Mark.
... 

> -- 
> 2.35.1.473.g83b2b277ed-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: Possible nohz-full/RCU issue in arm64 KVM

2022-01-11 Thread Mark Rutland

On Tue, Jan 11, 2022 at 12:32:38PM +0100, Nicolas Saenz Julienne wrote:
> Hi Mark,
> 
> On Tue, 2022-01-04 at 16:39 +0000, Mark Rutland wrote:
> > On Fri, Dec 17, 2021 at 04:54:22PM +0100, Paolo Bonzini wrote:
> > > On 12/17/21 15:38, Mark Rutland wrote:
> > > > For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which 
> > > > calls
> > > > vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> > > > guest_exit_irq_off() and the call to vtime_account_guest_exit() is 
> > > > open-coded
> > > > elsewhere. Also, guest_enter_irq_off() conditionally calls
> > > > rcu_virt_note_context_switch(), but I can't immediately spot anything 
> > > > on the
> > > > exit side that corresponded with that, which looks suspicious.
> > > 
> > > rcu_note_context_switch() is a point-in-time notification; it's not 
> > > strictly
> > > necessary, but it may improve performance a bit by avoiding unnecessary 
> > > IPIs
> > > from the RCU subsystem.
> > > 
> > > There's no benefit from doing it when you're back from the guest, because 
> > > at
> > > that point the CPU is just running normal kernel code.
> > 
> > I see.
> > 
> > My main issue here was just that it's really difficult to see how the
> > entry/exit logic is balanced, and I reckon we can solve that by splitting
> > guest_{enter,exit}_irqoff() into helper functions to handle the vtime
> > accounting separately from the context tracking, so that arch code can do
> > something like:
> > 
> >   guest_timing_enter_irqoff();
> >   
> >   guest_eqs_enter_irqoff();
> >   < actually run vCPU here >
> >   guest_eqs_exit_irqoff();
> >   
> >   < handle pending IRQs here >
> >   
> >   guest_timing_exit_irqoff();
> > 
> > ... which I hope should work for RISC-V too.
> > 
> > I've had a go, and I've pushed out a WIP to:
> > 
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/kvm/rcu
> 
> Had a look at the patches and they seeem OK to me.
> 
> Thanks!

Cool.

FWIW I have an updated version at:

  
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=kvm/entry-rework

... which is largely the same approach, but the helpers got renamed, the
lockdep/tracing bits got fixed, and I've aligned mips, riscv, and x86 on the
same approach.

Once I get a free hour or so I intend to rebase that atop v5.16 and post that
out. I'll start a new thread with that, and rope in the relevant arch
maintainers (since e.g. I'm not sure what to do for ppc and s390).

Thanks,
Mark.

> 
> -- 
> Nicolás Sáenz
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: Possible nohz-full/RCU issue in arm64 KVM

2022-01-04 Thread Mark Rutland

On Fri, Dec 17, 2021 at 04:54:22PM +0100, Paolo Bonzini wrote:
> On 12/17/21 15:38, Mark Rutland wrote:
> > For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
> > vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
> > guest_exit_irq_off() and the call to vtime_account_guest_exit() is 
> > open-coded
> > elsewhere. Also, guest_enter_irq_off() conditionally calls
> > rcu_virt_note_context_switch(), but I can't immediately spot anything on the
> > exit side that corresponded with that, which looks suspicious.
> 
> rcu_note_context_switch() is a point-in-time notification; it's not strictly
> necessary, but it may improve performance a bit by avoiding unnecessary IPIs
> from the RCU subsystem.
> 
> There's no benefit from doing it when you're back from the guest, because at
> that point the CPU is just running normal kernel code.

I see.

My main issue here was just that it's really difficult to see how the
entry/exit logic is balanced, and I reckon we can solve that by splitting
guest_{enter,exit}_irqoff() into helper functions to handle the vtime
accounting separately from the context tracking, so that arch code can do
something like:

  guest_timing_enter_irqoff();
  
  guest_eqs_enter_irqoff();
  < actually run vCPU here >
  guest_eqs_exit_irqoff();
  
  < handle pending IRQs here >
  
  guest_timing_exit_irqoff();

... which I hope should work for RISC-V too.

I've had a go, and I've pushed out a WIP to:

  
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/kvm/rcu

I also see we'll need to add some lockdep/irq-tracing management to arm64, and
it probably makes sense to fold that into common helpers, so I'll have a play
with that tomorrow.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: Possible nohz-full/RCU issue in arm64 KVM

2022-01-04 Thread Mark Rutland

On Mon, Dec 20, 2021 at 05:10:14PM +0100, Frederic Weisbecker wrote:
> On Fri, Dec 17, 2021 at 01:21:39PM +0000, Mark Rutland wrote:
> > On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > > Hi All,
> > 
> > Hi,
> > 
> > > arm64's guest entry code does the following:
> > > 
> > > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > > {
> > >   [...]
> > > 
> > >   guest_enter_irqoff();
> > > 
> > >   ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > > 
> > >   [...]
> > > 
> > >   local_irq_enable();
> > > 
> > >   /*
> > >* We do local_irq_enable() before calling guest_exit() so
> > >* that if a timer interrupt hits while running the guest we
> > >* account that tick as being spent in the guest.  We enable
> > >* preemption after calling guest_exit() so that if we get
> > >* preempted we make sure ticks after that is not counted as
> > >* guest time.
> > >*/
> > >   guest_exit();
> > >   [...]
> > > }
> > > 
> > > 
> > > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > > state (EQS). Any interrupt happening between local_irq_enable() and
> > > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt 
> > > handlers
> > > do the right thing if trggered in this context, but el1's won't. Is it
> > > possible to hit an el1 handler (for example __el1_irq()) there?
> > 
> > I think you're right that the EL1 handlers can trigger here and won't exit 
> > the
> > EQS.
> > 
> > I'm not immediately sure what we *should* do here. What does x86 do for an 
> > IRQ
> > taken from a guest mode? I couldn't spot any handling of that case, but I'm 
> > not
> > familiar enough with the x86 exception model to know if I'm looking in the
> > right place.
> 
> This is one of the purposes of rcu_irq_enter(). el1 handlers don't call 
> irq_enter()?

Due to lockep/tracing/etc ordering, we don't use irq_enter() directly and
instead call rcu_irq_enter() and irq_enter_rcu() separately. Critically we only
call rcu_irq_enter() for IRQs taken from the idle thread, as this was
previously thought to be the only place where we could take an IRQ from an EL1
EQS.

See __el1_irq(), __enter_from_kernel_mode(), and __exit_to_kernel_mode() in
arch/arm64/kernel/entry-common.c. The latter two are largely analogous to the
common irqentry_enter9) and irqentry_exit() helpers in kernel/entry/common.c.

We need to either rework the KVM code or that entry code. I'll dig into this a
bit more...

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: Possible nohz-full/RCU issue in arm64 KVM

2021-12-17 Thread Mark Rutland

On Fri, Dec 17, 2021 at 03:15:29PM +0100, Nicolas Saenz Julienne wrote:
> On Fri, 2021-12-17 at 13:21 +0000, Mark Rutland wrote:
> > On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> > > Hi All,
> > 
> > Hi,
> > 
> > > arm64's guest entry code does the following:
> > > 
> > > int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> > > {
> > >   [...]
> > > 
> > >   guest_enter_irqoff();
> > > 
> > >   ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> > > 
> > >   [...]
> > > 
> > >   local_irq_enable();
> > > 
> > >   /*
> > >* We do local_irq_enable() before calling guest_exit() so
> > >* that if a timer interrupt hits while running the guest we
> > >* account that tick as being spent in the guest.  We enable
> > >* preemption after calling guest_exit() so that if we get
> > >* preempted we make sure ticks after that is not counted as
> > >* guest time.
> > >*/
> > >   guest_exit();
> > >   [...]
> > > }
> > > 
> > > 
> > > On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> > > state (EQS). Any interrupt happening between local_irq_enable() and
> > > guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt 
> > > handlers
> > > do the right thing if trggered in this context, but el1's won't. Is it
> > > possible to hit an el1 handler (for example __el1_irq()) there?
> > 
> > I think you're right that the EL1 handlers can trigger here and won't exit 
> > the
> > EQS.
> > 
> > I'm not immediately sure what we *should* do here. What does x86 do for an 
> > IRQ
> > taken from a guest mode? I couldn't spot any handling of that case, but I'm 
> > not
> > familiar enough with the x86 exception model to know if I'm looking in the
> > right place.
> 
> Well x86 has its own private KVM guest context exit function
> 'kvm_guest_exit_irqoff()', which allows it to do the right thing (simplifying
> things):
> 
>   local_irq_disable();
>   kvm_guest_enter_irqoff() // Inform CT, enter EQS
>   __vmx_kvm_run()
>   kvm_guest_exit_irqoff() // Inform CT, exit EQS, task still marked with 
> PF_VCPU
> 
>   /*
>* Consume any pending interrupts, including the possible source of
>* VM-Exit on SVM and any ticks that occur between VM-Exit and now.
>* An instruction is required after local_irq_enable() to fully unblock
>* interrupts on processors that implement an interrupt shadow, the
>* stat.exits increment will do nicely.
>*/
>   local_irq_enable();
>   ++vcpu->stat.exits;
>   local_irq_disable();
> 
>   /*
>* Wait until after servicing IRQs to account guest time so that any
>* ticks that occurred while running the guest are properly accounted
>* to the guest.  Waiting until IRQs are enabled degrades the accuracy
>* of accounting via context tracking, but the loss of accuracy is
>* acceptable for all known use cases.
>*/
>   vtime_account_guest_exit(); // current->flags &= ~PF_VCPU

I see.

The abstraction's really messy here on x86, and the enter/exit sides aren't
clearly balanced.

For example kvm_guest_enter_irqoff() calls guest_enter_irq_off() which calls
vtime_account_guest_enter(), but kvm_guest_exit_irqoff() doesn't call
guest_exit_irq_off() and the call to vtime_account_guest_exit() is open-coded
elsewhere. Also, guest_enter_irq_off() conditionally calls
rcu_virt_note_context_switch(), but I can't immediately spot anything on the
exit side that corresponded with that, which looks suspicious.

> So I guess we should convert to x86's scheme, and maybe create another generic
> guest_{enter,exit}() flavor for virtualization schemes that run with 
> interrupts
> disabled.

I think we might need to do some preparatory refactoring here so that this is
all clearly balanced even on x86, e.g. splitting the enter/exit steps into
multiple phases.

> > Note that the EL0 handlers *cannot* trigger for an exception taken from a
> > guest. We use separate vectors while running a guest (for both VHE and nVHE
> > modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
> > can ony take IRQ from EL1 *after* that returns.
> > 
> > We *might* need to audit the KVM vector handlers to make sure they're not
> > dependent on RCU protection (I assume they're not, but it's p

Re: Possible nohz-full/RCU issue in arm64 KVM

2021-12-17 Thread Mark Rutland

On Fri, Dec 17, 2021 at 12:51:57PM +0100, Nicolas Saenz Julienne wrote:
> Hi All,

Hi,

> arm64's guest entry code does the following:
> 
> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> {
>   [...]
> 
>   guest_enter_irqoff();
> 
>   ret = kvm_call_hyp_ret(__kvm_vcpu_run, vcpu);
> 
>   [...]
> 
>   local_irq_enable();
> 
>   /*
>* We do local_irq_enable() before calling guest_exit() so
>* that if a timer interrupt hits while running the guest we
>* account that tick as being spent in the guest.  We enable
>* preemption after calling guest_exit() so that if we get
>* preempted we make sure ticks after that is not counted as
>* guest time.
>*/
>   guest_exit();
>   [...]
> }
> 
> 
> On a nohz-full CPU, guest_{enter,exit}() delimit an RCU extended quiescent
> state (EQS). Any interrupt happening between local_irq_enable() and
> guest_exit() should disable that EQS. Now, AFAICT all el0 interrupt handlers
> do the right thing if trggered in this context, but el1's won't. Is it
> possible to hit an el1 handler (for example __el1_irq()) there?

I think you're right that the EL1 handlers can trigger here and won't exit the
EQS.

I'm not immediately sure what we *should* do here. What does x86 do for an IRQ
taken from a guest mode? I couldn't spot any handling of that case, but I'm not
familiar enough with the x86 exception model to know if I'm looking in the
right place.

Note that the EL0 handlers *cannot* trigger for an exception taken from a
guest. We use separate vectors while running a guest (for both VHE and nVHE
modes), and from the main kernel's PoV we return from kvm_call_hyp_ret(). We
can ony take IRQ from EL1 *after* that returns.

We *might* need to audit the KVM vector handlers to make sure they're not
dependent on RCU protection (I assume they're not, but it's possible something
has leaked into the VHE code).

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 1/6] KVM: arm64: Correctly treat writes to OSLSR_EL1 as undefined

2021-12-15 Thread Mark Rutland

On Wed, Dec 15, 2021 at 01:09:28PM +, Oliver Upton wrote:
> Hi Mark,
> 
> On Wed, Dec 15, 2021 at 11:39:58AM +0000, Mark Rutland wrote:
> > Hi Oliver,
> > 
> > On Tue, Dec 14, 2021 at 05:28:07PM +, Oliver Upton wrote:
> > > Any valid implementation of the architecture should generate an
> > > undefined exception for writes to a read-only register, such as
> > > OSLSR_EL1. Nonetheless, the KVM handler actually implements write-ignore
> > > behavior.
> > > 
> > > Align the trap handler for OSLSR_EL1 with hardware behavior. If such a
> > > write ever traps to EL2, inject an undef into the guest and print a
> > > warning.
> > 
> > I think this can still be read amibguously, since we don't explicitly state
> > that writes to OSLSR_EL1 should never trap (and the implications of being
> > UNDEFINED are subtle). How about:
> > 
> > | Writes to OSLSR_EL1 are UNDEFINED and should never trap from EL1 to EL2, 
> > but
> > | the KVM trap handler for OSLSR_EL1 handlees writes via ignore_write(). 
> > This

Whoops, with s/handlees/handles/

> > | is confusing to readers of the code, but shouldn't have any functional 
> > impact.
> > |
> > | For clarity, use write_to_read_only() rather than ignore_write(). If a 
> > trap
> > | is unexpectedly taken to EL2 in violation of the architecture, this will
> > | WARN_ONCE() and inject an undef into the guest.
> 
> Agreed, I like your suggested changelog better :-)

Cool!

Mark.

> 
> > With that:
> > 
> > Reviewed-by: Mark Rutland 
> 
> Thanks!
> 
> --
> Best,
> Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 3/6] KVM: arm64: Allow guest to set the OSLK bit

2021-12-15 Thread Mark Rutland

On Tue, Dec 14, 2021 at 05:28:09PM +, Oliver Upton wrote:
> Allow writes to OSLAR and forward the OSLK bit to OSLSR. Do nothing with
> the value for now.
> 
> Reviewed-by: Reiji Watanabe 
> Signed-off-by: Oliver Upton 
> ---
>  arch/arm64/include/asm/sysreg.h |  9 
>  arch/arm64/kvm/sys_regs.c   | 39 ++---
>  2 files changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 16b3f1a1d468..46f800bda045 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -129,7 +129,16 @@
>  #define SYS_DBGWCRn_EL1(n)   sys_reg(2, 0, 0, n, 7)
>  #define SYS_MDRAR_EL1sys_reg(2, 0, 1, 0, 0)
>  #define SYS_OSLAR_EL1sys_reg(2, 0, 1, 0, 4)
> +
> +#define SYS_OSLAR_OSLK   BIT(0)
> +
>  #define SYS_OSLSR_EL1sys_reg(2, 0, 1, 1, 4)
> +
> +#define SYS_OSLSR_OSLK   BIT(1)
> +
> +#define SYS_OSLSR_OSLM_MASK  (BIT(3) | BIT(0))
> +#define SYS_OSLSR_OSLM   BIT(3)

Since `OSLM` is the field as a whole, I think this should have another level of
hierarchy, e.g.

#define SYS_OSLSR_OSLM_MASK (BIT(3) | BIT(0))
#define SYS_OSLSR_OSLM_NI   0
#define SYS_OSLSR_OSLM_OSLK BIT(3)

[...]

> +static bool trap_oslar_el1(struct kvm_vcpu *vcpu,
> +struct sys_reg_params *p,
> +const struct sys_reg_desc *r)
> +{
> + u64 oslsr;
> +
> + if (!p->is_write)
> + return read_from_write_only(vcpu, p, r);
> +
> + /* Forward the OSLK bit to OSLSR */
> + oslsr = __vcpu_sys_reg(vcpu, OSLSR_EL1) & ~SYS_OSLSR_OSLK;
> + if (p->regval & SYS_OSLAR_OSLK)
> + oslsr |= SYS_OSLSR_OSLK;
> +
> + __vcpu_sys_reg(vcpu, OSLSR_EL1) = oslsr;
> + return true;
> +}

Does changing this affect existing userspace? Previosuly it could read
OSLAR_EL1 as 0, whereas now that should be rejected.

That might be fine, and if so, it would be good to call that out in the commit
message.

[...]

> @@ -309,9 +331,14 @@ static int set_oslsr_el1(struct kvm_vcpu *vcpu, const 
> struct sys_reg_desc *rd,
>   if (err)
>   return err;
>  
> - if (val != rd->val)
> + /*
> +  * The only modifiable bit is the OSLK bit. Refuse the write if
> +  * userspace attempts to change any other bit in the register.
> +  */
> + if ((val & ~SYS_OSLSR_OSLK) != SYS_OSLSR_OSLM)
>   return -EINVAL;

How about:

if ((val ^ rd->val) & ~SYS_OSLSR_OSLK)
return -EINVAL;

... so that we don't need to hard-code the expected value here, and can more
easily change it in future?

[...]

> @@ -1463,8 +1486,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>   DBG_BCR_BVR_WCR_WVR_EL1(15),
>  
>   { SYS_DESC(SYS_MDRAR_EL1), trap_raz_wi },
> - { SYS_DESC(SYS_OSLAR_EL1), trap_raz_wi },
> - { SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1, reset_val, OSLSR_EL1, 
> 0x0008,
> + { SYS_DESC(SYS_OSLAR_EL1), trap_oslar_el1 },
> + { SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1, reset_val, OSLSR_EL1, 
> SYS_OSLSR_OSLM,
>   .set_user = set_oslsr_el1, },
>   { SYS_DESC(SYS_OSDLR_EL1), trap_raz_wi },
>   { SYS_DESC(SYS_DBGPRCR_EL1), trap_raz_wi },
> @@ -1937,7 +1960,7 @@ static const struct sys_reg_desc cp14_regs[] = {
>  
>   DBGBXVR(0),
>   /* DBGOSLAR */
> - { Op1( 0), CRn( 1), CRm( 0), Op2( 4), trap_raz_wi },
> + { Op1( 0), CRn( 1), CRm( 0), Op2( 4), trap_oslar_el1 },

As above, I have a slight concern that this could adversely affect existing
userspace, but I can also believe that's fine.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 2/6] KVM: arm64: Stash OSLSR_EL1 in the cpu context

2021-12-15 Thread Mark Rutland

On Tue, Dec 14, 2021 at 05:28:08PM +, Oliver Upton wrote:
> An upcoming change to KVM will context switch the OS Lock status between
> guest/host. Add OSLSR_EL1 to the cpu context and handle guest reads
> using the stored value.

The "context switch" wording is stale here, since later patches emulate the
behaviour of the OS lock (and explain why a context switch isn't appropriate).

That first sentence needs to change to something like:

| An upcoming change to KVM will emulate the OS Lock from the PoV of the guest.

> Wire up a custom handler for writes from userspace and prevent any of
> the invariant bits from changing.
> 
> Reviewed-by: Reiji Watanabe 
> Signed-off-by: Oliver Upton 
> ---
>  arch/arm64/include/asm/kvm_host.h |  2 ++
>  arch/arm64/kvm/sys_regs.c | 31 ---
>  2 files changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 2a5f7f38006f..53fc8a6eaf1c 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -172,8 +172,10 @@ enum vcpu_sysreg {
>   PAR_EL1,/* Physical Address Register */
>   MDSCR_EL1,  /* Monitor Debug System Control Register */
>   MDCCINT_EL1,/* Monitor Debug Comms Channel Interrupt Enable Reg */
> + OSLSR_EL1,  /* OS Lock Status Register */
>   DISR_EL1,   /* Deferred Interrupt Status Register */
>  
> +

I don't think this whitespace needed to change.

>   /* Performance Monitors Registers */
>   PMCR_EL0,   /* Control Register */
>   PMSELR_EL0, /* Event Counter Selection Register */
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 11b4212c2036..7bf350b3d9cd 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -291,12 +291,28 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
>  struct sys_reg_params *p,
>  const struct sys_reg_desc *r)
>  {
> - if (p->is_write) {
> + if (p->is_write)
>   return write_to_read_only(vcpu, p, r);
> - } else {
> - p->regval = (1 << 3);
> - return true;
> - }
> +
> + p->regval = __vcpu_sys_reg(vcpu, r->reg);
> + return true;
> +}
> +
> +static int set_oslsr_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc 
> *rd,
> +  const struct kvm_one_reg *reg, void __user *uaddr)
> +{
> + u64 id = sys_reg_to_index(rd);
> + u64 val;
> + int err;
> +
> + err = reg_from_user(&val, uaddr, id);
> + if (err)
> + return err;
> +
> + if (val != rd->val)
> + return -EINVAL;

Bit 1 isn't invariant; why can't the user set that? If there's a rationale,
that needs to be stated in the commit message.

> +
> + return 0;
>  }
>  
>  static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu,
> @@ -1448,7 +1464,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  
>   { SYS_DESC(SYS_MDRAR_EL1), trap_raz_wi },
>   { SYS_DESC(SYS_OSLAR_EL1), trap_raz_wi },
> - { SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1 },
> + { SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1, reset_val, OSLSR_EL1, 
> 0x0008,

Could we add mnemonics for this to , e.g.

#define OSLSR_EL1_OSLM_LOCK_NI  0
#define OSLSR_EL1_OSLM_LOCK_IMPLEMENTED BIT(3)

... and use that here for clarity?

Thanks,
Mark.

> + .set_user = set_oslsr_el1, },
>   { SYS_DESC(SYS_OSDLR_EL1), trap_raz_wi },
>   { SYS_DESC(SYS_DBGPRCR_EL1), trap_raz_wi },
>   { SYS_DESC(SYS_DBGCLAIMSET_EL1), trap_raz_wi },
> @@ -1923,7 +1940,7 @@ static const struct sys_reg_desc cp14_regs[] = {
>   { Op1( 0), CRn( 1), CRm( 0), Op2( 4), trap_raz_wi },
>   DBGBXVR(1),
>   /* DBGOSLSR */
> - { Op1( 0), CRn( 1), CRm( 1), Op2( 4), trap_oslsr_el1 },
> + { Op1( 0), CRn( 1), CRm( 1), Op2( 4), trap_oslsr_el1, NULL, OSLSR_EL1 },
>   DBGBXVR(2),
>   DBGBXVR(3),
>   /* DBGOSDLR */
> -- 
> 2.34.1.173.g76aa8bc2d0-goog
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 1/6] KVM: arm64: Correctly treat writes to OSLSR_EL1 as undefined

2021-12-15 Thread Mark Rutland

Hi Oliver,

On Tue, Dec 14, 2021 at 05:28:07PM +, Oliver Upton wrote:
> Any valid implementation of the architecture should generate an
> undefined exception for writes to a read-only register, such as
> OSLSR_EL1. Nonetheless, the KVM handler actually implements write-ignore
> behavior.
> 
> Align the trap handler for OSLSR_EL1 with hardware behavior. If such a
> write ever traps to EL2, inject an undef into the guest and print a
> warning.

I think this can still be read amibguously, since we don't explicitly state
that writes to OSLSR_EL1 should never trap (and the implications of being
UNDEFINED are subtle). How about:

| Writes to OSLSR_EL1 are UNDEFINED and should never trap from EL1 to EL2, but
| the KVM trap handler for OSLSR_EL1 handlees writes via ignore_write(). This
| is confusing to readers of the code, but shouldn't have any functional impact.
|
| For clarity, use write_to_read_only() rather than ignore_write(). If a trap
| is unexpectedly taken to EL2 in violation of the architecture, this will
| WARN_ONCE() and inject an undef into the guest.

With that:

Reviewed-by: Mark Rutland 

Mark.

> Reviewed-by: Reiji Watanabe 
> Signed-off-by: Oliver Upton 
> ---
>  arch/arm64/kvm/sys_regs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index e3ec1a44f94d..11b4212c2036 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -292,7 +292,7 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
>  const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> - return ignore_write(vcpu, p);
> + return write_to_read_only(vcpu, p, r);
>   } else {
>   p->regval = (1 << 3);
>   return true;
> -- 
> 2.34.1.173.g76aa8bc2d0-goog
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 1/2] ACPI/AEST: Initial AEST driver

2021-11-24 Thread Mark Rutland

Hi,

I haven't looked at this in great detail, but I spotted a few issues
from an initial scan.

On Wed, Nov 24, 2021 at 12:07:07PM -0500, Tyler Baicar wrote:
> Add support for parsing the ARM Error Source Table and basic handling of
> errors reported through both memory mapped and system register interfaces.
> 
> Assume system register interfaces are only registered with private
> peripheral interrupts (PPIs); otherwise there is no guarantee the
> core handling the error is the core which took the error and has the
> syndrome info in its system registers.

Can we actually assume that? What does the specification mandate?

> Add logging for all detected errors and trigger a kernel panic if there is
> any uncorrected error present.

Has this been tested on any hardware or software platform?

[...]

> +#define ERRDEVARCH_REV_SHIFT 0x16

IIUC This should be 16, not 0x16 (i.e. 22).

> +#define ERRDEVARCH_REV_MASK  0xf
> +
> +#define RAS_REV_v1_1 0x1
> +
> +struct ras_ext_regs {
> + u64 err_fr;
> + u64 err_ctlr;
> + u64 err_status;
> + u64 err_addr;
> + u64 err_misc0;
> + u64 err_misc1;
> + u64 err_misc2;
> + u64 err_misc3;
> +};

These last four might be better an an array.

[...]

> +static bool ras_extn_v1p1(void)
> +{
> + unsigned long fld, reg = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
> +
> + fld = cpuid_feature_extract_unsigned_field(reg, ID_AA64PFR0_RAS_SHIFT);
> +
> + return fld >= ID_AA64PFR0_RAS_V1P1;
> +}

I suspect it'd be better to pass this value around directly as
`version`, rather than dropping this into a `misc23_present` temporary
variable, as that would be a little clearer, and future-proof if/when
more registers get added.

[...]

> +void arch_arm_ras_report_error(u64 implemented, bool clear_misc)
> +{
> + struct ras_ext_regs regs = {0};
> + unsigned int i, cpu_num;
> + bool misc23_present;
> + bool fatal = false;
> + u64 num_records;
> +
> + if (!this_cpu_has_cap(ARM64_HAS_RAS_EXTN))
> + return;
> +
> + cpu_num = get_cpu();

Why get_cpu() here? Do you just need smp_processor_id()?

The commit message explained that this would be PE-local (e.g. in a PPI
handler), and we've already checked this_cpu_has_cap() which assumes
we're not preemptible.

So I don't see why we should use get_cpu() here -- any time it would
have a difference implies something has already gone wrong.

> + num_records = read_sysreg_s(SYS_ERRIDR_EL1) & ERRIDR_NUM_MASK;
> +
> + for (i = 0; i < num_records; i++) {
> + if (!(implemented & BIT(i)))
> + continue;
> +
> + write_sysreg_s(i, SYS_ERRSELR_EL1);
> + isb();
> + regs.err_status = read_sysreg_s(SYS_ERXSTATUS_EL1);
> +
> + if (!(regs.err_status & ERR_STATUS_V))
> + continue;
> +
> + pr_err("error from processor 0x%x\n", cpu_num);

Why in hex? We normally print 'cpu%d' or 'CPU%d', since this is a
logical ID anyway.

> +
> + if (regs.err_status & ERR_STATUS_AV)
> + regs.err_addr = read_sysreg_s(SYS_ERXADDR_EL1);
> +
> + misc23_present = ras_extn_v1p1();

As above, I reckon it's better to have this as 'version' or
'ras_version', and have the checks below be:

if (version >= ID_AA64PFR0_RAS_V1P1) {
// poke SYS_ERXMISC2_EL1
// poke SYS_ERXMISC3_EL1
}

> +
> + if (regs.err_status & ERR_STATUS_MV) {
> + regs.err_misc0 = read_sysreg_s(SYS_ERXMISC0_EL1);
> + regs.err_misc1 = read_sysreg_s(SYS_ERXMISC1_EL1);
> +
> + if (misc23_present) {
> + regs.err_misc2 = 
> read_sysreg_s(SYS_ERXMISC2_EL1);
> + regs.err_misc3 = 
> read_sysreg_s(SYS_ERXMISC3_EL1);
> + }
> + }
> +
> + arch_arm_ras_print_error(®s, i, misc23_present);
> +
> + /*
> +  * In the future, we will treat UER conditions as potentially
> +  * recoverable.
> +  */
> + if (regs.err_status & ERR_STATUS_UE)
> + fatal = true;
> +
> + regs.err_status = 
> arch_arm_ras_get_status_clear_value(regs.err_status);
> + write_sysreg_s(regs.err_status, SYS_ERXSTATUS_EL1);
> +
> + if (clear_misc) {
> + write_sysreg_s(0x0, SYS_ERXMISC0_EL1);
> + write_sysreg_s(0x0, SYS_ERXMISC1_EL1);
> +
> + if (misc23_present) {
> + write_sysreg_s(0x0, SYS_ERXMISC2_EL1);
> + write_sysreg_s(0x0, SYS_ERXMISC3_EL1);
> + }
> + }

Any reason not to clear when we read, above? e.g.

#define READ_CLEAR_MISC(nr, clear)  \
({  \

Re: [PATCH 1/5] arm64: Prevent kexec and hibernation if is_protected_kvm_enabled()

2021-09-23 Thread Mark Rutland

On Thu, Sep 23, 2021 at 12:22:52PM +0100, Will Deacon wrote:
> When pKVM is enabled, the hypervisor code at EL2 and its data structures
> are inaccessible to the host kernel and cannot be torn down or replaced
> as this would defeat the integrity properies which pKVM aims to provide.
> Furthermore, the ABI between the host and EL2 is flexible and private to
> whatever the current implementation of KVM requires and so booting a new
> kernel with an old EL2 component is very likely to end in disaster.
> 
> In preparation for uninstalling the hyp stub calls which are relied upon
> to reset EL2, disable kexec and hibernation in the host when protected
> KVM is enabled.
> 
> Cc: Marc Zyngier 
> Cc: Quentin Perret 
> Signed-off-by: Will Deacon 
> ---
>  arch/arm64/kernel/smp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 6f6ff072acbd..44369b99a57e 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -1128,5 +1128,6 @@ bool cpus_are_stuck_in_kernel(void)
>  {
>   bool smp_spin_tables = (num_possible_cpus() > 1 && !have_cpu_die());
>  
> - return !!cpus_stuck_in_kernel || smp_spin_tables;
> + return !!cpus_stuck_in_kernel || smp_spin_tables ||
> + is_protected_kvm_enabled();
>  }

IIUC you'll also need to do something to prevent kdump, since even with
CPUs stuck in the kernel that will try to do a kexec on the crashed CPU
and __cpu_soft_restart() won't be able to return to EL2.

You could fiddle with the BUG_ON() in machine_kexec() to die in this
case too.

Thanks,
Mark.

> -- 
> 2.33.0.464.g1972c5931b-goog
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: Any way to disable KVM VHE extension?

2021-07-15 Thread Mark Rutland

On Thu, Jul 15, 2021 at 11:00:42AM +0100, Robin Murphy wrote:
> On 2021-07-15 10:44, Qu Wenruo wrote:
> > 
> > 
> > On 2021/7/15 下午5:28, Robin Murphy wrote:
> > > On 2021-07-15 09:55, Qu Wenruo wrote:
> > > > Hi,
> > > > 
> > > > Recently I'm playing around the Nvidia Xavier AGX board, which
> > > > has VHE extension support.
> > > > 
> > > > In theory, considering the CPU and memory, it should be pretty
> > > > powerful compared to boards like RPI CM4.
> > > > 
> > > > But to my surprise, KVM runs pretty poor on Xavier.
> > > > 
> > > > Just booting the edk2 firmware could take over 10s, and 20s to
> > > > fully boot the kernel.
> > > > Even my VM on RPI CM4 has way faster boot time, even just
> > > > running on PCIE2.0 x1 lane NVME, and just 4 2.1Ghz A72 core.
> > > > 
> > > > This is definitely out of my expectation, I double checked to be
> > > > sure that it's running in KVM mode.
> > > > 
> > > > But further digging shows that, since Xavier AGX CPU supports
> > > > VHE, kvm is running in VHE mode other than HYP mode on CM4.
> > > > 
> > > > Is there anyway to manually disable VHE mode to test the more
> > > > common HYP mode on Xavier?
> > > 
> > > According to kernel-parameters.txt, "kvm-arm.mode=nvhe" (or its
> > > low-level equivalent "id_aa64mmfr1.vh=0") on the command line should
> > > do that.
> > 
> > Thanks for this one, I stupidly only searched modinfo of kvm, and didn't
> > even bother to search arch/arm64/kvm...
> > 
> > > 
> > > However I'd imagine the discrepancy is likely to be something more
> > > fundamental to the wildly different microarchitectures. There's
> > > certainly no harm in giving non-VHE a go for comparison, but I
> > > wouldn't be surprised if it turns out even slower...
> > 
> > You're totally right, with nvhe mode, it's still the same slow speed.
> > 
> > BTW, what did you mean by the "wildly different microarch"?
> > Is ARMv8.2 arch that different from ARMv8 of RPI4?
> 
> I don't mean Armv8.x architectural features, I mean the actual
> implementation of NVIDIA's Carmel core is very, very different from
> Cortex-A72 or indeed our newer v8.2 Cortex-A designs.
> 
> > And any extra methods I could try to explore the reason of the slowness?
> 
> I guess the first check would be whether you're trapping and exiting the VM
> significantly more. I believe there are stats somewhere, but I don't know
> exactly where, sorry - I know very little about actually *using* KVM :)
> 
> If it's not that, then it might just be that EDK2 is doing a lot of cache
> maintenance or system register modification or some other operation that
> happens to be slower on Carmel compared to Cortex-A72.

It would also be worthchecking tha the CPUs are running at the speed you
expect, in e.g. case the lack of a DVFS driver means they're running
slow, and this just happens to be more noticeable in a VM.

You can estimate that by using `perf stat` on the host on a busy loop,
and looking what the cpu-cycles count implies.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH V7 01/18] perf/core: Use static_call to optimize perf_guest_info_callbacks

2021-07-02 Thread Mark Rutland

On Fri, Jul 02, 2021 at 09:00:22AM -0700, Joe Perches wrote:
> On Fri, 2021-07-02 at 13:22 +0200, Peter Zijlstra wrote:
> > On Tue, Jun 22, 2021 at 05:42:49PM +0800, Zhu Lingshan wrote:
> > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> []
> > > @@ -90,6 +90,27 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, 
> > > *x86_pmu.pebs_aliases);
> > >   */
> > >  DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
> > >  
> > > 
> > > +DEFINE_STATIC_CALL_RET0(x86_guest_state, *(perf_guest_cbs->state));
> > > +DEFINE_STATIC_CALL_RET0(x86_guest_get_ip, *(perf_guest_cbs->get_ip));
> > > +DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, 
> > > *(perf_guest_cbs->handle_intel_pt_intr));
> > > +
> > > +void arch_perf_update_guest_cbs(void)
> > > +{
> > > + static_call_update(x86_guest_state, (void *)&__static_call_return0);
> > > + static_call_update(x86_guest_get_ip, (void *)&__static_call_return0);
> > > + static_call_update(x86_guest_handle_intel_pt_intr, (void 
> > > *)&__static_call_return0);
> > > +
> > > + if (perf_guest_cbs && perf_guest_cbs->state)
> > > + static_call_update(x86_guest_state, perf_guest_cbs->state);
> > > +
> > > + if (perf_guest_cbs && perf_guest_cbs->get_ip)
> > > + static_call_update(x86_guest_get_ip, perf_guest_cbs->get_ip);
> > > +
> > > + if (perf_guest_cbs && perf_guest_cbs->handle_intel_pt_intr)
> > > + static_call_update(x86_guest_handle_intel_pt_intr,
> > > +perf_guest_cbs->handle_intel_pt_intr);
> > > +}
> > 
> > Coding style wants { } on that last if().
> 
> That's just your personal preference.
> 
> The coding-style document doesn't require that.
> 
> It just says single statement.  It's not the number of
> vertical lines or characters required for the statement.
> 
> --
> 
> Do not unnecessarily use braces where a single statement will do.
> 
> .. code-block:: c
> 
>   if (condition)
>   action();
> 
> and
> 
> .. code-block:: none
> 
>   if (condition)
>   do_this();
>   else
>   do_that();
> 
> This does not apply if only one branch of a conditional statement is a single
> statement; in the latter case use braces in both branches:

Immediately after this, we say:

| Also, use braces when a loop contains more than a single simple statement:
|
| .. code-block:: c
| 
| while (condition) {
| if (test)
| do_something();
| }
| 

... and while that says "a loop", the principle is obviously supposed to
apply to conditionals too; structurally they're no different. We should
just fix the documentation to say "a loop or conditional", or something
to that effect.

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM

2021-06-04 Thread Mark Rutland

On Thu, Jun 03, 2021 at 07:33:47PM +0100, Will Deacon wrote:
> Introduce a new VM capability, KVM_CAP_ARM_PROTECTED_VM, which can be
> used to isolate guest memory from the host. For now, the EL2 portion is
> missing, so this documents and exposes the user ABI for the host.
> 
> Signed-off-by: Will Deacon 
> ---
>  Documentation/virt/kvm/api.rst|  69 
>  arch/arm64/include/asm/kvm_host.h |  10 +++
>  arch/arm64/include/uapi/asm/kvm.h |   9 +++
>  arch/arm64/kvm/arm.c  |  18 +++---
>  arch/arm64/kvm/mmu.c  |   3 +
>  arch/arm64/kvm/pkvm.c | 104 ++
>  include/uapi/linux/kvm.h  |   1 +
>  7 files changed, 205 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7fcb2fd38f42..dfbaf905c435 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6362,6 +6362,75 @@ default.
>  
>  See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
>  
> +7.26 KVM_CAP_ARM_PROTECTED_VM
> +-
> +
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: flags is a single KVM_CAP_ARM_PROTECTED_VM_FLAGS_* value
> +
> +The presence of this capability indicates that KVM supports running in a
> +configuration where the host Linux kernel does not have access to guest 
> memory.
> +On such a system, a small hypervisor layer at EL2 can configure the stage-2
> +page tables for both the CPU and any DMA-capable devices to protect guest
> +memory pages so that they are inaccessible to the host unless access is 
> granted
> +explicitly by the guest.
> +
> +The 'flags' parameter is defined as follows:
> +
> +7.26.1 KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE
> +
> +
> +:Capability: 'flag' parameter to KVM_CAP_ARM_PROTECTED_VM
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains memory slot ID to hold guest firmware
> +:Returns: 0 on success; negative error code on failure
> +
> +Enabling this capability causes all memory slots of the specified VM to be
> +unmapped from the host system and put into a state where they are no longer
> +configurable. The memory slot corresponding to the ID passed in args[0] is
> +populated with the guest firmware image provided by the host firmware.

As on the prior patch, I don't quite follow the rationale for the guest
fw coming from the host fw, and it seems to go against the usual design
for VM contents, so I fear it could be a problem in future (even if not
in android's specific model for usage).

> +The first vCPU to enter the guest is defined to be the primary vCPU. All 
> other
> +vCPUs belonging to the VM are secondary vCPUs.
> +
> +All vCPUs belonging to a VM with this capability enabled are initialised to a
> +pre-determined reset state

What is that "pre-determined reset state"? e.g. is that just what KVM
does today, or is there something more specific (e.g. that might change
with the "Boot protocol version" below)?

> irrespective of any prior configuration according to
> +the KVM_ARM_VCPU_INIT ioctl, with the following exceptions for the primary
> +vCPU:
> +
> + === ===
> + Register(s) Reset value
> + === ===
> + X0-X14: Preserved (see KVM_SET_ONE_REG)
> + X15:Boot protocol version (0)

What's the "Boot protocol" in this context? Is that just referring to
this handover state, or is that something more involved?

> + X16-X30:Reserved (0)
> + PC: IPA base of firmware memory slot
> + SP: IPA end of firmware memory slot
> + === ===
> +
> +Secondary vCPUs belonging to a VM with this capability enabled will return
> +-EPERM in response to a KVM_RUN ioctl() if the vCPU was not initialised with
> +the KVM_ARM_VCPU_POWER_OFF feature.

I assume this means that protected VMs always get a trusted PSCI
implementation? It might be worth mentioning so (and worth consdiering
if that should always have the SMCCC bits too).

> +
> +There is no support for AArch32 at any exception level.

Is this only going to run on CPUs without AArch32 EL0? ... or does this
mean behaviour will be erratic if someone tries to run AArch32 EL0?

> +
> +It is an error to enable this capability on a VM after issuing a KVM_RUN
> +ioctl() on one of its vCPUs.
> +
> +7.26.2 KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO
> +--
> +
> +:Capability: 'flag' parameter to KVM_CAP_ARM_PROTECTED_VM
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains pointer to 'struct kvm_protected_vm_info'
> +:Returns: 0 on success; negative error code on failure
> +
> +Populates the 'struct kvm_protected_vm_info' pointed to by args[0] with
> +information about the protected environment for the VM.
> +
>  8. Other capabilities.
>  ==
>  
> diff --git a/arch/arm

Re: [PATCH 3/4] KVM: arm64: Parse reserved-memory node for pkvm guest firmware region

2021-06-04 Thread Mark Rutland

On Thu, Jun 03, 2021 at 07:33:46PM +0100, Will Deacon wrote:
> Add support for a "linux,pkvm-guest-firmware-memory" reserved memory
> region, which can be used to identify a firmware image for protected
> VMs.

The idea that the guest's FW comes from the host's FW strikes me as
unusual; what's the rationale for this coming from the host FW? IIUC
other confidential compute VM environments allow you to load up whatever
virtual FW you want, but this is measured such that the virtual FW used
can be attested.

Thanks,
Mark.

> 
> Signed-off-by: Will Deacon 
> ---
>  arch/arm64/kvm/Makefile |  2 +-
>  arch/arm64/kvm/pkvm.c   | 52 +
>  2 files changed, 53 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kvm/pkvm.c
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 589921392cb1..61e054411831 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
> $(KVM)/eventfd.o \
>$(KVM)/vfio.o $(KVM)/irqchip.o \
>arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
>inject_fault.o va_layout.o handle_exit.o \
> -  guest.o debug.o reset.o sys_regs.o \
> +  guest.o debug.o pkvm.o reset.o sys_regs.o \
>vgic-sys-reg-v3.o fpsimd.o pmu.o \
>arch_timer.o trng.o\
>vgic/vgic.o vgic/vgic-init.o \
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> new file mode 100644
> index ..7af5d03a3941
> --- /dev/null
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * KVM host (EL1) interface to Protected KVM (pkvm) code at EL2.
> + *
> + * Copyright (C) 2021 Google LLC
> + * Author: Will Deacon 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static struct reserved_mem *pkvm_firmware_mem;
> +
> +static int __init pkvm_firmware_rmem_err(struct reserved_mem *rmem,
> +  const char *reason)
> +{
> + phys_addr_t end = rmem->base + rmem->size;
> +
> + kvm_err("Ignoring pkvm guest firmware memory reservation [%pa - %pa]: 
> %s\n",
> + &rmem->base, &end, reason);
> + return -EINVAL;
> +}
> +
> +static int __init pkvm_firmware_rmem_init(struct reserved_mem *rmem)
> +{
> + unsigned long node = rmem->fdt_node;
> +
> + if (kvm_get_mode() != KVM_MODE_PROTECTED)
> + return pkvm_firmware_rmem_err(rmem, "protected mode not 
> enabled");
> +
> + if (pkvm_firmware_mem)
> + return pkvm_firmware_rmem_err(rmem, "duplicate reservation");
> +
> + if (!of_get_flat_dt_prop(node, "no-map", NULL))
> + return pkvm_firmware_rmem_err(rmem, "missing \"no-map\" 
> property");
> +
> + if (of_get_flat_dt_prop(node, "reusable", NULL))
> + return pkvm_firmware_rmem_err(rmem, "\"reusable\" property 
> unsupported");
> +
> + if (!PAGE_ALIGNED(rmem->base))
> + return pkvm_firmware_rmem_err(rmem, "base is not page-aligned");
> +
> + if (!PAGE_ALIGNED(rmem->size))
> + return pkvm_firmware_rmem_err(rmem, "size is not page-aligned");
> +
> + pkvm_firmware_mem = rmem;
> + return 0;
> +}
> +RESERVEDMEM_OF_DECLARE(pkvm_firmware, "linux,pkvm-guest-firmware-memory",
> +pkvm_firmware_rmem_init);
> -- 
> 2.32.0.rc0.204.g9fa02ecfa5-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 2/4] KVM: arm64: Extend comment in has_vhe()

2021-06-04 Thread Mark Rutland

On Thu, Jun 03, 2021 at 07:33:45PM +0100, Will Deacon wrote:
> has_vhe() expands to a compile-time constant when evaluated from the VHE
> or nVHE code, alternatively checking a static key when called from
> elsewhere in the kernel. On face value, this looks like a case of
> premature optimization, but in fact this allows symbol references on
> VHE-specific code paths to be dropped from the nVHE object.
> 
> Expand the comment in has_vhe() to make this clearer, hopefully
> discouraging anybody from simplifying the code.
> 
> Cc: David Brazdil 
> Signed-off-by: Will Deacon 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/virt.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index 7379f35ae2c6..3218ca17f819 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -111,6 +111,9 @@ static __always_inline bool has_vhe(void)
>   /*
>* Code only run in VHE/NVHE hyp context can assume VHE is present or
>* absent. Otherwise fall back to caps.
> +  * This allows the compiler to discard VHE-specific code from the
> +  * nVHE object, reducing the number of external symbol references
> +  * needed to link.
>*/
>   if (is_vhe_hyp_code())
>   return true;
> -- 
> 2.32.0.rc0.204.g9fa02ecfa5-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE

2021-06-04 Thread Mark Rutland

On Thu, Jun 03, 2021 at 07:33:44PM +0100, Will Deacon wrote:
> Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
> only returns KVM_MODE_PROTECTED on systems where the feature is available.

IIUC, since the introduction of the idreg-override code, and the
mutate_to_vhe stuff, passing 'kvm-arm.mode=protected' should make the
kernel stick to EL1, right? So this should only affect M1 (or other HW
with a similar impediment).

One minor comment below; otherwise:

Acked-by: Mark Rutland 

> 
> Cc: David Brazdil 
> Signed-off-by: Will Deacon 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  1 -
>  arch/arm64/kernel/cpufeature.c  | 10 +-
>  arch/arm64/kvm/arm.c|  6 +-
>  3 files changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index cb89dbdedc46..e85dbdf1ee8e 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2300,7 +2300,6 @@
>  
>   protected: nVHE-based mode with support for guests whose
>  state is kept private from the host.
> -Not valid if the kernel is running in EL2.
>  
>   Defaults to VHE/nVHE based on hardware support.
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index efed2830d141..dc1f2e747828 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1773,15 +1773,7 @@ static void cpu_enable_mte(struct 
> arm64_cpu_capabilities const *cap)
>  #ifdef CONFIG_KVM
>  static bool is_kvm_protected_mode(const struct arm64_cpu_capabilities 
> *entry, int __unused)
>  {
> - if (kvm_get_mode() != KVM_MODE_PROTECTED)
> - return false;
> -
> - if (is_kernel_in_hyp_mode()) {
> - pr_warn("Protected KVM not available with VHE\n");
> - return false;
> - }
> -
> - return true;
> + return kvm_get_mode() == KVM_MODE_PROTECTED;
>  }
>  #endif /* CONFIG_KVM */
>  
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 1cb39c0803a4..8d5e23198dfd 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2121,7 +2121,11 @@ static int __init early_kvm_mode_cfg(char *arg)
>   return -EINVAL;
>  
>   if (strcmp(arg, "protected") == 0) {
> - kvm_mode = KVM_MODE_PROTECTED;
> + if (!is_kernel_in_hyp_mode())
> + kvm_mode = KVM_MODE_PROTECTED;
> + else
> + pr_warn_once("Protected KVM not available with VHE\n");

... assuming this is only for M1, it might be better to say:

Protected KVM not available on this hardware

... since that doesn't suggest that other VHE-capable HW is also not
PKVM-capable.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2] KVM: arm64: Prevent mixed-width VM creation

2021-05-25 Thread Mark Rutland

On Mon, May 24, 2021 at 06:07:52PM +0100, Marc Zyngier wrote:
> It looks like we have tolerated creating mixed-width VMs since...
> forever. However, that was never the intention, and we'd rather
> not have to support that pointless complexity.
> 
> Forbid such a setup by making sure all the vcpus have the same
> register width.
> 
> Reported-by: Steven Price 
> Signed-off-by: Marc Zyngier 
> Cc: sta...@vger.kernel.org

Looks good to me!

Acked-by: Mark Rutland 

Mark.

> ---
> 
> Notes:
> v2: Fix missing check against ARM64_HAS_32BIT_EL1 (Mark)
> 
>  arch/arm64/include/asm/kvm_emulate.h |  5 +
>  arch/arm64/kvm/reset.c   | 28 
>  2 files changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index f612c090f2e4..01b9857757f2 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -463,4 +463,9 @@ static __always_inline void kvm_incr_pc(struct kvm_vcpu 
> *vcpu)
>   vcpu->arch.flags |= KVM_ARM64_INCREMENT_PC;
>  }
>  
> +static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
> +{
> + return test_bit(feature, vcpu->arch.features);
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 956cdc240148..d37ebee085cf 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -166,6 +166,25 @@ static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
>   return 0;
>  }
>  
> +static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_vcpu *tmp;
> + bool is32bit;
> + int i;
> +
> + is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
> + if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is32bit)
> + return false;
> +
> + /* Check that the vcpus are either all 32bit or all 64bit */
> + kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> + if (vcpu_has_feature(tmp, KVM_ARM_VCPU_EL1_32BIT) != is32bit)
> + return false;
> + }
> +
> + return true;
> +}
> +
>  /**
>   * kvm_reset_vcpu - sets core registers and sys_regs to reset value
>   * @vcpu: The VCPU pointer
> @@ -217,13 +236,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   }
>   }
>  
> + if (!vcpu_allowed_register_width(vcpu)) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
>   switch (vcpu->arch.target) {
>   default:
>   if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> - if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1)) {
> - ret = -EINVAL;
> - goto out;
> - }
>   pstate = VCPU_RESET_PSTATE_SVC;
>   } else {
>   pstate = VCPU_RESET_PSTATE_EL1;
> -- 
> 2.30.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] KVM: arm64: Prevent mixed-width VM creation

2021-05-20 Thread Mark Rutland

On Thu, May 20, 2021 at 01:58:55PM +0100, Marc Zyngier wrote:
> On Thu, 20 May 2021 13:44:34 +0100,
> Mark Rutland  wrote:
> > 
> > On Thu, May 20, 2021 at 01:22:53PM +0100, Marc Zyngier wrote:
> > > It looks like we have tolerated creating mixed-width VMs since...
> > > forever. However, that was never the intention, and we'd rather
> > > not have to support that pointless complexity.
> > > 
> > > Forbid such a setup by making sure all the vcpus have the same
> > > register width.
> > > 
> > > Reported-by: Steven Price 
> > > Signed-off-by: Marc Zyngier 
> > > Cc: sta...@vger.kernel.org
> > > ---
> > >  arch/arm64/kvm/reset.c | 28 
> > >  1 file changed, 24 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > index 956cdc240148..1cf308be6ef3 100644
> > > --- a/arch/arm64/kvm/reset.c
> > > +++ b/arch/arm64/kvm/reset.c
> > > @@ -166,6 +166,25 @@ static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu 
> > > *vcpu)
> > >   return 0;
> > >  }
> > >  
> > > +static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
> > > +{
> > > + struct kvm_vcpu *tmp;
> > > + int i;
> > > +
> > > + /* Check that the vcpus are either all 32bit or all 64bit */
> > > + kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> > > + bool w;
> > > +
> > > + w  = test_bit(KVM_ARM_VCPU_EL1_32BIT, tmp->arch.features);
> > > + w ^= test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features);
> > > +
> > > + if (w)
> > > + return false;
> > > + }
> > 
> > I think this is wrong for a single-cpu VM. In that case, the loop will
> > have a single iteration, and tmp == vcpu, so w must be 0 regardless of
> > the value of arch.features.
> 
> I don't immediately see what is wrong with a single-cpu VM. 'w' will
> be zero indeed, and we'll return that this is allowed. After all, each
> VM starts by being a single-CPU VM.

Sorry; I should have been clearer. I had assumed that this was trying to
rely on a difference across vcpus implicitly providing an equivalent of
the removed check for the KVM_ARM_VCPU_EL1_32BIT cap. I guess from the
below that was not the case. :)

Thanks,
Mark.

> But of course...
> 
> > IIUC that doesn't prevent KVM_ARM_VCPU_EL1_32BIT being set when we don't
> > have the ARM64_HAS_32BIT_EL1 cap, unless that's checked elsewhere?
> 
> ... I mistakenly removed the check against ARM64_HAS_32BIT_EL1...
> 
> > 
> > How about something like:
> > 
> > | static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
> > | {
> > |   bool is_32bit = vcpu_features_32bit(vcpu);
> > |   struct kvm_vcpu *tmp;
> > |   int i;
> > | 
> > |   if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is_32bit)
> > |   return false;
> > | 
> > |   kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> > |   if (is_32bit != vcpu_features_32bit(tmp))
> > |   return false;
> > |   }
> > | 
> > |   return true;
> > | }
> > 
> > ... with a helper in  like:
> > 
> > | static bool vcpu_features_32bit(struct kvm_vcpu *vcpu)
> > | {
> > |   return test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features);
> > | }
> > 
> > ... or
> > 
> > | static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
> > | {
> > |   return test_bit(feature, vcpu->arch.features);
> > | }
> > 
> > ... so that we can avoid the line splitting required by the length of
> > the test_bit() expression?
> 
> Yup, looks OK to me (with a preference for the latter).
> 
> Thanks,
> 
>   M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] KVM: arm64: Prevent mixed-width VM creation

2021-05-20 Thread Mark Rutland

On Thu, May 20, 2021 at 01:22:53PM +0100, Marc Zyngier wrote:
> It looks like we have tolerated creating mixed-width VMs since...
> forever. However, that was never the intention, and we'd rather
> not have to support that pointless complexity.
> 
> Forbid such a setup by making sure all the vcpus have the same
> register width.
> 
> Reported-by: Steven Price 
> Signed-off-by: Marc Zyngier 
> Cc: sta...@vger.kernel.org
> ---
>  arch/arm64/kvm/reset.c | 28 
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 956cdc240148..1cf308be6ef3 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -166,6 +166,25 @@ static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
>   return 0;
>  }
>  
> +static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_vcpu *tmp;
> + int i;
> +
> + /* Check that the vcpus are either all 32bit or all 64bit */
> + kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> + bool w;
> +
> + w  = test_bit(KVM_ARM_VCPU_EL1_32BIT, tmp->arch.features);
> + w ^= test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features);
> +
> + if (w)
> + return false;
> + }

I think this is wrong for a single-cpu VM. In that case, the loop will
have a single iteration, and tmp == vcpu, so w must be 0 regardless of
the value of arch.features.

IIUC that doesn't prevent KVM_ARM_VCPU_EL1_32BIT being set when we don't
have the ARM64_HAS_32BIT_EL1 cap, unless that's checked elsewhere?

How about something like:

| static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
| {
|   bool is_32bit = vcpu_features_32bit(vcpu);
|   struct kvm_vcpu *tmp;
|   int i;
| 
|   if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is_32bit)
|   return false;
| 
|   kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
|   if (is_32bit != vcpu_features_32bit(tmp))
|   return false;
|   }
| 
|   return true;
| }

... with a helper in  like:

| static bool vcpu_features_32bit(struct kvm_vcpu *vcpu)
| {
|   return test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features);
| }

... or

| static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
| {
|   return test_bit(feature, vcpu->arch.features);
| }

... so that we can avoid the line splitting required by the length of
the test_bit() expression?

Thanks,
Mark.

> +
> + return true;
> +}
> +
>  /**
>   * kvm_reset_vcpu - sets core registers and sys_regs to reset value
>   * @vcpu: The VCPU pointer
> @@ -217,13 +236,14 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   }
>   }
>  
> + if (!vcpu_allowed_register_width(vcpu)) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
>   switch (vcpu->arch.target) {
>   default:
>   if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> - if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1)) {
> - ret = -EINVAL;
> - goto out;
> - }
>   pstate = VCPU_RESET_PSTATE_SVC;
>   } else {
>   pstate = VCPU_RESET_PSTATE_EL1;
> -- 
> 2.30.2
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 3/9] KVM: arm64: vgic: Be tolerant to the lack of maintenance interrupt

2021-05-11 Thread Mark Rutland

On Mon, May 10, 2021 at 06:44:49PM +0100, Marc Zyngier wrote:
> On Mon, 10 May 2021 17:19:07 +0100,
> Mark Rutland  wrote:
> > 
> > On Mon, May 10, 2021 at 02:48:18PM +0100, Marc Zyngier wrote:
> > > As it turns out, not all the interrupt controllers are able to
> > > expose a vGIC maintenance interrupt as a distrete signal.
> > > And to be fair, it doesn't really matter as all we require is
> > > for *something* to kick us out of guest mode out way or another.
> > > 
> > > On systems that do not expose a maintenance interrupt as such,
> > > there are two outcomes:
> > > 
> > > - either the virtual CPUIF does generate an interrupt, and
> > >   by the time we are back to the host the interrupt will have long
> > >   been disabled (as we set ICH_HCR_EL2.EN to 0 on exit). In this case,
> > >   interrupt latency is as good as it gets.
> > > 
> > > - or some other event (physical timer) will take us out of the guest
> > >   anyway, and the only drawback is a bad interrupt latency.
> > 
> > IIRC we won't have a a guaranteed schedular tick for NO_HZ_FULL, so in
> > that case we'll either need to set a period software maintenance
> > interrupt, or reject this combination at runtime (either when trying to
> > isolate the dynticks CPUs, or when trying to create a VM).
> 
> That's a good point.
> 
> On sensible systems, the maintenance interrupt is a standard GIC PPI
> that requires enabling, and that is all that KVM requires (the
> maintenance interrupt is only used as an exit mechanism and will be
> disabled before reaching the handler).
> 
> On the M1, owing to the lack of a per-CPU interrupt controller, there
> is nothing to enable. The virtual CPU interface will fire at will and
> take us out of the guest in a timely manner.

Ah, so the M1 does have a maintenance interrupt, but you can't silence
it at the irqchip level.

> So maybe instead of relaxing the requirement for a maintenance
> interrupt, we should only bypass the checks if the root interrupt
> controller advertises that it is safe to do so, making it a
> M1-specific hack.

That certainly sounds safer than permitting running without any
maintenance interrupt at all.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 3/9] KVM: arm64: vgic: Be tolerant to the lack of maintenance interrupt

2021-05-10 Thread Mark Rutland

On Mon, May 10, 2021 at 02:48:18PM +0100, Marc Zyngier wrote:
> As it turns out, not all the interrupt controllers are able to
> expose a vGIC maintenance interrupt as a distrete signal.
> And to be fair, it doesn't really matter as all we require is
> for *something* to kick us out of guest mode out way or another.
> 
> On systems that do not expose a maintenance interrupt as such,
> there are two outcomes:
> 
> - either the virtual CPUIF does generate an interrupt, and
>   by the time we are back to the host the interrupt will have long
>   been disabled (as we set ICH_HCR_EL2.EN to 0 on exit). In this case,
>   interrupt latency is as good as it gets.
> 
> - or some other event (physical timer) will take us out of the guest
>   anyway, and the only drawback is a bad interrupt latency.

IIRC we won't have a a guaranteed schedular tick for NO_HZ_FULL, so in
that case we'll either need to set a period software maintenance
interrupt, or reject this combination at runtime (either when trying to
isolate the dynticks CPUs, or when trying to create a VM).

Otherwise, it's very likely that something will take us out of the guest
from time to time, but we won't have a strict guarantee (e.g. if all
guest memory is pinned).

Thanks,
Mark.

> 
> So let's be tolerant to the lack of maintenance interrupt, and just let
> the user know that their mileage may vary...
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/vgic/vgic-init.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index 2fdb65529594..9fd23f32aa54 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -524,11 +524,6 @@ int kvm_vgic_hyp_init(void)
>   if (!gic_kvm_info)
>   return -ENODEV;
>  
> - if (!gic_kvm_info->maint_irq) {
> - kvm_err("No vgic maintenance irq\n");
> - return -ENXIO;
> - }
> -
>   switch (gic_kvm_info->type) {
>   case GIC_V2:
>   ret = vgic_v2_probe(gic_kvm_info);
> @@ -552,6 +547,11 @@ int kvm_vgic_hyp_init(void)
>   if (ret)
>   return ret;
>  
> + if (!kvm_vgic_global_state.maint_irq) {
> + kvm_err("No maintenance interrupt available, fingers 
> crossed...\n");
> + return 0;
> + }
> +
>   ret = request_percpu_irq(kvm_vgic_global_state.maint_irq,
>vgic_maintenance_handler,
>"vgic", kvm_get_running_vcpus());
> -- 
> 2.29.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 3/4] KVM: arm64: Rename SCTLR_ELx_FLAGS to SCTLR_EL2_FLAGS

2021-03-11 Thread Mark Rutland

On Thu, Mar 11, 2021 at 11:35:29AM +, Mark Rutland wrote:
> Acked-by: Mark Rutland 

Upon reflection, maybe I should spell my own name correctly:

Acked-by: Mark Rutland 

... lest you decide to add a Mocked-by tag instead ;)

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 3/4] KVM: arm64: Rename SCTLR_ELx_FLAGS to SCTLR_EL2_FLAGS

2021-03-11 Thread Mark Rutland

On Wed, Mar 10, 2021 at 06:20:22PM +, Will Deacon wrote:
> On Wed, Mar 10, 2021 at 05:49:17PM +, Marc Zyngier wrote:
> > On Wed, 10 Mar 2021 16:15:47 +,
> > Will Deacon  wrote:
> > > On Wed, Mar 10, 2021 at 04:05:17PM +, Marc Zyngier wrote:
> > > > On Wed, 10 Mar 2021 15:46:26 +,
> > > > Will Deacon  wrote:
> > > > > On Wed, Mar 10, 2021 at 03:26:55PM +, Marc Zyngier wrote:
> > > > > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S 
> > > > > > b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > > > > > index 4eb584ae13d9..7423f4d961a4 100644
> > > > > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > > > > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > > > > > @@ -122,7 +122,7 @@ alternative_else_nop_endif
> > > > > >  * as well as the EE bit on BE. Drop the A flag since the 
> > > > > > compiler
> > > > > >  * is allowed to generate unaligned accesses.
> > > > > >  */
> > > > > > -   mov_q   x0, (SCTLR_EL2_RES1 | (SCTLR_ELx_FLAGS & ~SCTLR_ELx_A))
> > > > > > +   mov_q   x0, (SCTLR_EL2_RES1 | (SCTLR_EL2_FLAGS & ~SCTLR_ELx_A))
> > > > > 
> > > > > Can we just drop SCTLR_ELx_A from SCTLR_EL2_FLAGS instead of clearing 
> > > > > it
> > > > > here?
> > > > 
> > > > Absolutely. That'd actually be an improvement.
> > > 
> > > In fact, maybe just define INIT_SCTLR_EL2_MMU_ON to mirror what we do for
> > > EL1 (i.e. including the RES1 bits) and then use that here?
> > 
> > Like this?
> > 
> > diff --git a/arch/arm64/include/asm/sysreg.h 
> > b/arch/arm64/include/asm/sysreg.h
> > index dfd4edbfe360..593b9bf91bbd 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -579,9 +579,6 @@
> >  #define SCTLR_ELx_A(BIT(1))
> >  #define SCTLR_ELx_M(BIT(0))
> >  
> > -#define SCTLR_ELx_FLAGS(SCTLR_ELx_M  | SCTLR_ELx_A | SCTLR_ELx_C | \
> > -SCTLR_ELx_SA | SCTLR_ELx_I | SCTLR_ELx_IESB)
> > -
> >  /* SCTLR_EL2 specific flags. */
> >  #define SCTLR_EL2_RES1 ((BIT(4))  | (BIT(5))  | (BIT(11)) | (BIT(16)) 
> > | \
> >  (BIT(18)) | (BIT(22)) | (BIT(23)) | (BIT(28)) | \
> > @@ -593,6 +590,10 @@
> >  #define ENDIAN_SET_EL2 0
> >  #endif
> >  
> > +#define INIT_SCTLR_EL2_MMU_ON  
> > \
> > +   (SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |  \
> > +SCTLR_ELx_IESB | ENDIAN_SET_EL2 | SCTLR_EL2_RES1)
> > +
> >  #define INIT_SCTLR_EL2_MMU_OFF \
> > (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> >  
> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S 
> > b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > index 4eb584ae13d9..2e16b2098bbd 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> > @@ -117,13 +117,7 @@ alternative_else_nop_endif
> > tlbialle2
> > dsb sy
> >  
> > -   /*
> > -    * Preserve all the RES1 bits while setting the default flags,
> > -* as well as the EE bit on BE. Drop the A flag since the compiler
> > -* is allowed to generate unaligned accesses.
> > -*/
> > -   mov_q   x0, (SCTLR_EL2_RES1 | (SCTLR_ELx_FLAGS & ~SCTLR_ELx_A))
> > -CPU_BE(orr x0, x0, #SCTLR_ELx_EE)
> > +   mov_q   x0, INIT_SCTLR_EL2_MMU_ON
> >  alternative_if ARM64_HAS_ADDRESS_AUTH
> > mov_q   x1, (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | \
> >  SCTLR_ELx_ENDA | SCTLR_ELx_ENDB)
> 
> Beautiful!
> 
> With that, you can have my ack on the whole series:
> 
> Acked-by: Will Deacon 

FWIW, likewise:

Acked-by: Mark Rutland 

This is really nice!

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-08 Thread Mark Rutland

On Mon, Mar 08, 2021 at 01:30:53PM +, Will Deacon wrote:
> On Sun, Mar 07, 2021 at 05:24:21PM +0530, Anshuman Khandual wrote:
> > On 3/5/21 8:21 PM, Mark Rutland wrote:
> > > On Fri, Mar 05, 2021 at 08:06:09PM +0530, Anshuman Khandual wrote:

> > >> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT  0x0
> > >> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE 0x1
> > >> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN  0x2
> > >> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX  0x7
> > >
> > > The TGRAN2 fields doesn't quite follow the usual ID scheme rules, so how
> > > do we deteremine the max value? Does the ARM ARM say anything in
> > > particular about them, like we do for some of the PMU ID fields?
> > 
> > Did not find anything in ARM ARM, regarding what scheme TGRAN2 fields
> > actually follow. I had arrived at more restrictive 0x7 value, like the
> > usual signed fields as the TGRAN4 fields definitely do not follow the
> > unsigned ID scheme. Would restricting max value to 0x3 (i.e LPA2) be a
> > better option instead ?
> 
> I don't think it helps much, as TGRAN64_2 doesn't even define 0x3.
> 
> So I think this patch is probably the best we can do, but the Arm ARM could
> really do with describing the scheme here.

I agree, and I've filed a ticket internally to try to get this cleaned
up.

I suspect that the answer is that these are basically unsigned, with
0x2-0xf indicating presence, but I can't guarantee that.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-05 Thread Mark Rutland

On Fri, Mar 05, 2021 at 08:06:09PM +0530, Anshuman Khandual wrote:
> From: James Morse 
> 
> As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
> might contain a range of values to describe supported translation granules
> (4K and 16K pages sizes in particular) instead of just enabled or disabled
> values. This changes __enable_mmu() function to handle complete acceptable
> range of values (depending on whether the field is signed or unsigned) now
> represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
> also fix similar situations in EFI stub and KVM as well.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: Suzuki K Poulose 
> Cc: Ard Biesheuvel 
> Cc: Mark Rutland 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: kvmarm@lists.cs.columbia.edu
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: James Morse 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/include/asm/sysreg.h   | 20 ++--
>  arch/arm64/kernel/head.S  |  6 --
>  arch/arm64/kvm/reset.c| 23 ---
>  drivers/firmware/efi/libstub/arm64-stub.c |  2 +-
>  4 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index dfd4edb..d4a5fca9 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -796,6 +796,11 @@
>  #define ID_AA64MMFR0_PARANGE_48  0x5
>  #define ID_AA64MMFR0_PARANGE_52  0x6
>  
> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT   0x0
> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE  0x1
> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN   0x2
> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX   0x7

The TGRAN2 fields doesn't quite follow the usual ID scheme rules, so how
do we deteremine the max value? Does the ARM ARM say anything in
particular about them, like we do for some of the PMU ID fields?

Otherwise, this patch looks correct to me.

Thanks,
Mark.

> +
>  #ifdef CONFIG_ARM64_PA_BITS_52
>  #define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_52
>  #else
> @@ -961,14 +966,17 @@
>  #define ID_PFR1_PROGMOD_SHIFT0
>  
>  #if defined(CONFIG_ARM64_4K_PAGES)
> -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT
> -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN4_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN4_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7
>  #elif defined(CONFIG_ARM64_16K_PAGES)
> -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT
> -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN16_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN16_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0xF
>  #elif defined(CONFIG_ARM64_64K_PAGES)
> -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT
> -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN64_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN64_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7
>  #endif
>  
>  #define MVFR2_FPMISC_SHIFT   4
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 66b0e0b..8b469f1 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow)
>  SYM_FUNC_START(__enable_mmu)
>   mrs x2, ID_AA64MMFR0_EL1
>   ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
> - cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
> - b.ne__no_granule_support
> + cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
> + b.lt__no_granule_support
> + cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
> + b.gt__no_granule_support
>   update_early_cpu_boot_status 0, x2, x3
>   adrpx2, idmap_pg_dir
>   phys_to_ttbr x1, x1
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 47f3f03..fe72bfb 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -286,7 +286,7 @@ u32 get_kvm_ipa_limit(void)
>  
>  int kvm_set_ipa_limit(void)
>  {
> - unsigned int parange, tgran_2;
> + unsigned int parange, tgran_2_shift, tgran_2;
>   u64 mmfr0;
>  
>   mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> @@ -300,27 +300,28 @@

Re: [PATCH 1/6] kvm: arm64: Prevent use of invalid PSCI v0.1 function IDs

2020-12-08 Thread Mark Rutland

On Tue, Dec 08, 2020 at 03:56:39PM +, Marc Zyngier wrote:
> On 2020-12-08 14:24, David Brazdil wrote:
> > PSCI driver exposes a struct containing the PSCI v0.1 function IDs
> > configured in the DT. However, the struct does not convey the
> > information whether these were set from DT or contain the default value
> > zero. This could be a problem for PSCI proxy in KVM protected mode.
> > 
> > Extend config passed to KVM with a bit mask with individual bits set
> > depending on whether the corresponding function pointer in psci_ops is
> > set, eg. set bit for PSCI_CPU_SUSPEND if psci_ops.cpu_suspend != NULL.
> > 
> > Previously config was split into multiple global variables. Put
> > everything into a single struct for convenience.
> > 
> > Reported-by: Mark Rutland 
> > Signed-off-by: David Brazdil 
> > ---
> >  arch/arm64/include/asm/kvm_host.h| 20 +++
> >  arch/arm64/kvm/arm.c | 14 +---
> >  arch/arm64/kvm/hyp/nvhe/psci-relay.c | 53 +---
> >  3 files changed, 70 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> > b/arch/arm64/include/asm/kvm_host.h
> > index 11beda85ee7e..828d50d40dc2 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -17,6 +17,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -240,6 +241,25 @@ struct kvm_host_data {
> > struct kvm_pmu_events pmu_events;
> >  };
> > 
> > +#define KVM_HOST_PSCI_0_1_CPU_SUSPEND  BIT(0)
> > +#define KVM_HOST_PSCI_0_1_CPU_ON   BIT(1)
> > +#define KVM_HOST_PSCI_0_1_CPU_OFF  BIT(2)
> > +#define KVM_HOST_PSCI_0_1_MIGRATE  BIT(3)
> > +
> > +struct kvm_host_psci_config {
> > +   /* PSCI version used by host. */
> > +   u32 version;
> > +
> > +   /* Function IDs used by host if version is v0.1. */
> > +   struct psci_0_1_function_ids function_ids_0_1;
> > +
> > +   /* Bitmask of functions enabled for v0.1, bits KVM_HOST_PSCI_0_1_*. */
> > +   unsigned int enabled_functions_0_1;
> 
> Nit: the conventional type for bitmaps is 'unsigned long'.
> Also, "enabled" seems odd. Isn't it actually "available"?

Sure, that or "implemented" works here.

Since there are only 4 functions here, it might make sense to use
independent bools rather than a bitmap, which might make this a bit
simpler...

> > get_psci_0_1_function_ids();
> > +   kvm_host_psci_config.version = psci_ops.get_version();
> > +
> > +   if (kvm_host_psci_config.version == PSCI_VERSION(0, 1)) {
> > +   kvm_host_psci_config.function_ids_0_1 = 
> > get_psci_0_1_function_ids();
> > +   kvm_host_psci_config.enabled_functions_0_1 =
> > +   (psci_ops.cpu_suspend ? KVM_HOST_PSCI_0_1_CPU_SUSPEND : 
> > 0) |
> > +   (psci_ops.cpu_off ? KVM_HOST_PSCI_0_1_CPU_OFF : 0) |
> > +   (psci_ops.cpu_on ? KVM_HOST_PSCI_0_1_CPU_ON : 0) |
> > +   (psci_ops.migrate ? KVM_HOST_PSCI_0_1_MIGRATE : 0);

... since e.g. this could be roughly:

kvm_host_psci_config.cpu_suspend_implemented = psci_ops.cpu_suspend;
kvm_host_psci_config.cpu_off_implemented = psci_ops.cpu_off;
kvm_host_psci_config.cpu_on_implemented = psci_ops.cpu_on;
kvm_host_psci_config.migrate_implemented = psci_ops.migrate;

> > +static inline bool is_psci_0_1_cpu_suspend(u64 func_id)
> > +{
> > +   return is_psci_0_1_function_enabled(KVM_HOST_PSCI_0_1_CPU_SUSPEND) &&
> > +  (func_id == kvm_host_psci_config.function_ids_0_1.cpu_suspend);
> > +}

...and similarly:

return  kvm_host_psci_config.cpu_suspend_implemented &&
func_id == kvm_host_psci_config.function_ids_0_1.cpu_suspend)

> Otherwise looks OK. Don't bother respinning the series for my
> comments, I can tidy things up as I apply it if there are no other
> issues.

FWIW, I'm happy with whatever choose to do here, so don't feel like you
have to follow my suggestions above.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC PATCH 16/27] KVM: arm64: Prepare Hyp memory protection

2020-12-07 Thread Mark Rutland

On Mon, Dec 07, 2020 at 10:20:03AM +, Will Deacon wrote:
> On Fri, Dec 04, 2020 at 06:01:52PM +, Quentin Perret wrote:
> > On Thursday 03 Dec 2020 at 12:57:33 (+), Fuad Tabba wrote:
> > 
> > > > +SYM_FUNC_START(__kvm_init_switch_pgd)
> > > > +   /* Turn the MMU off */
> > > > +   pre_disable_mmu_workaround
> > > > +   mrs x2, sctlr_el2
> > > > +   bic x3, x2, #SCTLR_ELx_M
> > > > +   msr sctlr_el2, x3
> > > > +   isb
> > > > +
> > > > +   tlbialle2
> > > > +
> > > > +   /* Install the new pgtables */
> > > > +   ldr x3, [x0, #NVHE_INIT_PGD_PA]
> > > > +   phys_to_ttbr x4, x3
> > > > +alternative_if ARM64_HAS_CNP
> > > > +   orr x4, x4, #TTBR_CNP_BIT
> > > > +alternative_else_nop_endif
> > > > +   msr ttbr0_el2, x4
> > > > +
> > > > +   /* Set the new stack pointer */
> > > > +   ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA]
> > > > +   mov sp, x0
> > > > +
> > > > +   /* And turn the MMU back on! */
> > > > +   dsb nsh
> > > > +   isb
> > > > +   msr sctlr_el2, x2
> > > > +   isb
> > > > +   ret x1
> > > > +SYM_FUNC_END(__kvm_init_switch_pgd)
> > > > +
> > > 
> > > Should the instruction cache be flushed here (ic iallu), to discard
> > > speculatively fetched instructions?
> > 
> > Hmm, Will? Thoughts?
> 
> The I-cache is physically tagged, so not sure what invalidation would
> achieve here. Fuad -- what do you think could go wrong specifically?

While the MMU is off, instruction fetches can be made from the PoC
rather than the PoU, so where instructions have been modified/copied and
not cleaned to the PoC, it's possible to fetch stale copies into the
I-caches. The physical tag doesn't prevent that.

In the regular CPU boot paths, __enabble_mmu() has an IC IALLU after
enabling the MMU to ensure that we get rid of anything stale (e.g. so
secondaries don't miss ftrace patching, which is only cleaned to the
PoU).

That might not be a problem here, if things are suitably padded and
never dynamically patched, but if so it's probably worth a comment.

Fuad, is that the sort of thing you were considering, or did you have
additional concerns?

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v6 0/2] MTE support for KVM guest

2020-12-03 Thread Mark Rutland

On Thu, Dec 03, 2020 at 04:49:49PM +, Steven Price wrote:
> On 03/12/2020 16:09, Mark Rutland wrote:
> > On Fri, Nov 27, 2020 at 03:21:11PM +, Steven Price wrote:
> > > It's been a week, and I think the comments on v5 made it clear that
> > > enforcing PROT_MTE requirements on the VMM was probably the wrong
> > > approach. So since I've got swap working correctly without that I
> > > thought I'd post a v6 which hopefully addresses all the comments so far.
> > > 
> > > This series adds support for Arm's Memory Tagging Extension (MTE) to
> > > KVM, allowing KVM guests to make use of it. This builds on the existing
> > > user space support already in v5.10-rc4, see [1] for an overview.
> > 
> > >   arch/arm64/include/asm/kvm_emulate.h   |  3 +++
> > >   arch/arm64/include/asm/kvm_host.h  |  8 
> > >   arch/arm64/include/asm/pgtable.h   |  2 +-
> > >   arch/arm64/include/asm/sysreg.h|  3 ++-
> > >   arch/arm64/kernel/mte.c| 18 +-
> > >   arch/arm64/kvm/arm.c   |  9 +
> > >   arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++
> > >   arch/arm64/kvm/mmu.c   | 16 
> > >   arch/arm64/kvm/sys_regs.c  | 20 +++-
> > >   include/uapi/linux/kvm.h   |  1 +
> > >   10 files changed, 82 insertions(+), 12 deletions(-)
> > 
> > I note that doesn't fixup arch/arm64/kvm/inject_fault.c, where in
> > enter_exception64() we have:
> > 
> > | // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
> > 
> > ... and IIUC when MTE is present, TCO should be set when delivering an
> > exception, so I believe that needs to be updated to set TCO.
> 
> Well spotted! As you say TCO should be set when delivering an exception, so
> we need the following:
> 
> -   // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
> +   if (kvm_has_mte(vcpu->kvm))
> +   new |= PSR_TCO_BIT;

Something of that sort, yes.

It'd be worth a look for any mention of TCO or MTE in case there are
other bits that need a fixup.

> > Given that MTE-capable HW does that unconditionally, this is going to be
> > a mess for big.LITTLE. :/
> 
> I'm not sure I follow. Either all CPUs support MTE in which this isn't a
> problem, or the MTE feature just isn't exposed. We don't support a mix of
> MTE and non-MTE CPUs. There are several aspects of MTE which effective mean
> it's an all-or-nothing feature for the system.

So long as the host requires uniform MTE support, I agree that's not a
problem.

The fun is that the CPUs themselves will set TCO upon a real exception
regardless of whether the host is aware, and on a mismatched system some
CPUs will do that while others will not. In such a case the host and
guest will end up seeing the SPSR TCO bit set sometimes upon exceptions
from EL1 or EL2, and I hope that MTE-unaware CPUs ignore the bit upon
ERET, or we're going to have significant problems.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v6 0/2] MTE support for KVM guest

2020-12-03 Thread Mark Rutland

On Fri, Nov 27, 2020 at 03:21:11PM +, Steven Price wrote:
> It's been a week, and I think the comments on v5 made it clear that
> enforcing PROT_MTE requirements on the VMM was probably the wrong
> approach. So since I've got swap working correctly without that I
> thought I'd post a v6 which hopefully addresses all the comments so far.
> 
> This series adds support for Arm's Memory Tagging Extension (MTE) to
> KVM, allowing KVM guests to make use of it. This builds on the existing
> user space support already in v5.10-rc4, see [1] for an overview.

>  arch/arm64/include/asm/kvm_emulate.h   |  3 +++
>  arch/arm64/include/asm/kvm_host.h  |  8 
>  arch/arm64/include/asm/pgtable.h   |  2 +-
>  arch/arm64/include/asm/sysreg.h|  3 ++-
>  arch/arm64/kernel/mte.c| 18 +-
>  arch/arm64/kvm/arm.c   |  9 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 14 ++
>  arch/arm64/kvm/mmu.c   | 16 
>  arch/arm64/kvm/sys_regs.c  | 20 +++-
>  include/uapi/linux/kvm.h   |  1 +
>  10 files changed, 82 insertions(+), 12 deletions(-)

I note that doesn't fixup arch/arm64/kvm/inject_fault.c, where in
enter_exception64() we have:

| // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)

... and IIUC when MTE is present, TCO should be set when delivering an
exception, so I believe that needs to be updated to set TCO.

Given that MTE-capable HW does that unconditionally, this is going to be
a mess for big.LITTLE. :/

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 16/26] kvm: arm64: Bootstrap PSCI SMC handler in nVHE EL2

2020-12-03 Thread Mark Rutland

On Wed, Dec 02, 2020 at 06:41:12PM +, David Brazdil wrote:
> Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized
> with the version used by the host's PSCI driver and the function IDs it
> was configured with. If the SMC function ID matches one of the
> configured PSCI calls (for v0.1) or falls into the PSCI function ID
> range (for v0.2+), the SMC is handled by the PSCI handler. For now, all
> SMCs return PSCI_RET_NOT_SUPPORTED.
> 
> Signed-off-by: David Brazdil 

> +static bool is_psci_0_1_call(u64 func_id)
> +{
> + return (func_id == kvm_host_psci_0_1_function_ids.cpu_suspend) ||
> +(func_id == kvm_host_psci_0_1_function_ids.cpu_on) ||
> +(func_id == kvm_host_psci_0_1_function_ids.cpu_off) ||
> +(func_id == kvm_host_psci_0_1_function_ids.migrate);
> +}

One minor thing, as I just spotted on an earlier patch: if FW doesn't
implement one of these, the ID will be 0, so we might need to snapshot
whether or not the function is enabled to stop spurious calls to FID 0.

To be clear, that can be done in a follow-up if necessary.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 06/26] psci: Add accessor for psci_0_1_function_ids

2020-12-03 Thread Mark Rutland

On Wed, Dec 02, 2020 at 06:41:02PM +, David Brazdil wrote:
> Make it possible to retrieve a copy of the psci_0_1_function_ids struct.
> This is useful for KVM if it is configured to intercept host's PSCI SMCs.
> 
> Signed-off-by: David Brazdil 

Acked-by: Mark Rutland 

... just to check, does KVM snapshot which function IDs are valid, or do
we want to add that state here too? That can be a follow-up if
necessary.

Thanks,
Mark.

> ---
>  drivers/firmware/psci/psci.c | 12 +---
>  include/linux/psci.h |  9 +
>  2 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 593fdd0e09a2..f5fc429cae3f 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -58,15 +58,13 @@ typedef unsigned long (psci_fn)(unsigned long, unsigned 
> long,
>   unsigned long, unsigned long);
>  static psci_fn *invoke_psci_fn;
>  
> -struct psci_0_1_function_ids {
> - u32 cpu_suspend;
> - u32 cpu_on;
> - u32 cpu_off;
> - u32 migrate;
> -};
> -
>  static struct psci_0_1_function_ids psci_0_1_function_ids;
>  
> +struct psci_0_1_function_ids get_psci_0_1_function_ids(void)
> +{
> + return psci_0_1_function_ids;
> +}
> +
>  #define PSCI_0_2_POWER_STATE_MASK\
>   (PSCI_0_2_POWER_STATE_ID_MASK | \
>   PSCI_0_2_POWER_STATE_TYPE_MASK | \
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index 2a1bfb890e58..4ca0060a3fc4 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -34,6 +34,15 @@ struct psci_operations {
>  
>  extern struct psci_operations psci_ops;
>  
> +struct psci_0_1_function_ids {
> + u32 cpu_suspend;
> + u32 cpu_on;
> + u32 cpu_off;
> + u32 migrate;
> +};
> +
> +struct psci_0_1_function_ids get_psci_0_1_function_ids(void);
> +
>  #if defined(CONFIG_ARM_PSCI_FW)
>  int __init psci_dt_init(void);
>  #else
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 05/26] psci: Replace psci_function_id array with a struct

2020-12-03 Thread Mark Rutland

On Wed, Dec 02, 2020 at 06:41:01PM +, David Brazdil wrote:
> Small refactor that replaces array of v0.1 function IDs indexed by an
> enum of function-name constants with a struct of function IDs "indexed"
> by field names. This is done in preparation for exposing the IDs to
> other parts of the kernel. Exposing a struct avoids the need for
> bounds checking.
> 
> Signed-off-by: David Brazdil 

Acked-by: Mark Rutland 

Mark.

> ---
>  drivers/firmware/psci/psci.c | 29 ++---
>  1 file changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 13b9ed71b446..593fdd0e09a2 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -58,15 +58,14 @@ typedef unsigned long (psci_fn)(unsigned long, unsigned 
> long,
>   unsigned long, unsigned long);
>  static psci_fn *invoke_psci_fn;
>  
> -enum psci_function {
> - PSCI_FN_CPU_SUSPEND,
> - PSCI_FN_CPU_ON,
> - PSCI_FN_CPU_OFF,
> - PSCI_FN_MIGRATE,
> - PSCI_FN_MAX,
> +struct psci_0_1_function_ids {
> + u32 cpu_suspend;
> + u32 cpu_on;
> + u32 cpu_off;
> + u32 migrate;
>  };
>  
> -static u32 psci_function_id[PSCI_FN_MAX];
> +static struct psci_0_1_function_ids psci_0_1_function_ids;
>  
>  #define PSCI_0_2_POWER_STATE_MASK\
>   (PSCI_0_2_POWER_STATE_ID_MASK | \
> @@ -178,7 +177,7 @@ static int __psci_cpu_suspend(u32 fn, u32 state, unsigned 
> long entry_point)
>  
>  static int psci_0_1_cpu_suspend(u32 state, unsigned long entry_point)
>  {
> - return __psci_cpu_suspend(psci_function_id[PSCI_FN_CPU_SUSPEND],
> + return __psci_cpu_suspend(psci_0_1_function_ids.cpu_suspend,
> state, entry_point);
>  }
>  
> @@ -198,7 +197,7 @@ static int __psci_cpu_off(u32 fn, u32 state)
>  
>  static int psci_0_1_cpu_off(u32 state)
>  {
> - return __psci_cpu_off(psci_function_id[PSCI_FN_CPU_OFF], state);
> + return __psci_cpu_off(psci_0_1_function_ids.cpu_off, state);
>  }
>  
>  static int psci_0_2_cpu_off(u32 state)
> @@ -216,7 +215,7 @@ static int __psci_cpu_on(u32 fn, unsigned long cpuid, 
> unsigned long entry_point)
>  
>  static int psci_0_1_cpu_on(unsigned long cpuid, unsigned long entry_point)
>  {
> - return __psci_cpu_on(psci_function_id[PSCI_FN_CPU_ON], cpuid, 
> entry_point);
> + return __psci_cpu_on(psci_0_1_function_ids.cpu_on, cpuid, entry_point);
>  }
>  
>  static int psci_0_2_cpu_on(unsigned long cpuid, unsigned long entry_point)
> @@ -234,7 +233,7 @@ static int __psci_migrate(u32 fn, unsigned long cpuid)
>  
>  static int psci_0_1_migrate(unsigned long cpuid)
>  {
> - return __psci_migrate(psci_function_id[PSCI_FN_MIGRATE], cpuid);
> + return __psci_migrate(psci_0_1_function_ids.migrate, cpuid);
>  }
>  
>  static int psci_0_2_migrate(unsigned long cpuid)
> @@ -548,22 +547,22 @@ static int __init psci_0_1_init(struct device_node *np)
>   psci_ops.get_version = psci_0_1_get_version;
>  
>   if (!of_property_read_u32(np, "cpu_suspend", &id)) {
> - psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
> + psci_0_1_function_ids.cpu_suspend = id;
>   psci_ops.cpu_suspend = psci_0_1_cpu_suspend;
>   }
>  
>   if (!of_property_read_u32(np, "cpu_off", &id)) {
> - psci_function_id[PSCI_FN_CPU_OFF] = id;
> + psci_0_1_function_ids.cpu_off = id;
>   psci_ops.cpu_off = psci_0_1_cpu_off;
>   }
>  
>   if (!of_property_read_u32(np, "cpu_on", &id)) {
> - psci_function_id[PSCI_FN_CPU_ON] = id;
> + psci_0_1_function_ids.cpu_on = id;
>   psci_ops.cpu_on = psci_0_1_cpu_on;
>   }
>  
>   if (!of_property_read_u32(np, "migrate", &id)) {
> - psci_function_id[PSCI_FN_MIGRATE] = id;
> + psci_0_1_function_ids.migrate = id;
>   psci_ops.migrate = psci_0_1_migrate;
>   }
>  
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v4 04/26] psci: Split functions to v0.1 and v0.2+ variants

2020-12-03 Thread Mark Rutland

On Wed, Dec 02, 2020 at 06:41:00PM +, David Brazdil wrote:
> Refactor implementation of v0.1+ functions (CPU_SUSPEND, CPU_OFF,
> CPU_ON, MIGRATE) to have two functions psci_0_1_foo / psci_0_2_foo that
> select the function ID and call a common helper __psci_foo.
> 
> This is a small cleanup so that the function ID array is only used for
> v0.1 configurations.
> 
> Signed-off-by: David Brazdil 

Acked-by: Mark Rutland 

Mark.

> ---
>  drivers/firmware/psci/psci.c | 94 +++-
>  1 file changed, 60 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index ace5b9ac676c..13b9ed71b446 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -168,46 +168,80 @@ int psci_set_osi_mode(bool enable)
>   return psci_to_linux_errno(err);
>  }
>  
> -static int psci_cpu_suspend(u32 state, unsigned long entry_point)
> +static int __psci_cpu_suspend(u32 fn, u32 state, unsigned long entry_point)
>  {
>   int err;
> - u32 fn;
>  
> - fn = psci_function_id[PSCI_FN_CPU_SUSPEND];
>   err = invoke_psci_fn(fn, state, entry_point, 0);
>   return psci_to_linux_errno(err);
>  }
>  
> -static int psci_cpu_off(u32 state)
> +static int psci_0_1_cpu_suspend(u32 state, unsigned long entry_point)
> +{
> + return __psci_cpu_suspend(psci_function_id[PSCI_FN_CPU_SUSPEND],
> +   state, entry_point);
> +}
> +
> +static int psci_0_2_cpu_suspend(u32 state, unsigned long entry_point)
> +{
> + return __psci_cpu_suspend(PSCI_FN_NATIVE(0_2, CPU_SUSPEND),
> +   state, entry_point);
> +}
> +
> +static int __psci_cpu_off(u32 fn, u32 state)
>  {
>   int err;
> - u32 fn;
>  
> - fn = psci_function_id[PSCI_FN_CPU_OFF];
>   err = invoke_psci_fn(fn, state, 0, 0);
>   return psci_to_linux_errno(err);
>  }
>  
> -static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> +static int psci_0_1_cpu_off(u32 state)
> +{
> + return __psci_cpu_off(psci_function_id[PSCI_FN_CPU_OFF], state);
> +}
> +
> +static int psci_0_2_cpu_off(u32 state)
> +{
> + return __psci_cpu_off(PSCI_0_2_FN_CPU_OFF, state);
> +}
> +
> +static int __psci_cpu_on(u32 fn, unsigned long cpuid, unsigned long 
> entry_point)
>  {
>   int err;
> - u32 fn;
>  
> - fn = psci_function_id[PSCI_FN_CPU_ON];
>   err = invoke_psci_fn(fn, cpuid, entry_point, 0);
>   return psci_to_linux_errno(err);
>  }
>  
> -static int psci_migrate(unsigned long cpuid)
> +static int psci_0_1_cpu_on(unsigned long cpuid, unsigned long entry_point)
> +{
> + return __psci_cpu_on(psci_function_id[PSCI_FN_CPU_ON], cpuid, 
> entry_point);
> +}
> +
> +static int psci_0_2_cpu_on(unsigned long cpuid, unsigned long entry_point)
> +{
> + return __psci_cpu_on(PSCI_FN_NATIVE(0_2, CPU_ON), cpuid, entry_point);
> +}
> +
> +static int __psci_migrate(u32 fn, unsigned long cpuid)
>  {
>   int err;
> - u32 fn;
>  
> - fn = psci_function_id[PSCI_FN_MIGRATE];
>   err = invoke_psci_fn(fn, cpuid, 0, 0);
>   return psci_to_linux_errno(err);
>  }
>  
> +static int psci_0_1_migrate(unsigned long cpuid)
> +{
> + return __psci_migrate(psci_function_id[PSCI_FN_MIGRATE], cpuid);
> +}
> +
> +static int psci_0_2_migrate(unsigned long cpuid)
> +{
> + return __psci_migrate(PSCI_FN_NATIVE(0_2, MIGRATE), cpuid);
> +}
> +
>  static int psci_affinity_info(unsigned long target_affinity,
>   unsigned long lowest_affinity_level)
>  {
> @@ -352,7 +386,7 @@ static void __init psci_init_system_suspend(void)
>  
>  static void __init psci_init_cpu_suspend(void)
>  {
> - int feature = psci_features(psci_function_id[PSCI_FN_CPU_SUSPEND]);
> + int feature = psci_features(PSCI_FN_NATIVE(0_2, CPU_SUSPEND));
>  
>   if (feature != PSCI_RET_NOT_SUPPORTED)
>   psci_cpu_suspend_feature = feature;
> @@ -426,24 +460,16 @@ static void __init psci_init_smccc(void)
>  static void __init psci_0_2_set_functions(void)
>  {
>   pr_info("Using standard PSCI v0.2 function IDs\n");
> - psci_ops.get_version = psci_0_2_get_version;
> -
> - psci_function_id[PSCI_FN_CPU_SUSPEND] =
> - PSCI_FN_NATIVE(0_2, CPU_SUSPEND);
> - psci_ops.cpu_suspend = psci_cpu_suspend;
> -
> - psci_function_id[PSCI_FN_CPU_OFF] = PSCI_0_2_FN_CPU_OFF;
> - psci_ops.cpu_off = psci_cpu_off;
> -
> - psci_function_id[PSCI_FN_CPU_ON] = PSCI_FN_NATIVE(0_2, CPU_ON);
> - psci_ops.cpu_on = psci_cpu_on;
&

Re: [PATCH v3 06/23] kvm: arm64: Add kvm-arm.protected early kernel parameter

2020-12-01 Thread Mark Rutland

On Tue, Dec 01, 2020 at 02:43:49PM +, David Brazdil wrote:
> > > > be just me, but if you agree please update so that it doesn't give 
> > > > remote
> > > > idea that it is not valid on VHE enabled hardware.
> > > > 
> > > > I was trying to run this on the hardware and was trying to understand 
> > > > the
> > > > details on how to do that.
> > > 
> > > I see what you're saying, but !CONFIG_ARM64_VHE isn't accurate either. The
> > > option makes sense if:
> > >   1) all cores booted in EL2
> > >  == is_hyp_mode_available()
> > >   2) ID_AA64MMFR1_EL1.VH=0 or !CONFIG_ARM64_VHE
> > >  == !is_kernel_in_hyp_mode()
> > > 
> > > The former feels implied for KVM, the latter could be 'Valid if the kernel
> > > is running in EL1'? WDYT?
> > 
> > I reckon we can avoid the restriction if we instead add an early stub
> > like with have for KASLR. That way we could parse the command line
> > early, and if necessary re-initialize EL2 and drop to EL1 before the
> > main kernel has to make any decisions about how to initialize things.
> > That would allow us to have a more general kvm-arm.mode option where a
> > single kernel Image could support:
> > 
> > * "protected" mode on nVHE or VHE HW
> > * "nvhe" mode on nVHE or VHE HW
> > * "vhe" mode on VHE HW
> > 
> > ... defaulting to VHE/nVHE modes depending on HW support.
> > 
> > That would also be somewhat future-proof if we have to add other
> > variants of protected mode in future, as we could extend the mode option
> > with parameters for each mode.
> 
> Agreed that 'mode' is a more future-proof flag and I would very much love to
> have an option to force nVHE on VHE HW. I however expect that the early stub
> would not be a trivial addition and would not want to get into that in this
> series. Could we agree on 'protected' as the only supported value for the time
> being?

Sure, that works for me.

Thanks,
Mark. 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 20/23] kvm: arm64: Intercept host's CPU_SUSPEND PSCI SMCs

2020-12-01 Thread Mark Rutland

On Thu, Nov 26, 2020 at 03:54:18PM +, David Brazdil wrote:
> Add a handler of CPU_SUSPEND host PSCI SMCs. The SMC can either enter
> a sleep state indistinguishable from a WFI or a deeper sleep state that
> behaves like a CPU_OFF+CPU_ON except that the core is still considered
> online when asleep.
> 
> The handler saves r0,pc of the host and makes the same call to EL3 with
> the hyp CPU entry point. It either returns back to the handler and then
> back to the host, or wakes up into the entry point and initializes EL2
> state before dropping back to EL1.

For those CPU_SUSPEND calls which lose context, is there no EL2 state
that you need to save/restore, or is that all saved elsewhere already?

The usual suspects are PMU, debug, and timers, so maybe not. It'd be
nice to have a statement in the commit message if we're certain there's
no state that we need to save.

> A core can only suspend itself but other cores can concurrently invoke
> CPU_ON with this core as target. To avoid racing them for the same
> boot args struct, CPU_SUSPEND uses a different struct instance and entry
> point. Each entry point selects the corresponding struct to restore host
> boot args from. This avoids the need for locking in CPU_SUSPEND.

I found this a bit confusing since the first sentence can be read to
mean that CPU_ON is expected to compose with CPU_SUSPEND, whereas what
this is actually saying is the implementation ensures they don't
interact. How about:

| CPU_ON and CPU_SUSPEND are both implemented using struct cpu_boot_args
| to store the state upon powerup, with each CPU having separate structs
| for CPU_ON and CPU_SUSPEND so that CPU_SUSPEND can operate locklessly
| and so that a CPU_ON xall targetting a CPU cannot interfere with a
| concurrent CPU_SUSPEND call on that CPU.

The patch itself looks fine to me.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 06/23] kvm: arm64: Add kvm-arm.protected early kernel parameter

2020-12-01 Thread Mark Rutland

On Tue, Dec 01, 2020 at 01:19:13PM +, David Brazdil wrote:
> Hey Sudeep,
> 
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> > > b/Documentation/admin-guide/kernel-parameters.txt
> > > index 526d65d8573a..06c89975c29c 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -2259,6 +2259,11 @@
> > >   for all guests.
> > >   Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
> > >  
> > > + kvm-arm.protected=
> > > + [KVM,ARM] Allow spawning protected guests whose state
> > > + is kept private from the host. Only valid for non-VHE.
> > > + Default is 0 (disabled).
> > > +
> > 
> > Sorry for being pedantic. Can we reword this to say valid for
> > !CONFIG_ARM64_VHE ? I read this as valid only for non-VHE hardware, it may
> > be just me, but if you agree please update so that it doesn't give remote
> > idea that it is not valid on VHE enabled hardware.
> > 
> > I was trying to run this on the hardware and was trying to understand the
> > details on how to do that.
> 
> I see what you're saying, but !CONFIG_ARM64_VHE isn't accurate either. The
> option makes sense if:
>   1) all cores booted in EL2
>  == is_hyp_mode_available()
>   2) ID_AA64MMFR1_EL1.VH=0 or !CONFIG_ARM64_VHE
>  == !is_kernel_in_hyp_mode()
> 
> The former feels implied for KVM, the latter could be 'Valid if the kernel
> is running in EL1'? WDYT?

I reckon we can avoid the restriction if we instead add an early stub
like with have for KASLR. That way we could parse the command line
early, and if necessary re-initialize EL2 and drop to EL1 before the
main kernel has to make any decisions about how to initialize things.
That would allow us to have a more general kvm-arm.mode option where a
single kernel Image could support:

* "protected" mode on nVHE or VHE HW
* "nvhe" mode on nVHE or VHE HW
* "vhe" mode on VHE HW

... defaulting to VHE/nVHE modes depending on HW support.

That would also be somewhat future-proof if we have to add other
variants of protected mode in future, as we could extend the mode option
with parameters for each mode.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 05/23] arm64: Extract parts of el2_setup into a macro

2020-11-26 Thread Mark Rutland

On Thu, Nov 26, 2020 at 03:54:03PM +, David Brazdil wrote:
> When the a CPU is booted in EL2, the kernel checks for VHE support and
> initializes the CPU core accordingly. For nVHE it also installs the stub
> vectors and drops down to EL1.
> 
> Once KVM gains the ability to boot cores without going through the
> kernel entry point, it will need to initialize the CPU the same way.
> Extract the relevant bits of el2_setup into an init_el2_state macro
> with an argument specifying whether to initialize for VHE or nVHE.
> 
> No functional change. Size of el2_setup increased by 148 bytes due
> to duplication.

As a heads-up, this will conflict with my rework which is queued in the
arm64 for-next/uaccess branch. I reworked an renamed el2_setup to
initialize SCTLR_ELx and PSTATE more consistently as a prerequisite for
the set_fs() removal.

I'm afraid this is going to conflict, and I reckon this needs to be
rebased atop that. I think the actual conflicts are logically trivial,
but the diff is going to be painful.

I'm certainly in favour of breaking this down into manageable chunks,
especially as that makes the branch naming easier to follow, but I have
a couple of concerns below.

> +/* GICv3 system register access */
> +.macro __init_el2_gicv3
> + mrs x0, id_aa64pfr0_el1
> + ubfxx0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
> + cbz x0, 1f
> +
> + mrs_s   x0, SYS_ICC_SRE_EL2
> + orr x0, x0, #ICC_SRE_EL2_SRE// Set ICC_SRE_EL2.SRE==1
> + orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
> + msr_s   SYS_ICC_SRE_EL2, x0
> + isb // Make sure SRE is now set
> + mrs_s   x0, SYS_ICC_SRE_EL2 // Read SRE back,
> + tbz x0, #0, 1f  // and check that it sticks
> + msr_s   SYS_ICH_HCR_EL2, xzr// Reset ICC_HCR_EL2 to defaults
> +1:
> +.endm

In the head.S code, this was under an ifdef CONFIG_ARM_GIC_V3, but that
ifdef wasn't carried into the macro here, or into its use below. I'm not
sure of the impact, but that does seem to be a functional change.

> +
> +.macro __init_el2_hstr
> + msr hstr_el2, xzr   // Disable CP15 traps to EL2
> +.endm

Likewise, this used to be be guarded by CONFIG_COMPAT, but that's not
carried into the macro or its use.

If the intent was to remove the conditionality, then that should be
mentioned in the commit message, since it is a potential functional
change.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 04/23] arm64: Move MAIR_EL1_SET to asm/memory.h

2020-11-26 Thread Mark Rutland

On Thu, Nov 26, 2020 at 03:54:02PM +, David Brazdil wrote:
> KVM currently initializes MAIR_EL2 to the value of MAIR_EL1. In
> preparation for initializing MAIR_EL2 before MAIR_EL1, move the constant
> into a shared header file. Since it is used for EL1 and EL2, rename to
> MAIR_ELx_SET.
> 
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/memory.h | 13 +
>  arch/arm64/mm/proc.S| 15 +--
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index cd61239bae8c..54a22cb5b17b 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -152,6 +152,19 @@
>  #define MT_S2_FWB_NORMAL 6
>  #define MT_S2_FWB_DEVICE_nGnRE   1
>  
> +/*
> + * Default MAIR_ELx. MT_NORMAL_TAGGED is initially mapped as Normal memory 
> and
> + * changed during __cpu_setup to Normal Tagged if the system supports MTE.
> + */
> +#define MAIR_ELx_SET \
> + (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |\
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) |  \
> +  MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))

Patch 7 initializes MAIR_EL2 with this directly rather than copying it
from MAIR_EL1, which means that MT_NORMAL_TAGGED will never be tagged
within the nVHE hyp code.

Is that expected? I suspect it's worth a comment here (introduced in
patch 7), just to make that clear.

Otherwise this looks fine to me.

Thanks,
Mark.


> +
>  #ifdef CONFIG_ARM64_4K_PAGES
>  #define IOREMAP_MAX_ORDER(PUD_SHIFT)
>  #else
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 23c326a06b2d..e3b9aa372b96 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -45,19 +45,6 @@
>  #define TCR_KASAN_FLAGS 0
>  #endif
>  
> -/*
> - * Default MAIR_EL1. MT_NORMAL_TAGGED is initially mapped as Normal memory 
> and
> - * changed during __cpu_setup to Normal Tagged if the system supports MTE.
> - */
> -#define MAIR_EL1_SET \
> - (MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRnE, MT_DEVICE_nGnRnE) |  \
> -  MAIR_ATTRIDX(MAIR_ATTR_DEVICE_nGnRE, MT_DEVICE_nGnRE) |\
> -  MAIR_ATTRIDX(MAIR_ATTR_DEVICE_GRE, MT_DEVICE_GRE) |\
> -  MAIR_ATTRIDX(MAIR_ATTR_NORMAL_NC, MT_NORMAL_NC) |  \
> -  MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL) |\
> -  MAIR_ATTRIDX(MAIR_ATTR_NORMAL_WT, MT_NORMAL_WT) |  \
> -  MAIR_ATTRIDX(MAIR_ATTR_NORMAL, MT_NORMAL_TAGGED))
> -
>  #ifdef CONFIG_CPU_PM
>  /**
>   * cpu_do_suspend - save CPU registers context
> @@ -425,7 +412,7 @@ SYM_FUNC_START(__cpu_setup)
>   /*
>* Memory region attributes
>*/
> - mov_q   x5, MAIR_EL1_SET
> + mov_q   x5, MAIR_ELx_SET
>  #ifdef CONFIG_ARM64_MTE
>   /*
>* Update MAIR_EL1, GCR_EL1 and TFSR*_EL1 if MTE is supported
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 03/23] arm64: Make cpu_logical_map() take unsigned int

2020-11-26 Thread Mark Rutland

On Thu, Nov 26, 2020 at 03:54:01PM +, David Brazdil wrote:
> CPU index should never be negative. Change the signature of
> (set_)cpu_logical_map to take an unsigned int.
> 
> Signed-off-by: David Brazdil 

Is there a function problem here, or is this just cleanup from
inspection?

Core code including the cpuhp_*() callbacks uses an int, so if there's a
strong justification to change this, it suggests there's some treewide
cleanup that should be done.

I don't have strong feelings on the matter, but I'd like to understand
the rationale.

Thanks,
Mark.

> ---
>  arch/arm64/include/asm/smp.h | 4 ++--
>  arch/arm64/kernel/setup.c| 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index 2e7f529ec5a6..bcb01ca15325 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -46,9 +46,9 @@ DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
>   * Logical CPU mapping.
>   */
>  extern u64 __cpu_logical_map[NR_CPUS];
> -extern u64 cpu_logical_map(int cpu);
> +extern u64 cpu_logical_map(unsigned int cpu);
>  
> -static inline void set_cpu_logical_map(int cpu, u64 hwid)
> +static inline void set_cpu_logical_map(unsigned int cpu, u64 hwid)
>  {
>   __cpu_logical_map[cpu] = hwid;
>  }
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 133257ffd859..2f2973bc67c7 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -276,7 +276,7 @@ arch_initcall(reserve_memblock_reserved_regions);
>  
>  u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>  
> -u64 cpu_logical_map(int cpu)
> +u64 cpu_logical_map(unsigned int cpu)
>  {
>   return __cpu_logical_map[cpu];
>  }
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 02/23] psci: Accessor for configured PSCI function IDs

2020-11-26 Thread Mark Rutland

On Thu, Nov 26, 2020 at 03:54:00PM +, David Brazdil wrote:
> Function IDs used by PSCI are configurable for v0.1 via DT/APCI. If the
> host is using PSCI v0.1, KVM's host PSCI proxy needs to use the same IDs.
> Expose the array holding the information with a read-only accessor.
> 
> Signed-off-by: David Brazdil 
> ---
>  drivers/firmware/psci/psci.c | 16 
>  include/linux/psci.h | 10 ++
>  2 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 213c68418a65..40609564595e 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -58,16 +58,16 @@ typedef unsigned long (psci_fn)(unsigned long, unsigned 
> long,
>   unsigned long, unsigned long);
>  static psci_fn *invoke_psci_fn;
>  
> -enum psci_function {
> - PSCI_FN_CPU_SUSPEND,
> - PSCI_FN_CPU_ON,
> - PSCI_FN_CPU_OFF,
> - PSCI_FN_MIGRATE,
> - PSCI_FN_MAX,
> -};
> -
>  static u32 psci_function_id[PSCI_FN_MAX];
>  
> +u32 psci_get_function_id(enum psci_function fn)
> +{
> + if (WARN_ON_ONCE(fn < 0 || fn >= PSCI_FN_MAX))
> + return 0;
> +
> + return psci_function_id[fn];
> +}

I'd really like if we could namespace this with a psci_0_1_* prefix
before we expose it outside of the PSCI code. I appreciate that's a
larger change, but I reckon we only need a couple of new patches:

1) Split the ops which consume the FN ids into separate psci_0_1_*() and
   psci_0_2_*() variants, with a common __psci_*() helper that takes the
   function ID as an argument. The 0_1 variants would read the function
   ID from a variable, and the 0_2 variants would hard-code the id.

2) Replace the psci_function_id array with:

   struct psci_0_1_function_ids {
u32 suspend;
u32 cpu_on;
u32 cpu_off;
u32 migrate;
   };

   ... and remove enum psci_function entirely.

3) Add a helper which returns the entire psci_0_1_function_ids struct in
   one go. No warnings necessary.

Does that sound OK to you?

Thanks,
Mark.

> +
>  #define PSCI_0_2_POWER_STATE_MASK\
>   (PSCI_0_2_POWER_STATE_ID_MASK | \
>   PSCI_0_2_POWER_STATE_TYPE_MASK | \
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index 2a1bfb890e58..5b49a5c82d6f 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -21,6 +21,16 @@ bool psci_power_state_is_valid(u32 state);
>  int psci_set_osi_mode(bool enable);
>  bool psci_has_osi_support(void);
>  
> +enum psci_function {
> + PSCI_FN_CPU_SUSPEND,
> + PSCI_FN_CPU_ON,
> + PSCI_FN_CPU_OFF,
> + PSCI_FN_MIGRATE,
> + PSCI_FN_MAX,
> +};
> +
> +u32 psci_get_function_id(enum psci_function fn);
> +
>  struct psci_operations {
>   u32 (*get_version)(void);
>   int (*cpu_suspend)(u32 state, unsigned long entry_point);
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 01/23] psci: Support psci_ops.get_version for v0.1

2020-11-26 Thread Mark Rutland

On Thu, Nov 26, 2020 at 03:53:59PM +, David Brazdil wrote:
> KVM's host PSCI SMC filter needs to be aware of the PSCI version of the
> system but currently it is impossible to distinguish between v0.1 and
> PSCI disabled because both have get_version == NULL.
> 
> Populate get_version for v0.1 with a function that returns a constant.
> 
> psci_opt.get_version is currently unused so this has no effect on
> existing functionality.
> 
> Signed-off-by: David Brazdil 
> ---
>  drivers/firmware/psci/psci.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index 00af99b6f97c..213c68418a65 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -146,6 +146,11 @@ static int psci_to_linux_errno(int errno)
>   return -EINVAL;
>  }
>  
> +static u32 psci_get_version_0_1(void)
> +{
> + return PSCI_VERSION(0, 1);
> +}

Elsewhere in this file we've used a psci_${MAJOR}_${MINOR}_* naming
scheme.

To match that, I'd prefer we call this psci_0_1_get_version(), and
rename psci_get_version() to psci_0_2_get_version().

With that:

Acked-by: Mark Rutland 

Thanks,
Mark.

> +
>  static u32 psci_get_version(void)
>  {
>   return invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
> @@ -514,6 +519,8 @@ static int __init psci_0_1_init(struct device_node *np)
>  
>   pr_info("Using PSCI v0.1 Function IDs from DT\n");
>  
> + psci_ops.get_version = psci_get_version_0_1;
> +
>   if (!of_property_read_u32(np, "cpu_suspend", &id)) {
>   psci_function_id[PSCI_FN_CPU_SUSPEND] = id;
>   psci_ops.cpu_suspend = psci_cpu_suspend;
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 4/5] arm64: Add support for SMCCC TRNG entropy source

2020-11-05 Thread Mark Rutland

On Thu, Nov 05, 2020 at 03:34:01PM +0100, Ard Biesheuvel wrote:
> On Thu, 5 Nov 2020 at 15:30, Mark Rutland  wrote:
> > On Thu, Nov 05, 2020 at 03:04:57PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 5 Nov 2020 at 15:03, Mark Rutland  wrote:
> >
> > > > That said, I'm not sure it's great to plumb this under the
> > > > arch_get_random*() interfaces, e.g. given this measn that
> > > > add_interrupt_randomness() will end up trapping to the host all the time
> > > > when it calls arch_get_random_seed_long().
> > >
> > > As it turns out, add_interrupt_randomness() isn't actually used on ARM.
> >
> > It's certainly called on arm64, per a warning I just hacked in:

[...]

> > ... and I couldn't immediately spot why 32-bit arm  would be different.
> 
> Hmm, I actually meant both arm64 and ARM.
> 
> Marc looked into this at my request a while ago, and I had a look
> myself as well at the time, and IIRC, we both concluded that we don't
> hit that code path. Darn.
> 
> In any case, the way add_interrupt_randomness() calls
> arch_get_random_seed_long() is absolutely insane, so we should try to
> fix that in any case.

I have no strong opinion there, and I'm happy with that getting cleaned
up.

Regardless, I do think it's reasonable for the common code to expect
that arch_get_random_*() to be roughly as expensive as "most other
instructions" (since even RNDR* is expensive the CPU might be able to do
useful speculative work in the mean time), whereas a trap to the host is
always liable to be expensive as no useful work can be done while the
host is handling it, so I think it makes sense to distinguish the two.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 4/5] arm64: Add support for SMCCC TRNG entropy source

2020-11-05 Thread Mark Rutland

On Thu, Nov 05, 2020 at 02:29:49PM +, Mark Brown wrote:
> On Thu, Nov 05, 2020 at 02:03:22PM +0000, Mark Rutland wrote:
> > On Thu, Nov 05, 2020 at 01:41:42PM +, Mark Brown wrote:
> 
> > > It isn't obvious to me why we don't fall through to trying the SMCCC
> > > TRNG here if for some reason the v8.5-RNG didn't give us something.
> > > Definitely an obscure possibility but still...
> 
> > I think it's better to assume that if we have a HW RNG and it's not
> > giving us entropy, it's not worthwhile trapping to the host, which might
> > encounter the exact same issue.
> 
> There's definitely a good argument for that, but OTOH it's possible the
> SMCCC implementation is doing something else (it'd be an interesting
> implementation decision but...).  That said I don't really mind, I think
> my comment was more that if we're doing this the code should be explicit
> about what the intent is since right now it isn't obvious.  Either a
> comment or having an explicit "what method are we choosing" thing.
> 
> > That said, I'm not sure it's great to plumb this under the
> > arch_get_random*() interfaces, e.g. given this measn that
> > add_interrupt_randomness() will end up trapping to the host all the time
> > when it calls arch_get_random_seed_long().
> 
> > Is there an existing interface for "slow" runtime entropy that we can
> > plumb this into instead?
> 
> Yeah, I was wondering about this myself - it seems like a better fit for
> hwrng rather than the arch interfaces but that's not used until
> userspace comes up, the arch stuff is all expected to be quick.  I
> suppose we could implement the SMCCC stuff for the early variants of the
> API you added so it gets used for bootstrapping purposes and then we
> rely on userspace keeping things topped up by fetching entropy through
> hwrng or otherwise but that feels confused so I have a hard time getting
> enthusiastic about it.

I'm perfectly happy for the early functions to call this, or for us to
add something new firmwware_get_random_*() functions that we can call
early (and potentially at runtime, but less often than
arch_get_random_*()).

I suspect the easy thing to do for now is plumb this into the existing
early arch functions and hwrng.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 4/5] arm64: Add support for SMCCC TRNG entropy source

2020-11-05 Thread Mark Rutland

On Thu, Nov 05, 2020 at 03:04:57PM +0100, Ard Biesheuvel wrote:
> On Thu, 5 Nov 2020 at 15:03, Mark Rutland  wrote:
> > On Thu, Nov 05, 2020 at 01:41:42PM +, Mark Brown wrote:
> > > On Thu, Nov 05, 2020 at 12:56:55PM +, Andre Przywara wrote:

> > That said, I'm not sure it's great to plumb this under the
> > arch_get_random*() interfaces, e.g. given this measn that
> > add_interrupt_randomness() will end up trapping to the host all the time
> > when it calls arch_get_random_seed_long().
> 
> As it turns out, add_interrupt_randomness() isn't actually used on ARM.

It's certainly called on arm64, per a warning I just hacked in:

[1.083802] [ cut here ]
[1.084802] add_interrupt_randomness called
[1.085685] WARNING: CPU: 1 PID: 0 at drivers/char/random.c:1267 
add_interrupt_randomness+0x2e8/0x318
[1.087599] Modules linked in:
[1.088258] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.0-rc2-dirty #13
[1.089672] Hardware name: linux,dummy-virt (DT)
[1.090659] pstate: 60400085 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
[1.091910] pc : add_interrupt_randomness+0x2e8/0x318
[1.092965] lr : add_interrupt_randomness+0x2e8/0x318
[1.094021] sp : 80001000be80
[1.094732] x29: 80001000be80 x28: 2d0c80209840 
[1.095859] x27: 137c3e3a x26: 8000100abdd0 
[1.096978] x25: 0035 x24: 67918bda8000 
[1.098100] x23: c57c31923fe8 x22: fffedc14 
[1.099224] x21: 2d0dbef796a0 x20: c57c331d16a0 
[1.100339] x19: c57c33720a48 x18: 0010 
[1.101459] x17:  x16: 0002 
[1.102578] x15: 00e7 x14: 80001000bb20 
[1.103706] x13: ffea x12: c57c337b56e8 
[1.104821] x11: 0003 x10: c57c3379d6a8 
[1.105944] x9 : c57c3379d700 x8 : 00017fe8 
[1.107073] x7 : c000efff x6 : 0001 
[1.108186] x5 : 00057fa8 x4 :  
[1.109305] x3 :  x2 : c57c337455d0 
[1.110428] x1 : db8dc9c2a1e0f600 x0 :  
[1.111552] Call trace:
[1.112083]  add_interrupt_randomness+0x2e8/0x318
[1.113074]  handle_irq_event_percpu+0x48/0x90
[1.114016]  handle_irq_event+0x48/0xf8
[1.114826]  handle_fasteoi_irq+0xa4/0x130
[1.115689]  generic_handle_irq+0x30/0x48
[1.116528]  __handle_domain_irq+0x64/0xc0
[1.117392]  gic_handle_irq+0xc0/0x138
[1.118194]  el1_irq+0xbc/0x180
[1.118870]  arch_cpu_idle+0x20/0x30
[1.119630]  default_idle_call+0x8c/0x350
[1.120479]  do_idle+0x224/0x298
[1.121163]  cpu_startup_entry+0x28/0x70
[1.121994]  secondary_start_kernel+0x184/0x198

... and I couldn't immediately spot why 32-bit arm  would be different.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 4/5] arm64: Add support for SMCCC TRNG entropy source

2020-11-05 Thread Mark Rutland

On Thu, Nov 05, 2020 at 01:41:42PM +, Mark Brown wrote:
> On Thu, Nov 05, 2020 at 12:56:55PM +, Andre Przywara wrote:
> 
> >  static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
> >  {
> > +   struct arm_smccc_res res;
> > unsigned long val;
> > -   bool ok = arch_get_random_seed_long(&val);
> >  
> > -   *v = val;
> > -   return ok;
> > +   if (cpus_have_const_cap(ARM64_HAS_RNG)) {
> > +   if (arch_get_random_seed_long(&val)) {
> > +   *v = val;
> > +   return true;
> > +   }
> > +   return false;
> > +   }
> 
> It isn't obvious to me why we don't fall through to trying the SMCCC
> TRNG here if for some reason the v8.5-RNG didn't give us something.
> Definitely an obscure possibility but still...

I think it's better to assume that if we have a HW RNG and it's not
giving us entropy, it's not worthwhile trapping to the host, which might
encounter the exact same issue.

I'd rather we have one RNG source that we trust works, and use that
exclusively.

That said, I'm not sure it's great to plumb this under the
arch_get_random*() interfaces, e.g. given this measn that
add_interrupt_randomness() will end up trapping to the host all the time
when it calls arch_get_random_seed_long().

Is there an existing interface for "slow" runtime entropy that we can
plumb this into instead?

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCHv2 2/3] arm64: cpufeature: reorder cpus_have_{const,final}_cap()

2020-10-30 Thread Mark Rutland

On Fri, Oct 30, 2020 at 08:20:14AM +, Will Deacon wrote:
> On Fri, Oct 30, 2020 at 08:18:48AM +, Will Deacon wrote:
> > On Mon, Oct 26, 2020 at 01:49:30PM +0000, Mark Rutland wrote:
> > > In a subsequent patch we'll modify cpus_have_const_cap() to call
> > > cpus_have_final_cap(), and hence we need to define cpus_have_final_cap()
> > > first.
> > > 
> > > To make subsequent changes easier to follow, this patch reorders the two
> > > without making any other changes.
> > > 
> > > There should be no functional change as a result of this patch.
> > 
> > You say this...

[...]

> > > -static __always_inline bool cpus_have_const_cap(int num)
> > > +static __always_inline bool cpus_have_final_cap(int num)
> > >  {
> > >   if (system_capabilities_finalized())
> > >   return __cpus_have_const_cap(num);
> > >   else
> > > - return cpus_have_cap(num);
> > > + BUG();
> > 
> > ... but isn't the failure case of calling cpus_have_final_cap() early now
> > different? What does BUG() do at EL2 w/ nVHE?
> 
> Ah no, sorry, I see you're just moving things around and the diff makes it
> look confusing (that and I've been up since 5:30 for KVM Forum).

Indeed; the diff was even more confusing before I split this from the
changes in the next patch!

> So on closer inspection:
> 
> Acked-by: Will Deacon 

Cheers!

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 09/11] KVM: arm64: Remove SPSR manipulation primitives

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 01:34:48PM +, Marc Zyngier wrote:
> The SPR setting code is now completely unused, including that dealing
> with banked AArch32 SPSRs. Cleanup time.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/kvm_emulate.h | 26 
>  arch/arm64/kvm/regmap.c  | 96 
>  2 files changed, 122 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 736a342dadf7..5d957d0e7b69 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -34,8 +34,6 @@ enum exception_type {
>  };
>  
>  unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
> -unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
> -void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
>  
>  bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
>  void kvm_skip_instr32(struct kvm_vcpu *vcpu);
> @@ -180,30 +178,6 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu 
> *vcpu, u8 reg_num,
>   vcpu_gp_regs(vcpu)->regs[reg_num] = val;
>  }
>  
> -static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
> -{
> - if (vcpu_mode_is_32bit(vcpu))
> - return vcpu_read_spsr32(vcpu);
> -
> - if (vcpu->arch.sysregs_loaded_on_cpu)
> - return read_sysreg_el1(SYS_SPSR);
> - else
> - return __vcpu_sys_reg(vcpu, SPSR_EL1);
> -}
> -
> -static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
> -{
> - if (vcpu_mode_is_32bit(vcpu)) {
> - vcpu_write_spsr32(vcpu, v);
> - return;
> - }
> -
> - if (vcpu->arch.sysregs_loaded_on_cpu)
> - write_sysreg_el1(v, SYS_SPSR);
> - else
> - __vcpu_sys_reg(vcpu, SPSR_EL1) = v;
> -}
> -
>  /*
>   * The layout of SPSR for an AArch32 state is different when observed from an
>   * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
> diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
> index accc1d5fba61..ae7e290bb017 100644
> --- a/arch/arm64/kvm/regmap.c
> +++ b/arch/arm64/kvm/regmap.c
> @@ -126,99 +126,3 @@ unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, 
> u8 reg_num)
>  
>   return reg_array + vcpu_reg_offsets[mode][reg_num];
>  }
> -
> -/*
> - * Return the SPSR for the current mode of the virtual CPU.
> - */
> -static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
> -{
> - unsigned long mode = *vcpu_cpsr(vcpu) & PSR_AA32_MODE_MASK;
> - switch (mode) {
> - case PSR_AA32_MODE_SVC: return KVM_SPSR_SVC;
> - case PSR_AA32_MODE_ABT: return KVM_SPSR_ABT;
> - case PSR_AA32_MODE_UND: return KVM_SPSR_UND;
> - case PSR_AA32_MODE_IRQ: return KVM_SPSR_IRQ;
> - case PSR_AA32_MODE_FIQ: return KVM_SPSR_FIQ;
> - default: BUG();
> - }
> -}
> -
> -unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
> -{
> - int spsr_idx = vcpu_spsr32_mode(vcpu);
> -
> - if (!vcpu->arch.sysregs_loaded_on_cpu) {
> - switch (spsr_idx) {
> - case KVM_SPSR_SVC:
> - return __vcpu_sys_reg(vcpu, SPSR_EL1);
> - case KVM_SPSR_ABT:
> - return vcpu->arch.ctxt.spsr_abt;
> - case KVM_SPSR_UND:
> - return vcpu->arch.ctxt.spsr_und;
> - case KVM_SPSR_IRQ:
> - return vcpu->arch.ctxt.spsr_irq;
> - case KVM_SPSR_FIQ:
> - return vcpu->arch.ctxt.spsr_fiq;
> - }
> - }
> -
> - switch (spsr_idx) {
> - case KVM_SPSR_SVC:
> - return read_sysreg_el1(SYS_SPSR);
> - case KVM_SPSR_ABT:
> - return read_sysreg(spsr_abt);
> - case KVM_SPSR_UND:
> - return read_sysreg(spsr_und);
> - case KVM_SPSR_IRQ:
> - return read_sysreg(spsr_irq);
> - case KVM_SPSR_FIQ:
> - return read_sysreg(spsr_fiq);
> - default:
> - BUG();
> - }
> -}
> -
> -void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
> -{
> - int spsr_idx = vcpu_spsr32_mode(vcpu);
> -
> - if (!vcpu->arch.sysregs_loaded_on_cpu) {
> - switch (spsr_idx) {
> - case KVM_SPSR_SVC:
> - __vcpu_sys_reg(vcpu, SPSR_EL1) = v;
> - break;
> - case KVM_SPSR_ABT:
> - vcpu->arch.ctxt.spsr_abt = v;
> - break;
> - case KV

Re: [PATCH 08/11] KVM: arm64: Inject AArch32 exceptions from HYP

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 01:34:47PM +, Marc Zyngier wrote:
> Similarily to what has been done for AArch64, move the AArch32 exception
> inhjection to HYP.
> 
> In order to not use the regmap selection code at EL2, simplify the code
> populating the target mode's LR register by harcoding the two possible
> LR registers (LR_abt in X20, LR_und in X22).
> 
> We also introduce new accessors for SPSR and CP15 registers.
> 
> Signed-off-by: Marc Zyngier 

Modulo comments on the prior patch for the AArch64 exception bits that
get carried along:

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kvm/aarch32.c   | 149 +-
>  arch/arm64/kvm/hyp/exception.c | 221 ++---
>  2 files changed, 212 insertions(+), 158 deletions(-)
> 
> diff --git a/arch/arm64/kvm/aarch32.c b/arch/arm64/kvm/aarch32.c
> index 40a62a99fbf8..ad453b47c517 100644
> --- a/arch/arm64/kvm/aarch32.c
> +++ b/arch/arm64/kvm/aarch32.c
> @@ -19,20 +19,6 @@
>  #define DFSR_FSC_EXTABT_nLPAE0x08
>  #define DFSR_LPAEBIT(9)
>  
> -/*
> - * Table taken from ARMv8 ARM DDI0487B-B, table G1-10.
> - */
> -static const u8 return_offsets[8][2] = {
> - [0] = { 0, 0 }, /* Reset, unused */
> - [1] = { 4, 2 }, /* Undefined */
> - [2] = { 0, 0 }, /* SVC, unused */
> - [3] = { 4, 4 }, /* Prefetch abort */
> - [4] = { 8, 8 }, /* Data abort */
> - [5] = { 0, 0 }, /* HVC, unused */
> - [6] = { 4, 4 }, /* IRQ, unused */
> - [7] = { 4, 4 }, /* FIQ, unused */
> -};
> -
>  static bool pre_fault_synchronize(struct kvm_vcpu *vcpu)
>  {
>   preempt_disable();
> @@ -53,132 +39,10 @@ static void post_fault_synchronize(struct kvm_vcpu 
> *vcpu, bool loaded)
>   }
>  }
>  
> -/*
> - * When an exception is taken, most CPSR fields are left unchanged in the
> - * handler. However, some are explicitly overridden (e.g. M[4:0]).
> - *
> - * The SPSR/SPSR_ELx layouts differ, and the below is intended to work with
> - * either format. Note: SPSR.J bit doesn't exist in SPSR_ELx, but this bit 
> was
> - * obsoleted by the ARMv7 virtualization extensions and is RES0.
> - *
> - * For the SPSR layout seen from AArch32, see:
> - * - ARM DDI 0406C.d, page B1-1148
> - * - ARM DDI 0487E.a, page G8-6264
> - *
> - * For the SPSR_ELx layout for AArch32 seen from AArch64, see:
> - * - ARM DDI 0487E.a, page C5-426
> - *
> - * Here we manipulate the fields in order of the AArch32 SPSR_ELx layout, 
> from
> - * MSB to LSB.
> - */
> -static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
> -{
> - u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
> - unsigned long old, new;
> -
> - old = *vcpu_cpsr(vcpu);
> - new = 0;
> -
> - new |= (old & PSR_AA32_N_BIT);
> - new |= (old & PSR_AA32_Z_BIT);
> - new |= (old & PSR_AA32_C_BIT);
> - new |= (old & PSR_AA32_V_BIT);
> - new |= (old & PSR_AA32_Q_BIT);
> -
> - // CPSR.IT[7:0] are set to zero upon any exception
> - // See ARM DDI 0487E.a, section G1.12.3
> - // See ARM DDI 0406C.d, section B1.8.3
> -
> - new |= (old & PSR_AA32_DIT_BIT);
> -
> - // CPSR.SSBS is set to SCTLR.DSSBS upon any exception
> - // See ARM DDI 0487E.a, page G8-6244
> - if (sctlr & BIT(31))
> - new |= PSR_AA32_SSBS_BIT;
> -
> - // CPSR.PAN is unchanged unless SCTLR.SPAN == 0b0
> - // SCTLR.SPAN is RES1 when ARMv8.1-PAN is not implemented
> - // See ARM DDI 0487E.a, page G8-6246
> - new |= (old & PSR_AA32_PAN_BIT);
> - if (!(sctlr & BIT(23)))
> - new |= PSR_AA32_PAN_BIT;
> -
> - // SS does not exist in AArch32, so ignore
> -
> - // CPSR.IL is set to zero upon any exception
> - // See ARM DDI 0487E.a, page G1-5527
> -
> - new |= (old & PSR_AA32_GE_MASK);
> -
> - // CPSR.IT[7:0] are set to zero upon any exception
> - // See prior comment above
> -
> - // CPSR.E is set to SCTLR.EE upon any exception
> - // See ARM DDI 0487E.a, page G8-6245
> - // See ARM DDI 0406C.d, page B4-1701
> - if (sctlr & BIT(25))
> - new |= PSR_AA32_E_BIT;
> -
> - // CPSR.A is unchanged upon an exception to Undefined, Supervisor
> - // CPSR.A is set upon an exception to other modes
> - // See ARM DDI 0487E.a, pages G1-5515 to G1-5516
> - // See ARM DDI 0406C.d, page B1-1182
> - new |= (old & PSR_AA32_A_BIT);
> - if (mode != PSR_AA32_MODE_UND && mode != PSR_AA32_MODE_SVC)
> - new |= PSR_AA32_A_BIT;
> -
> - // CPSR.I is

Re: [PATCH 01/11] KVM: arm64: Don't adjust PC on SError during SMC trap

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 02:08:35PM +, Marc Zyngier wrote:
> On 2020-10-26 13:53, Mark Rutland wrote:
> > Assuming that there is no 16-bit HVC:
> 
> It is actually impossible to have a 16bit encoding for HVC, as
> it always convey a 16bit immediate, and you need some space
> to encode the instruction itself!

Ah, of course!

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 07/11] KVM: arm64: Inject AArch64 exceptions from HYP

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 01:34:46PM +, Marc Zyngier wrote:
> Move the AArch64 exception injection code from EL1 to HYP, leaving
> only the ESR_EL1 updates to EL1. In order to come with the differences
> between VHE and nVHE, two set of system register accessors are provided.
> 
> SPSR, ELR, PC and PSTATE are now completely handled in the hypervisor.
> 
> Signed-off-by: Marc Zyngier 

>  void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  {
> + switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
> + case KVM_ARM64_EXCEPT_AA64_EL1_SYNC:
> + enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
> + break;
> + case KVM_ARM64_EXCEPT_AA64_EL1_IRQ:
> + enter_exception64(vcpu, PSR_MODE_EL1h, except_type_irq);
> + break;
> + case KVM_ARM64_EXCEPT_AA64_EL1_FIQ:
> + enter_exception64(vcpu, PSR_MODE_EL1h, except_type_fiq);
> + break;
> + case KVM_ARM64_EXCEPT_AA64_EL1_SERR:
> + enter_exception64(vcpu, PSR_MODE_EL1h, except_type_serror);
> + break;
> + default:
> + /* EL2 are unimplemented until we get NV. One day. */
> + break;
> + }
>  }

Huh, we're going to allow EL1 to inject IRQ/FIQ/SERROR *exceptions*
directly, rather than pending those via HCR_EL2.{VI,VF,VSE}? We never
used to have code to do that.

If we're going to support that we'll need to check against the DAIF bits
to make sure we don't inject an exception that can't be architecturally
taken. 

I guess we'll tighten that up along with the synchronous exception
checks, but given those three cases aren't needed today it might be
worth removing them from the switch for now and/or adding a comment to
that effect.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 05/11] KVM: arm64: Move VHE direct sysreg accessors into kvm_host.h

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 01:34:44PM +, Marc Zyngier wrote:
> As we are about to need to access system registers from the HYP
> code based on their internal encoding, move the direct sysreg
> accessors to a common include file.
> 
> No functionnal change.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/kvm_host.h | 85 +++
>  arch/arm64/kvm/sys_regs.c | 81 -
>  2 files changed, 85 insertions(+), 81 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 9a75de3ad8da..0ae51093013d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -438,6 +438,91 @@ struct kvm_vcpu_arch {
>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg);
>  void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg);
>  
> +static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
> +{
> + /*
> +  * *** VHE ONLY ***
> +  *
> +  * System registers listed in the switch are not saved on every
> +  * exit from the guest but are only saved on vcpu_put.
> +  *
> +  * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
> +  * should never be listed below, because the guest cannot modify its
> +  * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
> +  * thread when emulating cross-VCPU communication.
> +  */
> + switch (reg) {
> + case CSSELR_EL1:*val = read_sysreg_s(SYS_CSSELR_EL1);   break;
> + case SCTLR_EL1: *val = read_sysreg_s(SYS_SCTLR_EL12);   break;
> + case CPACR_EL1: *val = read_sysreg_s(SYS_CPACR_EL12);   break;
> + case TTBR0_EL1: *val = read_sysreg_s(SYS_TTBR0_EL12);   break;
> + case TTBR1_EL1: *val = read_sysreg_s(SYS_TTBR1_EL12);   break;
> + case TCR_EL1:   *val = read_sysreg_s(SYS_TCR_EL12); break;
> + case ESR_EL1:   *val = read_sysreg_s(SYS_ESR_EL12); break;
> + case AFSR0_EL1: *val = read_sysreg_s(SYS_AFSR0_EL12);   break;
> + case AFSR1_EL1: *val = read_sysreg_s(SYS_AFSR1_EL12);   break;
> + case FAR_EL1:   *val = read_sysreg_s(SYS_FAR_EL12); break;
> + case MAIR_EL1:  *val = read_sysreg_s(SYS_MAIR_EL12);break;
> + case VBAR_EL1:  *val = read_sysreg_s(SYS_VBAR_EL12);break;
> + case CONTEXTIDR_EL1:*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
> + case TPIDR_EL0: *val = read_sysreg_s(SYS_TPIDR_EL0);break;
> + case TPIDRRO_EL0:   *val = read_sysreg_s(SYS_TPIDRRO_EL0);  break;
> + case TPIDR_EL1: *val = read_sysreg_s(SYS_TPIDR_EL1);break;
> + case AMAIR_EL1: *val = read_sysreg_s(SYS_AMAIR_EL12);   break;
> + case CNTKCTL_EL1:   *val = read_sysreg_s(SYS_CNTKCTL_EL12); break;
> + case ELR_EL1:   *val = read_sysreg_s(SYS_ELR_EL12); break;
> + case PAR_EL1:   *val = read_sysreg_s(SYS_PAR_EL1);  break;
> + case DACR32_EL2:*val = read_sysreg_s(SYS_DACR32_EL2);   break;
> + case IFSR32_EL2:*val = read_sysreg_s(SYS_IFSR32_EL2);   break;
> + case DBGVCR32_EL2:  *val = read_sysreg_s(SYS_DBGVCR32_EL2); break;
> + default:return false;
> + }
> +
> + return true;
> +}
> +
> +static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
> +{
> + /*
> +  * *** VHE ONLY ***
> +  *
> +  * System registers listed in the switch are not restored on every
> +  * entry to the guest but are only restored on vcpu_load.
> +  *
> +  * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
> +  * should never be listed below, because the MPIDR should only be set
> +  * once, before running the VCPU, and never changed later.
> +  */
> + switch (reg) {
> + case CSSELR_EL1:write_sysreg_s(val, SYS_CSSELR_EL1);break;
> + case SCTLR_EL1: write_sysreg_s(val, SYS_SCTLR_EL12);break;
> + case CPACR_EL1: write_sysreg_s(val, SYS_CPACR_EL12);break;
> + case TTBR0_EL1: write_sysreg_s(val, SYS_TTBR0_EL12);break;
> + case TTBR1_EL1: write_sysreg_s(val, SYS_TTBR1_EL12);break;
> + case TCR_EL1:   write_sysreg_s(val, SYS_TCR_EL12);  break;
> + case ESR_EL1:   write_sysreg_s(val, SYS_ESR_EL12);  break;
> + case AFSR0_EL1: write_sysreg_s(val, SYS_AFSR0_EL12);break;
> + case AFSR1_EL1: write_sysreg_s(val, SYS_AFSR1_EL12);break;
> + case FAR_EL1:   write_sysreg_s(va

Re: [PATCH 04/11] KVM: arm64: Move PC rollback on SError to HYP

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 01:34:43PM +, Marc Zyngier wrote:
> Instead of handling the "PC rollback on SError during HVC" at EL1 (which
> requires disclosing PC to a potentially untrusted kernel), let's move
> this fixup to ... fixup_guest_exit(), which is where we do all fixups.
> 
> Isn't that neat?
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kvm/handle_exit.c| 17 -
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 15 +++
>  2 files changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index d4e00a864ee6..f79137ee4274 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -241,23 +241,6 @@ int handle_exit(struct kvm_vcpu *vcpu, int 
> exception_index)
>  {
>   struct kvm_run *run = vcpu->run;
>  
> - if (ARM_SERROR_PENDING(exception_index)) {
> - u8 esr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
> -
> - /*
> -  * HVC already have an adjusted PC, which we need to
> -  * correct in order to return to after having injected
> -  * the SError.
> -  *
> -  * SMC, on the other hand, is *trapped*, meaning its
> -  * preferred return address is the SMC itself.
> -  */
> - if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
> - *vcpu_pc(vcpu) -= 4;
> -
> - return 1;
> - }
> -
>   exception_index = ARM_EXCEPTION_CODE(exception_index);
>  
>   switch (exception_index) {
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
> b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index d687e574cde5..668f02c7b0b3 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -411,6 +411,21 @@ static inline bool fixup_guest_exit(struct kvm_vcpu 
> *vcpu, u64 *exit_code)
>   if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
>   vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
>  
> + if (ARM_SERROR_PENDING(*exit_code)) {
> + u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
> +
> + /*
> +  * HVC already have an adjusted PC, which we need to
> +  * correct in order to return to after having injected
> +  * the SError.
> +  *
> +  * SMC, on the other hand, is *trapped*, meaning its
> +  * preferred return address is the SMC itself.
> +  */
> + if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
> + *vcpu_pc(vcpu) -= 4;
> + }
> +
>   /*
>* We're using the raw exception code in order to only process
>* the trap if no SError is pending. We will come back to the
> -- 
> 2.28.0
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 01:34:42PM +, Marc Zyngier wrote:
> In an effort to remove the vcpu PC manipulations from EL1 on nVHE
> systems, move kvm_skip_instr() to be HYP-specific. EL1's intent
> to increment PC post emulation is now signalled via a flag in the
> vcpu structure.
> 
> Signed-off-by: Marc Zyngier 

[...]

> +/*
> + * Adjust the guest PC on entry, depending on flags provided by EL1
> + * for the purpose of emulation (MMIO, sysreg).
> + */
> +static inline void __adjust_pc(struct kvm_vcpu *vcpu)
> +{
> + if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
> + kvm_skip_instr(vcpu);
> + vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
> + }
> +}

What's your plan for restricting *when* EL1 can ask for the PC to be
adjusted?

I'm assuming that either:

1. You have EL2 sanity-check all responses from EL1 are permitted for
   the current state. e.g. if EL1 asks to increment the PC, EL2 must
   check that that was a sane response for the current state.

2. You raise the level of abstraction at the EL2/EL1 boundary, such that
   EL2 simply knows. e.g. if emulating a memory access, EL1 can either
   provide the response or signal an abort, but doesn't choose to
   manipulate the PC as EL2 will infer the right thing to do.

I know that either are tricky in practice, so I'm curious what your view
is. Generally option #2 is easier to fortify, but I guess we might have
to do #1 since we also have to support unprotected VMs?

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 02/11] KVM: arm64: Move kvm_vcpu_trap_il_is32bit into kvm_skip_instr32()

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 01:34:41PM +, Marc Zyngier wrote:
> There is no need to feed the result of kvm_vcpu_trap_il_is32bit()
> to kvm_skip_instr(), as only AArch32 has a variable lenght ISA, and

Typo: s/lenght/length/

If there are more typos in the series, I'll ignore them. I assume you
know how to drive your favourite spellchecker. ;)

> this helper can equally be called from kvm_skip_instr32(), reducing
> the complexity at all the call sites.
> 
> Signed-off-by: Marc Zyngier 

Looks nice!

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/kvm_emulate.h | 8 
>  arch/arm64/kvm/handle_exit.c | 6 +++---
>  arch/arm64/kvm/hyp/aarch32.c | 4 ++--
>  arch/arm64/kvm/mmio.c| 2 +-
>  arch/arm64/kvm/mmu.c | 2 +-
>  arch/arm64/kvm/sys_regs.c| 2 +-
>  6 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 5ef2669ccd6c..0864f425547d 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -26,7 +26,7 @@ unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
>  void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
>  
>  bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
> -void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
> +void kvm_skip_instr32(struct kvm_vcpu *vcpu);
>  
>  void kvm_inject_undefined(struct kvm_vcpu *vcpu);
>  void kvm_inject_vabt(struct kvm_vcpu *vcpu);
> @@ -472,10 +472,10 @@ static inline unsigned long 
> vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>   return data;/* Leave LE untouched */
>  }
>  
> -static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool 
> is_wide_instr)
> +static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
>  {
>   if (vcpu_mode_is_32bit(vcpu)) {
> - kvm_skip_instr32(vcpu, is_wide_instr);
> + kvm_skip_instr32(vcpu);
>   } else {
>   *vcpu_pc(vcpu) += 4;
>   *vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
> @@ -494,7 +494,7 @@ static __always_inline void __kvm_skip_instr(struct 
> kvm_vcpu *vcpu)
>   *vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
>   vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
>  
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> + kvm_skip_instr(vcpu);
>  
>   write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR);
>   write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 79a720657c47..30bf8e22df54 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -61,7 +61,7 @@ static int handle_smc(struct kvm_vcpu *vcpu)
>* otherwise return to the same address...
>*/
>   vcpu_set_reg(vcpu, 0, ~0UL);
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> + kvm_skip_instr(vcpu);
>   return 1;
>  }
>  
> @@ -100,7 +100,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
>   kvm_clear_request(KVM_REQ_UNHALT, vcpu);
>   }
>  
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> + kvm_skip_instr(vcpu);
>  
>   return 1;
>  }
> @@ -221,7 +221,7 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
>* that fail their condition code check"
>*/
>   if (!kvm_condition_valid(vcpu)) {
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> + kvm_skip_instr(vcpu);
>   handled = 1;
>   } else {
>   exit_handle_fn exit_handler;
> diff --git a/arch/arm64/kvm/hyp/aarch32.c b/arch/arm64/kvm/hyp/aarch32.c
> index ae56d8a4b382..f98cbe2626a1 100644
> --- a/arch/arm64/kvm/hyp/aarch32.c
> +++ b/arch/arm64/kvm/hyp/aarch32.c
> @@ -123,13 +123,13 @@ static void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
>   * kvm_skip_instr - skip a trapped instruction and proceed to the next
>   * @vcpu: The vcpu pointer
>   */
> -void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr)
> +void kvm_skip_instr32(struct kvm_vcpu *vcpu)
>  {
>   u32 pc = *vcpu_pc(vcpu);
>   bool is_thumb;
>  
>   is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_AA32_T_BIT);
> - if (is_thumb && !is_wide_instr)
> + if (is_thumb && !kvm_vcpu_trap_il_is32bit(vcpu))
>   pc += 2;
>   else
>   pc += 4;
> diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
> index 6a2826f1bf5e..7e8eb32ae7d2 100644
> --- a/arch/arm64/kvm/mmio.c
> +++ b/arch/arm64/kvm/mmio.c
> @@ -115,7

Re: [PATCH 01/11] KVM: arm64: Don't adjust PC on SError during SMC trap

2020-10-26 Thread Mark Rutland

On Mon, Oct 26, 2020 at 01:34:40PM +, Marc Zyngier wrote:
> On SMC trap, the prefered return address is set to that of the SMC
> instruction itself. It is thus wrong to tyr and roll it back when

Typo: s/tyr/try/

> an SError occurs while trapping on SMC. It is still necessary on
> HVC though, as HVC doesn't cause a trap, and sets ELR to returning
> *after* the HVC.
> 
> It also became apparent that the is 16bit encoding for an AArch32

I guess s/that the is/that there is no/ ?

> HVC instruction, meaning that the displacement is always 4 bytes,
> no matter what the ISA is. Take this opportunity to simplify it.
> 
> Signed-off-by: Marc Zyngier 

Assuming that there is no 16-bit HVC:

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kvm/handle_exit.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 5d690d60ccad..79a720657c47 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -245,15 +245,15 @@ int handle_exit(struct kvm_vcpu *vcpu, int 
> exception_index)
>   u8 esr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
>  
>   /*
> -  * HVC/SMC already have an adjusted PC, which we need
> -  * to correct in order to return to after having
> -  * injected the SError.
> +  * HVC already have an adjusted PC, which we need to
> +  * correct in order to return to after having injected
> +  * the SError.
> +  *
> +  * SMC, on the other hand, is *trapped*, meaning its
> +  * preferred return address is the SMC itself.
>*/
> - if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64 ||
> - esr_ec == ESR_ELx_EC_SMC32 || esr_ec == ESR_ELx_EC_SMC64) {
> - u32 adj =  kvm_vcpu_trap_il_is32bit(vcpu) ? 4 : 2;
> - *vcpu_pc(vcpu) -= adj;
> - }
> + if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
> + *vcpu_pc(vcpu) -= 4;
>  
>   return 1;
>   }
> -- 
> 2.28.0
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCHv2 2/3] arm64: cpufeature: reorder cpus_have_{const, final}_cap()

2020-10-26 Thread Mark Rutland

In a subsequent patch we'll modify cpus_have_const_cap() to call
cpus_have_final_cap(), and hence we need to define cpus_have_final_cap()
first.

To make subsequent changes easier to follow, this patch reorders the two
without making any other changes.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Cc: David Brazdil 
Cc: Marc Zyngier 
Cc: Will Deacon 
---
 arch/arm64/include/asm/cpufeature.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index f7e7144af174c..5d18c54507e6a 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -428,35 +428,35 @@ static __always_inline bool __cpus_have_const_cap(int num)
 }
 
 /*
- * Test for a capability, possibly with a runtime check.
+ * Test for a capability without a runtime check.
  *
- * Before capabilities are finalized, this behaves as cpus_have_cap().
+ * Before capabilities are finalized, this will BUG().
  * After capabilities are finalized, this is patched to avoid a runtime check.
  *
  * @num must be a compile-time constant.
  */
-static __always_inline bool cpus_have_const_cap(int num)
+static __always_inline bool cpus_have_final_cap(int num)
 {
if (system_capabilities_finalized())
return __cpus_have_const_cap(num);
else
-   return cpus_have_cap(num);
+   BUG();
 }
 
 /*
- * Test for a capability without a runtime check.
+ * Test for a capability, possibly with a runtime check.
  *
- * Before capabilities are finalized, this will BUG().
+ * Before capabilities are finalized, this behaves as cpus_have_cap().
  * After capabilities are finalized, this is patched to avoid a runtime check.
  *
  * @num must be a compile-time constant.
  */
-static __always_inline bool cpus_have_final_cap(int num)
+static __always_inline bool cpus_have_const_cap(int num)
 {
if (system_capabilities_finalized())
return __cpus_have_const_cap(num);
else
-   BUG();
+   return cpus_have_cap(num);
 }
 
 static inline void cpus_set_cap(unsigned int num)
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCHv2 3/3] arm64: cpufeature: upgrade hyp caps to final

2020-10-26 Thread Mark Rutland

We finalize caps before initializing kvm hyp code, and any use of
cpus_have_const_cap() in kvm hyp code generates redundant and
potentially unsound code to read the cpu_hwcaps array.

A number of helper functions used in both hyp context and regular kernel
context use cpus_have_const_cap(), as some regular kernel code runs
before the capabilities are finalized. It's tedious and error-prone to
write separate copies of these for hyp and non-hyp code.

So that we can avoid the redundant code, let's automatically upgrade
cpus_have_const_cap() to cpus_have_final_cap() when used in hyp context.
With this change, there's never a reason to access to cpu_hwcaps array
from hyp code, and we don't need to create an NVHE alias for this.

This should have no effect on non-hyp code.

Signed-off-by: Mark Rutland 
Cc: David Brazdil 
Cc: Marc Zyngier 
Cc: Will Deacon 
---
 arch/arm64/include/asm/cpufeature.h | 26 --
 arch/arm64/include/asm/virt.h   | 12 
 arch/arm64/kernel/image-vars.h  |  1 -
 3 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 5d18c54507e6a..97244d4feca9c 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -375,6 +375,23 @@ cpucap_multi_entry_cap_matches(const struct 
arm64_cpu_capabilities *entry,
return false;
 }
 
+static __always_inline bool is_vhe_hyp_code(void)
+{
+   /* Only defined for code run in VHE hyp context */
+   return __is_defined(__KVM_VHE_HYPERVISOR__);
+}
+
+static __always_inline bool is_nvhe_hyp_code(void)
+{
+   /* Only defined for code run in NVHE hyp context */
+   return __is_defined(__KVM_NVHE_HYPERVISOR__);
+}
+
+static __always_inline bool is_hyp_code(void)
+{
+   return is_vhe_hyp_code() || is_nvhe_hyp_code();
+}
+
 extern DECLARE_BITMAP(cpu_hwcaps, ARM64_NCAPS);
 extern struct static_key_false cpu_hwcap_keys[ARM64_NCAPS];
 extern struct static_key_false arm64_const_caps_ready;
@@ -444,8 +461,11 @@ static __always_inline bool cpus_have_final_cap(int num)
 }
 
 /*
- * Test for a capability, possibly with a runtime check.
+ * Test for a capability, possibly with a runtime check for non-hyp code.
  *
+ * For hyp code, this behaves the same as cpus_have_final_cap().
+ *
+ * For non-hyp code:
  * Before capabilities are finalized, this behaves as cpus_have_cap().
  * After capabilities are finalized, this is patched to avoid a runtime check.
  *
@@ -453,7 +473,9 @@ static __always_inline bool cpus_have_final_cap(int num)
  */
 static __always_inline bool cpus_have_const_cap(int num)
 {
-   if (system_capabilities_finalized())
+   if (is_hyp_code())
+   return cpus_have_final_cap(num);
+   else if (system_capabilities_finalized())
return __cpus_have_const_cap(num);
else
return cpus_have_cap(num);
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 300be14ba77b2..6069be50baf9f 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -83,18 +83,6 @@ static inline bool is_kernel_in_hyp_mode(void)
return read_sysreg(CurrentEL) == CurrentEL_EL2;
 }
 
-static __always_inline bool is_vhe_hyp_code(void)
-{
-   /* Only defined for code run in VHE hyp context */
-   return __is_defined(__KVM_VHE_HYPERVISOR__);
-}
-
-static __always_inline bool is_nvhe_hyp_code(void)
-{
-   /* Only defined for code run in NVHE hyp context */
-   return __is_defined(__KVM_NVHE_HYPERVISOR__);
-}
-
 static __always_inline bool has_vhe(void)
 {
/*
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 61684a5009148..c615b285ff5b3 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -87,7 +87,6 @@ KVM_NVHE_ALIAS(__icache_flags);
 /* Kernel symbols needed for cpus_have_final/const_caps checks. */
 KVM_NVHE_ALIAS(arm64_const_caps_ready);
 KVM_NVHE_ALIAS(cpu_hwcap_keys);
-KVM_NVHE_ALIAS(cpu_hwcaps);
 
 /* Static keys which are set if a vGIC trap should be handled in hyp. */
 KVM_NVHE_ALIAS(vgic_v2_cpuif_trap);
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCHv2 1/3] arm64: kvm: factor out is_{vhe,nvhe}_hyp_code()

2020-10-26 Thread Mark Rutland

Currently has_vhe() detects whether it is being compiled for VHE/NVHE
hyp code based on preprocessor definitions, and uses this knowledge to
avoid redundant runtime checks.

There are other cases where we'd like to use this knowledge, so let's
factor the preprocessor checks out into separate helpers.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Cc: David Brazdil 
Cc: Marc Zyngier 
Cc: Will Deacon 
---
 arch/arm64/include/asm/virt.h | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 09977acc007d1..300be14ba77b2 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -83,16 +83,27 @@ static inline bool is_kernel_in_hyp_mode(void)
return read_sysreg(CurrentEL) == CurrentEL_EL2;
 }
 
+static __always_inline bool is_vhe_hyp_code(void)
+{
+   /* Only defined for code run in VHE hyp context */
+   return __is_defined(__KVM_VHE_HYPERVISOR__);
+}
+
+static __always_inline bool is_nvhe_hyp_code(void)
+{
+   /* Only defined for code run in NVHE hyp context */
+   return __is_defined(__KVM_NVHE_HYPERVISOR__);
+}
+
 static __always_inline bool has_vhe(void)
 {
/*
-* The following macros are defined for code specic to VHE/nVHE.
-* If has_vhe() is inlined into those compilation units, it can
-* be determined statically. Otherwise fall back to caps.
+* Code only run in VHE/NVHE hyp context can assume VHE is present or
+* absent. Otherwise fall back to caps.
 */
-   if (__is_defined(__KVM_VHE_HYPERVISOR__))
+   if (is_vhe_hyp_code())
return true;
-   else if (__is_defined(__KVM_NVHE_HYPERVISOR__))
+   else if (is_nvhe_hyp_code())
return false;
else
return cpus_have_final_cap(ARM64_HAS_VIRT_HOST_EXTN);
-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

[PATCHv2 0/3] arm64: kvm: avoid referencing cpu_hwcaps from hyp

2020-10-26 Thread Mark Rutland

In a few places we use cpus_have_const_cap() in hyp code, usually
because this is hidden within a helper that's also used in regular
kernel context. As cpus_have_const_cap() generates code to read the
cpu_hwcaps array before capabilities are finalized, this means we
generate some potentially-unsound references to regular kernel VAs, but
this these are redundant as capabilities are finalized before we
initialize the kvm hyp code.

This series gets rid of the redundant code by automatically upgrading
cpust_have_const_cap() to cpus_have_final_cap() when used in hyp code.
This allows us to avoid creating an NVHE alias for the cpu_hwcaps array,
so we can catch if we accidentally introduce an runtime reference to
this (e.g. via cpus_have_cap()).

Since v1 [1]:
* Trivial rebase to v5.10-rc1

[1] https://lore.kernel.org/r/20201007125211.30043-1-mark.rutl...@arm.com

Mark Rutland (3):
  arm64: kvm: factor out is_{vhe,nvhe}_hyp_code()
  arm64: cpufeature: reorder cpus_have_{const,final}_cap()
  arm64: cpufeature: upgrade hyp caps to final

 arch/arm64/include/asm/cpufeature.h | 40 -
 arch/arm64/include/asm/virt.h   |  9 -
 arch/arm64/kernel/image-vars.h  |  1 -
 3 files changed, 35 insertions(+), 15 deletions(-)

-- 
2.11.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] perf: arm_spe: Use Inner Shareable DSB when draining the buffer

2020-10-19 Thread Mark Rutland

On Tue, Oct 06, 2020 at 05:13:31PM +0100, Alexandru Elisei wrote:
> Hi Marc,
> 
> Thank you for having a look at the patch!
> 
> On 10/6/20 4:32 PM, Marc Zyngier wrote:
> > Hi Alex,
> >
> > On Tue, 06 Oct 2020 16:05:20 +0100,
> > Alexandru Elisei  wrote:
> >> From ARM DDI 0487F.b, page D9-2807:
> >>
> >> "Although the Statistical Profiling Extension acts as another observer in
> >> the system, for determining the Shareability domain of the DSB
> >> instructions, the writes of sample records are treated as coming from the
> >> PE that is being profiled."
> >>
> >> Similarly, on page D9-2801:
> >>
> >> "The memory type and attributes that are used for a write by the
> >> Statistical Profiling Extension to the Profiling Buffer is taken from the
> >> translation table entries for the virtual address being written to. That
> >> is:
> >> - The writes are treated as coming from an observer that is coherent with
> >>   all observers in the Shareability domain that is defined by the
> >>   translation tables."
> >>
> >> All the PEs are in the Inner Shareable domain, use a DSB ISH to make sure
> >> writes to the profiling buffer have completed.
> > I'm a bit sceptical of this change. The SPE writes are per-CPU, and
> > all we are trying to ensure is that the CPU we are running on has
> > drained its own queue of accesses.
> >
> > The accesses being made within the IS domain doesn't invalidate the
> > fact that they are still per-CPU, because "the writes of sample
> > records are treated as coming from the PE that is being profiled.".
> >
> > So why should we have an IS-wide synchronisation for accesses that are
> > purely local?
> 
> I think I might have misunderstood how perf spe works. Below is my original 
> train
> of thought.
> 
> In the buffer management event interrupt we drain the buffer, and if the 
> buffer is
> full, we call arm_spe_perf_aux_output_end() -> perf_aux_output_end(). The 
> comment
> for perf_aux_output_end() says "Commit the data written by hardware into the 
> ring
> buffer by adjusting aux_head and posting a PERF_RECORD_AUX into the perf 
> buffer.
> It is the pmu driver's responsibility to observe ordering rules of the 
> hardware,
> so that all the data is externally visible before this is called." My 
> conclusion
> was that after we drain the buffer, the data must be visible to all CPUs.

FWIW, this reasoning sounds correct to me. The DSB NSH will be
sufficient to drain the buffer, but we need the DSB ISH to ensure that
it's visbile to other CPUs at the instant we call perf_aux_output_end().

Otherwise, if CPU x is reading the ring-buffer written by CPU y, it
might see the aux buffer pointers updated before the samples are
viisble, and hence read junk from the buffer.

We can add a comment to that effect (or rework perf_aux_output_end()
somehow to handle that ordering).

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support

2020-08-25 Thread Mark Rutland

On Wed, Aug 19, 2020 at 09:54:40AM +0100, Steven Price wrote:
> On 18/08/2020 15:41, Marc Zyngier wrote:
> > On 2020-08-17 09:41, Keqian Zhu wrote:

> We are discussing (re-)releasing the spec with the LPT parts added. If you
> have fundamental objections then please me know.

Like Marc, I argued strongly for the removal of the LPT bits on the
premise that it didn't really work (e.g. when transistioning between SW
agents) and so it was a pure maintenance burden.

I don't think the technical arguments have changed, and I don't think
it's a good idea to try to ressurect this. Please rope me in if
this comes up in internal discussions.

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 2/2] kvm/arm64: Detach ESR operator from vCPU struct

2020-06-30 Thread Mark Rutland

On Tue, Jun 30, 2020 at 10:16:07AM +1000, Gavin Shan wrote:
> Hi Mark,
> 
> On 6/29/20 9:00 PM, Mark Rutland wrote:
> > On Mon, Jun 29, 2020 at 07:18:41PM +1000, Gavin Shan wrote:
> > > There are a set of inline functions defined in kvm_emulate.h. Those
> > > functions reads ESR from vCPU fault information struct and then operate
> > > on it. So it's tied with vCPU fault information and vCPU struct. It
> > > limits their usage scope.
> > > 
> > > This detaches these functions from the vCPU struct by introducing an
> > > other set of inline functions in esr.h to manupulate the specified
> > > ESR value. With it, the inline functions defined in kvm_emulate.h
> > > can call these inline functions (in esr.h) instead. This shouldn't
> > > cause any functional changes.
> > > 
> > > Signed-off-by: Gavin Shan 
> > 
> > TBH, I'm not sure that this patch makes much sense on its own.
> > 
> > We already use vcpu_get_esr(), which is the bit that'd have to change if
> > we didn't pass the vcpu around, and the new helpers are just consuming
> > the value in a sifferent way rather than a necessarily simpler way.
> > 
> > Further comments on that front below.
> > 
> > > ---
> > >   arch/arm64/include/asm/esr.h | 32 +
> > >   arch/arm64/include/asm/kvm_emulate.h | 43 
> > >   2 files changed, 51 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> > > index 035003acfa87..950204c5fbe1 100644
> > > --- a/arch/arm64/include/asm/esr.h
> > > +++ b/arch/arm64/include/asm/esr.h
> > > @@ -326,6 +326,38 @@ static inline bool esr_is_data_abort(u32 esr)
> > >   return ec == ESR_ELx_EC_DABT_LOW || ec == ESR_ELx_EC_DABT_CUR;
> > >   }
> > > +#define ESR_DECLARE_CHECK_FUNC(name, field)  \
> > > +static inline bool esr_is_##name(u32 esr)\
> > > +{\
> > > + return !!(esr & (field));   \
> > > +}
> > > +#define ESR_DECLARE_GET_FUNC(name, mask, shift)  \
> > > +static inline u32 esr_get_##name(u32 esr)\
> > > +{\
> > > + return ((esr & (mask)) >> (shift)); \
> > > +}
> > > +
> > > +ESR_DECLARE_CHECK_FUNC(il_32bit,   ESR_ELx_IL);
> > > +ESR_DECLARE_CHECK_FUNC(condition,  ESR_ELx_CV);
> > > +ESR_DECLARE_CHECK_FUNC(dabt_valid, ESR_ELx_ISV);
> > > +ESR_DECLARE_CHECK_FUNC(dabt_sse,   ESR_ELx_SSE);
> > > +ESR_DECLARE_CHECK_FUNC(dabt_sf,ESR_ELx_SF);
> > > +ESR_DECLARE_CHECK_FUNC(dabt_s1ptw, ESR_ELx_S1PTW);
> > > +ESR_DECLARE_CHECK_FUNC(dabt_write, ESR_ELx_WNR);
> > > +ESR_DECLARE_CHECK_FUNC(dabt_cm,ESR_ELx_CM);
> > > +
> > > +ESR_DECLARE_GET_FUNC(class,ESR_ELx_EC_MASK,  
> > > ESR_ELx_EC_SHIFT);
> > > +ESR_DECLARE_GET_FUNC(fault,ESR_ELx_FSC,  0);
> > > +ESR_DECLARE_GET_FUNC(fault_type,   ESR_ELx_FSC_TYPE, 0);
> > > +ESR_DECLARE_GET_FUNC(condition,ESR_ELx_COND_MASK,
> > > ESR_ELx_COND_SHIFT);
> > > +ESR_DECLARE_GET_FUNC(hvc_imm,  ESR_ELx_xVC_IMM_MASK, 0);
> > > +ESR_DECLARE_GET_FUNC(dabt_iss_nisv_sanitized,
> > > +  (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC), 0);
> > > +ESR_DECLARE_GET_FUNC(dabt_rd,  ESR_ELx_SRT_MASK, 
> > > ESR_ELx_SRT_SHIFT);
> > > +ESR_DECLARE_GET_FUNC(dabt_as,  ESR_ELx_SAS,  
> > > ESR_ELx_SAS_SHIFT);
> > > +ESR_DECLARE_GET_FUNC(sys_rt,   ESR_ELx_SYS64_ISS_RT_MASK,
> > > +ESR_ELx_SYS64_ISS_RT_SHIFT);
> > 
> > I'm really not keen on this, as I think it's abstracting the problem at
> > the wrong level, hiding information and making things harder to reason
> > about rather than abstracting that.
> > 
> > I strongly suspect the right thing to do is use FIELD_GET() in-place in
> > the functions below, e.g.
> > 
> > !!FIELD_GET(esr, ESR_ELx_IL);
> > 
> > ... rather than:
> > 
> > esr_get_il_32bit(esr);
> > 
> > ... as that avoids the wrapper entirely, minimizing indirection and
> > making the codebase simpler to navigate.
> > 
> > For the cases where we *really* want a helper, i'd rather write those
> > out explicitly, e.g.
> 
> It will be no difference except to use FIELD_GET() to make the

Re: [PATCH 1/2] kvm/arm64: Rename HSR to ESR

2020-06-29 Thread Mark Rutland

On Mon, Jun 29, 2020 at 11:32:08AM +0100, Mark Rutland wrote:
> On Mon, Jun 29, 2020 at 07:18:40PM +1000, Gavin Shan wrote:
> > kvm/arm32 isn't supported since commit 541ad0150ca4 ("arm: Remove
> > 32bit KVM host support"). So HSR isn't meaningful since then. This
> > renames HSR to ESR accordingly. This shouldn't cause any functional
> > changes:
> > 
> >* Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() to make the
> >  function names self-explanatory.
> >* Rename variables from @hsr to @esr to make them self-explanatory.
> > 
> > Signed-off-by: Gavin Shan 
> 
> At a high-level, I agree that we should move to the `esr` naming to
> match the architecture and minimize surprise. However, I think there are
> some ABI changes here, which *are* funcitonal changes, and those need to
> be avoided.
> 
> [...]
> 
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index ba85bb23f060..d54345573a88 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -140,7 +140,7 @@ struct kvm_guest_debug_arch {
> >  };
> >  
> >  struct kvm_debug_exit_arch {
> > -   __u32 hsr;
> > +   __u32 esr;
> > __u64 far;  /* used for watchpoints */
> >  };
> 
> This is userspace ABI, and changing this *will* break userspace. This
> *is* a functional change.

To be slightly clearer: while the structure isn't changed, any userspace
software consuming this header will fail to build after this change,
beacause there will no longer be a field called `hsr`.

Existing binaries will almost certianly not care, but regardless this is
a regression (when building userspce) that I don't think we can permit.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 2/2] kvm/arm64: Detach ESR operator from vCPU struct

2020-06-29 Thread Mark Rutland

On Mon, Jun 29, 2020 at 07:18:41PM +1000, Gavin Shan wrote:
> There are a set of inline functions defined in kvm_emulate.h. Those
> functions reads ESR from vCPU fault information struct and then operate
> on it. So it's tied with vCPU fault information and vCPU struct. It
> limits their usage scope.
> 
> This detaches these functions from the vCPU struct by introducing an
> other set of inline functions in esr.h to manupulate the specified
> ESR value. With it, the inline functions defined in kvm_emulate.h
> can call these inline functions (in esr.h) instead. This shouldn't
> cause any functional changes.
> 
> Signed-off-by: Gavin Shan 

TBH, I'm not sure that this patch makes much sense on its own.

We already use vcpu_get_esr(), which is the bit that'd have to change if
we didn't pass the vcpu around, and the new helpers are just consuming
the value in a sifferent way rather than a necessarily simpler way.

Further comments on that front below.

> ---
>  arch/arm64/include/asm/esr.h | 32 +
>  arch/arm64/include/asm/kvm_emulate.h | 43 
>  2 files changed, 51 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 035003acfa87..950204c5fbe1 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -326,6 +326,38 @@ static inline bool esr_is_data_abort(u32 esr)
>   return ec == ESR_ELx_EC_DABT_LOW || ec == ESR_ELx_EC_DABT_CUR;
>  }
>  
> +#define ESR_DECLARE_CHECK_FUNC(name, field)  \
> +static inline bool esr_is_##name(u32 esr)\
> +{\
> + return !!(esr & (field));   \
> +}
> +#define ESR_DECLARE_GET_FUNC(name, mask, shift)  \
> +static inline u32 esr_get_##name(u32 esr)\
> +{\
> + return ((esr & (mask)) >> (shift)); \
> +}
> +
> +ESR_DECLARE_CHECK_FUNC(il_32bit,   ESR_ELx_IL);
> +ESR_DECLARE_CHECK_FUNC(condition,  ESR_ELx_CV);
> +ESR_DECLARE_CHECK_FUNC(dabt_valid, ESR_ELx_ISV);
> +ESR_DECLARE_CHECK_FUNC(dabt_sse,   ESR_ELx_SSE);
> +ESR_DECLARE_CHECK_FUNC(dabt_sf,ESR_ELx_SF);
> +ESR_DECLARE_CHECK_FUNC(dabt_s1ptw, ESR_ELx_S1PTW);
> +ESR_DECLARE_CHECK_FUNC(dabt_write, ESR_ELx_WNR);
> +ESR_DECLARE_CHECK_FUNC(dabt_cm,ESR_ELx_CM);
> +
> +ESR_DECLARE_GET_FUNC(class,ESR_ELx_EC_MASK,  ESR_ELx_EC_SHIFT);
> +ESR_DECLARE_GET_FUNC(fault,ESR_ELx_FSC,  0);
> +ESR_DECLARE_GET_FUNC(fault_type,   ESR_ELx_FSC_TYPE, 0);
> +ESR_DECLARE_GET_FUNC(condition,ESR_ELx_COND_MASK,ESR_ELx_COND_SHIFT);
> +ESR_DECLARE_GET_FUNC(hvc_imm,  ESR_ELx_xVC_IMM_MASK, 0);
> +ESR_DECLARE_GET_FUNC(dabt_iss_nisv_sanitized,
> +  (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC), 0);
> +ESR_DECLARE_GET_FUNC(dabt_rd,  ESR_ELx_SRT_MASK, ESR_ELx_SRT_SHIFT);
> +ESR_DECLARE_GET_FUNC(dabt_as,  ESR_ELx_SAS,  ESR_ELx_SAS_SHIFT);
> +ESR_DECLARE_GET_FUNC(sys_rt,   ESR_ELx_SYS64_ISS_RT_MASK,
> +ESR_ELx_SYS64_ISS_RT_SHIFT);

I'm really not keen on this, as I think it's abstracting the problem at
the wrong level, hiding information and making things harder to reason
about rather than abstracting that.

I strongly suspect the right thing to do is use FIELD_GET() in-place in
the functions below, e.g.

   !!FIELD_GET(esr, ESR_ELx_IL);

... rather than:

   esr_get_il_32bit(esr);

... as that avoids the wrapper entirely, minimizing indirection and
making the codebase simpler to navigate.

For the cases where we *really* want a helper, i'd rather write those
out explicitly, e.g.

#define esr_get_hvc_imm(esr)FIELD_GET(esr, ESR_ELx_xVC_IMM_MASK)

... but I'm not sure if we really need those given these are mostly used
*once* below.

> +
>  const char *esr_get_class_string(u32 esr);
>  #endif /* __ASSEMBLY */
>  
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index c9ba0df47f7d..9337d90c517f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -266,12 +266,8 @@ static __always_inline u32 kvm_vcpu_get_esr(const struct 
> kvm_vcpu *vcpu)
>  
>  static __always_inline int kvm_vcpu_get_condition(const struct kvm_vcpu 
> *vcpu)
>  {
> - u32 esr = kvm_vcpu_get_esr(vcpu);
> -
> - if (esr & ESR_ELx_CV)
> - return (esr & ESR_ELx_COND_MASK) >> ESR_ELx_COND_SHIFT;
> -
> - return -1;
> + return esr_is_condition(kvm_vcpu_get_esr(vcpu)) ?
> +esr_get_condition(kvm_vcpu_get_esr(vcpu)) : -1;
>  }

Do we really need to change the structure of this code? I thought this
was purely about decooupling helpers from the vcpu struct. This could
have stayed as:

static __always_inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
{
u32 esr = kvm_vcpu_get_esr(vcpu);

if (esr_is_condition(esr))
return esr_get_condition(esr);

Re: [PATCH 1/2] kvm/arm64: Rename HSR to ESR

2020-06-29 Thread Mark Rutland

On Mon, Jun 29, 2020 at 07:18:40PM +1000, Gavin Shan wrote:
> kvm/arm32 isn't supported since commit 541ad0150ca4 ("arm: Remove
> 32bit KVM host support"). So HSR isn't meaningful since then. This
> renames HSR to ESR accordingly. This shouldn't cause any functional
> changes:
> 
>* Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() to make the
>  function names self-explanatory.
>* Rename variables from @hsr to @esr to make them self-explanatory.
> 
> Signed-off-by: Gavin Shan 

At a high-level, I agree that we should move to the `esr` naming to
match the architecture and minimize surprise. However, I think there are
some ABI changes here, which *are* funcitonal changes, and those need to
be avoided.

[...]

> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index ba85bb23f060..d54345573a88 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -140,7 +140,7 @@ struct kvm_guest_debug_arch {
>  };
>  
>  struct kvm_debug_exit_arch {
> - __u32 hsr;
> + __u32 esr;
>   __u64 far;  /* used for watchpoints */
>  };

This is userspace ABI, and changing this *will* break userspace. This
*is* a functional change.

NAK to this specifically. At best these should be a comment here that
this is naming is legacym but must stay for ABI reasons.

[...]

> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> index 4c71270cc097..ee4f691b16ff 100644
> --- a/arch/arm64/kvm/trace_arm.h
> +++ b/arch/arm64/kvm/trace_arm.h
> @@ -42,7 +42,7 @@ TRACE_EVENT(kvm_exit,
>   __entry->vcpu_pc= vcpu_pc;
>   ),
>  
> - TP_printk("%s: HSR_EC: 0x%04x (%s), PC: 0x%08lx",
> + TP_printk("%s: ESR_EC: 0x%04x (%s), PC: 0x%08lx",
> __print_symbolic(__entry->ret, kvm_arm_exception_type),
> __entry->esr_ec,
> __print_symbolic(__entry->esr_ec, kvm_arm_exception_class),

Likewise, isn't all the tracepoint format stuff ABI? I'm not comfortable
that we can change this.

Thanks,
Mark.

> @@ -50,27 +50,27 @@ TRACE_EVENT(kvm_exit,
>  );
>  
>  TRACE_EVENT(kvm_guest_fault,
> - TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
> + TP_PROTO(unsigned long vcpu_pc, unsigned long esr,
>unsigned long hxfar,
>unsigned long long ipa),
> - TP_ARGS(vcpu_pc, hsr, hxfar, ipa),
> + TP_ARGS(vcpu_pc, esr, hxfar, ipa),
>  
>   TP_STRUCT__entry(
>   __field(unsigned long,  vcpu_pc )
> - __field(unsigned long,  hsr )
> + __field(unsigned long,  esr )
>   __field(unsigned long,  hxfar   )
>   __field(   unsigned long long,  ipa )
>   ),
>  
>   TP_fast_assign(
>   __entry->vcpu_pc= vcpu_pc;
> - __entry->hsr= hsr;
> + __entry->esr= esr;
>   __entry->hxfar  = hxfar;
>   __entry->ipa= ipa;
>   ),
>  
> - TP_printk("ipa %#llx, hsr %#08lx, hxfar %#08lx, pc %#08lx",
> -   __entry->ipa, __entry->hsr,
> + TP_printk("ipa %#llx, esr %#08lx, hxfar %#08lx, pc %#08lx",
> +   __entry->ipa, __entry->esr,
> __entry->hxfar, __entry->vcpu_pc)
>  );
>  
> diff --git a/arch/arm64/kvm/trace_handle_exit.h 
> b/arch/arm64/kvm/trace_handle_exit.h
> index 2c56d1e0f5bd..94ef1a98e609 100644
> --- a/arch/arm64/kvm/trace_handle_exit.h
> +++ b/arch/arm64/kvm/trace_handle_exit.h
> @@ -139,18 +139,18 @@ TRACE_EVENT(trap_reg,
>  );
>  
>  TRACE_EVENT(kvm_handle_sys_reg,
> - TP_PROTO(unsigned long hsr),
> - TP_ARGS(hsr),
> + TP_PROTO(unsigned long esr),
> + TP_ARGS(esr),
>  
>   TP_STRUCT__entry(
> - __field(unsigned long,  hsr)
> + __field(unsigned long,  esr)
>   ),
>  
>   TP_fast_assign(
> - __entry->hsr = hsr;
> + __entry->esr = esr;
>   ),
>  
> - TP_printk("HSR 0x%08lx", __entry->hsr)
> + TP_printk("ESR 0x%08lx", __entry->esr)
>  );
>  
>  TRACE_EVENT(kvm_sys_access,
> -- 
> 2.23.0
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 5/5] KVM: arm64: Simplify PtrAuth alternative patching

2020-06-22 Thread Mark Rutland

On Mon, Jun 22, 2020 at 11:25:41AM +0100, Marc Zyngier wrote:
> On 2020-06-22 10:15, Mark Rutland wrote:
> > On Mon, Jun 22, 2020 at 09:06:43AM +0100, Marc Zyngier wrote:
> I have folded in the following patch:
> 
> diff --git a/arch/arm64/include/asm/kvm_ptrauth.h
> b/arch/arm64/include/asm/kvm_ptrauth.h
> index 7a72508a841b..0ddf98c3ba9f 100644
> --- a/arch/arm64/include/asm/kvm_ptrauth.h
> +++ b/arch/arm64/include/asm/kvm_ptrauth.h
> @@ -68,29 +68,29 @@
>   */
>  .macro ptrauth_switch_to_guest g_ctxt, reg1, reg2, reg3
>  alternative_if_not ARM64_HAS_ADDRESS_AUTH
> - b   1000f
> + b   .L__skip_switch\@
>  alternative_else_nop_endif
>   mrs \reg1, hcr_el2
>   and \reg1, \reg1, #(HCR_API | HCR_APK)
> - cbz \reg1, 1000f
> + cbz \reg1, .L__skip_switch\@
>   add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
>   ptrauth_restore_state   \reg1, \reg2, \reg3
> -1000:
> +.L__skip_switch\@:
>  .endm
> 
>  .macro ptrauth_switch_to_host g_ctxt, h_ctxt, reg1, reg2, reg3
>  alternative_if_not ARM64_HAS_ADDRESS_AUTH
> - b   2000f
> + b   .L__skip_switch\@
>  alternative_else_nop_endif
>   mrs \reg1, hcr_el2
>   and \reg1, \reg1, #(HCR_API | HCR_APK)
> - cbz \reg1, 2000f
> + cbz \reg1, .L__skip_switch\@
>   add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
>   ptrauth_save_state  \reg1, \reg2, \reg3
>   add \reg1, \h_ctxt, #CPU_APIAKEYLO_EL1
>   ptrauth_restore_state   \reg1, \reg2, \reg3
>   isb
> -2000:
> +.L__skip_switch\@:
>  .endm

Looks good to me; thanks!

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 5/5] KVM: arm64: Simplify PtrAuth alternative patching

2020-06-22 Thread Mark Rutland

On Mon, Jun 22, 2020 at 09:06:43AM +0100, Marc Zyngier wrote:
> We currently decide to execute the PtrAuth save/restore code based
> on a set of branches that evaluate as (ARM64_HAS_ADDRESS_AUTH_ARCH ||
> ARM64_HAS_ADDRESS_AUTH_IMP_DEF). This can be easily replaced by
> a much simpler test as the ARM64_HAS_ADDRESS_AUTH capability is
> exactly this expression.
> 
> Suggested-by: Mark Rutland 
> Signed-off-by: Marc Zyngier 

Looks good to me. One minor suggestion below, but either way:

Acked-by: Mark Rutland 

> ---
>  arch/arm64/include/asm/kvm_ptrauth.h | 26 +-
>  1 file changed, 9 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_ptrauth.h 
> b/arch/arm64/include/asm/kvm_ptrauth.h
> index f1830173fa9e..7a72508a841b 100644
> --- a/arch/arm64/include/asm/kvm_ptrauth.h
> +++ b/arch/arm64/include/asm/kvm_ptrauth.h
> @@ -61,44 +61,36 @@
>  
>  /*
>   * Both ptrauth_switch_to_guest and ptrauth_switch_to_host macros will
> - * check for the presence of one of the cpufeature flag
> - * ARM64_HAS_ADDRESS_AUTH_ARCH or ARM64_HAS_ADDRESS_AUTH_IMP_DEF and
> + * check for the presence ARM64_HAS_ADDRESS_AUTH, which is defined as
> + * (ARM64_HAS_ADDRESS_AUTH_ARCH || ARM64_HAS_ADDRESS_AUTH_IMP_DEF) and
>   * then proceed ahead with the save/restore of Pointer Authentication
> - * key registers.
> + * key registers if enabled for the guest.
>   */
>  .macro ptrauth_switch_to_guest g_ctxt, reg1, reg2, reg3
> -alternative_if ARM64_HAS_ADDRESS_AUTH_ARCH
> +alternative_if_not ARM64_HAS_ADDRESS_AUTH
>   b   1000f
>  alternative_else_nop_endif
> -alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF
> - b   1001f
> -alternative_else_nop_endif
> -1000:
>   mrs \reg1, hcr_el2
>   and \reg1, \reg1, #(HCR_API | HCR_APK)
> - cbz \reg1, 1001f
> + cbz \reg1, 1000f
>   add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
>   ptrauth_restore_state   \reg1, \reg2, \reg3
> -1001:
> +1000:
>  .endm

Since these are in macros, we could use \@ to generate a macro-specific
lavel rather than a magic number, which would be less likely to conflict
with the surrounding environment and would be more descriptive. We do
that in a few places already, and here it could look something like:

| alternative_if_not ARM64_HAS_ADDRESS_AUTH
|   b   .L__skip_pauth_switch\@
| alternative_else_nop_endif
|   
|   ...
| 
| .L__skip_pauth_switch\@:

Per the gas documentation

| \@
|
|as maintains a counter of how many macros it has executed in this
|pseudo-variable; you can copy that number to your output with ‘\@’,
|but only within a macro definition.

No worries if you don't want to change that now; the Acked-by stands
either way.

Mark.

>  
>  .macro ptrauth_switch_to_host g_ctxt, h_ctxt, reg1, reg2, reg3
> -alternative_if ARM64_HAS_ADDRESS_AUTH_ARCH
> +alternative_if_not ARM64_HAS_ADDRESS_AUTH
>   b   2000f
>  alternative_else_nop_endif
> -alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF
> - b   2001f
> -alternative_else_nop_endif
> -2000:
>   mrs \reg1, hcr_el2
>   and \reg1, \reg1, #(HCR_API | HCR_APK)
> - cbz \reg1, 2001f
> + cbz \reg1, 2000f
>   add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
>   ptrauth_save_state  \reg1, \reg2, \reg3
>   add \reg1, \h_ctxt, #CPU_APIAKEYLO_EL1
>   ptrauth_restore_state   \reg1, \reg2, \reg3
>   isb
> -2001:
> +2000:
>  .endm
>  
>  #else /* !CONFIG_ARM64_PTR_AUTH */
> -- 
> 2.27.0
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2 1/5] KVM: arm64: Enable Address Authentication at EL2 if available

2020-06-22 Thread Mark Rutland

On Mon, Jun 22, 2020 at 09:06:39AM +0100, Marc Zyngier wrote:
> While initializing EL2, enable Address Authentication if detected
> from EL1. We still use the EL1-provided keys though.
> 
> Acked-by: Andrew Scull 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kvm/hyp-init.S | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
> index 6e6ed5581eed..1587d146726a 100644
> --- a/arch/arm64/kvm/hyp-init.S
> +++ b/arch/arm64/kvm/hyp-init.S
> @@ -104,6 +104,11 @@ alternative_else_nop_endif
>*/
>   mov_q   x4, (SCTLR_EL2_RES1 | (SCTLR_ELx_FLAGS & ~SCTLR_ELx_A))
>  CPU_BE(  orr x4, x4, #SCTLR_ELx_EE)
> +alternative_if ARM64_HAS_ADDRESS_AUTH
> + mov_q   x5, (SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | \
> +  SCTLR_ELx_ENDA | SCTLR_ELx_ENDB)
> + orr x4, x4, x5
> +alternative_else_nop_endif
>   msr sctlr_el2, x4
>   isb
>  
> -- 
> 2.27.0
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 4/4] KVM: arm64: Check HCR_EL2 instead of shadow copy to swap PtrAuth registers

2020-06-15 Thread Mark Rutland

On Mon, Jun 15, 2020 at 09:19:54AM +0100, Marc Zyngier wrote:
> When save/restoring PtrAuth registers between host and guest, it is
> pretty useless to fetch the in-memory state, while we have the right
> state in the HCR_EL2 system register. Use that instead.
> 
> Signed-off-by: Marc Zyngier 

It took me a while to spot that we switched the guest/host hcr_el2 value
in the __activate_traps() and __deactivate_traps() paths, but given that
this is only called in the __kvm_vcpu_run_*() paths called between
those, I agree this is sound. Given that:

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/kvm_ptrauth.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_ptrauth.h 
> b/arch/arm64/include/asm/kvm_ptrauth.h
> index 6301813dcace..f1830173fa9e 100644
> --- a/arch/arm64/include/asm/kvm_ptrauth.h
> +++ b/arch/arm64/include/asm/kvm_ptrauth.h
> @@ -74,7 +74,7 @@ alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF
>   b   1001f
>  alternative_else_nop_endif
>  1000:
> - ldr \reg1, [\g_ctxt, #(VCPU_HCR_EL2 - VCPU_CONTEXT)]
> + mrs \reg1, hcr_el2
>   and \reg1, \reg1, #(HCR_API | HCR_APK)
>   cbz \reg1, 1001f
>   add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
> @@ -90,7 +90,7 @@ alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF
>   b   2001f
>  alternative_else_nop_endif
>  2000:
> - ldr \reg1, [\g_ctxt, #(VCPU_HCR_EL2 - VCPU_CONTEXT)]
> + mrs \reg1, hcr_el2
>   and \reg1, \reg1, #(HCR_API | HCR_APK)
>   cbz \reg1, 2001f
>   add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
> -- 
> 2.27.0
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 3/4] KVM: arm64: Allow PtrAuth to be enabled from userspace on non-VHE systems

2020-06-15 Thread Mark Rutland

On Mon, Jun 15, 2020 at 09:19:53AM +0100, Marc Zyngier wrote:
> Now that the scene is set for enabling PtrAuth on non-VHE, drop
> the restrictions preventing userspace from enabling it.
> 
> Signed-off-by: Marc Zyngier 

Other than dropping the `has_vhe()` check this appears to be
functionally equivalent and easier to follow, so:

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kvm/reset.c | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index d3b209023727..2a929789fe2e 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -42,6 +42,11 @@ static u32 kvm_ipa_limit;
>  #define VCPU_RESET_PSTATE_SVC(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
>PSR_AA32_I_BIT | PSR_AA32_F_BIT)
>  
> +static bool system_has_full_ptr_auth(void)
> +{
> + return system_supports_address_auth() && system_supports_generic_auth();
> +}
> +
>  /**
>   * kvm_arch_vm_ioctl_check_extension
>   *
> @@ -80,8 +85,7 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   break;
>   case KVM_CAP_ARM_PTRAUTH_ADDRESS:
>   case KVM_CAP_ARM_PTRAUTH_GENERIC:
> - r = has_vhe() && system_supports_address_auth() &&
> -  system_supports_generic_auth();
> + r = system_has_full_ptr_auth();
>   break;
>   default:
>   r = 0;
> @@ -205,19 +209,14 @@ static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
>  
>  static int kvm_vcpu_enable_ptrauth(struct kvm_vcpu *vcpu)
>  {
> - /* Support ptrauth only if the system supports these capabilities. */
> - if (!has_vhe())
> - return -EINVAL;
> -
> - if (!system_supports_address_auth() ||
> - !system_supports_generic_auth())
> - return -EINVAL;
>   /*
>* For now make sure that both address/generic pointer authentication
> -  * features are requested by the userspace together.
> +  * features are requested by the userspace together and the system
> +  * supports these capabilities.
>*/
>   if (!test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) ||
> - !test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features))
> + !test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features) ||
> + !system_has_full_ptr_auth())
>   return -EINVAL;
>  
>   vcpu->arch.flags |= KVM_ARM64_GUEST_HAS_PTRAUTH;
> -- 
> 2.27.0
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 2/4] KVM: arm64: Allow ARM64_PTR_AUTH when ARM64_VHE=n

2020-06-15 Thread Mark Rutland

On Mon, Jun 15, 2020 at 09:19:52AM +0100, Marc Zyngier wrote:
> We currently prevent PtrAuth from even being built if KVM is selected,
> but VHE isn't. It is a bit of a pointless restriction, since we also
> check this at run time (rejecting the enabling of PtrAuth for the
> vcpu if we're not running with VHE).
> 
> Just drop this apparently useless restriction.
> 
> Signed-off-by: Marc Zyngier 

I can't recall exactly why we had this limitation to begin with, but
given we now save/restore the keys in common hyp code, I don't see a
reason to forbid this, and agree the limitation is pointless, so:

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/Kconfig | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 31380da53689..d719ea9c596d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1516,7 +1516,6 @@ menu "ARMv8.3 architectural features"
>  config ARM64_PTR_AUTH
>   bool "Enable support for pointer authentication"
>   default y
> - depends on !KVM || ARM64_VHE
>   depends on (CC_HAS_SIGN_RETURN_ADDRESS || CC_HAS_BRANCH_PROT_PAC_RET) 
> && AS_HAS_PAC
>   # GCC 9.1 and later inserts a .note.gnu.property section note for PAC
>   # which is only understood by binutils starting with version 2.33.1.
> @@ -1543,8 +1542,7 @@ config ARM64_PTR_AUTH
>  
> The feature is detected at runtime. If the feature is not present in
> hardware it will not be advertised to userspace/KVM guest nor will it
> -   be enabled. However, KVM guest also require VHE mode and hence
> -   CONFIG_ARM64_VHE=y option to use this feature.
> +   be enabled.
>  
> If the feature is present on the boot CPU but not on a late CPU, then
> the late CPU will be parked. Also, if the boot CPU does not have
> -- 
> 2.27.0
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 1/4] KVM: arm64: Enable Pointer Authentication at EL2 if available

2020-06-15 Thread Mark Rutland

On Mon, Jun 15, 2020 at 09:19:51AM +0100, Marc Zyngier wrote:
> While initializing EL2, switch Pointer Authentication if detected
> from EL1. We use the EL1-provided keys though.

Perhaps "enable address authentication", to avoid confusion with
context-switch, and since generic authentication cannot be disabled
locally at EL2.

> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp-init.S | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S
> index 6e6ed5581eed..81732177507d 100644
> --- a/arch/arm64/kvm/hyp-init.S
> +++ b/arch/arm64/kvm/hyp-init.S
> @@ -104,6 +104,17 @@ alternative_else_nop_endif
>*/
>   mov_q   x4, (SCTLR_EL2_RES1 | (SCTLR_ELx_FLAGS & ~SCTLR_ELx_A))
>  CPU_BE(  orr x4, x4, #SCTLR_ELx_EE)
> +alternative_if ARM64_HAS_ADDRESS_AUTH_ARCH
> + b   1f
> +alternative_else_nop_endif
> +alternative_if_not ARM64_HAS_ADDRESS_AUTH_IMP_DEF
> + b   2f
> +alternative_else_nop_endif

I see this is the same pattern we use in the kvm context switch, but I
think we can use the ARM64_HAS_ADDRESS_AUTH cap instead (likewise in the
existing code).

AFAICT that won't permit mismatch given both ARM64_HAS_ADDRESS_AUTH_ARCH
and ARM64_HAS_ADDRESS_AUTH_IMP_DEF are dealt with as
ARM64_CPUCAP_BOOT_CPU_FEATURE.

> +1:
> + orr x4, x4, #(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB)
> + orr x4, x4, #SCTLR_ELx_ENDA
> + orr x4, x4, #SCTLR_ELx_ENDB

Assuming we have a spare register, it would be nice if we could follow the same
pattern as in proc.S, where we do:

| ldr x2, =SCTLR_ELx_ENIA | SCTLR_ELx_ENIB | \
|  SCTLR_ELx_ENDA | SCTLR_ELx_ENDB
| orr x0, x0, x2

... though we could/should use mov_q rather than a load literal, here and in
proc.S.

... otherwise this looks sound to me.

Thanks,
Mark.

> +2:
>   msr sctlr_el2, x4
>   isb
>  
> -- 
> 2.27.0
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v2] KVM: arm64: Remove host_cpu_context member from vcpu structure

2020-06-08 Thread Mark Rutland

On Mon, Jun 08, 2020 at 09:56:57AM +0100, Marc Zyngier wrote:
> For very long, we have kept this pointer back to the per-cpu
> host state, despite having working per-cpu accessors at EL2
> for some time now.
> 
> Recent investigations have shown that this pointer is easy
> to abuse in preemptible context, which is a sure sign that
> it would better be gone. Not to mention that a per-cpu
> pointer is faster to access at all times.
> 
> Reported-by: Andrew Scull 
> Signed-off-by: Marc Zyngier 

>From a quick scan, this looks sane to me, so FWIW:

Acked-by: Mark Rutland 

Mark.

> ---
> 
> Notes:
> v2: Stick to this_cpu_ptr() in pmu.c, as this only used on the
> kernel side and not the hypervisor.
> 
>  arch/arm64/include/asm/kvm_host.h | 3 ---
>  arch/arm64/kvm/arm.c  | 3 ---
>  arch/arm64/kvm/hyp/debug-sr.c | 4 ++--
>  arch/arm64/kvm/hyp/switch.c   | 6 +++---
>  arch/arm64/kvm/hyp/sysreg-sr.c| 6 --
>  arch/arm64/kvm/pmu.c  | 8 ++--
>  6 files changed, 11 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 59029e90b557..ada1faa92211 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -284,9 +284,6 @@ struct kvm_vcpu_arch {
>   struct kvm_guest_debug_arch vcpu_debug_state;
>   struct kvm_guest_debug_arch external_debug_state;
>  
> - /* Pointer to host CPU context */
> - struct kvm_cpu_context *host_cpu_context;
> -
>   struct thread_info *host_thread_info;   /* hyp VA */
>   struct user_fpsimd_state *host_fpsimd_state;/* hyp VA */
>  
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 14b747266607..6ddaa23ef346 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -340,10 +340,8 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>   int *last_ran;
> - kvm_host_data_t *cpu_data;
>  
>   last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
> - cpu_data = this_cpu_ptr(&kvm_host_data);
>  
>   /*
>* We might get preempted before the vCPU actually runs, but
> @@ -355,7 +353,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   }
>  
>   vcpu->cpu = cpu;
> - vcpu->arch.host_cpu_context = &cpu_data->host_ctxt;
>  
>   kvm_vgic_load(vcpu);
>   kvm_timer_vcpu_load(vcpu);
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 0fc9872a1467..e95af204fec7 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -185,7 +185,7 @@ void __hyp_text __debug_switch_to_guest(struct kvm_vcpu 
> *vcpu)
>   if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>   return;
>  
> - host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> + host_ctxt = &__hyp_this_cpu_ptr(kvm_host_data)->host_ctxt;
>   guest_ctxt = &vcpu->arch.ctxt;
>   host_dbg = &vcpu->arch.host_debug_state.regs;
>   guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> @@ -207,7 +207,7 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu 
> *vcpu)
>   if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>   return;
>  
> - host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> + host_ctxt = &__hyp_this_cpu_ptr(kvm_host_data)->host_ctxt;
>   guest_ctxt = &vcpu->arch.ctxt;
>   host_dbg = &vcpu->arch.host_debug_state.regs;
>   guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index fc09c3dfa466..fc671426c14b 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -544,7 +544,7 @@ static bool __hyp_text __hyp_handle_ptrauth(struct 
> kvm_vcpu *vcpu)
>   return false;
>   }
>  
> - ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> + ctxt = &__hyp_this_cpu_ptr(kvm_host_data)->host_ctxt;
>   __ptrauth_save_key(ctxt->sys_regs, APIA);
>   __ptrauth_save_key(ctxt->sys_regs, APIB);
>   __ptrauth_save_key(ctxt->sys_regs, APDA);
> @@ -715,7 +715,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   struct kvm_cpu_context *guest_ctxt;
>   u64 exit_code;
>  
> - host_ctxt = vcpu->arch.host_cpu_context;
> + host_ctxt = &__hyp_this_cpu_ptr(kvm_host_data)->host_ctxt;
>   host_ctxt->__hyp_running_vcpu = vcpu;
>   guest_ctxt = &vcpu->arch.ctxt;
>  
> @@ -820,7 +820,7 @@ int __

Re: [PATCH 3/3] KVM: arm64: Enforce PtrAuth being disabled if not advertized

2020-06-04 Thread Mark Rutland

Hi Marc,

On Thu, Jun 04, 2020 at 02:33:54PM +0100, Marc Zyngier wrote:
> Even if we don't expose PtrAuth to a guest, the guest can still
> write to its SCTIRLE_1 register and set the En{I,D}{A,B} bits
> and execute PtrAuth instructions from the NOP space. This has
> the effect of trapping to EL2, and we currently inject an UNDEF.

I think it's worth noting that this is an ill-behaved guest, as those
bits are RES0 when pointer authentication isn't implemented.

The rationale for RES0/RES1 bits is that new HW can rely on old SW
programming them with the 0/1 as appropriate, and that old SW that does
not do so may encounter behaviour which from its PoV is UNPREDICTABLE.
The SW side of the contract is that you must program them as 0/1 unless
you know they're allocated with a specific meaning.

With that in mind I think the current behaviour is legitimate: from the
guest's PoV it's the same as there being a distinct extension which it
is not aware of where the En{I,D}{A,B} bits means "trap some HINTs to
EL1".

I don't think that we should attempt to work around broken software here
unless we absolutely have to, as it only adds complexity for no real
gain.

Thanks,
Mark.

> This is definitely the wrong thing to do, as the architecture says
> that these instructions should behave as NOPs.
> 
> Instead, we can simply reset the offending SCTLR_EL1 bits to
> zero, and resume the guest. It can still observe the SCTLR bits
> being set and then being cleared by magic, but that's much better
> than delivering an unexpected extension.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/handle_exit.c | 12 
>  arch/arm64/kvm/hyp/switch.c  | 18 --
>  2 files changed, 16 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 5a02d4c90559..98d8adf6f865 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -162,17 +162,6 @@ static int handle_sve(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   return 1;
>  }
>  
> -/*
> - * Guest usage of a ptrauth instruction (which the guest EL1 did not turn 
> into
> - * a NOP). If we get here, it is that we didn't fixup ptrauth on exit, and 
> all
> - * that we can do is give the guest an UNDEF.
> - */
> -static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu, struct kvm_run *run)
> -{
> - kvm_inject_undefined(vcpu);
> - return 1;
> -}
> -
>  static exit_handle_fn arm_exit_handlers[] = {
>   [0 ... ESR_ELx_EC_MAX]  = kvm_handle_unknown_ec,
>   [ESR_ELx_EC_WFx]= kvm_handle_wfx,
> @@ -195,7 +184,6 @@ static exit_handle_fn arm_exit_handlers[] = {
>   [ESR_ELx_EC_BKPT32] = kvm_handle_guest_debug,
>   [ESR_ELx_EC_BRK64]  = kvm_handle_guest_debug,
>   [ESR_ELx_EC_FP_ASIMD]   = handle_no_fpsimd,
> - [ESR_ELx_EC_PAC]= kvm_handle_ptrauth,
>  };
>  
>  static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 2a50b3771c3b..fc09c3dfa466 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -503,8 +503,22 @@ static bool __hyp_text __hyp_handle_ptrauth(struct 
> kvm_vcpu *vcpu)
>   struct kvm_cpu_context *ctxt;
>   u64 val;
>  
> - if (!vcpu_has_ptrauth(vcpu))
> - return false;
> + if (!vcpu_has_ptrauth(vcpu)) {
> + if (ec != ESR_ELx_EC_PAC)
> + return false;
> +
> + /*
> +  * Interesting situation: the guest has enabled PtrAuth,
> +  * despite KVM not advertising it. Fix SCTLR_El1 on behalf
> +  * of the guest (the bits should behave as RES0 anyway).
> +  */
> + val = read_sysreg_el1(SYS_SCTLR);
> + val &= ~(SCTLR_ELx_ENIA | SCTLR_ELx_ENIB |
> +  SCTLR_ELx_ENDA | SCTLR_ELx_ENDB);
> + write_sysreg_el1(val, SYS_SCTLR);
> +
> + return true;
> + }
>  
>   switch (ec) {
>   case ESR_ELx_EC_PAC:
> -- 
> 2.26.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 2/3] KVM: arm64: Handle PtrAuth traps early

2020-06-04 Thread Mark Rutland

 * - Either we re-execute the same key register access instruction
> -  *   after enabling ptrauth.
> -  * - Or an UNDEF is injected as ptrauth is not supported/enabled.
> +  * If we land here, that is because we didn't fixup the access on exit
> +  * by allowing the PtrAuth sysregs. The only way this happens is when
> +  * the guest does not have PtrAuth support enabled.
>*/
> + kvm_inject_undefined(vcpu);
> +
>   return false;
>  }
>  
> -- 
> 2.26.2
> 

Regardless of the suggestion above, this looks sound to me. I agree that
it's much nicer to handle this in hyp, and AFAICT the context switch
should do the right thing, so:

Reviewed-by: Mark Rutland 

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 1/3] KVM: arm64: Save the host's PtrAuth keys in non-preemptible context

2020-06-04 Thread Mark Rutland

On Thu, Jun 04, 2020 at 02:33:52PM +0100, Marc Zyngier wrote:
> When using the PtrAuth feature in a guest, we need to save the host's
> keys before allowing the guest to program them. For that, we dump
> them in a per-CPU data structure (the so called host context).
> 
> But both call sites that do this are in preemptible context,
> which may end up in disaster should the vcpu thread get preempted
> before reentering the guest.

Yuck!

> Instead, save the keys eagerly on each vcpu_load(). This has an
> increased overhead, but is at least safe.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Marc Zyngier 

This looks sound to me given kvm_arch_vcpu_load() is surrounded with
get_cpu() .. put_cpu() and gets called when the thread is preempted.

Reviewed-by: Mark Rutland 

Thanks,
Mark.

> ---
>  arch/arm64/include/asm/kvm_emulate.h |  6 --
>  arch/arm64/kvm/arm.c | 18 +-
>  arch/arm64/kvm/handle_exit.c | 19 ++-
>  3 files changed, 19 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index a30b4eec7cb4..977843e4d5fb 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -112,12 +112,6 @@ static inline void vcpu_ptrauth_disable(struct kvm_vcpu 
> *vcpu)
>   vcpu->arch.hcr_el2 &= ~(HCR_API | HCR_APK);
>  }
>  
> -static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu)
> -{
> - if (vcpu_has_ptrauth(vcpu))
> - vcpu_ptrauth_disable(vcpu);
> -}
> -
>  static inline unsigned long vcpu_get_vsesr(struct kvm_vcpu *vcpu)
>  {
>   return vcpu->arch.vsesr_el2;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index d6988401c22a..152049c5055d 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -337,6 +337,12 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
>   preempt_enable();
>  }
>  
> +#define __ptrauth_save_key(regs, key)
> \
> +({   
> \
> + regs[key ## KEYLO_EL1] = read_sysreg_s(SYS_ ## key ## KEYLO_EL1);   
> \
> + regs[key ## KEYHI_EL1] = read_sysreg_s(SYS_ ## key ## KEYHI_EL1);   
> \
> +})
> +
>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>   int *last_ran;
> @@ -370,7 +376,17 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   else
>   vcpu_set_wfx_traps(vcpu);
>  
> - vcpu_ptrauth_setup_lazy(vcpu);
> + if (vcpu_has_ptrauth(vcpu)) {
> + struct kvm_cpu_context *ctxt = vcpu->arch.host_cpu_context;
> +
> + __ptrauth_save_key(ctxt->sys_regs, APIA);
> + __ptrauth_save_key(ctxt->sys_regs, APIB);
> + __ptrauth_save_key(ctxt->sys_regs, APDA);
> + __ptrauth_save_key(ctxt->sys_regs, APDB);
> + __ptrauth_save_key(ctxt->sys_regs, APGA);
> +
> + vcpu_ptrauth_disable(vcpu);
> + }
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index eb194696ef62..065251efa2e6 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -162,31 +162,16 @@ static int handle_sve(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   return 1;
>  }
>  
> -#define __ptrauth_save_key(regs, key)
> \
> -({   
> \
> - regs[key ## KEYLO_EL1] = read_sysreg_s(SYS_ ## key ## KEYLO_EL1);   
> \
> - regs[key ## KEYHI_EL1] = read_sysreg_s(SYS_ ## key ## KEYHI_EL1);   
> \
> -})
> -
>  /*
>   * Handle the guest trying to use a ptrauth instruction, or trying to access 
> a
>   * ptrauth register.
>   */
>  void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu)
>  {
> - struct kvm_cpu_context *ctxt;
> -
> - if (vcpu_has_ptrauth(vcpu)) {
> + if (vcpu_has_ptrauth(vcpu))
>   vcpu_ptrauth_enable(vcpu);
> - ctxt = vcpu->arch.host_cpu_context;
> - __ptrauth_save_key(ctxt->sys_regs, APIA);
> - __ptrauth_save_key(ctxt->sys_regs, APIB);
> - __ptrauth_save_key(ctxt->sys_regs, APDA);
> - __ptrauth_save_key(ctxt->sys_regs, APDB);
> - __ptrauth_save_key(ctxt->sys_regs, APGA);
> - } else {
> + else
>   kvm_inject_undefined(vcpu);
> - }
>  }
>  
>  /*
> -- 
> 2.26.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL

2020-05-27 Thread Mark Rutland

On Wed, May 27, 2020 at 10:34:09AM +0100, Marc Zyngier wrote:
> HI Mark,
> 
> On 2020-05-19 11:44, Mark Rutland wrote:
> > On Wed, Apr 22, 2020 at 01:00:50PM +0100, Marc Zyngier wrote:
> > > -static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
> > > +static void enter_exception(struct kvm_vcpu *vcpu, unsigned long
> > > target_mode,
> > > + enum exception_type type)
> > 
> > Since this is all for an AArch64 target, could we keep `64` in the name,
> > e.g enter_exception64? That'd mirror the callers below.
> > 
> > >  {
> > > - unsigned long sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
> > > - unsigned long old, new;
> > > + unsigned long sctlr, vbar, old, new, mode;
> > > + u64 exc_offset;
> > > +
> > > + mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
> > > +
> > > + if  (mode == target_mode)
> > > + exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> > > + else if ((mode | 1) == target_mode)
> > > + exc_offset = CURRENT_EL_SP_EL0_VECTOR;
> > 
> > It would be nice if we could add a mnemonic for the `1` here, e.g.
> > PSR_MODE_SP0 or PSR_MODE_THREAD_BIT.
> 
> I've addressed both comments as follows:
> 
> diff --git a/arch/arm64/include/asm/ptrace.h
> b/arch/arm64/include/asm/ptrace.h
> index bf57308fcd63..953b6a1ce549 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -35,6 +35,7 @@
>  #define GIC_PRIO_PSR_I_SET   (1 << 4)
> 
>  /* Additional SPSR bits not exposed in the UABI */
> +#define PSR_MODE_THREAD_BIT  (1 << 0)
>  #define PSR_IL_BIT   (1 << 20)
> 
>  /* AArch32-specific ptrace requests */
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index 3dbcbc839b9c..ebfdfc27b2bd 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -43,8 +43,8 @@ enum exception_type {
>   * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout,
> from
>   * MSB to LSB.
>   */
> -static void enter_exception(struct kvm_vcpu *vcpu, unsigned long
> target_mode,
> - enum exception_type type)
> +static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long
> target_mode,
> +   enum exception_type type)
>  {
>   unsigned long sctlr, vbar, old, new, mode;
>   u64 exc_offset;
> @@ -53,7 +53,7 @@ static void enter_exception(struct kvm_vcpu *vcpu,
> unsigned long target_mode,
> 
>   if  (mode == target_mode)
>   exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> - else if ((mode | 1) == target_mode)
> + else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
>   exc_offset = CURRENT_EL_SP_EL0_VECTOR;
>   else if (!(mode & PSR_MODE32_BIT))
>   exc_offset = LOWER_EL_AArch64_VECTOR;
> @@ -126,7 +126,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool
> is_iabt, unsigned long addr
>   bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
>   u32 esr = 0;
> 
> - enter_exception(vcpu, PSR_MODE_EL1h, except_type_sync);
> + enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
> 
>   vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> 
> @@ -156,7 +156,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>  {
>   u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
> 
> - enter_exception(vcpu, PSR_MODE_EL1h, except_type_sync);
> + enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
> 
>   /*
>* Build an unknown exception, depending on the instruction

Thanks; that all looks good to me, and my R-b stands!

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH RFCv2 0/9] kvm/arm64: Support Async Page Fault

2020-05-26 Thread Mark Rutland

Hi Gavin,

At a high-level I'm rather fearful of this series. I can see many ways
that this can break, and I can also see that even if/when we get things
into a working state, constant vigilance will be requried for any
changes to the entry code.

I'm not keen on injecting non-architectural exceptions in this way, and
I'm also not keen on how deep the PV hooks are injected currently (e.g.
in the ret_to_user path).

I see a few patches have preparator cleanup that I think would be
worthwhile regardless of this series; if you could factor those out and
send them on their own it would get that out of the way and make it
easier to review the series itself. Similarly, there's some duplication
of code from arch/x86 which I think can be factored out to virt/kvm
instead as preparatory work.

Generally, I also think that you need to spend some time on commit
messages and/or documentation to better explain the concepts and
expected usage. I had to reverse-engineer the series by reviewing it in
entirety before I had an idea as to how basic parts of it strung
together, and a more thorough conceptual explanation would make it much
easier to critique the approach rather than the individual patches.

On Fri, May 08, 2020 at 01:29:10PM +1000, Gavin Shan wrote:
> Testing
> ===
> The tests are carried on the following machine. A guest with single vCPU
> and 4GB memory is started. Also, the QEMU process is put into memory cgroup
> (v1) whose memory limit is set to 2GB. In the guest, there are two threads,
> which are memory bound and CPU bound separately. The memory bound thread
> allocates all available memory, accesses and them free them. The CPU bound
> thread simply executes block of "nop".

I appreciate this is a microbenchmark, but that sounds far from
realistic.

Is there a specitic real workload that's expected to be representative
of?

Can you run tests with a real workload? For example, a kernel build
inside the VM?

> The test is carried out for 5 time
> continuously and the average number (per minute) of executed blocks in the
> CPU bound thread is taken as indicator of improvement.
> 
>Vendor: GIGABYTE   CPU: 224 x Cavium ThunderX2(R) CPU CN9975 v2.2 @ 2.0GHz
>Memory: 32GB   Disk: Fusion-MPT SAS-3 (PCIe3.0 x8)
> 
>Without-APF: 7029030180/minute = avg(7559625120 5962155840 7823208540
> 7629633480 6170527920)
>With-APF:8286827472/minute = avg(8464584540 8177073360 8262723180
> 8095084020 8434672260)
>Outcome: +17.8%
> 
> Another test case is to measure the time consumed by the application, but
> with the CPU-bound thread disabled.
> 
>Without-APF: 40.3s = avg(40.6 39.3 39.2 41.6 41.2)
>With-APF:40.8s = avg(40.6 41.1 40.9 41.0 40.7)
>Outcome: +1.2%

So this is pure overhead in that case?

I think we need to see a real workload that this benefits. As it stands
it seems that this is a lot of complexity to game a synthetic benchmark.

Thanks,
Mark.

> I also have some code in the host to capture the number of async page faults,
> time used to do swapin and its maximal/minimal values when async page fault
> is enabled. During the test, the CPU-bound thread is disabled. There is about
> 30% of the time used to do swapin.
> 
>Number of async page fault: 7555 times
>Total time used by application: 42.2 seconds
>Total time used by swapin:  12.7 seconds   (30%)
>  Minimal swapin time:  36.2 us
>  Maximal swapin time:  55.7 ms
> 
> Changelog
> =
> RFCv1 -> RFCv2
>* Rebase to 5.7.rc3
>    * Performance data   (Marc 
> Zyngier)
>* Replace IMPDEF system register with KVM vendor specific hypercall  (Mark 
> Rutland)
>* Based on Will's KVM vendor hypercall probe mechanism   (Will 
> Deacon)
>* Don't use IMPDEF DFSC (0x43). Async page fault reason is conveyed
>  by the control block   (Mark 
> Rutland)
>* Delayed wakeup mechanism in guest kernel   
> (Gavin Shan)
>* Stability improvement in the guest kernel: delayed wakeup mechanism,
>  external abort disallowed region, lazily clear async page fault,
>  disabled interrupt on acquiring the head's lock and so on  
> (Gavin Shan)
>* Stability improvement in the host kernel: serialized async page
>  faults etc.
> (Gavin Shan)
>* Performance improvement in guest kernel: percpu sleeper head   
> (Gavin Shan)
> 
> Gavin Shan (7):
>   kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr()
>   kvm/arm64: Detach

Re: [PATCH RFCv2 9/9] arm64: Support async page fault

2020-05-26 Thread Mark Rutland

On Fri, May 08, 2020 at 01:29:19PM +1000, Gavin Shan wrote:
> This supports asynchronous page fault for the guest. The design is
> similar to what x86 has: on receiving a PAGE_NOT_PRESENT signal from
> the host, the current task is either rescheduled or put into power
> saving mode. The task will be waken up when PAGE_READY signal is
> received. The PAGE_READY signal might be received in the context
> of the suspended process, to be waken up. That means the suspended
> process has to wake up itself, but it's not safe and prone to cause
> dead-lock on CPU runqueue lock. So the wakeup is delayed on returning
> from kernel space to user space or idle process is picked for running.
> 
> The signals are conveyed through the async page fault control block,
> which was passed to host on enabling the functionality. On each page
> fault, the control block is checked and switch to the async page fault
> handling flow if any signals exist.
> 
> The feature is put into the CONFIG_KVM_GUEST umbrella, which is added
> by this patch. So we have inline functions implemented in kvm_para.h,
> like other architectures do, to check if async page fault (one of the
> KVM para-virtualized features) is available. Also, the kernel boot
> parameter "no-kvmapf" can be specified to disable the feature.
> 
> Signed-off-by: Gavin Shan 
> ---
>  arch/arm64/Kconfig |  11 +
>  arch/arm64/include/asm/exception.h |   3 +
>  arch/arm64/include/asm/kvm_para.h  |  27 +-
>  arch/arm64/kernel/entry.S  |  33 +++
>  arch/arm64/kernel/process.c|   4 +
>  arch/arm64/mm/fault.c  | 434 +
>  6 files changed, 505 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 40fb05d96c60..2d5e5ee62d6d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1045,6 +1045,17 @@ config PARAVIRT
> under a hypervisor, potentially improving performance significantly
> over full virtualization.
>  
> +config KVM_GUEST
> + bool "KVM Guest Support"
> + depends on PARAVIRT
> + default y
> + help
> +   This option enables various optimizations for running under the KVM
> +   hypervisor. Overhead for the kernel when not running inside KVM should
> +   be minimal.
> +
> +   In case of doubt, say Y
> +
>  config PARAVIRT_TIME_ACCOUNTING
>   bool "Paravirtual steal time accounting"
>   select PARAVIRT
> diff --git a/arch/arm64/include/asm/exception.h 
> b/arch/arm64/include/asm/exception.h
> index 7a6e81ca23a8..d878afa42746 100644
> --- a/arch/arm64/include/asm/exception.h
> +++ b/arch/arm64/include/asm/exception.h
> @@ -46,4 +46,7 @@ void bad_el0_sync(struct pt_regs *regs, int reason, 
> unsigned int esr);
>  void do_cp15instr(unsigned int esr, struct pt_regs *regs);
>  void do_el0_svc(struct pt_regs *regs);
>  void do_el0_svc_compat(struct pt_regs *regs);
> +#ifdef CONFIG_KVM_GUEST
> +void kvm_async_pf_delayed_wake(void);
> +#endif
>  #endif   /* __ASM_EXCEPTION_H */
> diff --git a/arch/arm64/include/asm/kvm_para.h 
> b/arch/arm64/include/asm/kvm_para.h
> index 0ea481dd1c7a..b2f8ef243df7 100644
> --- a/arch/arm64/include/asm/kvm_para.h
> +++ b/arch/arm64/include/asm/kvm_para.h
> @@ -3,6 +3,20 @@
>  #define _ASM_ARM_KVM_PARA_H
>  
>  #include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_KVM_GUEST
> +static inline int kvm_para_available(void)
> +{
> + return 1;
> +}
> +#else
> +static inline int kvm_para_available(void)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_KVM_GUEST */

Please make these bool, and return true/false, as was the case with the
existing stub.

>  
>  static inline bool kvm_check_and_clear_guest_paused(void)
>  {
> @@ -11,17 +25,16 @@ static inline bool kvm_check_and_clear_guest_paused(void)
>  
>  static inline unsigned int kvm_arch_para_features(void)
>  {
> - return 0;
> + unsigned int features = 0;
> +
> + if (kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_APF))
> + features |= (1 << KVM_FEATURE_ASYNC_PF);
> +
> + return features;
>  }
>  
>  static inline unsigned int kvm_arch_para_hints(void)
>  {
>   return 0;
>  }
> -
> -static inline bool kvm_para_available(void)
> -{
> - return false;
> -}
> -
>  #endif /* _ASM_ARM_KVM_PARA_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ddcde093c433..15efd57129ff 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -751,12 +751,45 @@ finish_ret_to_user:
>   enable_step_tsk x1, x2
>  #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>   bl  stackleak_erase
> +#endif
> +#ifdef CONFIG_KVM_GUEST
> + bl  kvm_async_pf_delayed_wake
>  #endif

Yuck. I am very much not keen on this living in the entry assembly.

What precisely is this needed for?

>   kernel_exit 0
>  ENDPROC(ret_to_user)
>  
>   .popsection // .entry.text
>  
> +#ifdef CONFIG_KVM_GUEST
> + .pushsection ".rodata", "a"
> +SY

Re: [PATCH RFCv2 7/9] kvm/arm64: Support async page fault

2020-05-26 Thread Mark Rutland

On Fri, May 08, 2020 at 01:29:17PM +1000, Gavin Shan wrote:
> There are two stages of fault pages and the stage one page fault is
> handled by guest itself. The guest is trapped to host when the page
> fault is caused by stage 2 page table, for example missing. The guest
> is suspended until the requested page is populated. To populate the
> requested page can be related to IO activities if the page was swapped
> out previously. In this case, the guest has to suspend for a few of
> milliseconds at least, regardless of the overall system load. There
> is no useful work done during the suspended period from guest's view.

This is a bit difficult to read. How about:

| When a vCPU triggers a Stage-2 fault (e.g. when accessing a page that
| is not mapped at Stage-2), the vCPU is suspended until the host has
| handled the fault. It can take the host milliseconds or longer to
| handle the fault as this may require IO, and when the system load is
| low neither the host nor guest perform useful work during such
| periods.

> 
> This adds asychornous page fault to improve the situation. A signal

Nit: typo for `asynchronous` here, and there are a few other typos in
the patch itself. It would be nice if you could run a spellcheck over
that.

> (PAGE_NOT_PRESENT) is sent to guest if the requested page needs some time
> to be populated. Guest might reschedule to another running process if
> possible. Otherwise, the vCPU is put into power-saving mode, which is
> actually to cause vCPU reschedule from host's view. A followup signal
> (PAGE_READY) is sent to guest once the requested page is populated.
> The suspended task is waken up or scheduled when guest receives the
> signal. With this mechanism, the vCPU won't be stuck when the requested
> page is being populated by host.

It would probably be best to say 'notification' rather than 'signal'
here, and say 'the guest is notified', etc. As above, it seems that this
is per-vCPU, so it's probably better to say 'vCPU' rather than guest, to
make it clear which context this applies to.

> 
> There are more details highlighted as below. Note the implementation is
> similar to what x86 has to some extent:
> 
>* A dedicated SMCCC ID is reserved to enable, disable or configure
>  the functionality. The only 64-bits parameter is conveyed by two
>  registers (w2/w1). Bits[63:56] is the bitmap used to specify the
>  operated functionality like enabling/disabling/configuration. The
>  bits[55:6] is the physical address of control block or external
>  data abort injection disallowed region. Bit[5:0] are used to pass
>  control flags.
> 
>* Signal (PAGE_NOT_PRESENT) is sent to guest if the requested page
>  isn't ready. In the mean while, a work is started to populate the
>  page asynchronously in background. The stage 2 page table entry is
>  updated accordingly and another signal (PAGE_READY) is fired after
>  the request page is populted. The signals is notified by injected
>  data abort fault.
> 
>* The signals are fired and consumed in sequential fashion. It means
>  no more signals will be fired if there is pending one, awaiting the
>  guest to consume. It's because the injected data abort faults have
>  to be done in sequential fashion.
> 
> Signed-off-by: Gavin Shan 
> ---
>  arch/arm64/include/asm/kvm_host.h  |  43 
>  arch/arm64/include/asm/kvm_para.h  |  27 ++
>  arch/arm64/include/uapi/asm/Kbuild |   2 -
>  arch/arm64/include/uapi/asm/kvm_para.h |  22 ++
>  arch/arm64/kvm/Kconfig |   1 +
>  arch/arm64/kvm/Makefile|   2 +
>  include/linux/arm-smccc.h  |   6 +
>  virt/kvm/arm/arm.c |  36 ++-
>  virt/kvm/arm/async_pf.c| 335 +
>  virt/kvm/arm/hypercalls.c  |   8 +
>  virt/kvm/arm/mmu.c |  29 ++-
>  11 files changed, 506 insertions(+), 5 deletions(-)
>  create mode 100644 arch/arm64/include/asm/kvm_para.h
>  create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h
>  create mode 100644 virt/kvm/arm/async_pf.c
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index f77c706777ec..a207728d6f3f 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -250,6 +250,23 @@ struct vcpu_reset_state {
>   boolreset;
>  };
>  
> +#ifdef CONFIG_KVM_ASYNC_PF
> +
> +/* Should be a power of two number */
> +#define ASYNC_PF_PER_VCPU64

What exactly is this number?

> +
> +/*
> + * The association of gfn and token. The token will be sent to guest as
> + * page fault address. Also, the guest could be in aarch32 mode. So its
> + * length should be 32-bits.
> + */

The length of what should be 32-bit? The token?

The guest sees the token as the fault address? How exactly is that
exposed to the guest, is that via a synthetic S1 fault?

> +struct kvm_arch_async_pf {
> + u32

Re: [PATCH RFCv2 6/9] kvm/arm64: Export kvm_handle_user_mem_abort() with prefault mode

2020-05-26 Thread Mark Rutland

On Fri, May 08, 2020 at 01:29:16PM +1000, Gavin Shan wrote:
> This renames user_mem_abort() to kvm_handle_user_mem_abort(), and
> then export it. The function will be used in asynchronous page fault
> to populate a page table entry once the corresponding page is populated
> from the backup device (e.g. swap partition):
> 
>* Parameter @fault_status is replace by @esr.
>* The parameters are reorder based on their importance.

It seems like multiple changes are going on here, and it would be
clearer with separate patches.

Passing the ESR rather than the extracted fault status seems fine, but
for clarirty it's be nicer to do this in its own patch.

Why is it necessary to re-order the function parameters? Does that align
with other function prototypes?

What exactly is the `prefault` parameter meant to do? It doesn't do
anything currently, so it'd be better to introduce it later when logic
using it is instroduced, or where callers will pass distinct values.

Thanks,
Mark.

> 
> This shouldn't cause any functional changes.
> 
> Signed-off-by: Gavin Shan 
> ---
>  arch/arm64/include/asm/kvm_host.h |  4 
>  virt/kvm/arm/mmu.c| 14 --
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 32c8a675e5a4..f77c706777ec 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -437,6 +437,10 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
> struct kvm_vcpu_events *events);
>  
>  #define KVM_ARCH_WANT_MMU_NOTIFIER
> +int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu, unsigned int esr,
> +   struct kvm_memory_slot *memslot,
> +   phys_addr_t fault_ipa, unsigned long hva,
> +   bool prefault);
>  int kvm_unmap_hva_range(struct kvm *kvm,
>   unsigned long start, unsigned long end);
>  int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index e462e0368fd9..95aaabb2b1fc 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1656,12 +1656,12 @@ static bool fault_supports_stage2_huge_mapping(struct 
> kvm_memory_slot *memslot,
>  (hva & ~(map_size - 1)) + map_size <= uaddr_end;
>  }
>  
> -static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -   struct kvm_memory_slot *memslot, unsigned long hva,
> -   unsigned long fault_status)
> +int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu, unsigned int esr,
> +   struct kvm_memory_slot *memslot,
> +   phys_addr_t fault_ipa, unsigned long hva,
> +   bool prefault)
>  {
> - int ret;
> - u32 esr = kvm_vcpu_get_esr(vcpu);
> + unsigned int fault_status = kvm_vcpu_trap_get_fault_type(esr);
>   bool write_fault, writable, force_pte = false;
>   bool exec_fault, needs_exec;
>   unsigned long mmu_seq;
> @@ -1674,6 +1674,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   pgprot_t mem_type = PAGE_S2;
>   bool logging_active = memslot_is_logging(memslot);
>   unsigned long vma_pagesize, flags = 0;
> + int ret;
>  
>   write_fault = kvm_is_write_fault(esr);
>   exec_fault = kvm_vcpu_trap_is_iabt(esr);
> @@ -1995,7 +1996,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   goto out_unlock;
>   }
>  
> - ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
> + ret = kvm_handle_user_mem_abort(vcpu, esr, memslot,
> + fault_ipa, hva, false);
>   if (ret == 0)
>   ret = 1;
>  out:
> -- 
> 2.23.0
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH RFCv2 4/9] kvm/arm64: Detach ESR operator from vCPU struct

2020-05-26 Thread Mark Rutland

On Fri, May 08, 2020 at 01:29:14PM +1000, Gavin Shan wrote:
> There are a set of inline functions defined in kvm_emulate.h. Those
> functions reads ESR from vCPU fault information struct and then operate
> on it. So it's tied with vCPU fault information and vCPU struct. It
> limits their usage scope.
> 
> This detaches these functions from the vCPU struct. With this, the
> caller has flexibility on where the ESR is read. It shouldn't cause
> any functional changes.
> 
> Signed-off-by: Gavin Shan 
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 83 +++-
>  arch/arm64/kvm/handle_exit.c | 20 --
>  arch/arm64/kvm/hyp/switch.c  | 24 ---
>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c |  7 +-
>  arch/arm64/kvm/inject_fault.c|  4 +-
>  arch/arm64/kvm/sys_regs.c| 12 ++--
>  virt/kvm/arm/arm.c   |  4 +-
>  virt/kvm/arm/hyp/aarch32.c   |  2 +-
>  virt/kvm/arm/hyp/vgic-v3-sr.c|  5 +-
>  virt/kvm/arm/mmio.c  | 27 
>  virt/kvm/arm/mmu.c   | 22 ---
>  11 files changed, 112 insertions(+), 98 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index bd1a69e7c104..2873bf6dc85e 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -270,10 +270,8 @@ static __always_inline u32 kvm_vcpu_get_esr(const struct 
> kvm_vcpu *vcpu)
>   return vcpu->arch.fault.esr_el2;
>  }
>  
> -static __always_inline int kvm_vcpu_get_condition(const struct kvm_vcpu 
> *vcpu)
> +static __always_inline int kvm_vcpu_get_condition(u32 esr)

Given the `vcpu` argument has been removed, it's odd to keep `vcpu` in the
name, rather than `esr`.

e.g. this would make more sense as something like esr_get_condition().

... and if we did something like that, we could move most of the
extraction functions into , and share them with non-KVM code.

Otherwise, do you need to extract all of these for your use-case, or do
you only need a few of the helpers? If you only need a few, it might be
better to only factor those out for now, and keep the existing API in
place with wrappers, e.g. have:

| esr_get_condition(u32 esr) {
|   ... 
| }
| 
| kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
| {
|   return esr_get_condition(kvm_vcpu_get_esr(vcpu));
| }

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH RFCv2 5/9] kvm/arm64: Replace hsr with esr

2020-05-26 Thread Mark Rutland

On Fri, May 08, 2020 at 01:29:15PM +1000, Gavin Shan wrote:
> This replace the variable names to make them self-explaining. The
> tracepoint isn't changed accordingly because they're part of ABI:
> 
>* @hsr to @esr
>* @hsr_ec to @ec
>* Use kvm_vcpu_trap_get_class() helper if possible
> 
> Signed-off-by: Gavin Shan 

As with patch 3, I think this cleanup makes sense independent from the
rest of the series, and I think it'd make sense to bundle all the
patches renaming hsr -> esr, and send those as a preparatory series.

Thanks,
Mark.

> ---
>  arch/arm64/kvm/handle_exit.c | 28 ++--
>  arch/arm64/kvm/hyp/switch.c  |  9 -
>  arch/arm64/kvm/sys_regs.c| 30 +++---
>  3 files changed, 33 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 00858db82a64..e3b3dcd5b811 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -123,13 +123,13 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   */
>  static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> - u32 hsr = kvm_vcpu_get_esr(vcpu);
> + u32 esr = kvm_vcpu_get_esr(vcpu);
>   int ret = 0;
>  
>   run->exit_reason = KVM_EXIT_DEBUG;
> - run->debug.arch.hsr = hsr;
> + run->debug.arch.hsr = esr;
>  
> - switch (ESR_ELx_EC(hsr)) {
> + switch (kvm_vcpu_trap_get_class(esr)) {
>   case ESR_ELx_EC_WATCHPT_LOW:
>   run->debug.arch.far = vcpu->arch.fault.far_el2;
>   /* fall through */
> @@ -139,8 +139,8 @@ static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   case ESR_ELx_EC_BRK64:
>   break;
>   default:
> - kvm_err("%s: un-handled case hsr: %#08x\n",
> - __func__, (unsigned int) hsr);
> + kvm_err("%s: un-handled case esr: %#08x\n",
> + __func__, (unsigned int)esr);
>   ret = -1;
>   break;
>   }
> @@ -150,10 +150,10 @@ static int kvm_handle_guest_debug(struct kvm_vcpu 
> *vcpu, struct kvm_run *run)
>  
>  static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> - u32 hsr = kvm_vcpu_get_esr(vcpu);
> + u32 esr = kvm_vcpu_get_esr(vcpu);
>  
> - kvm_pr_unimpl("Unknown exception class: hsr: %#08x -- %s\n",
> -   hsr, esr_get_class_string(hsr));
> + kvm_pr_unimpl("Unknown exception class: esr: %#08x -- %s\n",
> +   esr, esr_get_class_string(esr));
>  
>   kvm_inject_undefined(vcpu);
>   return 1;
> @@ -230,10 +230,10 @@ static exit_handle_fn arm_exit_handlers[] = {
>  
>  static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>  {
> - u32 hsr = kvm_vcpu_get_esr(vcpu);
> - u8 hsr_ec = ESR_ELx_EC(hsr);
> + u32 esr = kvm_vcpu_get_esr(vcpu);
> + u8 ec = kvm_vcpu_trap_get_class(esr);
>  
> - return arm_exit_handlers[hsr_ec];
> + return arm_exit_handlers[ec];
>  }
>  
>  /*
> @@ -273,15 +273,15 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run 
> *run,
>  {
>   if (ARM_SERROR_PENDING(exception_index)) {
>   u32 esr = kvm_vcpu_get_esr(vcpu);
> - u8 hsr_ec = ESR_ELx_EC(esr);
> + u8 ec = kvm_vcpu_trap_get_class(esr);
>  
>   /*
>* HVC/SMC already have an adjusted PC, which we need
>* to correct in order to return to after having
>* injected the SError.
>*/
> - if (hsr_ec == ESR_ELx_EC_HVC32 || hsr_ec == ESR_ELx_EC_HVC64 ||
> - hsr_ec == ESR_ELx_EC_SMC32 || hsr_ec == ESR_ELx_EC_SMC64) {
> + if (ec == ESR_ELx_EC_HVC32 || ec == ESR_ELx_EC_HVC64 ||
> + ec == ESR_ELx_EC_SMC32 || ec == ESR_ELx_EC_SMC64) {
>   u32 adj =  kvm_vcpu_trap_il_is32bit(esr) ? 4 : 2;
>   *vcpu_pc(vcpu) -= adj;
>   }
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 369f22f49f3d..7bf4840bf90e 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -356,8 +356,8 @@ static bool __hyp_text __populate_fault_info(struct 
> kvm_vcpu *vcpu)
>  static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
>  {
>   u32 esr = kvm_vcpu_get_esr(vcpu);
> + u8 ec = kvm_vcpu_trap_get_class(esr);
>   bool vhe, sve_guest, sve_host;
> - u8 hsr_ec;
>  
>   if (!system_supports_fpsimd())
>   return false;
> @@ -372,14 +372,13 @@ static bool __hyp_text __hyp_handle_fpsimd(struct 
> kvm_vcpu *vcpu)
>   vhe = has_vhe();
>   }
>  
> - hsr_ec = kvm_vcpu_trap_get_class(esr);
> - if (hsr_ec != ESR_ELx_EC_FP_ASIMD &&
> - hsr_ec != ESR_ELx_EC_SVE)
> + if (ec != ESR_ELx_EC_FP_ASIMD &&
> + ec != ESR_ELx_EC_SVE)
>   return false;
>  
>

Re: [PATCH RFCv2 3/9] kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr()

2020-05-26 Thread Mark Rutland

On Fri, May 08, 2020 at 01:29:13PM +1000, Gavin Shan wrote:
> Since kvm/arm32 was removed, this renames kvm_vcpu_get_hsr() to
> kvm_vcpu_get_esr() to it a bit more self-explaining because the
> functions returns ESR instead of HSR on aarch64. This shouldn't
> cause any functional changes.
> 
> Signed-off-by: Gavin Shan 

I think that this would be a nice cleanup on its own, and could be taken
independently of the rest of this series if it were rebased and sent as
a single patch.

Mark.

> ---
>  arch/arm64/include/asm/kvm_emulate.h | 36 +++-
>  arch/arm64/kvm/handle_exit.c | 12 +-
>  arch/arm64/kvm/hyp/switch.c  |  2 +-
>  arch/arm64/kvm/sys_regs.c|  6 ++---
>  virt/kvm/arm/hyp/aarch32.c   |  2 +-
>  virt/kvm/arm/hyp/vgic-v3-sr.c|  4 ++--
>  virt/kvm/arm/mmu.c   |  6 ++---
>  7 files changed, 35 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index a30b4eec7cb4..bd1a69e7c104 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -265,14 +265,14 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu 
> *vcpu)
>   return mode != PSR_MODE_EL0t;
>  }
>  
> -static __always_inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
> +static __always_inline u32 kvm_vcpu_get_esr(const struct kvm_vcpu *vcpu)
>  {
>   return vcpu->arch.fault.esr_el2;
>  }
>  
>  static __always_inline int kvm_vcpu_get_condition(const struct kvm_vcpu 
> *vcpu)
>  {
> - u32 esr = kvm_vcpu_get_hsr(vcpu);
> + u32 esr = kvm_vcpu_get_esr(vcpu);
>  
>   if (esr & ESR_ELx_CV)
>   return (esr & ESR_ELx_COND_MASK) >> ESR_ELx_COND_SHIFT;
> @@ -297,64 +297,66 @@ static inline u64 kvm_vcpu_get_disr(const struct 
> kvm_vcpu *vcpu)
>  
>  static inline u32 kvm_vcpu_hvc_get_imm(const struct kvm_vcpu *vcpu)
>  {
> - return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_xVC_IMM_MASK;
> + return kvm_vcpu_get_esr(vcpu) & ESR_ELx_xVC_IMM_MASK;
>  }
>  
>  static __always_inline bool kvm_vcpu_dabt_isvalid(const struct kvm_vcpu 
> *vcpu)
>  {
> - return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_ISV);
> + return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_ISV);
>  }
>  
>  static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct 
> kvm_vcpu *vcpu)
>  {
> - return kvm_vcpu_get_hsr(vcpu) & (ESR_ELx_CM | ESR_ELx_WNR | 
> ESR_ELx_FSC);
> + return kvm_vcpu_get_esr(vcpu) &
> +(ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC);
>  }
>  
>  static inline bool kvm_vcpu_dabt_issext(const struct kvm_vcpu *vcpu)
>  {
> - return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SSE);
> + return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_SSE);
>  }
>  
>  static inline bool kvm_vcpu_dabt_issf(const struct kvm_vcpu *vcpu)
>  {
> - return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SF);
> + return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_SF);
>  }
>  
>  static __always_inline int kvm_vcpu_dabt_get_rd(const struct kvm_vcpu *vcpu)
>  {
> - return (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SRT_MASK) >> ESR_ELx_SRT_SHIFT;
> + return (kvm_vcpu_get_esr(vcpu) & ESR_ELx_SRT_MASK) >> ESR_ELx_SRT_SHIFT;
>  }
>  
>  static __always_inline bool kvm_vcpu_dabt_iss1tw(const struct kvm_vcpu *vcpu)
>  {
> - return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_S1PTW);
> + return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_S1PTW);
>  }
>  
>  static __always_inline bool kvm_vcpu_dabt_iswrite(const struct kvm_vcpu 
> *vcpu)
>  {
> - return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WNR) ||
> + return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WNR) ||
>   kvm_vcpu_dabt_iss1tw(vcpu); /* AF/DBM update */
>  }
>  
>  static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu)
>  {
> - return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_CM);
> + return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_CM);
>  }
>  
>  static __always_inline unsigned int kvm_vcpu_dabt_get_as(const struct 
> kvm_vcpu *vcpu)
>  {
> - return 1 << ((kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SAS) >> 
> ESR_ELx_SAS_SHIFT);
> + return 1 << ((kvm_vcpu_get_esr(vcpu) & ESR_ELx_SAS) >>
> +  ESR_ELx_SAS_SHIFT);
>  }
>  
>  /* This one is not specific to Data Abort */
>  static __always_inline bool kvm_vcpu_trap_il_is32bit(const struct kvm_vcpu 
> *vcpu)
>  {
> - return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_IL);
> + return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_IL);
>  }
>  
>  static __always_inline u8 kvm_vcpu_trap_get_class(const struct kvm_vcpu 
> *vcpu)
>  {
> - return ESR_ELx_EC(kvm_vcpu_get_hsr(vcpu));
> + return ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
>  }
>  
>  static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
> @@ -364,12 +366,12 @@ static inline bool kvm_vcpu_trap_is_iabt(const struct 
> kvm_vcpu *vcpu)
>  
>  static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu 
> *vcpu)
>  {
> - return kvm_vcpu_get_hsr(vcpu) &

Re: [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL

2020-05-19 Thread Mark Rutland

On Wed, Apr 22, 2020 at 01:00:50PM +0100, Marc Zyngier wrote:
> We currently assume that an exception is delivered to EL1, always.
> Once we emulate EL2, this no longer will be the case. To prepare
> for this, add a target_mode parameter.
> 
> While we're at it, merge the computing of the target PC and PSTATE in
> a single function that updates both PC and CPSR after saving their
> previous values in the corresponding ELR/SPSR. This ensures that they
> are updated in the correct order (a pretty common source of bugs...).
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/inject_fault.c | 75 ++-
>  1 file changed, 38 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index d3ebf8bca4b89..3dbcbc839b9c3 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -26,28 +26,12 @@ enum exception_type {
>   except_type_serror  = 0x180,
>  };
>  
> -static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
> -{
> - u64 exc_offset;
> -
> - switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
> - case PSR_MODE_EL1t:
> - exc_offset = CURRENT_EL_SP_EL0_VECTOR;
> - break;
> - case PSR_MODE_EL1h:
> - exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> - break;
> - case PSR_MODE_EL0t:
> - exc_offset = LOWER_EL_AArch64_VECTOR;
> - break;
> - default:
> - exc_offset = LOWER_EL_AArch32_VECTOR;
> - }
> -
> - return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
> -}
> -
>  /*
> + * This performs the exception entry at a given EL (@target_mode), stashing 
> PC
> + * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
> + * The EL passed to this function *must* be a non-secure, privileged mode 
> with
> + * bit 0 being set (PSTATE.SP == 1).
> + *
>   * When an exception is taken, most PSTATE fields are left unchanged in the
>   * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily 
> all
>   * of the inherited bits have the same position in the AArch64/AArch32 
> SPSR_ELx
> @@ -59,10 +43,35 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum 
> exception_type type)
>   * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, 
> from
>   * MSB to LSB.
>   */
> -static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
> +static void enter_exception(struct kvm_vcpu *vcpu, unsigned long target_mode,
> + enum exception_type type)

Since this is all for an AArch64 target, could we keep `64` in the name,
e.g enter_exception64? That'd mirror the callers below.

>  {
> - unsigned long sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
> - unsigned long old, new;
> + unsigned long sctlr, vbar, old, new, mode;
> + u64 exc_offset;
> +
> + mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> + if  (mode == target_mode)
> + exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> + else if ((mode | 1) == target_mode)
> + exc_offset = CURRENT_EL_SP_EL0_VECTOR;

It would be nice if we could add a mnemonic for the `1` here, e.g.
PSR_MODE_SP0 or PSR_MODE_THREAD_BIT.

> + else if (!(mode & PSR_MODE32_BIT))
> + exc_offset = LOWER_EL_AArch64_VECTOR;
> + else
> + exc_offset = LOWER_EL_AArch32_VECTOR;

Other than the above, I couldn't think of a nicer way of writing thism
and AFAICT this is correct.

> +
> + switch (target_mode) {
> + case PSR_MODE_EL1h:
> + vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
> + sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
> + vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
> + break;
> + default:
> + /* Don't do that */
> + BUG();
> + }
> +
> + *vcpu_pc(vcpu) = vbar + exc_offset + type;
>  
>   old = *vcpu_cpsr(vcpu);
>   new = 0;
> @@ -105,9 +114,10 @@ static unsigned long get_except64_pstate(struct kvm_vcpu 
> *vcpu)
>   new |= PSR_I_BIT;
>   new |= PSR_F_BIT;
>  
> - new |= PSR_MODE_EL1h;
> + new |= target_mode;

As a heads-up, some of the other bits will need to change for an EL2
target (e.g. SPAN will depend on HCR_EL2.E2H), but as-is this this is
fine.

Regardless of the above comments:

Reviewed-by: Mark Rutland 

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH kvmtool] rtc: Generate fdt node for the real-time clock

2020-05-14 Thread Mark Rutland

Hi Andre,

On Thu, May 14, 2020 at 10:45:53AM +0100, Andre Przywara wrote:
> On arm and arm64 we expose the Motorola RTC emulation to the guest,
> but never advertised this in the device tree.
> 
> EDK-2 seems to rely on this device, but on its hardcoded address. To
> make this more future-proof, add a DT node with the address in it.
> EDK-2 can then read the proper address from there, and we can change
> this address later (with the flexible memory layout).
> 
> Please note that an arm64 Linux kernel is not ready to use this device,
> there are some include files missing under arch/arm64 to compile the
> driver. I hacked this up in the kernel, just to verify this DT snippet
> is correct, but don't see much value in enabling this properly in
> Linux.
>
> Signed-off-by: Andre Przywara 

With EFI at least, the expectation is that the RTC is accesses via the
runtime EFI services. So as long as EFI knows about the RTC and the
kernel knows about EFI, the kernel can use the RTC that way. It would be
problematic were the kernel to mess with the RTC behind the back of EFI
or vice-versa, so it doesn't make sense to expose voth view to the
kernel simultaneously.

I don't think it makes sense to expose this in the DT unless EFI were
also clearing this from the DT before handing that on to Linux. If we
have that, I think it'd be fine, but on its own this patch introduces a
potnetial problem that I think we should avoid.

Thanks,
Mark.

> ---
>  hw/rtc.c | 44 ++--
>  1 file changed, 38 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/rtc.c b/hw/rtc.c
> index c1fa72f2..5483879f 100644
> --- a/hw/rtc.c
> +++ b/hw/rtc.c
> @@ -130,24 +130,56 @@ static struct ioport_operations 
> cmos_ram_index_ioport_ops = {
>   .io_out = cmos_ram_index_out,
>  };
>  
> +#ifdef CONFIG_HAS_LIBFDT
> +static void generate_rtc_fdt_node(void *fdt,
> +   struct device_header *dev_hdr,
> +   void (*generate_irq_prop)(void *fdt,
> + u8 irq,
> + enum irq_type))
> +{
> + u64 reg_prop[2] = { cpu_to_fdt64(0x70), cpu_to_fdt64(2) };
> +
> + _FDT(fdt_begin_node(fdt, "rtc"));
> + _FDT(fdt_property_string(fdt, "compatible", "motorola,mc146818"));
> + _FDT(fdt_property(fdt, "reg", reg_prop, sizeof(reg_prop)));
> + _FDT(fdt_end_node(fdt));
> +}
> +#else
> +#define generate_rtc_fdt_node NULL
> +#endif
> +
> +struct device_header rtc_dev_hdr = {
> + .bus_type = DEVICE_BUS_IOPORT,
> + .data = generate_rtc_fdt_node,
> +};
> +
>  int rtc__init(struct kvm *kvm)
>  {
> - int r = 0;
> + int r;
> +
> + r = device__register(&rtc_dev_hdr);
> + if (r < 0)
> + return r;
>  
>   /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
>   r = ioport__register(kvm, 0x0070, &cmos_ram_index_ioport_ops, 1, NULL);
>   if (r < 0)
> - return r;
> + goto out_device;
>  
>   r = ioport__register(kvm, 0x0071, &cmos_ram_data_ioport_ops, 1, NULL);
> - if (r < 0) {
> - ioport__unregister(kvm, 0x0071);
> - return r;
> - }
> + if (r < 0)
> + goto out_ioport;
>  
>   /* Set the VRT bit in Register D to indicate valid RAM and time */
>   rtc.cmos_data[RTC_REG_D] = RTC_REG_D_VRT;
>  
> + return r;
> +
> +out_ioport:
> + ioport__unregister(kvm, 0x0070);
> +out_device:
> + device__unregister(&rtc_dev_hdr);
> +
>   return r;
>  }
>  dev_init(rtc__init);
> -- 
> 2.17.1
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] arm64/cpufeature: Add ID_AA64MMFR0_PARANGE_MASK

2020-05-12 Thread Mark Rutland

On Tue, May 12, 2020 at 11:53:43AM +0100, Mark Rutland wrote:
> >
> > /* Clamp the IPA limit to the PA size supported by the kernel */
> > ipa_max = (pa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : pa_max;
> > @@ -411,7 +411,8 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long 
> > type)
> > phys_shift = KVM_PHYS_SHIFT;
> > }
> >  
> > -   parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7;
> > +   parange = id_aa64mmfr0_parange(read_sanitised_ftr_reg
> > +   (SYS_ID_AA64MMFR0_EL1));
> 
> Can't we add a system_ipa_range() helper, and avoid more boilerplate in
> each of these?
> 
> e.g.
> 
> int system_ipa_range(void)
> {
>   u64 mmfr0;
>   int parange;
> 
>   mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
>   parange = cpuid_feature_extract_unsigned_field(mmfr0,
>   ID_AA64MMFR0_PARANGE_SHIFT);
>   
>   return parange;
> }

As Per MarcZ's comments, that should be system_pa_range() rather than
system_ipa_range().

Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH] arm64/cpufeature: Add ID_AA64MMFR0_PARANGE_MASK

2020-05-12 Thread Mark Rutland

On Tue, May 12, 2020 at 07:43:26AM +0530, Anshuman Khandual wrote:
> This replaces multiple open encoding (0x7) with ID_AA64MMFR0_PARANGE_MASK
> thus cleaning the clutter. It modifies an existing ID_AA64MMFR0 helper and
> introduces a new one i.e id_aa64mmfr0_iparange() and id_aa64mmfr0_parange()
> respectively.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> Cc: kvmarm@lists.cs.columbia.edu
> 
> Signed-off-by: Anshuman Khandual 
> ---
> This applies after (https://patchwork.kernel.org/patch/11541893/).
> 
>  arch/arm64/include/asm/cpufeature.h | 11 ++-
>  arch/arm64/kernel/cpufeature.c  |  5 ++---
>  arch/arm64/kvm/reset.c  |  9 +
>  3 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h 
> b/arch/arm64/include/asm/cpufeature.h
> index 1291ad5a9ccb..320cfc5b6025 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -706,8 +706,17 @@ void arm64_set_ssbd_mitigation(bool state);
>  
>  extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
>  
> -static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
> +#define ID_AA64MMFR0_PARANGE_MASK 0x7

We already have ID_AA64MMFR0_PARANGE_SHIFT in , so if we
need this it should live there too.

The ARM ARM tells me ID_AA64MMFR0_EL1.PARange is bits 3:0, so this
should be 0xf.

Given it's a standard 4-bit field, do we even need this? We have helpers
that assume 4 bits for standard fields, e.g.
cpuid_feature_extract_unsigned_field().

> +
> +static inline u32 id_aa64mmfr0_parange(u64 mmfr0)
>  {
> + return mmfr0 & ID_AA64MMFR0_PARANGE_MASK;
> +}

return cpuid_feature_extract_unsigned_field(mmfr0,
ID_AA64MMFR0_PARANGE_SHIFT);

> +
> +static inline u32 id_aa64mmfr0_iparange(u64 mmfr0)
> +{
> + int parange = id_aa64mmfr0_parange(mmfr0);
> +
>   switch (parange) {
>   case 0: return 32;
>   case 1: return 36;
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 30917fe7942a..2c62f7c64a3c 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2185,7 +2185,7 @@ static void verify_sve_features(void)
>  void verify_hyp_capabilities(void)
>  {
>   u64 safe_mmfr1, mmfr0, mmfr1;
> - int parange, ipa_max;
> + int ipa_max;
>   unsigned int safe_vmid_bits, vmid_bits;
>  
>   safe_mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> @@ -2201,8 +2201,7 @@ void verify_hyp_capabilities(void)
>   }
>  
>   /* Verify IPA range */
> - parange = mmfr0 & 0x7;
> - ipa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
> + ipa_max = id_aa64mmfr0_iparange(mmfr0);

Why drop id_aa64mmfr0_parange_to_phys_shift()?

>   if (ipa_max < get_kvm_ipa_limit()) {
>   pr_crit("CPU%d: IPA range mismatch\n", smp_processor_id());
>   cpu_die_early();
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 841b492ff334..2e4da75d79ea 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -347,10 +347,10 @@ u32 get_kvm_ipa_limit(void)
>  
>  void kvm_set_ipa_limit(void)
>  {
> - unsigned int ipa_max, pa_max, va_max, parange;
> + unsigned int ipa_max, pa_max, va_max;
>  
> - parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7;
> - pa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
> + pa_max = id_aa64mmfr0_iparange(read_sanitised_ftr_reg
> + (SYS_ID_AA64MMFR0_EL1));

Weird style here. the '(' should be kept next to the function name.

>
>   /* Clamp the IPA limit to the PA size supported by the kernel */
>   ipa_max = (pa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : pa_max;
> @@ -411,7 +411,8 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long 
> type)
>   phys_shift = KVM_PHYS_SHIFT;
>   }
>  
> - parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7;
> + parange = id_aa64mmfr0_parange(read_sanitised_ftr_reg
> + (SYS_ID_AA64MMFR0_EL1));

Can't we add a system_ipa_range() helper, and avoid more boilerplate in
each of these?

e.g.

int system_ipa_range(void)
{
u64 mmfr0;
int parange;

mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
parange = cpuid_feature_extract_unsigned_field(mmfr0,
ID_AA64MMFR0_PARANGE_SHIFT);

return parange;
}

... we do similar for the system_supports_xxx() helpers.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH V3 04/16] arm64/cpufeature: Introduce ID_PFR2 CPU register

2020-05-05 Thread Mark Rutland

On Tue, May 05, 2020 at 01:12:39PM +0100, Will Deacon wrote:
> On Tue, May 05, 2020 at 12:50:54PM +0100, Mark Rutland wrote:
> > On Tue, May 05, 2020 at 12:27:19PM +0100, Will Deacon wrote:
> > > On Tue, May 05, 2020 at 12:16:07PM +0100, Mark Rutland wrote:
> > > > On Tue, May 05, 2020 at 12:12:41PM +0100, Will Deacon wrote:
> > > > > On Sat, May 02, 2020 at 07:03:53PM +0530, Anshuman Khandual wrote:
> > > > > > diff --git a/arch/arm64/include/asm/sysreg.h 
> > > > > > b/arch/arm64/include/asm/sysreg.h
> > > > > > index e5317a6367b6..c977449e02db 100644
> > > > > > --- a/arch/arm64/include/asm/sysreg.h
> > > > > > +++ b/arch/arm64/include/asm/sysreg.h
> > > > > > @@ -153,6 +153,7 @@
> > > > > >  #define SYS_MVFR0_EL1  sys_reg(3, 0, 0, 3, 0)
> > > > > >  #define SYS_MVFR1_EL1  sys_reg(3, 0, 0, 3, 1)
> > > > > >  #define SYS_MVFR2_EL1  sys_reg(3, 0, 0, 3, 2)
> > > > > > +#define SYS_ID_PFR2_EL1sys_reg(3, 0, 0, 3, 4)
> > > > > 
> > > > > nit: but please group these defines by name rather than encoding.
> > > > 
> > > > So far we've *always* grouped these by encoding in this file, so can we
> > > > keep things that way for now? Otherwise we're inconsistent with both
> > > > schemes.
> > > 
> > > Hmm, but it's really hard to read sorted that way and we'll end up with
> > > duplicate definitions like we had for some of the field offsets already.
> > 
> > I appreciate that, and don't disagree that the current scheme is not
> > obvious.
> > 
> > I just want to ensure that we don't make things less consistent, and if
> > we're going to change the scheme in order to make that easier, it should
> > be a separate patch. There'll be other changes like MMFR4_EL1, and we
> > should probably add a comment as to what the policy is either way (e.g.
> > if we're just grouping at the top level, or if that should be sorted
> > too).
> 
> Ok, I added a comment below.

Thanks!

Acked-by: Mark Rutland 

Mark.

> 
> Will
> 
> --->8
> 
> commit be7ab6a6cdb0a6d7b10883094c2adf96f5d4e1e8
> Author: Will Deacon 
> Date:   Tue May 5 13:08:02 2020 +0100
> 
> arm64: cpufeature: Group indexed system register definitions by name
> 
> Some system registers contain an index in the name (e.g. ID_MMFR_EL1)
> and, while this index often follows the register encoding, newer additions
> to the architecture are necessarily tacked on the end. Sorting these
> registers by encoding therefore becomes a bit of a mess.
> 
> Group the indexed system register definitions by name so that it's easier 
> to
> read and will hopefully reduce the chance of us accidentally introducing
> duplicate definitions in the future.
> 
> Signed-off-by: Will Deacon 
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 2dd3f4ca9780..194684301df0 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -105,6 +105,10 @@
>  #define SYS_DC_CSW   sys_insn(1, 0, 7, 10, 2)
>  #define SYS_DC_CISW  sys_insn(1, 0, 7, 14, 2)
>  
> +/*
> + * System registers, organised loosely by encoding but grouped together
> + * where the architected name contains an index. e.g. ID_MMFR_EL1.
> + */
>  #define SYS_OSDTRRX_EL1  sys_reg(2, 0, 0, 0, 2)
>  #define SYS_MDCCINT_EL1  sys_reg(2, 0, 0, 2, 0)
>  #define SYS_MDSCR_EL1sys_reg(2, 0, 0, 2, 2)
> @@ -140,6 +144,7 @@
>  #define SYS_ID_MMFR1_EL1 sys_reg(3, 0, 0, 1, 5)
>  #define SYS_ID_MMFR2_EL1 sys_reg(3, 0, 0, 1, 6)
>  #define SYS_ID_MMFR3_EL1 sys_reg(3, 0, 0, 1, 7)
> +#define SYS_ID_MMFR4_EL1 sys_reg(3, 0, 0, 2, 6)
>  
>  #define SYS_ID_ISAR0_EL1 sys_reg(3, 0, 0, 2, 0)
>  #define SYS_ID_ISAR1_EL1 sys_reg(3, 0, 0, 2, 1)
> @@ -147,7 +152,6 @@
>  #define SYS_ID_ISAR3_EL1 sys_reg(3, 0, 0, 2, 3)
>  #define SYS_ID_ISAR4_EL1 sys_reg(3, 0, 0, 2, 4)
>  #define SYS_ID_ISAR5_EL1 sys_reg(3, 0, 0, 2, 5)
> -#define SYS_ID_MMFR4_EL1 sys_reg(3, 0, 0, 2, 6)
>  #define SYS_ID_ISAR6_EL1 sys_reg(3, 0, 0, 2, 7)
>  
>  #define SYS_MVFR0_EL1sys_reg(3, 0, 0, 3, 0)
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

1 2 3 4 5 6 >

1 - 100 of 517 matches

Mail list logo