Re: [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM

2020-07-27 Thread zhukeqian
Hi Marc,

On 2020/7/13 22:53, Marc Zyngier wrote:
> Hi Keqian,
> 
> On Mon, 13 Jul 2020 03:47:25 +0100,
> zhukeqian  wrote:
>>
>> Hi Marc,
>>
>> Sorry for the delay reply.
>>
>> On 2020/7/6 15:54, Marc Zyngier wrote:
>>> Hi Keqian,
>>>
>>> On 2020-07-06 02:28, zhukeqian wrote:
 Hi Catalin and Marc,

 On 2020/7/2 21:55, Keqian Zhu wrote:
> This patch series add support for dirty log based on HW DBM.
>
> It works well under some migration test cases, including VM with 4K
> pages or 2M THP. I checked the SHA256 hash digest of all memory and
> they keep same for source VM and destination VM, which means no dirty
> pages is missed under hardware DBM.
>
> Some key points:
>
> 1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs
>are not involved for now.
>
> 2. About *performance*: In RFC patch, I have mentioned that for every 64GB
>memory, KVM consumes about 40ms to scan all PTEs to collect dirty log.
>This patch solves this problem through two ways: HW/SW dynamic switch
>and Multi-core offload.
>
>HW/SW dynamic switch: Give userspace right to enable/disable hw dirty
>log. This adds a new KVM cap named KVM_CAP_ARM_HW_DIRTY_LOG. We can
>achieve this by change the kvm->arch.vtcr value and kick vCPUs out to
>reload this value to VCTR_EL2. Then userspace can enable hw dirty log
>at the begining and disable it when dirty pages is little and about to
>stop VM, so VM downtime is not affected.
>
>Multi-core offload: Offload the PT scanning workload to multi-core can
>greatly reduce scanning time. To promise we can complete in time, I use
>smp_call_fuction to realize this policy, which utilize IPI to dispatch
>workload to other CPUs. Under 128U Kunpeng 920 platform, it just takes
>about 5ms to scan PTs of 256 RAM (use mempress and almost all PTs have
>been established). And We dispatch workload iterately (every CPU just
>scan PTs of 512M RAM for each iteration), so it won't affect physical
>CPUs seriously.

 What do you think of these two methods to solve high-cost PTs scaning? 
 Maybe
 you are waiting for PML like feature on ARM :-) , but for my test, DBM is 
 usable
 after these two methods applied.
>>>
>>> Useable, maybe. But leaving to userspace the decision to switch from one
>>> mode to another isn't an acceptable outcome. Userspace doesn't need nor
>>> want to know about this.
>>>
>> OK, maybe this is worth discussing. The switch logic can be
>> encapsulated into Qemu and can not be seen from VM users. Well, I
>> think it maybe acceptable. :)
> 
> I'm sorry, but no, this isn't acceptable. Userspace is concerned with
> finding out about the dirty pages, and nothing else. The method by
> which you exposes which pages are dirty is not the business of
> userspace *at all*.
OK.
> 
>>
>>> Another thing is that sending IPIs all over to trigger scanning may
>>> work well on a system that runs a limited number of guests (or some
>>> other userspace, actually), but I seriously doubt that it is impact
>>> free once you start doing this on an otherwise loaded system.
>>>
>> Yes, it is not suitable to send IPIs to all other physical
>> CPUs. Currently I just want to show you my idea and to prove it is
>> effective. In real cloud product, we have resource isolation
>> mechanism, so we will have a bit worse result (compared to 5ms) but
>> we won't effect other VMs.
> 
> If you limit the IPIs to the CPUs running the VM, they you are already
> in a situation where you effectively stop the guest to parse the S2
> page tables.
> 
>>
>>> You may have better results by having an alternative mapping of your
>>> S2 page tables so that they are accessible linearly, which would
>>> sidestep the PT parsing altogether, probably saving some cycles. But
>> Yeah, this is a good idea. But for my understanding, to make them
>> linear, we have to preserve enough physical memory at VM start (may
>> waste much memory), and the effect of this optimization *maybe* not
>> obvious.
> 
> Well, you'd have to have another set of PT to map the S2
> linearly. Yes, it consumes memory.
> 
>>
>>> this is still a marginal gain compared to the overall overhead of
>>> scanning 4kB of memory per 2MB of guest RAM, as opposed to 64 *bytes*
>>> per 2MB (assuming strict 4kB mappings at S2, no block mappings).
>>>
>> I ever tested scanning PTs by reading only one byte of each PTE and
>> the test result keeps same.  So, when we scan PTs using just one
>> core, the bottle-neck is CPU speed, instead of memory bandwidth.
> 
> But you are still reading the leaf entries of the PTs, hence defeating
> any sort of prefetch that the CPU could do for you. And my claim is
> that reading the bitmap is much faster than parsing the PTs. Are you
> saying that this isn't the case?
I am confused here. MMU DBM just updates the S2AP[1] 

RE: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-07-27 Thread Jianyong Wu



> -Original Message-
> From: Will Deacon 
> Sent: Monday, July 27, 2020 7:38 PM
> To: Jianyong Wu 
> Cc: net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org;
> t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com;
> m...@kernel.org; richardcoch...@gmail.com; Mark Rutland
> ; Suzuki Poulose ;
> Steven Price ; linux-ker...@vger.kernel.org; linux-
> arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> k...@vger.kernel.org; Steve Capper ; Kaly Xin
> ; Justin He ; Wei Chen
> ; nd 
> Subject: Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests
> via SMCCC
> 
> On Mon, Jul 27, 2020 at 03:45:37AM +, Jianyong Wu wrote:
> > > From: Will Deacon 
> > >
> > > We can advertise ourselves to guests as KVM and provide a basic
> > > features bitmap for discoverability of future hypervisor services.
> > >
> > > Cc: Marc Zyngier 
> > > Signed-off-by: Will Deacon 
> > > Signed-off-by: Jianyong Wu 
> > > ---
> > >  arch/arm64/kvm/hypercalls.c | 29 +++--
> > >  1 file changed, 19 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/hypercalls.c
> > > b/arch/arm64/kvm/hypercalls.c index 550dfa3e53cd..db6dce3d0e23
> > > 100644
> > > --- a/arch/arm64/kvm/hypercalls.c
> > > +++ b/arch/arm64/kvm/hypercalls.c
> > > @@ -12,13 +12,13 @@
> > >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)  {
> > >   u32 func_id = smccc_get_function(vcpu);
> > > - long val = SMCCC_RET_NOT_SUPPORTED;
> > > + u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
> >
> > There is a risk as this u32 value will return here and a u64 value
> > will be obtained in guest. For example, The val[0] is initialized as
> > -1 of 0x and the guest get 0x then it will be compared
> > with -1 of 0x Also this problem exists for the
> > transfer of address in u64 type. So the following assignment to "val"
> > should be split into two
> > u32 value and assign to val[0] and val[1] respectively.
> > WDYT?
> 
> Yes, I think you're right that this is a bug, but isn't the solution just to 
> make
> that an array of 'long'?
> 
>   long val [4];
> 
> That will sign-extend the negative error codes as required, while leaving the
> explicitly unsigned UID constants alone.

Ok, that's much better. I will fix it at next version.

By the way, I wonder when will you update this patch set. I see someone like me
adopt this patch set as code base and need rebase it every time, so expect your 
update.

Thanks
Jianyong 
> 
> Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 3/3] arm64: Add workaround for Arm Cortex-A77 erratum 1508412

2020-07-27 Thread Andrew Scull
Hi Rob, a couple of suggestions for way this erratum is gated, but I
haven't delved into the details of the errata itself.

On Fri, Jul 17, 2020 at 02:52:33PM -0600, Rob Herring wrote:
> On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load
> and a store exclusive or PAR_EL1 read can cause a deadlock.
> 
> The workaround requires a DMB SY before and after a PAR_EL1 register read.
> A deadlock is still possible with the workaround as KVM guests must also
> have the workaround. IOW, a malicious guest can deadlock an affected
> systems.
> 
> This workaround also depends on a firmware counterpart to enable the h/w
> to insert DMB SY after load and store exclusive instructions. See the
> errata document SDEN-1152370 v10 [1] for more information.
> 
> [1] 
> https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf
> 
> Cc: Catalin Marinas 
> Cc: James Morse 
> Cc: Suzuki K Poulose 
> Cc: Will Deacon 
> Cc: Marc Zyngier 
> Cc: Julien Thierry 
> Cc: kvmarm@lists.cs.columbia.edu
> Signed-off-by: Rob Herring 
> ---
> v3:
> - Add dmbs around PAR reads in KVM code
> - Clean-up 'work-around' and 'errata'
> 
> v2:
> - Don't disable KVM, just print warning
> ---
>  Documentation/arm64/silicon-errata.rst |  2 ++
>  arch/arm64/Kconfig | 19 +++
>  arch/arm64/include/asm/cpucaps.h   |  3 ++-
>  arch/arm64/include/asm/kvm_hyp.h   | 11 +++
>  arch/arm64/kernel/cpu_errata.c | 10 ++
>  arch/arm64/kvm/arm.c   |  3 ++-
>  arch/arm64/kvm/hyp/switch.c|  7 ---
>  arch/arm64/kvm/hyp/sysreg-sr.c |  2 +-
>  arch/arm64/kvm/sys_regs.c  |  8 +++-
>  arch/arm64/mm/fault.c  | 10 ++
>  10 files changed, 68 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.rst 
> b/Documentation/arm64/silicon-errata.rst
> index 936cf2a59ca4..716b279e3b33 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -90,6 +90,8 @@ stable kernels.
>  
> ++-+-+-+
>  | ARM| Cortex-A76  | #1463225| ARM64_ERRATUM_1463225 
>   |
>  
> ++-+-+-+
> +| ARM| Cortex-A77  | #1508412| ARM64_ERRATUM_1508412 
>   |
> +++-+-+-+
>  | ARM| Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 
>   |
>  
> ++-+-+-+
>  | ARM| Neoverse-N1 | #1349291| N/A   
>   |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4a094bedcb2..6638444ce0d8 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -626,6 +626,25 @@ config ARM64_ERRATUM_1542419
>  
> If unsure, say Y.
>  
> +config ARM64_ERRATUM_1508412
> + bool "Cortex-A77: 1508412: workaround deadlock on sequence of NC/Device 
> load and store exclusive or PAR read"
> + default y
> + help
> +   This option adds a workaround for Arm Cortex-A77 erratum 1508412.
> +
> +   Affected Cortex-A77 cores (r0p0, r1p0) could deadlock on a sequence
> +   of a store-exclusive or read of PAR_EL1 and a load with device or
> +   non-cacheable memory attributes. The workaround depends on a firmware
> +   counterpart.
> +
> +   KVM guests must also have the workaround implemented or they can
> +   deadlock the system.
> +
> +   Work around the issue by inserting DMB SY barriers around PAR_EL1
> +   register reads and warning KVM users.
> +
> +   If unsure, say Y.
> +
>  config CAVIUM_ERRATUM_22375
>   bool "Cavium erratum 22375, 24313"
>   default y
> diff --git a/arch/arm64/include/asm/cpucaps.h 
> b/arch/arm64/include/asm/cpucaps.h
> index d7b3bb0cb180..2a2cdb4ced8b 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -62,7 +62,8 @@
>  #define ARM64_HAS_GENERIC_AUTH   52
>  #define ARM64_HAS_32BIT_EL1  53
>  #define ARM64_BTI54
> +#define ARM64_WORKAROUND_1508412 55
>  
> -#define ARM64_NCAPS  55
> +#define ARM64_NCAPS  56
>  
>  #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index ce3080834bfa..ce5b0d9b12bf 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -46,6 +46,17 @@
>  #define read_sysreg_el2(r)   read_sysreg_elx(r, _EL2, _EL1)
>  #define write_sysreg_el2(v,r)write_sysreg_elx(v, r, _EL2, _EL1)
>  
> +static inline u64 __hyp_text read_sysreg_par(void)
> +{
> + u64 par;
> 

Re: [PATCH 0/5] KVM: arm64: pvtime: Fixes and a new cap

2020-07-27 Thread Marc Zyngier

On 2020-07-11 11:04, Andrew Jones wrote:

The first three patches in the series are fixes that come from testing
and reviewing pvtime code while writing the QEMU support (I'll reply
to this mail with a link to the QEMU patches after posting - which I'll
do shortly). The last patch is only a convenience for userspace, and I
wouldn't be heartbroken if it wasn't deemed worth it. The QEMU patches
I'll be posting are currently written without the cap. However, if the
cap is accepted, then I'll change the QEMU code to use it.

Thanks,
drew

Andrew Jones (5):
  KVM: arm64: pvtime: steal-time is only supported when configured
  KVM: arm64: pvtime: Fix potential loss of stolen time
  KVM: arm64: pvtime: Fix stolen time accounting across migration
  KVM: Documentation minor fixups
  arm64/x86: KVM: Introduce steal-time cap

 Documentation/virt/kvm/api.rst| 20 
 arch/arm64/include/asm/kvm_host.h |  2 +-
 arch/arm64/kvm/arm.c  |  3 +++
 arch/arm64/kvm/pvtime.c   | 31 +++
 arch/x86/kvm/x86.c|  3 +++
 include/linux/kvm_host.h  | 19 +++
 include/uapi/linux/kvm.h  |  1 +
 7 files changed, 58 insertions(+), 21 deletions(-)


Hi Andrew,

Sorry about the time it took to get to this series.
Although I had a number of comments, they are all easy to
address, and you will hopefully be able to respin it quickly
(assuming we agree that patch #1 is unnecessary).

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 5/5] arm64/x86: KVM: Introduce steal-time cap

2020-07-27 Thread Marc Zyngier

On 2020-07-11 11:04, Andrew Jones wrote:

arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
support for steal-time. However this is unnecessary, as only a KVM
fd is required, and it complicates userspace (userspace may prefer
delaying vcpu creation until after feature probing). Introduce a cap
that can be checked instead. While x86 can already probe steal-time
support with a kvm fd (KVM_GET_SUPPORTED_CPUID), we add the cap there
too for consistency.

Reviewed-by: Steven Price 
Signed-off-by: Andrew Jones 
---
 Documentation/virt/kvm/api.rst| 11 +++
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c  |  3 +++
 arch/arm64/kvm/pvtime.c   |  2 +-
 arch/x86/kvm/x86.c|  3 +++
 include/uapi/linux/kvm.h  |  1 +
 6 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst 
b/Documentation/virt/kvm/api.rst

index 3bd96c1a3962..3716d1e29771 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6152,3 +6152,14 @@ KVM can therefore start protected VMs.
 This capability governs the KVM_S390_PV_COMMAND ioctl and the
 KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
 guests when the state change is invalid.
+
+8.24 KVM_CAP_STEAL_TIME
+---
+
+:Architectures: arm64, x86
+
+This capability indicates that KVM supports steal time accounting.
+When steal time accounting is supported it may be enabled with
+architecture-specific interfaces.  For x86 see
+Documentation/virt/kvm/msr.rst "MSR_KVM_STEAL_TIME".  For arm64 see
+Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL".


Maybe add something that indicates that KVM_CAP_STEAL_TIME
and the architecture specific ioctl aren't allowed to disagree
about the support?


diff --git a/arch/arm64/include/asm/kvm_host.h
b/arch/arm64/include/asm/kvm_host.h
index b01f52b61572..8909c840ea91 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -500,6 +500,7 @@ long kvm_hypercall_pv_features(struct kvm_vcpu 
*vcpu);

 gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu);
 void kvm_update_stolen_time(struct kvm_vcpu *vcpu);

+bool kvm_arm_pvtime_supported(void);
 int kvm_arm_pvtime_set_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
 int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90cb90561446..0f7f8cd2f341 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -222,6 +222,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
long ext)

 */
r = 1;
break;
+   case KVM_CAP_STEAL_TIME:
+   r = kvm_arm_pvtime_supported();
+   break;
default:
r = kvm_arch_vm_ioctl_check_extension(kvm, ext);
break;
diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index 025b5f3a97ef..468bd1ef9c7e 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -73,7 +73,7 @@ gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu)
return base;
 }

-static bool kvm_arm_pvtime_supported(void)
+bool kvm_arm_pvtime_supported(void)
 {
return !!sched_info_on();
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 88c593f83b28..27fc86bb28c4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3538,6 +3538,9 @@ int kvm_vm_ioctl_check_extension(struct kvm
*kvm, long ext)
case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
break;
+   case KVM_CAP_STEAL_TIME:
+   r = sched_info_on();
+   break;
default:
break;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4fdf30316582..121fb29ac004 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SECURE_GUEST 181
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
+#define KVM_CAP_STEAL_TIME 184

 #ifdef KVM_CAP_IRQ_ROUTING


Otherwise looks ok.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 4/5] KVM: Documentation minor fixups

2020-07-27 Thread Marc Zyngier

On 2020-07-11 11:04, Andrew Jones wrote:

Reviewed-by: Steven Price 
Signed-off-by: Andrew Jones 


It'd be good to have an actual commit message.


---
 Documentation/virt/kvm/api.rst | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst 
b/Documentation/virt/kvm/api.rst

index 320788f81a05..3bd96c1a3962 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6122,7 +6122,7 @@ HvCallSendSyntheticClusterIpi,
HvCallSendSyntheticClusterIpiEx.
 8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
 ---

-:Architecture: x86
+:Architectures: x86

 This capability indicates that KVM running on top of Hyper-V 
hypervisor

 enables Direct TLB flush for its guests meaning that TLB flush
@@ -6135,16 +6135,17 @@ in CPUID and only exposes Hyper-V
identification. In this case, guest
 thinks it's running on Hyper-V and only use Hyper-V hypercalls.

 8.22 KVM_CAP_S390_VCPU_RESETS
+-

-Architectures: s390
+:Architectures: s390

 This capability indicates that the KVM_S390_NORMAL_RESET and
 KVM_S390_CLEAR_RESET ioctls are available.

 8.23 KVM_CAP_S390_PROTECTED
+---

-Architecture: s390
-
+:Architectures: s390

 This capability indicates that the Ultravisor has been initialized and
 KVM can therefore start protected VMs.


But this seems to be an otherwise unrelated patch.
I'm happy to take it, but it seems odd here.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/5] KVM: arm64: pvtime: Fix stolen time accounting across migration

2020-07-27 Thread Marc Zyngier

On 2020-07-11 11:04, Andrew Jones wrote:

When updating the stolen time we should always read the current
stolen time from the user provided memory, not from a kernel
cache. If we use a cache then we'll end up resetting stolen time
to zero on the first update after migration.

Signed-off-by: Andrew Jones 
---
 arch/arm64/include/asm/kvm_host.h |  1 -
 arch/arm64/kvm/pvtime.c   | 23 +--
 include/linux/kvm_host.h  | 19 +++
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h
b/arch/arm64/include/asm/kvm_host.h
index c3e6fcc664b1..b01f52b61572 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -343,7 +343,6 @@ struct kvm_vcpu_arch {

/* Guest PV state */
struct {
-   u64 steal;
u64 last_steal;
gpa_t base;
} steal;
diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index db5ef097a166..025b5f3a97ef 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -13,26 +13,22 @@
 void kvm_update_stolen_time(struct kvm_vcpu *vcpu)
 {
struct kvm *kvm = vcpu->kvm;
+   u64 base = vcpu->arch.steal.base;
u64 last_steal = vcpu->arch.steal.last_steal;
-   u64 steal;
-   __le64 steal_le;
-   u64 offset;
+   u64 offset = offsetof(struct pvclock_vcpu_stolen_time, stolen_time);
+   u64 steal = 0;
int idx;
-   u64 base = vcpu->arch.steal.base;

if (base == GPA_INVALID)
return;

-   /* Let's do the local bookkeeping */
-   steal = vcpu->arch.steal.steal;
-   vcpu->arch.steal.last_steal = current->sched_info.run_delay;
-   steal += vcpu->arch.steal.last_steal - last_steal;
-   vcpu->arch.steal.steal = steal;
-
-   steal_le = cpu_to_le64(steal);
idx = srcu_read_lock(&kvm->srcu);
-   offset = offsetof(struct pvclock_vcpu_stolen_time, stolen_time);
-   kvm_put_guest(kvm, base + offset, steal_le, u64);
+   if (!kvm_get_guest(kvm, base + offset, steal, u64)) {
+   steal = le64_to_cpu(steal);
+   vcpu->arch.steal.last_steal = current->sched_info.run_delay;
+   steal += vcpu->arch.steal.last_steal - last_steal;
+   kvm_put_guest(kvm, base + offset, cpu_to_le64(steal), u64);
+   }
srcu_read_unlock(&kvm->srcu, idx);
 }

@@ -68,7 +64,6 @@ gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu)
 * Start counting stolen time from the time the guest requests
 * the feature enabled.
 */
-   vcpu->arch.steal.steal = 0;
vcpu->arch.steal.last_steal = current->sched_info.run_delay;

idx = srcu_read_lock(&kvm->srcu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d564855243d8..e2fc655f0b5b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -749,6 +749,25 @@ int kvm_write_guest_offset_cached(struct kvm
*kvm, struct gfn_to_hva_cache *ghc,
 int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache 
*ghc,

  gpa_t gpa, unsigned long len);

+#define __kvm_get_guest(kvm, gfn, offset, x, type) \
+({ \
+   unsigned long __addr = gfn_to_hva(kvm, gfn);\
+   type __user *__uaddr = (type __user *)(__addr + offset);\


Passing the type around is pretty ugly. Can't you use something like:

typeof(x) __user *__uaddr = (typeof(__uaddr))(__addr + offset);

which would avoid passing this type around? kvm_put_guest could
use the same treatment.

Yes, it forces the caller to rigorously type the inputs to the
macro. But they should do that anyway.


+   int __ret = -EFAULT;\
+   \
+   if (!kvm_is_error_hva(__addr))  \
+   __ret = get_user(x, __uaddr);   \
+   __ret;  \
+})
+
+#define kvm_get_guest(kvm, gpa, x, type)   \
+({ \
+   gpa_t __gpa = gpa;  \
+   struct kvm *__kvm = kvm;\
+   __kvm_get_guest(__kvm, __gpa >> PAGE_SHIFT,   \
+   offset_in_page(__gpa), x, type);\
+})
+
 #define __kvm_put_guest(kvm, gfn, offset, value, type) \
 ({ \
unsigned long __addr = gfn_to_hva(kvm, gfn);\


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.c

Re: [PATCH 2/5] KVM: arm64: pvtime: Fix potential loss of stolen time

2020-07-27 Thread Marc Zyngier

On 2020-07-11 11:04, Andrew Jones wrote:

We should only check current->sched_info.run_delay once when
updating stolen time. Otherwise there's a chance there could
be a change between checks that we miss (preemption disabling
comes after vcpu request checks).

Signed-off-by: Andrew Jones 
---
 arch/arm64/kvm/pvtime.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index 2b22214909be..db5ef097a166 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -13,6 +13,7 @@
 void kvm_update_stolen_time(struct kvm_vcpu *vcpu)
 {
struct kvm *kvm = vcpu->kvm;
+   u64 last_steal = vcpu->arch.steal.last_steal;
u64 steal;
__le64 steal_le;
u64 offset;
@@ -24,8 +25,8 @@ void kvm_update_stolen_time(struct kvm_vcpu *vcpu)

/* Let's do the local bookkeeping */
steal = vcpu->arch.steal.steal;
-   steal += current->sched_info.run_delay - vcpu->arch.steal.last_steal;
vcpu->arch.steal.last_steal = current->sched_info.run_delay;
+   steal += vcpu->arch.steal.last_steal - last_steal;
vcpu->arch.steal.steal = steal;

steal_le = cpu_to_le64(steal);


Unless you read current->sched_info.run_delay using READ_ONCE,
there is no guarantee that this will do what you want. The
compiler is free to rejig this anyway it wants.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/5] KVM: arm64: pvtime: steal-time is only supported when configured

2020-07-27 Thread Marc Zyngier

Hi Andrew,

On 2020-07-11 11:04, Andrew Jones wrote:

Don't confuse the guest by saying steal-time is supported when
it hasn't been configured by userspace and won't work.

Signed-off-by: Andrew Jones 
---
 arch/arm64/kvm/pvtime.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index f7b52ce1557e..2b22214909be 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -42,9 +42,12 @@ long kvm_hypercall_pv_features(struct kvm_vcpu 
*vcpu)


switch (feature) {
case ARM_SMCCC_HV_PV_TIME_FEATURES:
-   case ARM_SMCCC_HV_PV_TIME_ST:
val = SMCCC_RET_SUCCESS;
break;
+   case ARM_SMCCC_HV_PV_TIME_ST:
+   if (vcpu->arch.steal.base != GPA_INVALID)
+   val = SMCCC_RET_SUCCESS;
+   break;
}

return val;


I'm not so sure about this. I have always considered the
discovery interface to be "do you know about this SMCCC
function". And if you look at the spec, it says (4.2,
PV_TIME_FEATURES):


If PV_call_id identifies PV_TIME_FEATURES, this call returns:
• NOT_SUPPORTED (-1) to indicate that all
paravirtualized time functions in this specification are not
supported.
• SUCCESS (0) to indicate that all the paravirtualized time
functions in this specification are supported.


So the way I understand it, you cannot return "supported"
for PV_TIME_FEATURES, and yet return NOT_SUPPORTED for
PV_TIME_ST. It applies to *all* features.

Yes, this is very bizarre. But I don't think we can deviate
from it.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH] arm64: Compile with -mno-outline-atomics for GCC >= 10

2020-07-27 Thread Paolo Bonzini
On 27/07/20 14:30, Andrew Jones wrote:
>>> Looks much better than my version. Do you want me to spin a v2 or do you 
>>> want to
>>> send it as a separate patch? If that's the case, I tested the same way I 
>>> did my
>>> patch (gcc 10.1.0 and 5.4.0 for cross-compiling, 9.3.0 native):
>>>
>>> Tested-by: Alexandru Elisei 
>> Gentle ping regarding this.
>>
> Hi Alexandru,
> 
> I was on vacation all last week and have been digging myself out of email
> today. I'll send this as a proper patch with your T-b later today or
> tomorrow.

Same here; will wait for Andrew's patch.

Paolo

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RESEND RFC PATCH v1] arm64: kvm: flush tlbs by range in unmap_stage2_range function

2020-07-27 Thread Marc Zyngier

Zhenyu,

On 2020-07-27 15:51, Zhenyu Ye wrote:

Hi Marc,

On 2020/7/26 1:40, Marc Zyngier wrote:

On 2020-07-24 14:43, Zhenyu Ye wrote:

Now in unmap_stage2_range(), we flush tlbs one by one just after the
corresponding pages cleared.  However, this may cause some 
performance

problems when the unmap range is very large (such as when the vm
migration rollback, this may cause vm downtime too loog).


You keep resending this patch, but you don't give any numbers
that would back your assertion.


I have tested the downtime of vm migration rollback on arm64, and found
the downtime could even take up to 7s.  Then I traced the cost of
unmap_stage2_range() and found it could take a maximum of 1.2s.  The
vm configuration is as follows (with high memory pressure, the dirty
rate is about 500MB/s):

  192
  48
  

  

  


This means nothing to me, I'm afraid.



After this patch applied, the cost of unmap_stage2_range() can reduce 
to

16ms, and VM downtime can be less than 1s.

The following figure shows a clear comparison:

  | vm downtime  |  cost of unmap_stage2_range()
--+--+--
before change | 7s   |  1200 ms
after  change | 1s   |16 ms
--+--+--


I don't see how you turn a 1.184s reduction into a 6s gain.
Surely there is more to it than what you posted.


+
+    if ((end - start) >= 512 << (PAGE_SHIFT - 12)) {
+    __tlbi(vmalls12e1is);


And what is this magic value based on? You don't even mention in the
commit log that you are taking this shortcut.




If the page num is bigger than 512, flush all tlbs of this vm to avoid
soft lock-ups on large TLB flushing ranges.  Just like what the
flush_tlb_range() does.


I'm not sure this is applicable here, and it doesn't mean
this is as good on other systems.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RESEND RFC PATCH v1] arm64: kvm: flush tlbs by range in unmap_stage2_range function

2020-07-27 Thread Zhenyu Ye
Hi Marc,

On 2020/7/26 1:40, Marc Zyngier wrote:
> On 2020-07-24 14:43, Zhenyu Ye wrote:
>> Now in unmap_stage2_range(), we flush tlbs one by one just after the
>> corresponding pages cleared.  However, this may cause some performance
>> problems when the unmap range is very large (such as when the vm
>> migration rollback, this may cause vm downtime too loog).
> 
> You keep resending this patch, but you don't give any numbers
> that would back your assertion.

I have tested the downtime of vm migration rollback on arm64, and found
the downtime could even take up to 7s.  Then I traced the cost of
unmap_stage2_range() and found it could take a maximum of 1.2s.  The
vm configuration is as follows (with high memory pressure, the dirty
rate is about 500MB/s):

  192
  48
  

  

  

After this patch applied, the cost of unmap_stage2_range() can reduce to
16ms, and VM downtime can be less than 1s.

The following figure shows a clear comparison:

  | vm downtime  |  cost of unmap_stage2_range()
--+--+--
before change | 7s   |  1200 ms
after  change | 1s   |16 ms
--+--+--

>> +
>> +    if ((end - start) >= 512 << (PAGE_SHIFT - 12)) {
>> +    __tlbi(vmalls12e1is);
> 
> And what is this magic value based on? You don't even mention in the
> commit log that you are taking this shortcut.
> 


If the page num is bigger than 512, flush all tlbs of this vm to avoid
soft lock-ups on large TLB flushing ranges.  Just like what the
flush_tlb_range() does.


Thanks,
Zhenyu

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH] arm64: Compile with -mno-outline-atomics for GCC >= 10

2020-07-27 Thread Alexandru Elisei
Hi Drew,

On 7/27/20 1:30 PM, Andrew Jones wrote:
> On Mon, Jul 27, 2020 at 01:21:03PM +0100, Alexandru Elisei wrote:
>> Hi Drew,
>>
>> On 7/18/20 2:50 PM, Alexandru Elisei wrote:
>>> Hi,
>>>
>>> On 7/18/20 10:11 AM, Andrew Jones wrote:
 On Fri, Jul 17, 2020 at 05:47:27PM +0100, Alexandru Elisei wrote:
> GCC 10.1.0 introduced the -m{,no-}outline-atomics flags which, according 
> to
> man 1 gcc:
>
> "Enable or disable calls to out-of-line helpers to implement atomic
> operations.  These helpers will, at runtime, determine if the LSE
> instructions from ARMv8.1-A can be used; if not, they will use the
> load/store-exclusive instructions that are present in the base ARMv8.0 
> ISA.
> [..] This option is on by default."
>
> Unfortunately the option causes the following error at compile time:
>
> aarch64-linux-gnu-ld -nostdlib -pie -n -o arm/spinlock-test.elf -T 
> /path/to/kvm-unit-tests/arm/flat.lds \
>   arm/spinlock-test.o arm/cstart64.o lib/libcflat.a lib/libfdt/libfdt.a 
> /usr/lib/gcc/aarch64-linux-gnu/10.1.0/libgcc.a lib/arm/libeabi.a 
> arm/spinlock-test.aux.o
> aarch64-linux-gnu-ld: 
> /usr/lib/gcc/aarch64-linux-gnu/10.1.0/libgcc.a(lse-init.o): in function 
> `init_have_lse_atomics':
> lse-init.c:(.text.startup+0xc): undefined reference to `__getauxval'
>
> This is happening because we are linking against our own libcflat which
> doesn't implement the function __getauxval().
>
> Disable the use of the out-of-line functions by compiling with
> -mno-outline-atomics if we detect a GCC version greater than 10.
>
> Signed-off-by: Alexandru Elisei 
> ---
>
> Tested with gcc versions 10.1.0 and 5.4.0 (cross-compilation), 9.3.0
> (native).
>
> I've been able to suss out the reason for the build failure from this
> rejected gcc patch [1].
>
> [1] https://patches.openembedded.org/patch/172460/
>
>  arm/Makefile.arm64 | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
> index dfd0c56fe8fb..3223cb966789 100644
> --- a/arm/Makefile.arm64
> +++ b/arm/Makefile.arm64
> @@ -9,6 +9,12 @@ ldarch = elf64-littleaarch64
>  arch_LDFLAGS = -pie -n
>  CFLAGS += -mstrict-align
>  
> +# The -mno-outline-atomics flag is only valid for GCC versions 10 and 
> greater.
> +GCC_MAJOR_VERSION=$(shell $(CC) -dumpversion 2> /dev/null | cut -f1 -d.)
> +ifeq ($(shell expr "$(GCC_MAJOR_VERSION)" ">=" "10"), 1)
> +CFLAGS += -mno-outline-atomics
> +endif
 How about this patch instead?

 diff --git a/Makefile b/Makefile
 index 3ff2f91600f6..0e21a49096ba 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -17,6 +17,11 @@ DESTDIR := $(PREFIX)/share/kvm-unit-tests/
  
  .PHONY: arch_clean clean distclean cscope
  
 +# cc-option
 +# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, 
 -malign-functions=0)
 +cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \
 +  > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;)
 +
  #make sure env CFLAGS variable is not used
  CFLAGS =
  
 @@ -43,12 +48,6 @@ OBJDIRS += $(LIBFDT_objdir)
  #include architecture specific make rules
  include $(SRCDIR)/$(TEST_DIR)/Makefile
  
 -# cc-option
 -# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, 
 -malign-functions=0)
 -
 -cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \
 -  > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;)
 -
  COMMON_CFLAGS += -g $(autodepend-flags) -fno-strict-aliasing -fno-common
  COMMON_CFLAGS += -Wall -Wwrite-strings -Wempty-body -Wuninitialized
  COMMON_CFLAGS += -Wignored-qualifiers -Werror
 diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
 index dfd0c56fe8fb..dbc7524d3070 100644
 --- a/arm/Makefile.arm64
 +++ b/arm/Makefile.arm64
 @@ -9,6 +9,9 @@ ldarch = elf64-littleaarch64
  arch_LDFLAGS = -pie -n
  CFLAGS += -mstrict-align
  
 +mno_outline_atomics := $(call cc-option, -mno-outline-atomics, "")
 +CFLAGS += $(mno_outline_atomics)
 +
  define arch_elf_check =
$(if $(shell ! $(OBJDUMP) -R $(1) >&/dev/null && echo "nok"),
$(error $(shell $(OBJDUMP) -R $(1) 2>&1)))


 Thanks,
 drew
>>> Looks much better than my version. Do you want me to spin a v2 or do you 
>>> want to
>>> send it as a separate patch? If that's the case, I tested the same way I 
>>> did my
>>> patch (gcc 10.1.0 and 5.4.0 for cross-compiling, 9.3.0 native):
>>>
>>> Tested-by: Alexandru Elisei 
>> Gentle ping regarding this.
>>
> Hi Alexandru,
>
> I was on vacation all last week and have been digging myself out of email
> today. I'll send this as a proper patch with your T-b later today or

Re: [kvm-unit-tests PATCH] arm64: Compile with -mno-outline-atomics for GCC >= 10

2020-07-27 Thread Andrew Jones
On Mon, Jul 27, 2020 at 01:21:03PM +0100, Alexandru Elisei wrote:
> Hi Drew,
> 
> On 7/18/20 2:50 PM, Alexandru Elisei wrote:
> > Hi,
> >
> > On 7/18/20 10:11 AM, Andrew Jones wrote:
> >> On Fri, Jul 17, 2020 at 05:47:27PM +0100, Alexandru Elisei wrote:
> >>> GCC 10.1.0 introduced the -m{,no-}outline-atomics flags which, according 
> >>> to
> >>> man 1 gcc:
> >>>
> >>> "Enable or disable calls to out-of-line helpers to implement atomic
> >>> operations.  These helpers will, at runtime, determine if the LSE
> >>> instructions from ARMv8.1-A can be used; if not, they will use the
> >>> load/store-exclusive instructions that are present in the base ARMv8.0 
> >>> ISA.
> >>> [..] This option is on by default."
> >>>
> >>> Unfortunately the option causes the following error at compile time:
> >>>
> >>> aarch64-linux-gnu-ld -nostdlib -pie -n -o arm/spinlock-test.elf -T 
> >>> /path/to/kvm-unit-tests/arm/flat.lds \
> >>>   arm/spinlock-test.o arm/cstart64.o lib/libcflat.a lib/libfdt/libfdt.a 
> >>> /usr/lib/gcc/aarch64-linux-gnu/10.1.0/libgcc.a lib/arm/libeabi.a 
> >>> arm/spinlock-test.aux.o
> >>> aarch64-linux-gnu-ld: 
> >>> /usr/lib/gcc/aarch64-linux-gnu/10.1.0/libgcc.a(lse-init.o): in function 
> >>> `init_have_lse_atomics':
> >>> lse-init.c:(.text.startup+0xc): undefined reference to `__getauxval'
> >>>
> >>> This is happening because we are linking against our own libcflat which
> >>> doesn't implement the function __getauxval().
> >>>
> >>> Disable the use of the out-of-line functions by compiling with
> >>> -mno-outline-atomics if we detect a GCC version greater than 10.
> >>>
> >>> Signed-off-by: Alexandru Elisei 
> >>> ---
> >>>
> >>> Tested with gcc versions 10.1.0 and 5.4.0 (cross-compilation), 9.3.0
> >>> (native).
> >>>
> >>> I've been able to suss out the reason for the build failure from this
> >>> rejected gcc patch [1].
> >>>
> >>> [1] https://patches.openembedded.org/patch/172460/
> >>>
> >>>  arm/Makefile.arm64 | 6 ++
> >>>  1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
> >>> index dfd0c56fe8fb..3223cb966789 100644
> >>> --- a/arm/Makefile.arm64
> >>> +++ b/arm/Makefile.arm64
> >>> @@ -9,6 +9,12 @@ ldarch = elf64-littleaarch64
> >>>  arch_LDFLAGS = -pie -n
> >>>  CFLAGS += -mstrict-align
> >>>  
> >>> +# The -mno-outline-atomics flag is only valid for GCC versions 10 and 
> >>> greater.
> >>> +GCC_MAJOR_VERSION=$(shell $(CC) -dumpversion 2> /dev/null | cut -f1 -d.)
> >>> +ifeq ($(shell expr "$(GCC_MAJOR_VERSION)" ">=" "10"), 1)
> >>> +CFLAGS += -mno-outline-atomics
> >>> +endif
> >> How about this patch instead?
> >>
> >> diff --git a/Makefile b/Makefile
> >> index 3ff2f91600f6..0e21a49096ba 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -17,6 +17,11 @@ DESTDIR := $(PREFIX)/share/kvm-unit-tests/
> >>  
> >>  .PHONY: arch_clean clean distclean cscope
> >>  
> >> +# cc-option
> >> +# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, 
> >> -malign-functions=0)
> >> +cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \
> >> +  > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;)
> >> +
> >>  #make sure env CFLAGS variable is not used
> >>  CFLAGS =
> >>  
> >> @@ -43,12 +48,6 @@ OBJDIRS += $(LIBFDT_objdir)
> >>  #include architecture specific make rules
> >>  include $(SRCDIR)/$(TEST_DIR)/Makefile
> >>  
> >> -# cc-option
> >> -# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, 
> >> -malign-functions=0)
> >> -
> >> -cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \
> >> -  > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;)
> >> -
> >>  COMMON_CFLAGS += -g $(autodepend-flags) -fno-strict-aliasing -fno-common
> >>  COMMON_CFLAGS += -Wall -Wwrite-strings -Wempty-body -Wuninitialized
> >>  COMMON_CFLAGS += -Wignored-qualifiers -Werror
> >> diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
> >> index dfd0c56fe8fb..dbc7524d3070 100644
> >> --- a/arm/Makefile.arm64
> >> +++ b/arm/Makefile.arm64
> >> @@ -9,6 +9,9 @@ ldarch = elf64-littleaarch64
> >>  arch_LDFLAGS = -pie -n
> >>  CFLAGS += -mstrict-align
> >>  
> >> +mno_outline_atomics := $(call cc-option, -mno-outline-atomics, "")
> >> +CFLAGS += $(mno_outline_atomics)
> >> +
> >>  define arch_elf_check =
> >>$(if $(shell ! $(OBJDUMP) -R $(1) >&/dev/null && echo "nok"),
> >>$(error $(shell $(OBJDUMP) -R $(1) 2>&1)))
> >>
> >>
> >> Thanks,
> >> drew
> > Looks much better than my version. Do you want me to spin a v2 or do you 
> > want to
> > send it as a separate patch? If that's the case, I tested the same way I 
> > did my
> > patch (gcc 10.1.0 and 5.4.0 for cross-compiling, 9.3.0 native):
> >
> > Tested-by: Alexandru Elisei 
> 
> Gentle ping regarding this.
>

Hi Alexandru,

I was on vacation all last week and have been digging myself out of email
today. I'll send this as a proper patch with your T-b later today or
tomorrow.

Thanks,
drew

_

Re: [kvm-unit-tests PATCH] arm64: Compile with -mno-outline-atomics for GCC >= 10

2020-07-27 Thread Alexandru Elisei
Hi Drew,

On 7/18/20 2:50 PM, Alexandru Elisei wrote:
> Hi,
>
> On 7/18/20 10:11 AM, Andrew Jones wrote:
>> On Fri, Jul 17, 2020 at 05:47:27PM +0100, Alexandru Elisei wrote:
>>> GCC 10.1.0 introduced the -m{,no-}outline-atomics flags which, according to
>>> man 1 gcc:
>>>
>>> "Enable or disable calls to out-of-line helpers to implement atomic
>>> operations.  These helpers will, at runtime, determine if the LSE
>>> instructions from ARMv8.1-A can be used; if not, they will use the
>>> load/store-exclusive instructions that are present in the base ARMv8.0 ISA.
>>> [..] This option is on by default."
>>>
>>> Unfortunately the option causes the following error at compile time:
>>>
>>> aarch64-linux-gnu-ld -nostdlib -pie -n -o arm/spinlock-test.elf -T 
>>> /path/to/kvm-unit-tests/arm/flat.lds \
>>> arm/spinlock-test.o arm/cstart64.o lib/libcflat.a lib/libfdt/libfdt.a 
>>> /usr/lib/gcc/aarch64-linux-gnu/10.1.0/libgcc.a lib/arm/libeabi.a 
>>> arm/spinlock-test.aux.o
>>> aarch64-linux-gnu-ld: 
>>> /usr/lib/gcc/aarch64-linux-gnu/10.1.0/libgcc.a(lse-init.o): in function 
>>> `init_have_lse_atomics':
>>> lse-init.c:(.text.startup+0xc): undefined reference to `__getauxval'
>>>
>>> This is happening because we are linking against our own libcflat which
>>> doesn't implement the function __getauxval().
>>>
>>> Disable the use of the out-of-line functions by compiling with
>>> -mno-outline-atomics if we detect a GCC version greater than 10.
>>>
>>> Signed-off-by: Alexandru Elisei 
>>> ---
>>>
>>> Tested with gcc versions 10.1.0 and 5.4.0 (cross-compilation), 9.3.0
>>> (native).
>>>
>>> I've been able to suss out the reason for the build failure from this
>>> rejected gcc patch [1].
>>>
>>> [1] https://patches.openembedded.org/patch/172460/
>>>
>>>  arm/Makefile.arm64 | 6 ++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
>>> index dfd0c56fe8fb..3223cb966789 100644
>>> --- a/arm/Makefile.arm64
>>> +++ b/arm/Makefile.arm64
>>> @@ -9,6 +9,12 @@ ldarch = elf64-littleaarch64
>>>  arch_LDFLAGS = -pie -n
>>>  CFLAGS += -mstrict-align
>>>  
>>> +# The -mno-outline-atomics flag is only valid for GCC versions 10 and 
>>> greater.
>>> +GCC_MAJOR_VERSION=$(shell $(CC) -dumpversion 2> /dev/null | cut -f1 -d.)
>>> +ifeq ($(shell expr "$(GCC_MAJOR_VERSION)" ">=" "10"), 1)
>>> +CFLAGS += -mno-outline-atomics
>>> +endif
>> How about this patch instead?
>>
>> diff --git a/Makefile b/Makefile
>> index 3ff2f91600f6..0e21a49096ba 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -17,6 +17,11 @@ DESTDIR := $(PREFIX)/share/kvm-unit-tests/
>>  
>>  .PHONY: arch_clean clean distclean cscope
>>  
>> +# cc-option
>> +# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, 
>> -malign-functions=0)
>> +cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \
>> +  > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;)
>> +
>>  #make sure env CFLAGS variable is not used
>>  CFLAGS =
>>  
>> @@ -43,12 +48,6 @@ OBJDIRS += $(LIBFDT_objdir)
>>  #include architecture specific make rules
>>  include $(SRCDIR)/$(TEST_DIR)/Makefile
>>  
>> -# cc-option
>> -# Usage: OP_CFLAGS+=$(call cc-option, -falign-functions=0, 
>> -malign-functions=0)
>> -
>> -cc-option = $(shell if $(CC) -Werror $(1) -S -o /dev/null -xc /dev/null \
>> -  > /dev/null 2>&1; then echo "$(1)"; else echo "$(2)"; fi ;)
>> -
>>  COMMON_CFLAGS += -g $(autodepend-flags) -fno-strict-aliasing -fno-common
>>  COMMON_CFLAGS += -Wall -Wwrite-strings -Wempty-body -Wuninitialized
>>  COMMON_CFLAGS += -Wignored-qualifiers -Werror
>> diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
>> index dfd0c56fe8fb..dbc7524d3070 100644
>> --- a/arm/Makefile.arm64
>> +++ b/arm/Makefile.arm64
>> @@ -9,6 +9,9 @@ ldarch = elf64-littleaarch64
>>  arch_LDFLAGS = -pie -n
>>  CFLAGS += -mstrict-align
>>  
>> +mno_outline_atomics := $(call cc-option, -mno-outline-atomics, "")
>> +CFLAGS += $(mno_outline_atomics)
>> +
>>  define arch_elf_check =
>>  $(if $(shell ! $(OBJDUMP) -R $(1) >&/dev/null && echo "nok"),
>>  $(error $(shell $(OBJDUMP) -R $(1) 2>&1)))
>>
>>
>> Thanks,
>> drew
> Looks much better than my version. Do you want me to spin a v2 or do you want 
> to
> send it as a separate patch? If that's the case, I tested the same way I did 
> my
> patch (gcc 10.1.0 and 5.4.0 for cross-compiling, 9.3.0 native):
>
> Tested-by: Alexandru Elisei 

Gentle ping regarding this.

Thanks,
Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v13 2/9] arm/arm64: KVM: Advertise KVM UID to guests via SMCCC

2020-07-27 Thread Will Deacon
On Mon, Jul 27, 2020 at 03:45:37AM +, Jianyong Wu wrote:
> > From: Will Deacon 
> > 
> > We can advertise ourselves to guests as KVM and provide a basic features
> > bitmap for discoverability of future hypervisor services.
> > 
> > Cc: Marc Zyngier 
> > Signed-off-by: Will Deacon 
> > Signed-off-by: Jianyong Wu 
> > ---
> >  arch/arm64/kvm/hypercalls.c | 29 +++--
> >  1 file changed, 19 insertions(+), 10 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 550dfa3e53cd..db6dce3d0e23 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -12,13 +12,13 @@
> >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)  {
> > u32 func_id = smccc_get_function(vcpu);
> > -   long val = SMCCC_RET_NOT_SUPPORTED;
> > +   u32 val[4] = {SMCCC_RET_NOT_SUPPORTED};
> 
> There is a risk as this u32 value will return here and a u64 value will be
> obtained in guest. For example, The val[0] is initialized as -1 of
> 0x and the guest get 0x then it will be compared with -1
> of 0x Also this problem exists for the transfer of address
> in u64 type. So the following assignment to "val" should be split into two
> u32 value and assign to val[0] and val[1] respectively.
> WDYT?

Yes, I think you're right that this is a bug, but isn't the solution just
to make that an array of 'long'?

long val [4];

That will sign-extend the negative error codes as required, while leaving
the explicitly unsigned UID constants alone.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: kvm-unit-tests: Question about the "no interrupt when timer is disabled" case

2020-07-27 Thread Alexandru Elisei
Hi Zenghui,

On 7/25/20 5:50 AM, Zenghui Yu wrote:
> Hi Alex,
>
> [+cc Nianyao]
>
> On 2020/7/24 19:08, Alexandru Elisei wrote:
>> Hi Zenghui,
>>
>> I don't believe this issue can be triggered by a Linux guest. Details below.
>>
>> On 7/23/20 9:56 AM, Zenghui Yu wrote:
>>> Hi Alexandru,
>>>
>>> I've noticed that the timer case will fail in the -stable 4.19 kernel.
>>> The log is as follows:
>>>
>>> FAIL: vtimer-busy-loop: no interrupt when timer is disabled
>>> FAIL: vtimer-busy-loop: interrupt signal no longer pending
>>>
>>> And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic:
>>> Reevaluate level sensitive interrupts on enable"] hasn't been backported
>>> to the stable tree.
>>
>> This is not an actual fix (hence no "Fixes" tag), this is more like an 
>> improvement
>> of the behaviour of the GIC. Like the patch description says, this can 
>> happen even
>> on hardware if the GIC hasn't sampled the device interrupt state (or the 
>> device
>> itself hasn't updated it) before the CPU re-enables the interrupt.
>
> Fair enough.
>
>>>
>>> Just out of curiosity, _without_ this fix, had you actually seen the
>>> guest getting into trouble due to an un-retired level-sensitive
>>> interrupt and your patch fixed it? Or this was found by code inspection?
>>
>> This issue was found when running kvm-unit-tests on the model.
>>
>>>
>>> Take the exact vtimer case as an example, is it possible that the Linux
>>> guest would disable the vtimer (the input interrupt line is driven to 0
>>> but the old KVM doesn't take this into account) and potentially hit this
>>> issue? I'm not familiar with it.
>>
>> To trigger this, a guest has to do the following steps:
>>
>> 1. Disable the timer interrupt at the Redistributor level.
>> 2. Trigger the timer interrupt in the timer.
>> 3. Disable the timer entirely (CNT{P,V}_CTL_EL0.ENABLE = 0), which also 
>> disables
>> the timer interrupt.
>> 4. Enable the timer interrupt at the Redistributor level.
>>
>> I believe there are two reasons why this will never happen for a Linux guest:
>>
>> - This isn't the way Linux handles interrupts. Furthermore, I don't believe 
>> Linux
>> will ever disable a specific interrupt at the irqchip level.
>
> This can at least happen in arch_timer_stop() [arm_arch_timer.c], where
> the disable_percpu_irq() API will disable the interrupt (via irq_mask()
> callback which will in turn disable the interrupt at GIC level by
> programming the ICENABLER0).

Sorry, I missed that. Did a grep for arch_timer_stop(), the function is called
only when a CPU is offlined.

>
> What I'm worried is something like:
>
> 1. Disable the timer interrupt (at RDist level by poking the ICENABLER0,
>    or at CPU level by masking PSTATE.I)

A CPU masking interrupts with PSTATE.I will not trigger the spurious interrupt.
KVM doesn't care if the guest is masking interrupts when it fills the LR
registers, and the interrupt state is updated for level-sensitive hardware
interrupts that are in the LR registers on a guest exit. As long as KVM keeps 
the
interrupt in a LR register, the interrupt state will be updated at guest exit
(writing to I{C,S}ENABLER0 triggers one).

>
>   [ timer interrupt is made pending, and remains pending in (v)GIC. ]
>
> 2. Disable the timer
> 3. Enable the timer interrupt (at RDist level by poking the ISENABLER0,
>    or at CPU level by unmasking PSTATE.I)
>
>   [ The interrupt is forwarded to (v)CPU, and we end-up re-enabling the
>     timer inside the timer IRQ handler, which may not be as expected. ]
>
> I'm just not sure if this will be a possible scenario in the Linux, and
> is it harmful if this would happen.

I'm not familiar with hotplug, but I assume that a CPU is offlined with a
PSCI_CPU_OFF call. KVM reset the vcpu when it's brought online again, and that
means updating the timer state in kvm_timer_vcpu_load() -> 
kvm_timer_vcpu_load_gic().

I've been thinking about it, and getting spurious timer interrupts can happen in
another, more common case, when the GIC retires the physical interrupt between
guest exit (when the interrupt state is sampled by KVM) and guest entry. There's
nothing that KVM can do in this case.

>
>> - The timer IRQ handler checks the ISTATUS flag in the timer control register
>> before handling the interrupt. The flag is unset if the timer is disabled.
>
> This actually doesn't match the spec. Arm ARM D13.8.25 has a description
> about the ISTATUS field as below,
>
> "When the value of the ENABLE bit is 0, the ISTATUS field is UNKNOWN."

My bad, when I looked at the Arm ARM I missed that part.

>
> I guess what Nianyao had posted [*] may address the concern above...
>
> [*]
> http://lore.kernel.org/r/1595584037-6877-1-git-send-email-zhangshao...@hisilicon.com/

I've read Marc's explanation, and I tend to agree. Spurious interrupts can 
happen
on real hardware too, and there's really nothing that KVM can do about it.

Thanks,

Alex

>
>>
>> I hope my explanation made sense, please chime in if I missed somethi

Re: [PATCH 0/5] KVM: arm64: pvtime: Fixes and a new cap

2020-07-27 Thread Marc Zyngier

On 2020-07-27 12:02, Andrew Jones wrote:

Hi Marc,

Ping?


On it! ;-)

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/5] KVM: arm64: pvtime: Fixes and a new cap

2020-07-27 Thread Andrew Jones


Hi Marc,

Ping?

Thanks,
drew


On Sat, Jul 11, 2020 at 12:04:29PM +0200, Andrew Jones wrote:
> The first three patches in the series are fixes that come from testing
> and reviewing pvtime code while writing the QEMU support (I'll reply
> to this mail with a link to the QEMU patches after posting - which I'll
> do shortly). The last patch is only a convenience for userspace, and I
> wouldn't be heartbroken if it wasn't deemed worth it. The QEMU patches
> I'll be posting are currently written without the cap. However, if the
> cap is accepted, then I'll change the QEMU code to use it.
> 
> Thanks,
> drew
> 
> Andrew Jones (5):
>   KVM: arm64: pvtime: steal-time is only supported when configured
>   KVM: arm64: pvtime: Fix potential loss of stolen time
>   KVM: arm64: pvtime: Fix stolen time accounting across migration
>   KVM: Documentation minor fixups
>   arm64/x86: KVM: Introduce steal-time cap
> 
>  Documentation/virt/kvm/api.rst| 20 
>  arch/arm64/include/asm/kvm_host.h |  2 +-
>  arch/arm64/kvm/arm.c  |  3 +++
>  arch/arm64/kvm/pvtime.c   | 31 +++
>  arch/x86/kvm/x86.c|  3 +++
>  include/linux/kvm_host.h  | 19 +++
>  include/uapi/linux/kvm.h  |  1 +
>  7 files changed, 58 insertions(+), 21 deletions(-)
> 
> -- 
> 2.25.4
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/9] arm64: Stolen time support

2020-07-27 Thread Steven Price

On 21/07/2020 04:26, zhukeqian wrote:

Hi Steven,


Hi Keqian,


On 2019/8/2 22:50, Steven Price wrote:

This series add support for paravirtualized time for arm64 guests and
KVM hosts following the specification in Arm's document DEN 0057A:

https://developer.arm.com/docs/den0057/a

It implements support for stolen time, allowing the guest to
identify time when it is forcibly not executing.

It doesn't implement support for Live Physical Time (LPT) as there are
some concerns about the overheads and approach in the above

Do you plan to pick up LPT support? As there is demand of cross-frequency 
migration
(from older platform to newer platform).


I don't have any plans to pick up the LPT support at the moment - feel 
free to pick it up! ;)



I am not clear about the overheads and approach problem here, could you please
give some detail information? Maybe we can work together to solve these 
concerns. :-)


Fundamentally the issue here is that LPT only solves one small part of 
migration between different hosts. To successfully migrate between hosts 
with different CPU implementations it is also necessary to be able to 
virtualise various ID registers (e.g. MIDR_EL1, REVIDR_EL1, AIDR_EL1) 
which we have no support for currently.


The problem with just virtualising the registers is how you handle 
errata. The guest will currently use those (and other) ID registers to 
decide whether to enable specific errata workarounds. But what errata 
should be enabled for a guest which might migrate to another host?


What we ideally need is a mechanism to communicate to the guest what 
workarounds are required to successfully run on any of the hosts that 
the guest may be migrated to. You may also have the situation where the 
workarounds required for two hosts are mutually incompatible - something 
needs to understand this and do the "right thing" (most likely just 
reject this situation, i.e. prevent the migration).


There are various options here: e.g. a para-virtualised interface to 
describe the workarounds (but this is hard to do in an OS-agnostic way), 
or virtual-ID registers describing an idealised environment where no 
workarounds are required (and only hosts that have no errata affecting a 
guest would be able to provide this).


Given the above complexity and the fact that Armv8.6-A standardises the 
frequency to 1GHz this didn't seem worth continuing with. So LPT was 
dropped from the spec and patches to avoid holding up the stolen time 
support.


However, if you have a use case which doesn't require such a generic 
migration (e.g. perhaps old and new platforms are based on the same IP) 
then it might be worth looking at bring this back. But to make the 
problem solvable it either needs to be restricted to platforms which are 
substantially the same (so the errata list will be identical), or 
there's work to be done in preparation to deal with migrating a guest 
successfully between hosts with potentially different errata requirements.


Can you share more details about the hosts that you are interested in 
migrating between?


Thanks,

Steve
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Clean up memcache usage for page-table pages

2020-07-27 Thread Will Deacon
On Mon, Jul 27, 2020 at 09:45:39AM +0100, Marc Zyngier wrote:
> On 2020-07-23 12:02, Will Deacon wrote:
> > Here are some small cleanups I made to the memcache logic while hacking
> > on the
> > page-table code. The ioremap() behaviour looks like a bug to me,
> > although it's
> > just a performance thing so nothing urgent.
> > 
> > Cheers,
> > 
> > Will
> > 
> > --->8
> > 
> > Will Deacon (3):
> >   KVM: arm64: Don't free memcache pages in kvm_phys_addr_ioremap()
> >   KVM: arm64: Simplify mmu_topup_memory_cache()
> >   KVM: arm64: Remove mmu_free_memory_cache()
> > 
> >  arch/arm64/kvm/mmu.c | 34 ++
> >  1 file changed, 14 insertions(+), 20 deletions(-)
> 
> Although I'm OK with this series, it conflicts with the changes
> Sean did on the MMU memory cache in the core code, which also
> affects arm64.
> 
> I guess I'll queue patch 1 and 3 as fixes post -rc1. Patch 2 doesn't
> seem to make sense anymore in that context.

Cheers, that sounds good to me. None of this is urgent.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 5/7] KVM: arm64: Move 'invalid syndrome' logic out of io_mem_abort()

2020-07-27 Thread Will Deacon
On Sun, Jul 26, 2020 at 12:55:16PM +0100, Marc Zyngier wrote:
> On Fri, 24 Jul 2020 15:35:04 +0100,
> Will Deacon  wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 73e62d360a36..adb933ecd177 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -2046,6 +2046,20 @@ static void handle_access_fault(struct kvm_vcpu 
> > *vcpu, phys_addr_t fault_ipa)
> > kvm_set_pfn_accessed(pfn);
> >  }
> >  
> > +static int handle_error_invalid_dabt(struct kvm_vcpu *vcpu, struct kvm_run 
> > *run,
> 
> Nit: why the "_error_"? There isn't any error here, only an awkward
> part of the architecture. I'd rather see something like
> handle_nisv_dabt(), which matches what this function actually does.

I chose "_error_" because this handling the case when kvm_is_error_hva() is
true (but I agree that "error" is misleading in both cases).

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/7] KVM: arm64: Update comment when skipping guest MMIO access instruction

2020-07-27 Thread Will Deacon
On Sun, Jul 26, 2020 at 12:08:28PM +0100, Marc Zyngier wrote:
> On Fri, 24 Jul 2020 15:35:00 +0100,
> Will Deacon  wrote:
> > 
> > If a 32-bit guest accesses MMIO using a 16-bit Thumb-2 instruction that
> > is reported to the hypervisor without a valid syndrom (for example,
> > because of the addressing mode), then we may hand off the fault to
> > userspace. When resuming the guest, we unconditionally advance the PC
> > by 4 bytes, since ESR_EL2.IL is always 1 for data aborts generated without
> > a valid syndrome. This is a bit rubbish, but it's also difficult to see
> > how we can fix it without potentially introducing regressions in userspace
> > MMIO fault handling.
> 
> Not quite, see below.
> 
> > 
> > Update the comment when skipping a guest MMIO access instruction so that
> > this corner case is at least written down.
> > 
> > Cc: Marc Zyngier 
> > Cc: Quentin Perret 
> > Signed-off-by: Will Deacon 
> > ---
> >  arch/arm64/kvm/mmio.c | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
> > index 4e0366759726..b54ea5aa6c06 100644
> > --- a/arch/arm64/kvm/mmio.c
> > +++ b/arch/arm64/kvm/mmio.c
> > @@ -113,6 +113,13 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, 
> > struct kvm_run *run)
> > /*
> >  * The MMIO instruction is emulated and should not be re-executed
> >  * in the guest.
> > +*
> > +* Note: If user space handled the emulation because the abort
> > +* symdrome information was not valid (ISV set in the ESR), then
> 
> nits: syndrome, ISV *clear*.

Duh, thanks.

> > +* this will assume that the faulting instruction was 32-bit.
> > +* If the faulting instruction was a 16-bit Thumb instruction,
> > +* then userspace would need to rewind the PC by 2 bytes prior to
> > +* resuming the vCPU (yuck!).
> >  */
> > kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> >  
> 
> That's not how I read it. On ESR_EL2.ISV being clear, and in the
> absence of KVM_CAP_ARM_NISV_TO_USER being set, we return a -ENOSYS from
> io_mem_abort(), exiting back to userspace *without* advertising a MMIO
> access. The VMM is free to do whatever it can to handle it (i.e. not
> much), but crucially we don't go via kvm_handle_mmio_return() on
> resuming the vcpu (unless the VMM sets run->exit_reason to
> KVM_EXIT_MMIO, but that's clearly its own decision).
> 
> Instead, the expectation is that userspace willing to handle an exit
> resulting in ESR_EL2.ISV being clear would instead request a
> KVM_EXIT_ARM_NISV exit type (by enabling KVM_CAP_ARM_NISV_TO_USER),
> getting extra information in the process such as as the fault
> IPA). KVM_EXIT_ARM_NISV clearly states in the documentation:
> 
>   "Note that KVM does not skip the faulting instruction as it does for
>KVM_EXIT_MMIO, but userspace has to emulate any change to the
>processing state if it decides to decode and emulate the instruction."

Thanks, I think you're right. I _thought_ we always reported EXIT_MMIO
for write faults on read-only memslots (as per the documented behaviour),
but actually that goes down the io_mem_abort() path too and so the
skipping only ever occurs when the syndrome is valid.

I'll drop this patch.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/7] KVM: arm64: Rename kvm_vcpu_dabt_isextabt()

2020-07-27 Thread Will Deacon
On Sun, Jul 26, 2020 at 12:15:44PM +0100, Marc Zyngier wrote:
> On Fri, 24 Jul 2020 15:35:01 +0100,
> Will Deacon  wrote:
> > 
> > kvm_vcpu_dabt_isextabt() is not specific to data aborts and has nothing
> > to do with sign extension.
> 
> Not sign extension, but external abort (it reads as "is external
> abort?").
> 
> This is in keeping with all the other helpers (kvm_vcpu_foo_isbar). If
> you want to drop the "data" part (which is indeed not correct), how
> about kvm_vcpu_abt_isexternal? or kvm_vcpu_abt_issea?

kvm_vcpu_abt_issea() would be great -- the problem is that I confused
kvm_vcpu_dabt_issext() and kvm_vcpu_dabt_isextabt() when I was hacking
and if you get the wrong one then you're really hosed!

I'll update for v2.

Cheers,

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 6/7] KVM: arm64: Handle stage-2 faults on stage-1 page-table walks earlier

2020-07-27 Thread Will Deacon
On Sun, Jul 26, 2020 at 02:38:38PM +0100, Marc Zyngier wrote:
> On Fri, 24 Jul 2020 15:35:05 +0100,
> Will Deacon  wrote:
> > 
> > Stage-2 faults on stage-1 page-table walks can occur on both the I-side
> > and the D-side. It is IMPLEMENTATATION DEFINED whether they are reported
> > as reads or writes and, in the case that they are generated by an AT
> > instruction, they are reported with the CM bit set.
> > 
> > All of this deeply confuses the logic in kvm_handle_guest_abort();
> > userspace may or may not see the fault, depending on whether it occurs
> > on the data or the instruction side, and an AT instruction may be skipped
> > if the translation tables are held in a read-only memslot.
> 
> Yuk, that's indeed ugly. Well spotted. I guess the saving grace is
> that a S2 trap caused by an ATS1 instruction will be reported as
> S1PTW+CM, while the fault caused by a CMO is reported as *either*
> S1PTW *or* CM, but never both.

Hmm, is that right? If the translation faults at S2 for a CM instruction,
wouldn't it have both bits set?

> > Move the handling of stage-2 faults on stage-1 page-table walks earlier
> > so that they consistently result in either a data or an instruction abort
> > being re-injected back to the guest.
> 
> The instruction abort seems to be happening as the side effect of
> executing outside of a memslot, not really because of a S1PTW.

Not sure about that. If the instruction fetch generates an S2 abort during
translation, then we could be executing from within a memslot; it's the
location of the page-tables that matters.

However, I think that means things still aren't quite right with my patches
because we can end up calling io_mem_abort() from an instruction abort,
which won't have enough syndrome information to do anything useful. Hmm.

Stepping back, here's what I _think_ we want, although see the '(?)'
bits where I'm particularly unsure:


S2 instruction abort:
  * Not in memslot: inject external iabt to guest
  * In R/O memslot:
- S2 fault on S1 walk:  either EXIT_NISV or inject external iabt
to guest (?)

S2 data abort:
  * Not in memslot:
- S2 fault on S1 walk:  inject external dabt to guest
- Cache maintenance:skip instr
- Syndrome validEXIT_MMIO
- Syndrome invalid  EXIT_NISV
  * In R/O memslot:
- S2 fault on S1 walk:  either EXIT_NISV or inject external dabt
to guest (?)
- Access is write (including cache maintenance (?)):
  - Syndrome valid  EXIT_MMIO
  - Syndrome invalidEXIT_NISV


Everything else gets handled by handle_access_fault()/user_mem_abort().

What do you think?

> I wonder whether these S1PTW faults should be classified as external
> aborts instead (because putting your page tables outside of a memslot
> seems a bit bonkers).

I think that's what this patch does, since we end up in kvm_inject_dabt().

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Clean up memcache usage for page-table pages

2020-07-27 Thread Marc Zyngier

Hi Will,

On 2020-07-23 12:02, Will Deacon wrote:

Hi all,

Here are some small cleanups I made to the memcache logic while hacking 
on the
page-table code. The ioremap() behaviour looks like a bug to me, 
although it's

just a performance thing so nothing urgent.

Cheers,

Will

--->8

Will Deacon (3):
  KVM: arm64: Don't free memcache pages in kvm_phys_addr_ioremap()
  KVM: arm64: Simplify mmu_topup_memory_cache()
  KVM: arm64: Remove mmu_free_memory_cache()

 arch/arm64/kvm/mmu.c | 34 ++
 1 file changed, 14 insertions(+), 20 deletions(-)


Although I'm OK with this series, it conflicts with the changes
Sean did on the MMU memory cache in the core code, which also
affects arm64.

I guess I'll queue patch 1 and 3 as fixes post -rc1. Patch 2 doesn't
seem to make sense anymore in that context.

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm