Re: [PATCH v7 9/10] KVM: arm64: docs: document KVM support of pointer authentication

2019-03-20 Thread Amit Daniel Kachhap

Hi Julien/Kristina,

On 3/21/19 2:26 AM, Kristina Martsenko wrote:

On 20/03/2019 18:06, Julien Thierry wrote:



On 20/03/2019 15:04, Kristina Martsenko wrote:

On 20/03/2019 13:37, Julien Thierry wrote:

Hi Amit,

On 19/03/2019 08:30, Amit Daniel Kachhap wrote:

This adds sections for KVM API extension for pointer authentication.
A brief description about usage of pointer authentication for KVM guests
is added in the arm64 documentations.


[...]


diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 7de9eee..b5c66bc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2659,6 +2659,12 @@ Possible features:
  Depends on KVM_CAP_ARM_PSCI_0_2.
- KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
  Depends on KVM_CAP_ARM_PMU_V3.
+   - KVM_ARM_VCPU_PTRAUTH_ADDRESS:
+   - KVM_ARM_VCPU_PTRAUTH_GENERIC:
+ Enables Pointer authentication for the CPU.
+ Depends on KVM_CAP_ARM_PTRAUTH and only on arm64 architecture. If
+ set, then the KVM guest allows the execution of pointer authentication
+ instructions. Otherwise, KVM treats these instructions as undefined.
  


Overall I feel one could easily get confused to whether
PTRAUTH_ADDRESS/GENERIC are two individual features, whether one is a
superset of the other, if the names are just an alias of one another, etc...

I think the doc should at least stress out that *both* flags are
required to enable ptrauth in a guest. However it raises the question,
if we don't plan to support the features individually (because we
can't), should we really expose two feature flags? I seems odd to
introduce two flags that only do something if used together...


Why can't we support the features individually? For example, if we ever
get a system where all CPUs support address authentication and none of
them support generic authentication, then we could still support address
authentication in the guest.




That's a good point, I didn't think of that.

Although, currently we don't have a way to detect that we are in such a
configuration. So as is, both flags are required to enable either
feature, and I feel the documentation should be clear on that aspect.


For now we only support enabling both features together, so both flags
need to be set. I agree that the documentation should be made clear on this.

In the future, if we need to, we can add "negative" cpucaps to detect
that a feature is absent on all CPUs.



Another option would be to introduce a flag that enables both for now,
and if one day we decide to support the configuration you mentioned we
could add "more modular" flags that allow you to control those features
individually. While a bit cumbersome, I would find that less awkward
than having two flags that only do something if both are present.


That would work too.

I find it more logical to have two flags since there are two features
(two ID register fields), and KVM enables two features for the guest.
The fact that KVM does not currently support enabling them separately is
a KVM implementation choice, and could change in the future.
Kristina, this comments of yours is actually what this patch series is 
trying to do. I should have added more details about the necessity of 
keeping two flags and enhancement of them is actually a future work.


Thanks,
Amit Daniel


Thanks,
Kristina


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 7/10] KVM: arm/arm64: context-switch ptrauth registers

2019-03-20 Thread Amit Daniel Kachhap

Hi Julien,

On 3/20/19 5:43 PM, Julien Thierry wrote:

Hi Amit,

On 19/03/2019 08:30, Amit Daniel Kachhap wrote:

From: Mark Rutland 

When pointer authentication is supported, a guest may wish to use it.
This patch adds the necessary KVM infrastructure for this to work, with
a semi-lazy context switch of the pointer auth state.

Pointer authentication feature is only enabled when VHE is built
in the kernel and present in the CPU implementation so only VHE code
paths are modified.

When we schedule a vcpu, we disable guest usage of pointer
authentication instructions and accesses to the keys. While these are
disabled, we avoid context-switching the keys. When we trap the guest
trying to use pointer authentication functionality, we change to eagerly
context-switching the keys, and enable the feature. The next time the
vcpu is scheduled out/in, we start again. However the host key save is
optimized and implemented inside ptrauth instruction/register access
trap.

Pointer authentication consists of address authentication and generic
authentication, and CPUs in a system might have varied support for
either. Where support for either feature is not uniform, it is hidden
from guests via ID register emulation, as a result of the cpufeature
framework in the host.

Unfortunately, address authentication and generic authentication cannot
be trapped separately, as the architecture provides a single EL2 trap
covering both. If we wish to expose one without the other, we cannot
prevent a (badly-written) guest from intermittently using a feature
which is not uniformly supported (when scheduled on a physical CPU which
supports the relevant feature). Hence, this patch expects both type of
authentication to be present in a cpu.

This switch of key is done from guest enter/exit assembly as preperation
for the upcoming in-kernel pointer authentication support. Hence, these
key switching routines are not implemented in C code as they may cause
pointer authentication key signing error in some situations.

Signed-off-by: Mark Rutland 
[Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
, save host key in ptrauth exception trap]
Signed-off-by: Amit Daniel Kachhap 
Reviewed-by: Julien Thierry 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: kvmarm@lists.cs.columbia.edu
---
  arch/arm64/include/asm/kvm_host.h|  17 ++
  arch/arm64/include/asm/kvm_ptrauth_asm.h | 100 +++
  arch/arm64/kernel/asm-offsets.c  |   6 ++
  arch/arm64/kvm/guest.c   |  14 +
  arch/arm64/kvm/handle_exit.c |  24 +---
  arch/arm64/kvm/hyp/entry.S   |   7 +++
  arch/arm64/kvm/reset.c   |   7 +++
  arch/arm64/kvm/sys_regs.c|  46 +-
  virt/kvm/arm/arm.c   |   2 +
  9 files changed, 212 insertions(+), 11 deletions(-)
  create mode 100644 arch/arm64/include/asm/kvm_ptrauth_asm.h


[...]

+#ifdef CONFIG_ARM64_PTR_AUTH
+
+#define PTRAUTH_REG_OFFSET(x)  (x - CPU_APIAKEYLO_EL1)


I don't really see the point of this macro. You move the pointers of
kvm_cpu_contexts to point to where the ptr auth registers are (which is
in the middle of an array) by adding the offset of APIAKEYLO and then we
have to recompute all offsets with this macro.

Why not just pass the kvm_cpu_context pointers to
ptrauth_save/restore_state and use the already defined offsets
(CPU_AP*_EL1) directly?

I think this would also allow to use one less register for the
ptrauth_switch_to_* macros.
Actually the values of CPU_AP*_EL1 are exceeding the immediate range 
(i.e 512), so this was done to keep the immediate offset within the range.
The other way would have been to calculate the destination register but 
these would add one more add instruction everywhere.

I should have mentioned them as comments somewhere.



+
+.macro ptrauth_save_state base, reg1, reg2
+   mrs_s   \reg1, SYS_APIAKEYLO_EL1
+   mrs_s   \reg2, SYS_APIAKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)]
+   mrs_s   \reg1, SYS_APIBKEYLO_EL1
+   mrs_s   \reg2, SYS_APIBKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)]
+   mrs_s   \reg1, SYS_APDAKEYLO_EL1
+   mrs_s   \reg2, SYS_APDAKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDAKEYLO_EL1)]
+   mrs_s   \reg1, SYS_APDBKEYLO_EL1
+   mrs_s   \reg2, SYS_APDBKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDBKEYLO_EL1)]
+   mrs_s   \reg1, SYS_APGAKEYLO_EL1
+   mrs_s   \reg2, SYS_APGAKEYHI_EL1
+   stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APGAKEYLO_EL1)]
+.endm
+
+.macro ptrauth_restore_state base, reg1, reg2
+   ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)]
+   msr_s   SYS_APIAKEYLO_EL1, \reg1
+   msr_s   SYS_APIAKEYHI_EL1, \reg2
+   ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)]
+  

Re: [PATCH v7 3/10] KVM: arm64: Move hyp_symbol_addr to fix dependency

2019-03-20 Thread Amit Daniel Kachhap

Hi Julien,

On 3/20/19 2:19 PM, Julien Thierry wrote:

Hi Amit,

On 19/03/2019 08:30, Amit Daniel Kachhap wrote:

Currently hyp_symbol_addr is palced in kvm_mmu.h which is mostly
used by __hyp_this_cpu_ptr in kvm_asm.h but it cannot include
kvm_mmu.h directly as kvm_mmu.h uses kvm_ksym_ref which is
defined inside kvm_asm.h. Hence, hyp_symbol_addr is moved inside
kvm_asm.h to fix this dependency on each other.

Also kvm_ksym_ref is corresponding counterpart of hyp_symbol_addr
so should be ideally placed inside kvm_asm.h.



This part is a bit confusing, it lead me to think that kvm_ksym_ref was
in kvm_mmu.h and should moved to kvm_asm.h as well. I'd suggest
rephrasing it with something along the lines:

"Also, hyp_symbol_addr corresponding counterpart, kvm_ksym_ref, is
already in kvm_asm.h, making it more sensible to move kvm_symbol_addr to
the same file."

ok, will rephrase.


Otherwise:

Reviewed-by: Julien Thierry 


Thanks,
Amit


Cheers,

Julien


Suggested by: James Morse 
Signed-off-by: Amit Daniel Kachhap 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: kvmarm@lists.cs.columbia.edu
---
  arch/arm64/include/asm/kvm_asm.h | 20 
  arch/arm64/include/asm/kvm_mmu.h | 20 
  2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f5b79e9..57a07e8 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -80,6 +80,26 @@ extern void __vgic_v3_init_lrs(void);
  
  extern u32 __kvm_get_mdcr_el2(void);
  
+/*

+ * Obtain the PC-relative address of a kernel symbol
+ * s: symbol
+ *
+ * The goal of this macro is to return a symbol's address based on a
+ * PC-relative computation, as opposed to a loading the VA from a
+ * constant pool or something similar. This works well for HYP, as an
+ * absolute VA is guaranteed to be wrong. Only use this if trying to
+ * obtain the address of a symbol (i.e. not something you obtained by
+ * following a pointer).
+ */
+#define hyp_symbol_addr(s) \
+   ({  \
+   typeof(s) *addr;\
+   asm("adrp  %0, %1\n"  \
+   "add   %0, %0, :lo12:%1\n"\
+   : "=r" (addr) : "S" (&s));  \
+   addr;   \
+   })
+
  /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */
  #define __hyp_this_cpu_ptr(sym)   
\
({  \
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index b0742a1..3dea6af 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -118,26 +118,6 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
  #define kern_hyp_va(v)((typeof(v))(__kern_hyp_va((unsigned long)(v
  
  /*

- * Obtain the PC-relative address of a kernel symbol
- * s: symbol
- *
- * The goal of this macro is to return a symbol's address based on a
- * PC-relative computation, as opposed to a loading the VA from a
- * constant pool or something similar. This works well for HYP, as an
- * absolute VA is guaranteed to be wrong. Only use this if trying to
- * obtain the address of a symbol (i.e. not something you obtained by
- * following a pointer).
- */
-#define hyp_symbol_addr(s) \
-   ({  \
-   typeof(s) *addr;\
-   asm("adrp  %0, %1\n"  \
-   "add   %0, %0, :lo12:%1\n"\
-   : "=r" (addr) : "S" (&s));  \
-   addr;   \
-   })
-
-/*
   * We currently support using a VM-specified IPA size. For backward
   * compatibility, the default IPA size is fixed to 40bits.
   */




___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 9/10] KVM: arm64: docs: document KVM support of pointer authentication

2019-03-20 Thread Kristina Martsenko
On 20/03/2019 18:06, Julien Thierry wrote:
> 
> 
> On 20/03/2019 15:04, Kristina Martsenko wrote:
>> On 20/03/2019 13:37, Julien Thierry wrote:
>>> Hi Amit,
>>>
>>> On 19/03/2019 08:30, Amit Daniel Kachhap wrote:
 This adds sections for KVM API extension for pointer authentication.
 A brief description about usage of pointer authentication for KVM guests
 is added in the arm64 documentations.
>>
>> [...]
>>
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 7de9eee..b5c66bc 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2659,6 +2659,12 @@ Possible features:
  Depends on KVM_CAP_ARM_PSCI_0_2.
- KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
  Depends on KVM_CAP_ARM_PMU_V3.
 +  - KVM_ARM_VCPU_PTRAUTH_ADDRESS:
 +  - KVM_ARM_VCPU_PTRAUTH_GENERIC:
 +Enables Pointer authentication for the CPU.
 +Depends on KVM_CAP_ARM_PTRAUTH and only on arm64 architecture. If
 +set, then the KVM guest allows the execution of pointer authentication
 +instructions. Otherwise, KVM treats these instructions as undefined.
  
>>>
>>> Overall I feel one could easily get confused to whether
>>> PTRAUTH_ADDRESS/GENERIC are two individual features, whether one is a
>>> superset of the other, if the names are just an alias of one another, etc...
>>>
>>> I think the doc should at least stress out that *both* flags are
>>> required to enable ptrauth in a guest. However it raises the question,
>>> if we don't plan to support the features individually (because we
>>> can't), should we really expose two feature flags? I seems odd to
>>> introduce two flags that only do something if used together...
>>
>> Why can't we support the features individually? For example, if we ever
>> get a system where all CPUs support address authentication and none of
>> them support generic authentication, then we could still support address
>> authentication in the guest.
>>
>>
> 
> That's a good point, I didn't think of that.
> 
> Although, currently we don't have a way to detect that we are in such a
> configuration. So as is, both flags are required to enable either
> feature, and I feel the documentation should be clear on that aspect.

For now we only support enabling both features together, so both flags
need to be set. I agree that the documentation should be made clear on this.

In the future, if we need to, we can add "negative" cpucaps to detect
that a feature is absent on all CPUs.

> 
> Another option would be to introduce a flag that enables both for now,
> and if one day we decide to support the configuration you mentioned we
> could add "more modular" flags that allow you to control those features
> individually. While a bit cumbersome, I would find that less awkward
> than having two flags that only do something if both are present.

That would work too.

I find it more logical to have two flags since there are two features
(two ID register fields), and KVM enables two features for the guest.
The fact that KVM does not currently support enabling them separately is
a KVM implementation choice, and could change in the future.

Thanks,
Kristina
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 9/10] KVM: arm64: docs: document KVM support of pointer authentication

2019-03-20 Thread Julien Thierry



On 20/03/2019 15:04, Kristina Martsenko wrote:
> On 20/03/2019 13:37, Julien Thierry wrote:
>> Hi Amit,
>>
>> On 19/03/2019 08:30, Amit Daniel Kachhap wrote:
>>> This adds sections for KVM API extension for pointer authentication.
>>> A brief description about usage of pointer authentication for KVM guests
>>> is added in the arm64 documentations.
> 
> [...]
> 
>>> diff --git a/Documentation/virtual/kvm/api.txt 
>>> b/Documentation/virtual/kvm/api.txt
>>> index 7de9eee..b5c66bc 100644
>>> --- a/Documentation/virtual/kvm/api.txt
>>> +++ b/Documentation/virtual/kvm/api.txt
>>> @@ -2659,6 +2659,12 @@ Possible features:
>>>   Depends on KVM_CAP_ARM_PSCI_0_2.
>>> - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
>>>   Depends on KVM_CAP_ARM_PMU_V3.
>>> +   - KVM_ARM_VCPU_PTRAUTH_ADDRESS:
>>> +   - KVM_ARM_VCPU_PTRAUTH_GENERIC:
>>> + Enables Pointer authentication for the CPU.
>>> + Depends on KVM_CAP_ARM_PTRAUTH and only on arm64 architecture. If
>>> + set, then the KVM guest allows the execution of pointer authentication
>>> + instructions. Otherwise, KVM treats these instructions as undefined.
>>>  
>>
>> Overall I feel one could easily get confused to whether
>> PTRAUTH_ADDRESS/GENERIC are two individual features, whether one is a
>> superset of the other, if the names are just an alias of one another, etc...
>>
>> I think the doc should at least stress out that *both* flags are
>> required to enable ptrauth in a guest. However it raises the question,
>> if we don't plan to support the features individually (because we
>> can't), should we really expose two feature flags? I seems odd to
>> introduce two flags that only do something if used together...
> 
> Why can't we support the features individually? For example, if we ever
> get a system where all CPUs support address authentication and none of
> them support generic authentication, then we could still support address
> authentication in the guest.
> 
> 

That's a good point, I didn't think of that.

Although, currently we don't have a way to detect that we are in such a
configuration. So as is, both flags are required to enable either
feature, and I feel the documentation should be clear on that aspect.

Another option would be to introduce a flag that enables both for now,
and if one day we decide to support the configuration you mentioned we
could add "more modular" flags that allow you to control those features
individually. While a bit cumbersome, I would find that less awkward
than having two flags that only do something if both are present.

Thanks,

-- 
Julien Thierry
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH -next] KVM: arm: vgic: Make two functions static

2019-03-20 Thread Marc Zyngier
On Wed, 20 Mar 2019 22:18:13 +0800
Yue Haibing  wrote:

> From: YueHaibing 
> 
> Fix sparse warnings:
> 
> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1732:5: warning:
>  symbol 'vgic_its_has_attr_regs' was not declared. Should it be static?
> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1753:5: warning:
>  symbol 'vgic_its_attr_regs_access' was not declared. Should it be static?
> 
> Signed-off-by: YueHaibing 

Applied with a small tweak to the subject line.

Thanks,

M.
-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Marc Zyngier
On Wed, 20 Mar 2019 14:57:19 +
Suzuki K Poulose  wrote:

> We rely on the mmu_notifier call backs to handle the split/merge
> of huge pages and thus we are guaranteed that, while creating a
> block mapping, either the entire block is unmapped at stage2 or it
> is missing permission.
> 
> However, we miss a case where the block mapping is split for dirty
> logging case and then could later be made block mapping, if we cancel the
> dirty logging. This not only creates inconsistent TLB entries for
> the pages in the the block, but also leakes the table pages for
> PMD level.
> 
> Handle this corner case for the huge mappings at stage2 by
> unmapping the non-huge mapping for the block. This could potentially
> release the upper level table. So we need to restart the table walk
> once we unmap the range.
> 
> Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages")
> Reported-by: Zheng Xiang 
> Cc: Zheng Xiang 
> Cc: Zhenghui Yu 
   ^
> Cc: Marc Zyngier 
> Cc: Christoffer Dall 
> Signed-off-by: Suzuki K Poulose 

Applied, with Zenghui's name fixed.

Thanks,

M.
-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Marc Zyngier
On Wed, 20 Mar 2019 11:12:47 +
Suzuki K Poulose  wrote:

> Marc,
> 
> On 20/03/2019 10:35, Marc Zyngier wrote:
> > On Wed, 20 Mar 2019 10:23:39 +
> > Suzuki K Poulose  wrote:
> > 
> > Hi Suzuki,
> >   
> >> Marc,
> >>
> >> On 20/03/2019 10:11, Marc Zyngier wrote:  
> >>> On Wed, 20 Mar 2019 09:44:38 +
> >>> Suzuki K Poulose  wrote:  
> >>> Hi Marc,  
> 
>  On 20/03/2019 08:15, Marc Zyngier wrote:  
> > Hi Suzuki,
> >
> > On Tue, 19 Mar 2019 14:11:08 +,
> > Suzuki K Poulose  wrote:  
> 
> ...
> 
> >> +  if (!pmd_thp_or_huge(old_pmd)) {
> >> +  unmap_stage2_range(kvm, addr & S2_PMD_MASK, 
> >> S2_PMD_SIZE);
> >> +  goto retry;  
> >   >>  
> >> +  if (!stage2_pud_huge(kvm, old_pud)) {
> >> +  unmap_stage2_range(kvm, addr & S2_PUD_MASK, 
> >> S2_PUD_SIZE);  
> >   >>  
>  We should really get rid of the S2_P{U/M}D_* definitions, as they are
>  always the same as the host. The only thing that changes is the PGD size
>  which varies according to the IPA and the concatenation.  
>    >>  
> >> Also what do you think about using  P{M,U}D_* instead of S2_P{M,U}D_*
> >> above ? I could make that change with the respin.  
> > 
> > Given that this is a fix, I'd like it to be as small as obvious as
> > possible, making it easier to backport.
> > 
> > I'm happy to take another patch for 5.2 that will drop the whole S2_P*
> > if we still think that this should be the case (though what I'd really
> > like is to have architectural levels instead of these arbitrary
> > definitions).  
> 
> I only meant the two new instances added above in the patch. Of course, I
> could send something to fix the existing ones.

I'd rather be consistent, and use the same names all over the code.
Once we decide to change, we do it all in one go.

Thanks,

M.
-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC] Question about enable doorbell irq and halt_poll process

2019-03-20 Thread Marc Zyngier
On Tue, 19 Mar 2019 21:25:47 +0800
"Tangnianyao (ICT)"  wrote:

> Hi, all
> 
> Using gicv4, when guest is waiting for irq, it sends wfi and traps to kvm.
> When vlpi is forwarded to PE after its_vpe_deschedule, before halt_poll in
> kvm_vcpu_block, halt_poll may increase latency for this vlpi getting to guest.
> In halt_poll process, it checks if there's pending irq for vcpu using 
> pending_last.
> However, doorbell is not enable at this moment and vlpi or doorbell can not 
> set
> pending_last true, to stop halt_poll. It will run until halt_poll time ends, 
> if
> there's no other physical irq coming in the meantime. And then vcpu is 
> scheduled out.
> This pending vlpi has to wait for vcpu getting schedule in next time.
> 
> Should we enable doorbell before halt_poll process ?

Enabling doorbells can be quite expensive. Depending on the HW, this is
either:

- a write to memory (+DSB, potential cache maintenance), a write to the
  INVLPI register, and a poll of the SYNC register
- a write to memory (+DSB, potential cache maintenance), potentially
  a string of DISCARD+SYNC+MAPI+SYNC commands, and an INV+SYNC command

Frankly, you want to be careful with that. I'd rather enable them late
and have a chance of not blocking because of another (virtual)
interrupt, which saves us the doorbell business.

I wonder if you wouldn't be in a better position by drastically
reducing halt_poll_ns for vcpu that can have directly injected
interrupts.

In any case, this is something that we should measure, not guess.

Thanks,

M.
-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-20 Thread Jean-Philippe Brucker
On 20/03/2019 16:37, Jacob Pan wrote:
[...]
>> +struct iommu_inv_addr_info {
>> +#define IOMMU_INV_ADDR_FLAGS_PASID  (1 << 0)
>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID (1 << 1)
>> +#define IOMMU_INV_ADDR_FLAGS_LEAF   (1 << 2)
>> +__u32   flags;
>> +__u32   archid;
>> +__u64   pasid;
>> +__u64   addr;
>> +__u64   granule_size;
>> +__u64   nb_granules;
>> +};
>> +
>> +/**
>> + * First level/stage invalidation information
>> + * @cache: bitfield that allows to select which caches to invalidate
>> + * @granularity: defines the lowest granularity used for the
>> invalidation:
>> + * domain > pasid > addr
>> + *
>> + * Not all the combinations of cache/granularity make sense:
>> + *
>> + * type |   DEV_IOTLB   | IOTLB |  PASID|
>> + * granularity  |   |   |
>> cache|
>> + * -+---+---+---+
>> + * DOMAIN   |   N/A |   Y   |
>> Y|
>> + * PASID|   Y   |   Y   |
>> Y|
>> + * ADDR |   Y   |   Y   |
>> N/A  |
>> + */
>> +struct iommu_cache_invalidate_info {
>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>> +__u32   version;
>> +/* IOMMU paging structure cache */
>> +#define IOMMU_CACHE_INV_TYPE_IOTLB  (1 << 0) /* IOMMU IOTLB */
>> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB  (1 << 1) /* Device
>> IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID
>> cache */
> Just a clarification, this used to be an enum. You do intend to issue a
> single invalidation request on multiple cache types? Perhaps for
> virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d
> we plan to issue one cache type at a time for now. So this format works
> for us.

Yes for virtio-iommu I'd like as little overhead as possible, which
means a single invalidation message to hit both IOTLB and ATC at once,
and the ability to specify multiple pages with @nb_granules.

> However, if multiple cache types are issued in a single invalidation.
> They must share a single granularity, not all combinations are valid.
> e.g. dev IOTLB does not support domain granularity. Just a reminder,
> not an issue. Driver could filter out invalid combinations.

Agreed. Even the core could filter out invalid combinations based on the
table above: IOTLB and domain granularity are N/A.

Thanks,
Jean

> 
>> +__u8cache;
>> +__u8granularity;
>> +__u8padding[2];
>> +union {
>> +__u64   pasid;
>> +struct iommu_inv_addr_info addr_info;
>> +};
>> +};
>> +
>> +
>>  #endif /* _UAPI_IOMMU_H */
> 
> [Jacob Pan]
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH -next] KVM: arm: vgic: Make two functions static

2019-03-20 Thread Yue Haibing
From: YueHaibing 

Fix sparse warnings:

arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1732:5: warning:
 symbol 'vgic_its_has_attr_regs' was not declared. Should it be static?
arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1753:5: warning:
 symbol 'vgic_its_attr_regs_access' was not declared. Should it be static?

Signed-off-by: YueHaibing 
---
 virt/kvm/arm/vgic/vgic-its.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index ab3f477..a97db69 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1729,8 +1729,8 @@ static void vgic_its_destroy(struct kvm_device *kvm_dev)
kfree(its);
 }
 
-int vgic_its_has_attr_regs(struct kvm_device *dev,
-  struct kvm_device_attr *attr)
+static int vgic_its_has_attr_regs(struct kvm_device *dev,
+ struct kvm_device_attr *attr)
 {
const struct vgic_register_region *region;
gpa_t offset = attr->attr;
@@ -1750,9 +1750,9 @@ int vgic_its_has_attr_regs(struct kvm_device *dev,
return 0;
 }
 
-int vgic_its_attr_regs_access(struct kvm_device *dev,
- struct kvm_device_attr *attr,
- u64 *reg, bool is_write)
+static int vgic_its_attr_regs_access(struct kvm_device *dev,
+struct kvm_device_attr *attr,
+u64 *reg, bool is_write)
 {
const struct vgic_register_region *region;
struct vgic_its *its;
-- 
2.7.0


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-20 Thread Jacob Pan
On Sun, 17 Mar 2019 18:22:15 +0100
Eric Auger  wrote:

> From: "Liu, Yi L" 
> 
> In any virtualization use case, when the first translation stage
> is "owned" by the guest OS, the host IOMMU driver has no knowledge
> of caching structure updates unless the guest invalidation activities
> are trapped by the virtualizer and passed down to the host.
> 
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own
> format.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> v5 -> v6:
> - fix merge issue
> 
> v3 -> v4:
> - full reshape of the API following Alex' comments
> 
> v1 -> v2:
> - add arch_id field
> - renamed tlb_invalidate into cache_invalidate as this API allows
>   to invalidate context caches on top of IOTLBs
> 
> v1:
> renamed sva_invalidate into tlb_invalidate and add iommu_ prefix in
> header. Commit message reworded.
> ---
>  drivers/iommu/iommu.c  | 14 
>  include/linux/iommu.h  | 15 
>  include/uapi/linux/iommu.h | 71
> ++ 3 files changed, 100
> insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 7d9285cea100..b72e326ddd41 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1544,6 +1544,20 @@ void iommu_detach_pasid_table(struct
> iommu_domain *domain) }
>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>  
> +int iommu_cache_invalidate(struct iommu_domain *domain, struct
> device *dev,
> +struct iommu_cache_invalidate_info
> *inv_info) +{
> + int ret = 0;
> +
> + if (unlikely(!domain->ops->cache_invalidate))
> + return -ENODEV;
> +
> + ret = domain->ops->cache_invalidate(domain, dev, inv_info);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
> struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index fb9b7a8de25f..7c7c6bad1420 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -191,6 +191,7 @@ struct iommu_resv_region {
>   *  driver init to device driver init (default
> no)
>   * @attach_pasid_table: attach a pasid table
>   * @detach_pasid_table: detach the pasid table
> + * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -239,6 +240,9 @@ struct iommu_ops {
> struct iommu_pasid_table_config
> *cfg); void (*detach_pasid_table)(struct iommu_domain *domain);
>  
> + int (*cache_invalidate)(struct iommu_domain *domain, struct
> device *dev,
> + struct iommu_cache_invalidate_info
> *inv_info); +
>   unsigned long pgsize_bitmap;
>  };
>  
> @@ -349,6 +353,9 @@ extern void iommu_detach_device(struct
> iommu_domain *domain, extern int iommu_attach_pasid_table(struct
> iommu_domain *domain, struct iommu_pasid_table_config *cfg);
>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
> +   struct device *dev,
> +   struct iommu_cache_invalidate_info
> *inv_info); extern struct iommu_domain
> *iommu_get_domain_for_dev(struct device *dev); extern struct
> iommu_domain *iommu_get_dma_domain(struct device *dev); extern int
> iommu_map(struct iommu_domain *domain, unsigned long iova, @@ -797,6
> +804,14 @@ int iommu_attach_pasid_table(struct iommu_domain *domain,
> static inline void iommu_detach_pasid_table(struct iommu_domain
> *domain) {} 
> +static inline int
> +iommu_cache_invalidate(struct iommu_domain *domain,
> +struct device *dev,
> +struct iommu_cache_invalidate_info *inv_info)
> +{
> + return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #ifdef CONFIG_IOMMU_DEBUGFS
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 532a64075f23..e4c6a447e85a 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -159,4 +159,75 @@ struct iommu_pasid_table_config {
>   };
>  };
>  
> +/* defines the granularity of the invalidation */
> +enum iommu_inv_granularity {
> + IOMMU_INV_GRANU_DOMAIN, /* domain-selective
> invalidation */
> + IOMMU_INV_GRANU_PASID,  /* pasid-selective
> invalidation */
> + IOMMU_INV_GRANU_ADDR,   /* page-selective invalidation
> */ +};
> +
> +/**
> + * Address Selective Invalidation Structure
> + *
> + * @flags indicates the granularity of the address-selective
> invalidat

Re: [RFC] arm/cpu: fix soft lockup panic after resuming from stop

2019-03-20 Thread Steven Price
On 19/03/2019 14:39, Heyi Guo wrote:
> Hi Christoffer and Steve,
> 
> 
> On 2019/3/15 16:59, Christoffer Dall wrote:
>> Hi Steve,
>>
>> On Wed, Mar 13, 2019 at 10:11:30AM +, Steven Price wrote:
>>> Personally I think what we need is:
>>>
>>> * Either a patch like the one from Heyi Guo (save/restore CNTVCT_EL0) or
>>> alternatively hooking up KVM_KVMCLOCK_CTRL to prevent the watchdog
>>> firing when user space explicitly stops scheduling the guest for a
>>> while.
>> If we save/restore CNTVCT_EL0 and the warning goes away, does the guest
>> wall clock timekeeping get all confused and does it figure this out
>> automagically somehow?
> What's the source for guest wall clock timekeeping in current ARM64
> implementation? Is it the value from CNTP_TVAL_EL0? Will guest OS keep
> track of this? Or is the wall clock simply platform RTC?
> 
> I tested the RFC change and did not see anything unusual. Did I miss
> someting?

Are you running ARM or ARM64? I'm assuming ARM64 here...

Unless I'm mistaken you should see the time reported by the guest not
progress when you have stopped it using the QEMU monitor console.

Running something like "while /bin/true; do date; sleep 1; done" should
allow you to see that without the patch time will jump in the guest
(i.e. the time reflects wall-clock time). With the patch I believe it
will not jump (i.e. the clock will be behind wall-clock time after the
pause).

However, this behaviour does depend on the exact system being emulated.
From drivers/clocksource/arm_arch_timer.c:

> static void __init arch_counter_register(unsigned type)
> {
>   u64 start_count;
> 
>   /* Register the CP15 based counter if we have one */
>   if (type & ARCH_TIMER_TYPE_CP15) {
>   if ((IS_ENABLED(CONFIG_ARM64) && !is_hyp_mode_available()) ||
>   arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
>   arch_timer_read_counter = arch_counter_get_cntvct;
>   else
>   arch_timer_read_counter = arch_counter_get_cntpct;
> 
>   clocksource_counter.archdata.vdso_direct = vdso_default;
>   } else {
>   arch_timer_read_counter = arch_counter_get_cntvct_mem;
>   }

So we can see here that there are three functions for reading the
counter - there's the memory interface for CNTVCT, the "CP15" interface
also for CNTVCT, and an interface for CNTPCT. CNTPCT is only used for
!ARM64 or if hyp mode is available (not relevant until nested
virtualisation is added).

The upshot is that on arm64 we're using the virtual counter (CNTVCT).

>>
>> Does KVM_KVMCLOCK_CTRL solve that problem?
> It seems KVM_KVMCLOCK_CTRL is dedicated for guest pause/resume scenario,
> but it relies on pvclock both in KVM and Guest and right now only X86
> supports it :(

KVM_KVMCLOCK_CTRL simply provides a mechanism for the host to inform the
guest that a vCPU hasn't been scheduled for "a while". The guest can
then use that information to avoid triggering the watchdog timeout. As
you note it is only currently implemented for X86.

> Could Steve provide more insights about the whole thing?

I'll try - see below.

> Thanks,
> Heyi
> 
>>
>>> * KVM itself saving/restoring CNTVCT_EL0 during suspend/resume so the
>>> guest doesn't see time pass during a suspend.
>> This smells like policy to me so I'd much prefer keeping as much
>> functionality in user space as possible.  If we already have the APIs we
>> need from KVM, let's use them.

The problem with suspend/resume is that user space doesn't get a
look-in. There's no way (generically) for a user space application to
save/restore registers during the suspend. User space simply sees time
jump forwards - which is precisely what we're trying to hide from the
guest (as it otherwise gets upset that it appears a vCPU has deadlocked
for a while).

So while I agree this is "policy", it's something we need the kernel to
do on our behalf. Potentially we can put it behind an ioctl so that user
space can opt-in to it.

>>> * Something equivalent to MSR_KVM_WALL_CLOCK_NEW for arm which allows
>>> the guest to query the wall clock time from the host and provides an
>>> offset between CNTVCT_EL0 to wall clock time which the KVM can update
>>> during suspend/resume. This means that during a suspend/resume the guest
>>> can observe that wall clock time has passed, without having to be
>>> bothered about CNTVCT_EL0 jumping forwards.
>>>
>> Isn't the proper Arm architectural solution for this to read the
>> physical counter for wall clock time keeping ?
>>
>> (Yes that will require a trap on physical counter reads after migration
>> on current systems, but migration sucks in terms of timekeeping
>> already.)

The physical counter is certainly tempting as it largely does what we
want as a secondary counter. However I'm wary of using it because it
starts to get "really interesting" when dealing with nested
virtualisation. Any by "really interesting" I of course mean horribly
broken :)

Basically the pro

Re: [PATCH v7 9/10] KVM: arm64: docs: document KVM support of pointer authentication

2019-03-20 Thread Kristina Martsenko
On 20/03/2019 13:37, Julien Thierry wrote:
> Hi Amit,
> 
> On 19/03/2019 08:30, Amit Daniel Kachhap wrote:
>> This adds sections for KVM API extension for pointer authentication.
>> A brief description about usage of pointer authentication for KVM guests
>> is added in the arm64 documentations.

[...]

>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index 7de9eee..b5c66bc 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2659,6 +2659,12 @@ Possible features:
>>Depends on KVM_CAP_ARM_PSCI_0_2.
>>  - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
>>Depends on KVM_CAP_ARM_PMU_V3.
>> +- KVM_ARM_VCPU_PTRAUTH_ADDRESS:
>> +- KVM_ARM_VCPU_PTRAUTH_GENERIC:
>> +  Enables Pointer authentication for the CPU.
>> +  Depends on KVM_CAP_ARM_PTRAUTH and only on arm64 architecture. If
>> +  set, then the KVM guest allows the execution of pointer authentication
>> +  instructions. Otherwise, KVM treats these instructions as undefined.
>>  
> 
> Overall I feel one could easily get confused to whether
> PTRAUTH_ADDRESS/GENERIC are two individual features, whether one is a
> superset of the other, if the names are just an alias of one another, etc...
> 
> I think the doc should at least stress out that *both* flags are
> required to enable ptrauth in a guest. However it raises the question,
> if we don't plan to support the features individually (because we
> can't), should we really expose two feature flags? I seems odd to
> introduce two flags that only do something if used together...

Why can't we support the features individually? For example, if we ever
get a system where all CPUs support address authentication and none of
them support generic authentication, then we could still support address
authentication in the guest.

Thanks,
Kristina
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Suzuki K Poulose
We rely on the mmu_notifier call backs to handle the split/merge
of huge pages and thus we are guaranteed that, while creating a
block mapping, either the entire block is unmapped at stage2 or it
is missing permission.

However, we miss a case where the block mapping is split for dirty
logging case and then could later be made block mapping, if we cancel the
dirty logging. This not only creates inconsistent TLB entries for
the pages in the the block, but also leakes the table pages for
PMD level.

Handle this corner case for the huge mappings at stage2 by
unmapping the non-huge mapping for the block. This could potentially
release the upper level table. So we need to restart the table walk
once we unmap the range.

Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages")
Reported-by: Zheng Xiang 
Cc: Zheng Xiang 
Cc: Zhenghui Yu 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---

Changes since v1:
 - Fix build break on arm32
   - Add missing definitions for S2_PUD_*
   - Use kvm_pud_pfn() instead of pud_pfn()
 - Make PUD handling similar to PMD, by dropping the else case

 arch/arm/include/asm/stage2_pgtable.h |  2 ++
 virt/kvm/arm/mmu.c| 59 +--
 2 files changed, 45 insertions(+), 16 deletions(-)


diff --git a/arch/arm/include/asm/stage2_pgtable.h 
b/arch/arm/include/asm/stage2_pgtable.h
index de20895..9e11dce 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -75,6 +75,8 @@ static inline bool kvm_stage2_has_pud(struct kvm *kvm)
 
 #define S2_PMD_MASKPMD_MASK
 #define S2_PMD_SIZEPMD_SIZE
+#define S2_PUD_MASKPUD_MASK
+#define S2_PUD_SIZEPUD_SIZE
 
 static inline bool kvm_stage2_has_pmd(struct kvm *kvm)
 {
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index fce0983..97b5417 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1060,25 +1060,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct 
kvm_mmu_memory_cache
 {
pmd_t *pmd, old_pmd;
 
+retry:
pmd = stage2_get_pmd(kvm, cache, addr);
VM_BUG_ON(!pmd);
 
old_pmd = *pmd;
+   /*
+* Multiple vcpus faulting on the same PMD entry, can
+* lead to them sequentially updating the PMD with the
+* same value. Following the break-before-make
+* (pmd_clear() followed by tlb_flush()) process can
+* hinder forward progress due to refaults generated
+* on missing translations.
+*
+* Skip updating the page table if the entry is
+* unchanged.
+*/
+   if (pmd_val(old_pmd) == pmd_val(*new_pmd))
+   return 0;
+
if (pmd_present(old_pmd)) {
/*
-* Multiple vcpus faulting on the same PMD entry, can
-* lead to them sequentially updating the PMD with the
-* same value. Following the break-before-make
-* (pmd_clear() followed by tlb_flush()) process can
-* hinder forward progress due to refaults generated
-* on missing translations.
+* If we already have PTE level mapping for this block,
+* we must unmap it to avoid inconsistent TLB state and
+* leaking the table page. We could end up in this situation
+* if the memory slot was marked for dirty logging and was
+* reverted, leaving PTE level mappings for the pages accessed
+* during the period. So, unmap the PTE level mapping for this
+* block and retry, as we could have released the upper level
+* table in the process.
 *
-* Skip updating the page table if the entry is
-* unchanged.
+* Normal THP split/merge follows mmu_notifier callbacks and do
+* get handled accordingly.
 */
-   if (pmd_val(old_pmd) == pmd_val(*new_pmd))
-   return 0;
-
+   if (!pmd_thp_or_huge(old_pmd)) {
+   unmap_stage2_range(kvm, addr & S2_PMD_MASK, 
S2_PMD_SIZE);
+   goto retry;
+   }
/*
 * Mapping in huge pages should only happen through a
 * fault.  If a page is merged into a transparent huge
@@ -1090,8 +1108,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct 
kvm_mmu_memory_cache
 * should become splitting first, unmapped, merged,
 * and mapped back in on-demand.
 */
-   VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
-
+   WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
pmd_clear(pmd);
kvm_tlb_flush_vmid_ipa(kvm, addr);
} else {
@@ -1107,6 +1124,7 @@ 

Re: [RFC] Question about enable doorbell irq and halt_poll process

2019-03-20 Thread Heyi Guo

+cc Marc and Christoffer...


On 2019/3/19 21:25, Tangnianyao (ICT) wrote:

Hi, all

Using gicv4, when guest is waiting for irq, it sends wfi and traps to kvm.
When vlpi is forwarded to PE after its_vpe_deschedule, before halt_poll in
kvm_vcpu_block, halt_poll may increase latency for this vlpi getting to guest.
In halt_poll process, it checks if there's pending irq for vcpu using 
pending_last.
However, doorbell is not enable at this moment and vlpi or doorbell can not set
pending_last true, to stop halt_poll. It will run until halt_poll time ends, if
there's no other physical irq coming in the meantime. And then vcpu is 
scheduled out.
This pending vlpi has to wait for vcpu getting schedule in next time.

Should we enable doorbell before halt_poll process ?


.




___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 9/10] KVM: arm64: docs: document KVM support of pointer authentication

2019-03-20 Thread Julien Thierry
Hi Amit,

On 19/03/2019 08:30, Amit Daniel Kachhap wrote:
> This adds sections for KVM API extension for pointer authentication.
> A brief description about usage of pointer authentication for KVM guests
> is added in the arm64 documentations.
> 
> Signed-off-by: Amit Daniel Kachhap 
> Cc: Mark Rutland 
> Cc: Christoffer Dall 
> Cc: Marc Zyngier 
> Cc: kvmarm@lists.cs.columbia.edu
> ---
>  Documentation/arm64/pointer-authentication.txt | 15 +++
>  Documentation/virtual/kvm/api.txt  |  6 ++
>  2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/arm64/pointer-authentication.txt 
> b/Documentation/arm64/pointer-authentication.txt
> index 5baca42..4b769e6 100644
> --- a/Documentation/arm64/pointer-authentication.txt
> +++ b/Documentation/arm64/pointer-authentication.txt
> @@ -87,7 +87,14 @@ used to get and set the keys for a thread.
>  Virtualization
>  --
>  
> -Pointer authentication is not currently supported in KVM guests. KVM
> -will mask the feature bits from ID_AA64ISAR1_EL1, and attempted use of
> -the feature will result in an UNDEFINED exception being injected into
> -the guest.
> +Pointer authentication is enabled in KVM guest when each virtual cpu is
> +initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and
> +requesting this feature to be enabled. Without this flag, pointer

"Without these flags"*

> +authentication is not enabled in KVM guests and attempted use of the
> +feature will result in an UNDEFINED exception being injected into the
> +guest.
> +
> +Additionally, when these vcpu feature flags are not set then KVM will
> +filter out the Pointer Authentication system key registers from
> +KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID
> +register.
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 7de9eee..b5c66bc 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2659,6 +2659,12 @@ Possible features:
> Depends on KVM_CAP_ARM_PSCI_0_2.
>   - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
> Depends on KVM_CAP_ARM_PMU_V3.
> + - KVM_ARM_VCPU_PTRAUTH_ADDRESS:
> + - KVM_ARM_VCPU_PTRAUTH_GENERIC:
> +   Enables Pointer authentication for the CPU.
> +   Depends on KVM_CAP_ARM_PTRAUTH and only on arm64 architecture. If
> +   set, then the KVM guest allows the execution of pointer authentication
> +   instructions. Otherwise, KVM treats these instructions as undefined.
>  

Overall I feel one could easily get confused to whether
PTRAUTH_ADDRESS/GENERIC are two individual features, whether one is a
superset of the other, if the names are just an alias of one another, etc...

I think the doc should at least stress out that *both* flags are
required to enable ptrauth in a guest. However it raises the question,
if we don't plan to support the features individually (because we
can't), should we really expose two feature flags? I seems odd to
introduce two flags that only do something if used together...

Cheers,

-- 
Julien Thierry
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 00/26] KVM: arm64: SVE guest support

2019-03-20 Thread Dave Martin
On Tue, Mar 19, 2019 at 05:51:51PM +, Dave Martin wrote:

[...]

> Known issues:
> 
>  * This update requires modifications to kvmtool that are not published
>yet.  I will reply to this cover letter with a link when those are
>available (hopefully within 24 hours of this posting).
> 
>The kvmtool branch referenced by the v5 cover letter **will not
>work** with v6...  Please be patient :)

I've now posted the updated kvmtool series [1].

I've also pushed the series to git for convenience [2].

Testing/comments welcome, as always.


[1] [PATCH kvmtool v2 0/3] arm64: Basic SVE guest support
https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035198.html

[2]
git://linux-arm.org/kvmtool-dm.git sve-linuxv6/v2/head
http://linux-arm.org/git?p=kvmtool-dm.git;a=shortlog;h=refs/heads/sve-linuxv6/v2/head

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 7/10] KVM: arm/arm64: context-switch ptrauth registers

2019-03-20 Thread Julien Thierry
Hi Amit,

On 19/03/2019 08:30, Amit Daniel Kachhap wrote:
> From: Mark Rutland 
> 
> When pointer authentication is supported, a guest may wish to use it.
> This patch adds the necessary KVM infrastructure for this to work, with
> a semi-lazy context switch of the pointer auth state.
> 
> Pointer authentication feature is only enabled when VHE is built
> in the kernel and present in the CPU implementation so only VHE code
> paths are modified.
> 
> When we schedule a vcpu, we disable guest usage of pointer
> authentication instructions and accesses to the keys. While these are
> disabled, we avoid context-switching the keys. When we trap the guest
> trying to use pointer authentication functionality, we change to eagerly
> context-switching the keys, and enable the feature. The next time the
> vcpu is scheduled out/in, we start again. However the host key save is
> optimized and implemented inside ptrauth instruction/register access
> trap.
> 
> Pointer authentication consists of address authentication and generic
> authentication, and CPUs in a system might have varied support for
> either. Where support for either feature is not uniform, it is hidden
> from guests via ID register emulation, as a result of the cpufeature
> framework in the host.
> 
> Unfortunately, address authentication and generic authentication cannot
> be trapped separately, as the architecture provides a single EL2 trap
> covering both. If we wish to expose one without the other, we cannot
> prevent a (badly-written) guest from intermittently using a feature
> which is not uniformly supported (when scheduled on a physical CPU which
> supports the relevant feature). Hence, this patch expects both type of
> authentication to be present in a cpu.
> 
> This switch of key is done from guest enter/exit assembly as preperation
> for the upcoming in-kernel pointer authentication support. Hence, these
> key switching routines are not implemented in C code as they may cause
> pointer authentication key signing error in some situations.
> 
> Signed-off-by: Mark Rutland 
> [Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
> , save host key in ptrauth exception trap]
> Signed-off-by: Amit Daniel Kachhap 
> Reviewed-by: Julien Thierry 
> Cc: Marc Zyngier 
> Cc: Christoffer Dall 
> Cc: kvmarm@lists.cs.columbia.edu
> ---
>  arch/arm64/include/asm/kvm_host.h|  17 ++
>  arch/arm64/include/asm/kvm_ptrauth_asm.h | 100 
> +++
>  arch/arm64/kernel/asm-offsets.c  |   6 ++
>  arch/arm64/kvm/guest.c   |  14 +
>  arch/arm64/kvm/handle_exit.c |  24 +---
>  arch/arm64/kvm/hyp/entry.S   |   7 +++
>  arch/arm64/kvm/reset.c   |   7 +++
>  arch/arm64/kvm/sys_regs.c|  46 +-
>  virt/kvm/arm/arm.c   |   2 +
>  9 files changed, 212 insertions(+), 11 deletions(-)
>  create mode 100644 arch/arm64/include/asm/kvm_ptrauth_asm.h
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 9dd2918..61239a6 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -152,6 +152,18 @@ enum vcpu_sysreg {
>   PMSWINC_EL0,/* Software Increment Register */
>   PMUSERENR_EL0,  /* User Enable Register */
>  
> + /* Pointer Authentication Registers */
> + APIAKEYLO_EL1,
> + APIAKEYHI_EL1,
> + APIBKEYLO_EL1,
> + APIBKEYHI_EL1,
> + APDAKEYLO_EL1,
> + APDAKEYHI_EL1,
> + APDBKEYLO_EL1,
> + APDBKEYHI_EL1,
> + APGAKEYLO_EL1,
> + APGAKEYHI_EL1,
> +
>   /* 32bit specific registers. Keep them at the end of the range */
>   DACR32_EL2, /* Domain Access Control Register */
>   IFSR32_EL2, /* Instruction Fault Status Register */
> @@ -497,6 +509,11 @@ static inline bool kvm_arch_requires_vhe(void)
>   test_bit(KVM_ARM_VCPU_PTRAUTH_ADDRESS, vcpu->arch.features) && \
>   test_bit(KVM_ARM_VCPU_PTRAUTH_GENERIC, vcpu->arch.features))
>  
> +void kvm_arm_vcpu_ptrauth_enable(struct kvm_vcpu *vcpu);
> +void kvm_arm_vcpu_ptrauth_disable(struct kvm_vcpu *vcpu);
> +void kvm_arm_vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu);
> +void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu);
> +
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> diff --git a/arch/arm64/include/asm/kvm_ptrauth_asm.h 
> b/arch/arm64/include/asm/kvm_ptrauth_asm.h
> new file mode 100644
> index 000..97bb040
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_ptrauth_asm.h
> @@ -0,0 +1,100 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + * arch/arm64/include/asm/kvm_ptrauth_asm.h: Guest/host ptrauth save/restore
> + * Copyright 2019 Arm Limited
> + * Author: Mark Rutland 
> + * Amit Daniel Kachhap 
> + */
> +
> +#ifndef __ASM_KVM_ASM_PTRAUTH_H
> +#def

[PATCH kvmtool v2 3/3] arm64: Add command-line option to set the guest's SVE vector lengths

2019-03-20 Thread Dave Martin
In order to support use cases such as migration, it may be
important in some situations to restrict the set of SVE vector
lengths available to the guest.  It can also be usefule to observe
the behaviour of guest OSes with different vector lengths.

To enable testing and experimentation for such configurations, this
patch adds a command-line option to allow setting of the set of
vector lengths to be made available to the guest.

For now, the setting is global: no means is offered to configure
individual guest vcpus independently of each other.

Signed-off-by: Dave Martin 
---

It wasn't clear to me what the most user-friendly form for the
 --sve-vls parameter would be.  Possibilities include:

--sve-max-vl=# all host vector lengths not exceeding 
--sve-vls=16,32,64  # i.e., byte ("vl") units, not 128-bit
--sve-vls=1-2,4 # i.e., similar syntax to taskset -c etc.
--sve-vls=16-32,64  # combination of the above
--sve-vqs=1,2,4 # for more consistent terminology

In the end, to avoid premature overengineering, I went with the
easiest choice.  (--sve-vls=1,2,4)

There's also the question of how to specify different features and
configurations per-vcpu.  I don't currently attempt this, but it would
be useful for KVM testing and development.  I think that any solution
to this shouldn't be SVE-specific: we should aim for a common approach
that can be reused for other features too.

---
 arm/aarch64/include/kvm/kvm-config-arch.h |  8 +++-
 arm/aarch64/kvm-cpu.c | 80 ++-
 arm/include/arm-common/kvm-config-arch.h  |  1 +
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h 
b/arm/aarch64/include/kvm/kvm-config-arch.h
index 50b7aae..1eb8cfc 100644
--- a/arm/aarch64/include/kvm/kvm-config-arch.h
+++ b/arm/aarch64/include/kvm/kvm-config-arch.h
@@ -1,6 +1,8 @@
 #ifndef KVM__KVM_CONFIG_ARCH_H
 #define KVM__KVM_CONFIG_ARCH_H
 
+int sve_vls_parser(const struct option *opt, const char *arg, int unset);
+
 #define ARM_OPT_ARCH_RUN(cfg)  \
OPT_BOOLEAN('\0', "aarch32", &(cfg)->aarch32_guest, \
"Run AArch32 guest"),   \
@@ -10,7 +12,11 @@
"Specify random seed for Kernel Address Space " \
"Layout Randomization (KASLR)"),\
OPT_BOOLEAN('\0', "sve", &(cfg)->has_sve,   \
-   "Enable SVE for the guest"),
+   "Enable SVE for the guest"),\
+   OPT_CALLBACK('\0', "sve-vls", &(cfg)->sve_vqs,  \
+"comma-separated list of vector lengths, in 128-bit 
units", \
+"Set of vector lengths to enable for the guest",   \
+sve_vls_parser, NULL),
 
 #include "arm-common/kvm-config-arch.h"
 
diff --git a/arm/aarch64/kvm-cpu.c b/arm/aarch64/kvm-cpu.c
index 43eb69e..d917c8b 100644
--- a/arm/aarch64/kvm-cpu.c
+++ b/arm/aarch64/kvm-cpu.c
@@ -1,8 +1,13 @@
+#include 
+#include 
+#include 
+
 #include "kvm/kvm-cpu.h"
 #include "kvm/kvm.h"
 #include "kvm/virtio.h"
 
 #include 
+#include 
 
 #define COMPAT_PSR_F_BIT   0x0040
 #define COMPAT_PSR_I_BIT   0x0080
@@ -12,6 +17,65 @@
 #define SCTLR_EL1_E0E_MASK (1 << 24)
 #define SCTLR_EL1_EE_MASK  (1 << 25)
 
+/*
+ * Work around old kernel headers that lack these definitions in
+ * :
+ */
+#ifndef SVE_VQ_MIN
+#define SVE_VQ_MIN 1
+#endif
+
+#ifndef SVE_VQ_MAX
+#define SVE_VQ_MAX 512
+#endif
+
+int sve_vls_parser(const struct option *opt, const char *arg, int unset)
+{
+   size_t offset = 0;
+   int vq, n, t;
+   u64 (*vqs)[(SVE_VQ_MAX + 1 - SVE_VQ_MIN + 63) / 64];
+   u64 **cfg_vqs = opt->value;
+
+   if (*cfg_vqs) {
+   pr_err("sve-vls: SVE vector lengths set may only be specified 
once");
+   return -1;
+   }
+
+   vqs = calloc(1, sizeof *vqs);
+   if (!vqs)
+   die("%s", strerror(errno));
+
+   offset = 0;
+   while (arg[offset]) {
+   n = -1;
+
+   t = sscanf(arg + offset,
+  offset == 0 ? "%i%n" : ",%i%n",
+  &vq, &n);
+   if (t == EOF || t < 1 || n <= 0) {
+   pr_err("sve-vls: Comma-separated list of vector lengths 
required");
+   goto error;
+   }
+
+   if (vq < SVE_VQ_MIN || vq > SVE_VQ_MAX) {
+   pr_err("sve-vls: Invalid vector length %d", vq);
+   goto error;
+   }
+
+   vq -= SVE_VQ_MIN;
+   (*vqs)[vq / 64] |= (u64)1 << (vq % 64);
+
+   offset += n;
+   }
+
+   *cfg_vqs = *vqs;
+   return 0;
+
+error:
+   free(vqs);
+   return -1;
+}
+
 static __u64 __co

[PATCH kvmtool v2 2/3] arm64: Add basic SVE support

2019-03-20 Thread Dave Martin
This patch adds an --sve command line option to allow the Scalable
Vector Extension to be enabled when creating a guest.

This requires use of the new KVM_ARM_VCPU_FINALIZE ioctl before the
vcpu is runnable, so a new hook kvm_cpu__configure_features() is
added to provide an appropiate place to do this work.

The kernel does not enable SVE by default, and for now kvmtool
adopts the same policy: without --sve, SVE is not enabled for the
guest even if the host supports it.

Signed-off-by: Dave Martin 
---
 arm/aarch32/include/kvm/kvm-cpu-arch.h|  5 +
 arm/aarch64/include/kvm/kvm-config-arch.h |  4 +++-
 arm/aarch64/include/kvm/kvm-cpu-arch.h|  7 ++-
 arm/aarch64/kvm-cpu.c | 19 +++
 arm/include/arm-common/kvm-config-arch.h  |  1 +
 arm/kvm-cpu.c |  3 +++
 6 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arm/aarch32/include/kvm/kvm-cpu-arch.h 
b/arm/aarch32/include/kvm/kvm-cpu-arch.h
index d28ea67..04740d4 100644
--- a/arm/aarch32/include/kvm/kvm-cpu-arch.h
+++ b/arm/aarch32/include/kvm/kvm-cpu-arch.h
@@ -13,4 +13,9 @@
 #define ARM_CPU_ID 0, 0, 0
 #define ARM_CPU_ID_MPIDR   5
 
+static inline int kvm_cpu__configure_features(struct kvm_cpu *vcpu)
+{
+   return 0;
+}
+
 #endif /* KVM__KVM_CPU_ARCH_H */
diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h 
b/arm/aarch64/include/kvm/kvm-config-arch.h
index 04be43d..50b7aae 100644
--- a/arm/aarch64/include/kvm/kvm-config-arch.h
+++ b/arm/aarch64/include/kvm/kvm-config-arch.h
@@ -8,7 +8,9 @@
"Create PMUv3 device"), \
OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed, \
"Specify random seed for Kernel Address Space " \
-   "Layout Randomization (KASLR)"),
+   "Layout Randomization (KASLR)"),\
+   OPT_BOOLEAN('\0', "sve", &(cfg)->has_sve,   \
+   "Enable SVE for the guest"),
 
 #include "arm-common/kvm-config-arch.h"
 
diff --git a/arm/aarch64/include/kvm/kvm-cpu-arch.h 
b/arm/aarch64/include/kvm/kvm-cpu-arch.h
index a9d8563..7f2bfb9 100644
--- a/arm/aarch64/include/kvm/kvm-cpu-arch.h
+++ b/arm/aarch64/include/kvm/kvm-cpu-arch.h
@@ -8,13 +8,18 @@
 #define ARM_VCPU_FEATURE_FLAGS(kvm, cpuid) {   
\
[0] = ((!!(cpuid) << KVM_ARM_VCPU_POWER_OFF) |  
\
   (!!(kvm)->cfg.arch.aarch32_guest << KVM_ARM_VCPU_EL1_32BIT) |
\
-  (!!(kvm)->cfg.arch.has_pmuv3 << KVM_ARM_VCPU_PMU_V3))
\
+  (!!(kvm)->cfg.arch.has_pmuv3 << KVM_ARM_VCPU_PMU_V3) |   
\
+  (!!(kvm)->cfg.arch.has_sve << KVM_ARM_VCPU_SVE)) 
\
 }
 
+
+
 #define ARM_MPIDR_HWID_BITMASK 0xFF00FFUL
 #define ARM_CPU_ID 3, 0, 0, 0
 #define ARM_CPU_ID_MPIDR   5
 #define ARM_CPU_CTRL   3, 0, 1, 0
 #define ARM_CPU_CTRL_SCTLR_EL1 0
 
+int kvm_cpu__configure_features(struct kvm_cpu *vcpu);
+
 #endif /* KVM__KVM_CPU_ARCH_H */
diff --git a/arm/aarch64/kvm-cpu.c b/arm/aarch64/kvm-cpu.c
index 0aaefaf..43eb69e 100644
--- a/arm/aarch64/kvm-cpu.c
+++ b/arm/aarch64/kvm-cpu.c
@@ -128,6 +128,25 @@ static void reset_vcpu_aarch64(struct kvm_cpu *vcpu)
}
 }
 
+static int configure_sve(struct kvm_cpu *vcpu)
+{
+   int feature = KVM_ARM_VCPU_SVE;
+
+   if (ioctl(vcpu->vcpu_fd, KVM_ARM_VCPU_FINALIZE, &feature))
+   return -1;
+
+   return 0;
+}
+
+int kvm_cpu__configure_features(struct kvm_cpu *vcpu)
+{
+   if (vcpu->kvm->cfg.arch.has_sve)
+   if (configure_sve(vcpu))
+   return -1;
+
+   return 0;
+}
+
 void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
 {
if (vcpu->kvm->cfg.arch.aarch32_guest)
diff --git a/arm/include/arm-common/kvm-config-arch.h 
b/arm/include/arm-common/kvm-config-arch.h
index 5734c46..bdab680 100644
--- a/arm/include/arm-common/kvm-config-arch.h
+++ b/arm/include/arm-common/kvm-config-arch.h
@@ -12,6 +12,7 @@ struct kvm_config_arch {
u64 kaslr_seed;
enum irqchip_type irqchip;
u64 fw_addr;
+   boolhas_sve;
 };
 
 int irqchip_parser(const struct option *opt, const char *arg, int unset);
diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
index 7780251..91d5241 100644
--- a/arm/kvm-cpu.c
+++ b/arm/kvm-cpu.c
@@ -122,6 +122,9 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
unsigned long cpu_id)
vcpu->cpu_compatible= target->compatible;
vcpu->is_running= true;
 
+   if (kvm_cpu__configure_features(vcpu))
+   die("Unable to configure requested vcpu features");
+
return vcpu;
 }
 
-- 
2.1.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v2 1/3] arm/arm64: [HACK] Sync kernel headers from sve-guest v6

2019-03-20 Thread Dave Martin
This patch imports the necessary UAPI header definitions for
building the SVE extensions for kvmtool.

This commit should be replaced by a header import from upstream
Linux once the SVE kvm patches have been merged.

Signed-off-by: Dave Martin 
---
 arm/aarch64/include/asm/kvm.h | 22 ++
 include/linux/kvm.h   |  5 +
 2 files changed, 27 insertions(+)

diff --git a/arm/aarch64/include/asm/kvm.h b/arm/aarch64/include/asm/kvm.h
index 97c3478..6963b7e 100644
--- a/arm/aarch64/include/asm/kvm.h
+++ b/arm/aarch64/include/asm/kvm.h
@@ -102,6 +102,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
 #define KVM_ARM_VCPU_PSCI_0_2  2 /* CPU uses PSCI v0.2 */
 #define KVM_ARM_VCPU_PMU_V33 /* Support guest PMUv3 */
+#define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
 
 struct kvm_vcpu_init {
__u32 target;
@@ -226,6 +227,27 @@ struct kvm_vcpu_events {
 KVM_REG_ARM_FW | ((r) & 0x))
 #define KVM_REG_ARM_PSCI_VERSION   KVM_REG_ARM_FW_REG(0)
 
+/* SVE registers */
+#define KVM_REG_ARM64_SVE  (0x15 << KVM_REG_ARM_COPROC_SHIFT)
+
+/* Z- and P-regs occupy blocks at the following offsets within this range: */
+#define KVM_REG_ARM64_SVE_ZREG_BASE0
+#define KVM_REG_ARM64_SVE_PREG_BASE0x400
+
+#define KVM_REG_ARM64_SVE_ZREG(n, i)   (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
+KVM_REG_ARM64_SVE_ZREG_BASE |  \
+KVM_REG_SIZE_U2048 |   \
+((n) << 5) | (i))
+#define KVM_REG_ARM64_SVE_PREG(n, i)   (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
+KVM_REG_ARM64_SVE_PREG_BASE |  \
+KVM_REG_SIZE_U256 |\
+((n) << 5) | (i))
+#define KVM_REG_ARM64_SVE_FFR(i)   KVM_REG_ARM64_SVE_PREG(16, i)
+
+/* Vector lengths pseudo-register: */
+#define KVM_REG_ARM64_SVE_VLS  (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
+KVM_REG_SIZE_U512 | 0x)
+
 /* Device Control API: ARM VGIC */
 #define KVM_DEV_ARM_VGIC_GRP_ADDR  0
 #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS 1
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 6d4ea4b..1d56444 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -988,6 +988,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_VM_IPA_SIZE 165
 #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
 #define KVM_CAP_HYPERV_CPUID 167
+#define KVM_CAP_ARM_SVE 168
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1145,6 +1146,7 @@ struct kvm_dirty_tlb {
 #define KVM_REG_SIZE_U256  0x0050ULL
 #define KVM_REG_SIZE_U512  0x0060ULL
 #define KVM_REG_SIZE_U1024 0x0070ULL
+#define KVM_REG_SIZE_U2048 0x0080ULL
 
 struct kvm_reg_list {
__u64 n; /* number of regs */
@@ -1440,6 +1442,9 @@ struct kvm_enc_region {
 /* Available with KVM_CAP_HYPERV_CPUID */
 #define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc1, struct kvm_cpuid2)
 
+/* Available with KVM_CAP_ARM_SVE */
+#define KVM_ARM_VCPU_FINALIZE_IOW(KVMIO,  0xc2, int)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
/* Guest initialization commands */
-- 
2.1.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvmtool v2 0/3] arm64: Basic SVE guest support

2019-03-20 Thread Dave Martin
This series, based on kvmtool master [1], contains development hacks
to accompany the recently posted v6 SVE guest support series for the
Linux kernel [2].

This series supersedes the previous posting [3], and is a clean(er)
rewrite of the minimal support required for SVE, without the debug/
testing cruft.

Due to an ABI change, this series does not work with the older (v5)
KVM series.

Testing and review welcome.

In particular, if people have a view on the --sve-vls parameter, I'd
be interested to discuss what that should look like.  See patch 3.

[1] 
git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git master
https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/log/

Specifically, commit c57e001a3efb ("arm: Auto-detect guest GIC type").

[2] [PATCH v5 00/26] KVM: arm64: SVE guest support
https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/thread.html

(Note, the subject line for this posting was incorrect.  It should have
read: [PATCH v6 00/27] KVM: arm64: SVE guest support)

git://linux-arm.org/linux-dm.git sve-kvm/v6/head
http://linux-arm.org/git?p=linux-dm.git;a=shortlog;h=refs/heads/sve-kvm/v6/head


Dave Martin (3):
  arm/arm64: [HACK] Sync kernel headers from sve-guest v6
  arm64: Add basic SVE support
  arm64: Add command-line option to set the guest's SVE vector lengths

 arm/aarch32/include/kvm/kvm-cpu-arch.h|  5 ++
 arm/aarch64/include/asm/kvm.h | 22 +++
 arm/aarch64/include/kvm/kvm-config-arch.h | 10 +++-
 arm/aarch64/include/kvm/kvm-cpu-arch.h|  7 ++-
 arm/aarch64/kvm-cpu.c | 97 +++
 arm/include/arm-common/kvm-config-arch.h  |  2 +
 arm/kvm-cpu.c |  3 +
 include/linux/kvm.h   |  5 ++
 8 files changed, 149 insertions(+), 2 deletions(-)

-- 
2.1.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Suzuki K Poulose

Marc,

On 20/03/2019 10:35, Marc Zyngier wrote:

On Wed, 20 Mar 2019 10:23:39 +
Suzuki K Poulose  wrote:

Hi Suzuki,


Marc,

On 20/03/2019 10:11, Marc Zyngier wrote:

On Wed, 20 Mar 2019 09:44:38 +
Suzuki K Poulose  wrote:
   

Hi Marc,

On 20/03/2019 08:15, Marc Zyngier wrote:

Hi Suzuki,

On Tue, 19 Mar 2019 14:11:08 +,
Suzuki K Poulose  wrote:


...


+   if (!pmd_thp_or_huge(old_pmd)) {
+   unmap_stage2_range(kvm, addr & S2_PMD_MASK, 
S2_PMD_SIZE);
+   goto retry;
  



+   if (!stage2_pud_huge(kvm, old_pud)) {
+   unmap_stage2_range(kvm, addr & S2_PUD_MASK, 
S2_PUD_SIZE);
  



We should really get rid of the S2_P{U/M}D_* definitions, as they are
always the same as the host. The only thing that changes is the PGD size
which varies according to the IPA and the concatenation.
  


Also what do you think about using  P{M,U}D_* instead of S2_P{M,U}D_*
above ? I could make that change with the respin.


Given that this is a fix, I'd like it to be as small as obvious as
possible, making it easier to backport.

I'm happy to take another patch for 5.2 that will drop the whole S2_P*
if we still think that this should be the case (though what I'd really
like is to have architectural levels instead of these arbitrary
definitions).


I only meant the two new instances added above in the patch. Of course, I
could send something to fix the existing ones.

Cheers
Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Marc Zyngier
On Wed, 20 Mar 2019 10:23:39 +
Suzuki K Poulose  wrote:

Hi Suzuki,

> Marc,
> 
> On 20/03/2019 10:11, Marc Zyngier wrote:
> > On Wed, 20 Mar 2019 09:44:38 +
> > Suzuki K Poulose  wrote:
> >   
> >> Hi Marc,
> >>
> >> On 20/03/2019 08:15, Marc Zyngier wrote:  
> >>> Hi Suzuki,
> >>>
> >>> On Tue, 19 Mar 2019 14:11:08 +,
> >>> Suzuki K Poulose  wrote:  
> 
>  We rely on the mmu_notifier call backs to handle the split/merge
>  of huge pages and thus we are guaranteed that, while creating a
>  block mapping, either the entire block is unmapped at stage2 or it
>  is missing permission.
> 
>  However, we miss a case where the block mapping is split for dirty
>  logging case and then could later be made block mapping, if we cancel the
>  dirty logging. This not only creates inconsistent TLB entries for
>  the pages in the the block, but also leakes the table pages for
>  PMD level.
> 
>  Handle this corner case for the huge mappings at stage2 by
>  unmapping the non-huge mapping for the block. This could potentially
>  release the upper level table. So we need to restart the table walk
>  once we unmap the range.
> 
>  Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages")
>  Reported-by: Zheng Xiang 
>  Cc: Zheng Xiang 
>  Cc: Zhengui Yu 
>  Cc: Marc Zyngier 
>  Cc: Christoffer Dall 
>  Signed-off-by: Suzuki K Poulose ...  
> 
> 
>  +if (!pmd_thp_or_huge(old_pmd)) {
>  +unmap_stage2_range(kvm, addr & S2_PMD_MASK, 
>  S2_PMD_SIZE);
>  +goto retry;  
> >>>  
> 
>  +if (!stage2_pud_huge(kvm, old_pud)) {
>  +unmap_stage2_range(kvm, addr & S2_PUD_MASK, 
>  S2_PUD_SIZE);  
> >>>  
> 
> >> We should really get rid of the S2_P{U/M}D_* definitions, as they are
> >> always the same as the host. The only thing that changes is the PGD size
> >> which varies according to the IPA and the concatenation.
> >>  
> 
> Also what do you think about using  P{M,U}D_* instead of S2_P{M,U}D_*
> above ? I could make that change with the respin.

Given that this is a fix, I'd like it to be as small as obvious as
possible, making it easier to backport.

I'm happy to take another patch for 5.2 that will drop the whole S2_P*
if we still think that this should be the case (though what I'd really
like is to have architectural levels instead of these arbitrary
definitions).

Thanks,

M.
-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Suzuki K Poulose

Marc,

On 20/03/2019 10:11, Marc Zyngier wrote:

On Wed, 20 Mar 2019 09:44:38 +
Suzuki K Poulose  wrote:


Hi Marc,

On 20/03/2019 08:15, Marc Zyngier wrote:

Hi Suzuki,

On Tue, 19 Mar 2019 14:11:08 +,
Suzuki K Poulose  wrote:


We rely on the mmu_notifier call backs to handle the split/merge
of huge pages and thus we are guaranteed that, while creating a
block mapping, either the entire block is unmapped at stage2 or it
is missing permission.

However, we miss a case where the block mapping is split for dirty
logging case and then could later be made block mapping, if we cancel the
dirty logging. This not only creates inconsistent TLB entries for
the pages in the the block, but also leakes the table pages for
PMD level.

Handle this corner case for the huge mappings at stage2 by
unmapping the non-huge mapping for the block. This could potentially
release the upper level table. So we need to restart the table walk
once we unmap the range.

Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages")
Reported-by: Zheng Xiang 
Cc: Zheng Xiang 
Cc: Zhengui Yu 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose ...




+   if (!pmd_thp_or_huge(old_pmd)) {
+   unmap_stage2_range(kvm, addr & S2_PMD_MASK, 
S2_PMD_SIZE);
+   goto retry;





+   if (!stage2_pud_huge(kvm, old_pud)) {
+   unmap_stage2_range(kvm, addr & S2_PUD_MASK, 
S2_PUD_SIZE);





We should really get rid of the S2_P{U/M}D_* definitions, as they are
always the same as the host. The only thing that changes is the PGD size
which varies according to the IPA and the concatenation.



Also what do you think about using  P{M,U}D_* instead of S2_P{M,U}D_*
above ? I could make that change with the respin.



Sure, feel free to send a fixed version. I'll drop the currently queued
patch.




Thanks. Sorry for the trouble.

Cheers
Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Marc Zyngier
On Wed, 20 Mar 2019 09:44:38 +
Suzuki K Poulose  wrote:

> Hi Marc,
> 
> On 20/03/2019 08:15, Marc Zyngier wrote:
> > Hi Suzuki,
> > 
> > On Tue, 19 Mar 2019 14:11:08 +,
> > Suzuki K Poulose  wrote:  
> >>
> >> We rely on the mmu_notifier call backs to handle the split/merge
> >> of huge pages and thus we are guaranteed that, while creating a
> >> block mapping, either the entire block is unmapped at stage2 or it
> >> is missing permission.
> >>
> >> However, we miss a case where the block mapping is split for dirty
> >> logging case and then could later be made block mapping, if we cancel the
> >> dirty logging. This not only creates inconsistent TLB entries for
> >> the pages in the the block, but also leakes the table pages for
> >> PMD level.
> >>
> >> Handle this corner case for the huge mappings at stage2 by
> >> unmapping the non-huge mapping for the block. This could potentially
> >> release the upper level table. So we need to restart the table walk
> >> once we unmap the range.
> >>
> >> Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages")
> >> Reported-by: Zheng Xiang 
> >> Cc: Zheng Xiang 
> >> Cc: Zhengui Yu 
> >> Cc: Marc Zyngier 
> >> Cc: Christoffer Dall 
> >> Signed-off-by: Suzuki K Poulose ...  
> 
> >> +retry:
> >>pmd = stage2_get_pmd(kvm, cache, addr);
> >>VM_BUG_ON(!pmd);
> >>   ...  
> 
> >>if (pmd_present(old_pmd)) {
> >>/*
> >> -   * Multiple vcpus faulting on the same PMD entry, can
> >> -   * lead to them sequentially updating the PMD with the
> >> -   * same value. Following the break-before-make
> >> -   * (pmd_clear() followed by tlb_flush()) process can
> >> -   * hinder forward progress due to refaults generated
> >> -   * on missing translations.
> >> +   * If we already have PTE level mapping for this block,
> >> +   * we must unmap it to avoid inconsistent TLB state and
> >> +   * leaking the table page. We could end up in this situation
> >> +   * if the memory slot was marked for dirty logging and was
> >> +   * reverted, leaving PTE level mappings for the pages accessed
> >> +   * during the period. So, unmap the PTE level mapping for this
> >> +   * block and retry, as we could have released the upper level
> >> +   * table in the process.
> >> *
> >> -   * Skip updating the page table if the entry is
> >> -   * unchanged.
> >> +   * Normal THP split/merge follows mmu_notifier callbacks and do
> >> +   * get handled accordingly.
> >> */
> >> -  if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> >> -  return 0;
> >> -
> >> +  if (!pmd_thp_or_huge(old_pmd)) {
> >> +  unmap_stage2_range(kvm, addr & S2_PMD_MASK, 
> >> S2_PMD_SIZE);
> >> +  goto retry;  
> > 
> > This looks slightly dodgy. Doing this retry results in another call to
> > stage2_get_pmd(), which may or may not result in allocating a PUD. I
> > think this is safe as if we managed to get here, it means the whole
> > hierarchy was already present and nothing was allocated in the first
> > round.
> > 
> > Somehow, I would feel more comfortable with just not even trying.
> > Unmap, don't fix the fault, let the vcpu come again for additional
> > punishment. But this is probably more invasive, as none of the
> > stage2_set_p*() return value is ever evaluated. Oh well.
> >   
> 
> Yes. The other option was to unmap_stage2_ptes() and get the page refcount
> on the new pmd. But that kind of makes it a bit difficult to follow the
> code.
> 
> >>if (stage2_pud_present(kvm, old_pud)) {
> >> -  stage2_pud_clear(kvm, pudp);
> >> -  kvm_tlb_flush_vmid_ipa(kvm, addr);
> >> +  /*
> >> +   * If we already have table level mapping for this block, unmap
> >> +   * the range for this block and retry.
> >> +   */
> >> +  if (!stage2_pud_huge(kvm, old_pud)) {
> >> +  unmap_stage2_range(kvm, addr & S2_PUD_MASK, 
> >> S2_PUD_SIZE);  
> > 
> > This broke 32bit. I've added the following hunk to fix it:  
> 
> Grrr! Sorry about that.
> 
> > 
> > diff --git a/arch/arm/include/asm/stage2_pgtable.h 
> > b/arch/arm/include/asm/stage2_pgtable.h
> > index de2089501b8b..b8f21088a744 100644
> > --- a/arch/arm/include/asm/stage2_pgtable.h
> > +++ b/arch/arm/include/asm/stage2_pgtable.h
> > @@ -68,6 +68,9 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, 
> > phys_addr_t end)
> >   #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
> >   #define stage2_pud_table_empty(kvm, pudp) false  
> >   > +#define S2_PUD_MASKPGDIR_MASK  
> > +#define S2_PUD_SIZEPGDIR_SIZE
> > +  
> 
> We should really get rid of the S2_P{U/M}D_* definitions, as they are
> always the same as the host. The only thing that changes is the PGD size
> wh

Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Suzuki K Poulose

Hi Marc,

On 20/03/2019 08:15, Marc Zyngier wrote:

Hi Suzuki,

On Tue, 19 Mar 2019 14:11:08 +,
Suzuki K Poulose  wrote:


We rely on the mmu_notifier call backs to handle the split/merge
of huge pages and thus we are guaranteed that, while creating a
block mapping, either the entire block is unmapped at stage2 or it
is missing permission.

However, we miss a case where the block mapping is split for dirty
logging case and then could later be made block mapping, if we cancel the
dirty logging. This not only creates inconsistent TLB entries for
the pages in the the block, but also leakes the table pages for
PMD level.

Handle this corner case for the huge mappings at stage2 by
unmapping the non-huge mapping for the block. This could potentially
release the upper level table. So we need to restart the table walk
once we unmap the range.

Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages")
Reported-by: Zheng Xiang 
Cc: Zheng Xiang 
Cc: Zhengui Yu 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 


...


+retry:
pmd = stage2_get_pmd(kvm, cache, addr);
VM_BUG_ON(!pmd);
  


...


if (pmd_present(old_pmd)) {
/*
-* Multiple vcpus faulting on the same PMD entry, can
-* lead to them sequentially updating the PMD with the
-* same value. Following the break-before-make
-* (pmd_clear() followed by tlb_flush()) process can
-* hinder forward progress due to refaults generated
-* on missing translations.
+* If we already have PTE level mapping for this block,
+* we must unmap it to avoid inconsistent TLB state and
+* leaking the table page. We could end up in this situation
+* if the memory slot was marked for dirty logging and was
+* reverted, leaving PTE level mappings for the pages accessed
+* during the period. So, unmap the PTE level mapping for this
+* block and retry, as we could have released the upper level
+* table in the process.
 *
-* Skip updating the page table if the entry is
-* unchanged.
+* Normal THP split/merge follows mmu_notifier callbacks and do
+* get handled accordingly.
 */
-   if (pmd_val(old_pmd) == pmd_val(*new_pmd))
-   return 0;
-
+   if (!pmd_thp_or_huge(old_pmd)) {
+   unmap_stage2_range(kvm, addr & S2_PMD_MASK, 
S2_PMD_SIZE);
+   goto retry;


This looks slightly dodgy. Doing this retry results in another call to
stage2_get_pmd(), which may or may not result in allocating a PUD. I
think this is safe as if we managed to get here, it means the whole
hierarchy was already present and nothing was allocated in the first
round.

Somehow, I would feel more comfortable with just not even trying.
Unmap, don't fix the fault, let the vcpu come again for additional
punishment. But this is probably more invasive, as none of the
stage2_set_p*() return value is ever evaluated. Oh well.



Yes. The other option was to unmap_stage2_ptes() and get the page refcount
on the new pmd. But that kind of makes it a bit difficult to follow the
code.


if (stage2_pud_present(kvm, old_pud)) {
-   stage2_pud_clear(kvm, pudp);
-   kvm_tlb_flush_vmid_ipa(kvm, addr);
+   /*
+* If we already have table level mapping for this block, unmap
+* the range for this block and retry.
+*/
+   if (!stage2_pud_huge(kvm, old_pud)) {
+   unmap_stage2_range(kvm, addr & S2_PUD_MASK, 
S2_PUD_SIZE);


This broke 32bit. I've added the following hunk to fix it:


Grrr! Sorry about that.



diff --git a/arch/arm/include/asm/stage2_pgtable.h 
b/arch/arm/include/asm/stage2_pgtable.h
index de2089501b8b..b8f21088a744 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -68,6 +68,9 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, 
phys_addr_t end)
  #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp)
  #define stage2_pud_table_empty(kvm, pudp) false
  
+#define S2_PUD_MASKPGDIR_MASK

+#define S2_PUD_SIZEPGDIR_SIZE
+


We should really get rid of the S2_P{U/M}D_* definitions, as they are
always the same as the host. The only thing that changes is the PGD size
which varies according to the IPA and the concatenation.


  static inline bool kvm_stage2_has_pud(struct kvm *kvm)
  {
return false;


+   goto retry;
+   } else {
+   WARN_ON_ONCE(pud_pfn(old_pud) != pud_pfn(*new_pudp));
+   stage2_pud_clear(kvm, pudp);
+   kvm_tlb_flush_vmid_ipa(k

Re: [PATCH v7 3/10] KVM: arm64: Move hyp_symbol_addr to fix dependency

2019-03-20 Thread Julien Thierry
Hi Amit,

On 19/03/2019 08:30, Amit Daniel Kachhap wrote:
> Currently hyp_symbol_addr is palced in kvm_mmu.h which is mostly
> used by __hyp_this_cpu_ptr in kvm_asm.h but it cannot include
> kvm_mmu.h directly as kvm_mmu.h uses kvm_ksym_ref which is
> defined inside kvm_asm.h. Hence, hyp_symbol_addr is moved inside
> kvm_asm.h to fix this dependency on each other.
> 
> Also kvm_ksym_ref is corresponding counterpart of hyp_symbol_addr
> so should be ideally placed inside kvm_asm.h.
> 

This part is a bit confusing, it lead me to think that kvm_ksym_ref was
in kvm_mmu.h and should moved to kvm_asm.h as well. I'd suggest
rephrasing it with something along the lines:

"Also, hyp_symbol_addr corresponding counterpart, kvm_ksym_ref, is
already in kvm_asm.h, making it more sensible to move kvm_symbol_addr to
the same file."

Otherwise:

Reviewed-by: Julien Thierry 

Cheers,

Julien

> Suggested by: James Morse 
> Signed-off-by: Amit Daniel Kachhap 
> Cc: Marc Zyngier 
> Cc: Christoffer Dall 
> Cc: kvmarm@lists.cs.columbia.edu
> ---
>  arch/arm64/include/asm/kvm_asm.h | 20 
>  arch/arm64/include/asm/kvm_mmu.h | 20 
>  2 files changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index f5b79e9..57a07e8 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -80,6 +80,26 @@ extern void __vgic_v3_init_lrs(void);
>  
>  extern u32 __kvm_get_mdcr_el2(void);
>  
> +/*
> + * Obtain the PC-relative address of a kernel symbol
> + * s: symbol
> + *
> + * The goal of this macro is to return a symbol's address based on a
> + * PC-relative computation, as opposed to a loading the VA from a
> + * constant pool or something similar. This works well for HYP, as an
> + * absolute VA is guaranteed to be wrong. Only use this if trying to
> + * obtain the address of a symbol (i.e. not something you obtained by
> + * following a pointer).
> + */
> +#define hyp_symbol_addr(s)   \
> + ({  \
> + typeof(s) *addr;\
> + asm("adrp   %0, %1\n"   \
> + "add%0, %0, :lo12:%1\n" \
> + : "=r" (addr) : "S" (&s));  \
> + addr;   \
> + })
> +
>  /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */
>  #define __hyp_this_cpu_ptr(sym)  
> \
>   ({  \
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index b0742a1..3dea6af 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -118,26 +118,6 @@ static inline unsigned long __kern_hyp_va(unsigned long 
> v)
>  #define kern_hyp_va(v)   ((typeof(v))(__kern_hyp_va((unsigned long)(v
>  
>  /*
> - * Obtain the PC-relative address of a kernel symbol
> - * s: symbol
> - *
> - * The goal of this macro is to return a symbol's address based on a
> - * PC-relative computation, as opposed to a loading the VA from a
> - * constant pool or something similar. This works well for HYP, as an
> - * absolute VA is guaranteed to be wrong. Only use this if trying to
> - * obtain the address of a symbol (i.e. not something you obtained by
> - * following a pointer).
> - */
> -#define hyp_symbol_addr(s)   \
> - ({  \
> - typeof(s) *addr;\
> - asm("adrp   %0, %1\n"   \
> - "add%0, %0, :lo12:%1\n" \
> - : "=r" (addr) : "S" (&s));  \
> - addr;   \
> - })
> -
> -/*
>   * We currently support using a VM-specified IPA size. For backward
>   * compatibility, the default IPA size is fixed to 40bits.
>   */
> 

-- 
Julien Thierry
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2

2019-03-20 Thread Leo Yan
On Tue, Mar 19, 2019 at 09:33:58AM +0800, Leo Yan wrote:

[...]

> So far, it still cannot work well if I only set "sky2.disable_msi=1"
> for host kernel command line, with this config it runs with below flow
> and which cannot forward interrupt properly from host to guest:
> 
> hostguest
> 
>   INTx mode   msi enable
>   msi disable
>   Switch back to INTx mode

Just for heading up, for enabling vfio-pci with NIC device on Juno
board, I sent out two patch set for related changes.  With these
changes, the INTx mode can work well and it also can handle for up
failure case.

  kvmtool changes:
  https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035186.html

  Juno DT binding for PCIe SMMU:
  
https://archive.armlinux.org.uk/lurker/message/20190320.083105.f002c91c.en.html

@Robin, if you find issue for Juno DT binding in my patch and want to
send your patch for related fixing, please feel free let me know.  I
am happy to test it and drop my own.  Thanks!

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings

2019-03-20 Thread Marc Zyngier
Hi Suzuki,

On Tue, 19 Mar 2019 14:11:08 +,
Suzuki K Poulose  wrote:
> 
> We rely on the mmu_notifier call backs to handle the split/merge
> of huge pages and thus we are guaranteed that, while creating a
> block mapping, either the entire block is unmapped at stage2 or it
> is missing permission.
> 
> However, we miss a case where the block mapping is split for dirty
> logging case and then could later be made block mapping, if we cancel the
> dirty logging. This not only creates inconsistent TLB entries for
> the pages in the the block, but also leakes the table pages for
> PMD level.
> 
> Handle this corner case for the huge mappings at stage2 by
> unmapping the non-huge mapping for the block. This could potentially
> release the upper level table. So we need to restart the table walk
> once we unmap the range.
> 
> Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages")
> Reported-by: Zheng Xiang 
> Cc: Zheng Xiang 
> Cc: Zhengui Yu 
> Cc: Marc Zyngier 
> Cc: Christoffer Dall 
> Signed-off-by: Suzuki K Poulose 
> ---
>  virt/kvm/arm/mmu.c | 63 
> ++
>  1 file changed, 45 insertions(+), 18 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index fce0983..6ad6f19d 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1060,25 +1060,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, 
> struct kvm_mmu_memory_cache
>  {
>   pmd_t *pmd, old_pmd;
>  
> +retry:
>   pmd = stage2_get_pmd(kvm, cache, addr);
>   VM_BUG_ON(!pmd);
>  
>   old_pmd = *pmd;
> + /*
> +  * Multiple vcpus faulting on the same PMD entry, can
> +  * lead to them sequentially updating the PMD with the
> +  * same value. Following the break-before-make
> +  * (pmd_clear() followed by tlb_flush()) process can
> +  * hinder forward progress due to refaults generated
> +  * on missing translations.
> +  *
> +  * Skip updating the page table if the entry is
> +  * unchanged.
> +  */
> + if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> + return 0;
> +
>   if (pmd_present(old_pmd)) {
>   /*
> -  * Multiple vcpus faulting on the same PMD entry, can
> -  * lead to them sequentially updating the PMD with the
> -  * same value. Following the break-before-make
> -  * (pmd_clear() followed by tlb_flush()) process can
> -  * hinder forward progress due to refaults generated
> -  * on missing translations.
> +  * If we already have PTE level mapping for this block,
> +  * we must unmap it to avoid inconsistent TLB state and
> +  * leaking the table page. We could end up in this situation
> +  * if the memory slot was marked for dirty logging and was
> +  * reverted, leaving PTE level mappings for the pages accessed
> +  * during the period. So, unmap the PTE level mapping for this
> +  * block and retry, as we could have released the upper level
> +  * table in the process.
>*
> -  * Skip updating the page table if the entry is
> -  * unchanged.
> +  * Normal THP split/merge follows mmu_notifier callbacks and do
> +  * get handled accordingly.
>*/
> - if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> - return 0;
> -
> + if (!pmd_thp_or_huge(old_pmd)) {
> + unmap_stage2_range(kvm, addr & S2_PMD_MASK, 
> S2_PMD_SIZE);
> + goto retry;

This looks slightly dodgy. Doing this retry results in another call to
stage2_get_pmd(), which may or may not result in allocating a PUD. I
think this is safe as if we managed to get here, it means the whole
hierarchy was already present and nothing was allocated in the first
round.

Somehow, I would feel more comfortable with just not even trying.
Unmap, don't fix the fault, let the vcpu come again for additional
punishment. But this is probably more invasive, as none of the
stage2_set_p*() return value is ever evaluated. Oh well.

> + }
>   /*
>* Mapping in huge pages should only happen through a
>* fault.  If a page is merged into a transparent huge
> @@ -1090,8 +1108,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct 
> kvm_mmu_memory_cache
>* should become splitting first, unmapped, merged,
>* and mapped back in on-demand.
>*/
> - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
> -
> + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
>   pmd_clear(pmd);
>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>   } else {
> @@ -1107,6 +1124,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct 
> kvm_mmu_memory_cache *cac
>  {
>   pud_t *pudp, old