[PATCH V5 (RESEND) 0/4] arm64/cpufeature: Introduce ID_PFR2, ID_DFR1, ID_MMFR5 and other changes

2020-07-02 Thread Anshuman Khandual
These are remaining patches from V4 series which had some pending reviews
from Suzuki (https://patchwork.kernel.org/cover/11557333/). Also dropped
[PATCH 15/17] as that will need some more investigation and rework.

This series applies on 5.8-rc3.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mark Rutland  
Cc: Marc Zyngier 
Cc: James Morse 
Cc: Suzuki K Poulose 
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org

Changes in V5: 
(https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=293885)

- Dropped TGRAN features along with it's macros from ID_AA64MMFR0 per Suzuki
- Replaced with FTR_HIGHER_SAFE for SpecSEI feature in ID_AA64MMFR1 per Suzuki
- Dropped patch "arm64/cpufeature: Add remaining feature bits in ID_AA64DFR0 
register"

Changes in V4: 
(https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=290085)

- Updated ftr_id_dfr0[] with a documentation for now missing [31:28] Tracfilt 
per Will
- Fixed erroneous bit width value from 28 to 4 for double lock feature per Will
- Replaced ID_SANITIZED() with ID_HIDDEN() for SYS_ID_DFR1_EL1 per Suzuki
- Fixed positions for register definitions as per new name based grouping per 
Will
- Replaced FTR_VISIBLE with FTR_HIDDEN for TLB feature in ID_AA64ISAR0 per 
Suzuki
- Replaced FTR_VISIBLE with FTR_HIDDEN for MPAM and SEL2 in ID_AA64PFR0 per 
Suzuki
- Replaced FTR_VISIBLE with FTR_HIDDEN for MPAMFRAC and RASFRAC in ID_AA64PFR1 
per Suzuki
- Dropped both MTE and BT features from ftr_id_aa64pfr1[] to be added later per 
Suzuki
- Added ID_MMFR4_EL1 into the cpuinfo_arm64 context per Will

Changes in V3: 
(https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=281211)

- Rebased on git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
(for-next/cpufeature)

Changes in V2: 
(https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=270605)

- Added Suggested-by tag from Mark Rutland for all changes he had proposed
- Added comment for SpecSEI feature on why it is HIGHER_SAFE per Suzuki
- Added a patch which makes ID_AA64DFR0_DOUBLELOCK a signed feature per Suzuki
- Added ID_DFR1 and ID_MMFR5 system register definitions per Will
- Added remaining features bits for relevant 64 bit system registers per Will
- Changed commit message on [PATCH 5/7] regarding TraceFilt feature per Suzuki
- Changed ID_PFR2.CSV3 (FTR_STRICT -> FTR_NONSTRICT) as 64 bit registers per 
Will
- Changed ID_PFR0.CSV2 (FTR_STRICT -> FTR_NONSTRICT) as 64 bit registers per 
Will 
- Changed some commit messages

Changes in V1: 
(https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=234093)


Anshuman Khandual (4):
  arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR0 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR1 register
  arm64/cpufeature: Add remaining feature bits in ID_AA64MMFR2 register
  arm64/cpufeature: Replace all open bits shift encodings with macros

 arch/arm64/include/asm/sysreg.h | 42 +
 arch/arm64/kernel/cpufeature.c  | 67 -
 2 files changed, 83 insertions(+), 26 deletions(-)

-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 7/8] arm64: microbench: Add time limit for each individual test

2020-07-02 Thread Jingyi Wang

Hi Eric,

On 7/2/2020 9:23 PM, Auger Eric wrote:

Hi Jingyi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:

Besides using separate running times parameter, we add time limit
for loop_test to make sure each test should be done in a certain
time(5 sec here).

Signed-off-by: Jingyi Wang 
---
  arm/micro-bench.c | 17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index 506d2f9..4c962b7 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -23,6 +23,7 @@
  #include 
  
  #define NTIMES (1U << 16)

+#define MAX_NS (5 * 1000 * 1000 * 1000UL)
  
  static u32 cntfrq;
  
@@ -258,22 +259,26 @@ static void loop_test(struct exit_test *test)

uint64_t start, end, total_ticks, ntimes = 0;
struct ns_time total_ns, avg_ns;
  
+	total_ticks = 0;

if (test->prep) {
if(!test->prep()) {
printf("%s test skipped\n", test->name);
return;
}
}
-   isb();
-   start = read_sysreg(cntpct_el0);
-   while (ntimes < test->times) {
+
+   while (ntimes < test->times && total_ns.ns < MAX_NS) {
+   isb();
+   start = read_sysreg(cntpct_el0);
test->exec();
+   isb();
+   end = read_sysreg(cntpct_el0);
+
ntimes++;
+   total_ticks += (end - start);
+   ticks_to_ns_time(total_ticks, &total_ns);
}

you don't need the
ticks_to_ns_time(total_ticks, &total_ns);

after the loop


Okay, I forgot to delete it here. Thanks for reviewing.


-   isb();
-   end = read_sysreg(cntpct_el0);
  
-	total_ticks = end - start;

ticks_to_ns_time(total_ticks, &total_ns);
avg_ns.ns = total_ns.ns / ntimes;
avg_ns.ns_frac = total_ns.ns_frac / ntimes;



Besides
Reviewed-by: Eric Auger 

Thanks

Eric


.



Thanks,
Jingyi

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Jingyi Wang



On 7/2/2020 9:42 PM, Auger Eric wrote:

Hi Marc,

On 7/2/20 3:08 PM, Marc Zyngier wrote:

Hi Eric,

On 2020-07-02 13:57, Auger Eric wrote:

Hi Jingyi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:

If gicv4.1(sgi hardware injection) supported, we test ipi injection
via hw/sw way separately.

Signed-off-by: Jingyi Wang 
---
  arm/micro-bench.c    | 45 +++-
  lib/arm/asm/gic-v3.h |  3 +++
  lib/arm/asm/gic.h    |  1 +
  3 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index fc4d356..80d8db3 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -91,9 +91,40 @@ static void gic_prep_common(void)
  assert(irq_ready);
  }

-static void ipi_prep(void)
+static bool ipi_prep(void)

Any reason why the bool returned value is preferred over the standard
int?

  {
+    u32 val;
+
+    val = readl(vgic_dist_base + GICD_CTLR);
+    if (readl(vgic_dist_base + GICD_TYPER2) & GICD_TYPER2_nASSGIcap) {
+    val &= ~GICD_CTLR_ENABLE_G1A;
+    val &= ~GICD_CTLR_nASSGIreq;
+    writel(val, vgic_dist_base + GICD_CTLR);
+    val |= GICD_CTLR_ENABLE_G1A;
+    writel(val, vgic_dist_base + GICD_CTLR);

Why do we need this G1A dance?


Because it isn't legal to change the SGI behaviour when groups are enabled.
Yes, it is described in this bit of documentation nobody has access to.


OK thank you for the explanation. Jingyi, maybe add a comment to avoid
the question again ;-)

Thanks

Eric


Okay, I will add a comment here in the next version.

Thanks,
Jingyi



And this code needs to track RWP on disabling Group-1.

     M.



.



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 19/21] KVM: MIPS: Drop @max param from mmu_topup_memory_cache()

2020-07-02 Thread Sean Christopherson
Replace the @max param in mmu_topup_memory_cache() and instead use
ARRAY_SIZE() to terminate the loop to fill the cache.  This removes a
BUG_ON() and sets the stage for moving MIPS to the common memory cache
implementation.

No functional change intended.

Signed-off-by: Sean Christopherson 
---
 arch/mips/kvm/mmu.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 49bd160f4d85..d1f148db2449 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -25,15 +25,13 @@
 #define KVM_MMU_CACHE_MIN_PAGES 2
 #endif
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- int min, int max)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
 {
void *page;
 
-   BUG_ON(max > KVM_NR_MEM_OBJS);
if (cache->nobjs >= min)
return 0;
-   while (cache->nobjs < max) {
+   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
page = (void *)__get_free_page(GFP_KERNEL);
if (!page)
return -ENOMEM;
@@ -711,8 +709,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, 
unsigned long gpa,
goto out;
 
/* We need a minimum of cached pages ready for page table creation */
-   err = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES,
-KVM_NR_MEM_OBJS);
+   err = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
if (err)
goto out;
 
@@ -796,8 +793,7 @@ static pte_t *kvm_trap_emul_pte_for_gva(struct kvm_vcpu 
*vcpu,
int ret;
 
/* We need a minimum of cached pages ready for page table creation */
-   ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES,
-KVM_NR_MEM_OBJS);
+   ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
if (ret)
return NULL;
 
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 12/21] KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU topups

2020-07-02 Thread Sean Christopherson
Don't bother filling the gfn array cache when the caller is a fully
direct MMU, i.e. won't need a gfn array for shadow pages.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 18 ++
 arch/x86/kvm/mmu/paging_tmpl.h |  4 ++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 60b0d460bbf5..586d63de0e78 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1101,7 +1101,7 @@ static void mmu_free_memory_cache(struct 
kvm_mmu_memory_cache *mc)
}
 }
 
-static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
+static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 {
int r;
 
@@ -1114,10 +1114,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu)
   PT64_ROOT_MAX_LEVEL);
if (r)
return r;
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
-  PT64_ROOT_MAX_LEVEL);
-   if (r)
-   return r;
+   if (maybe_indirect) {
+   r = mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
+  PT64_ROOT_MAX_LEVEL);
+   if (r)
+   return r;
+   }
return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
  PT64_ROOT_MAX_LEVEL);
 }
@@ -4107,7 +4109,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 error_code,
if (fast_page_fault(vcpu, gpa, error_code))
return RET_PF_RETRY;
 
-   r = mmu_topup_memory_caches(vcpu);
+   r = mmu_topup_memory_caches(vcpu, false);
if (r)
return r;
 
@@ -5142,7 +5144,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
int r;
 
-   r = mmu_topup_memory_caches(vcpu);
+   r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
if (r)
goto out;
r = mmu_alloc_roots(vcpu);
@@ -5336,7 +5338,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, 
gpa_t gpa,
 * or not since pte prefetch is skiped if it does not have
 * enough objects in the cache.
 */
-   mmu_topup_memory_caches(vcpu);
+   mmu_topup_memory_caches(vcpu, true);
 
spin_lock(&vcpu->kvm->mmu_lock);
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 451d7aa7d959..8d2159ae3bdf 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -815,7 +815,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t 
addr, u32 error_code,
return RET_PF_EMULATE;
}
 
-   r = mmu_topup_memory_caches(vcpu);
+   r = mmu_topup_memory_caches(vcpu, true);
if (r)
return r;
 
@@ -902,7 +902,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, 
hpa_t root_hpa)
 * No need to check return value here, rmap_can_add() can
 * help us to skip pte prefetch later.
 */
-   mmu_topup_memory_caches(vcpu);
+   mmu_topup_memory_caches(vcpu, true);
 
if (!VALID_PAGE(root_hpa)) {
WARN_ON(1);
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 07/21] KVM: x86/mmu: Topup memory caches after walking GVA->GPA

2020-07-02 Thread Sean Christopherson
Topup memory caches after walking the GVA->GPA translation during a
shadow page fault, there is no need to ensure the caches are full when
walking the GVA.  As of commit f5a1e9f89504f ("KVM: MMU: remove call
to kvm_mmu_pte_write from walk_addr"), the FNAME(walk_addr) flow no
longer add rmaps via kvm_mmu_pte_write().

This avoids allocating memory in the case that the GVA is unmapped in
the guest, and also provides a paper trail of why/when the memory caches
need to be filled.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/paging_tmpl.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 58234bfaca07..451d7aa7d959 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -788,10 +788,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t 
addr, u32 error_code,
 
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 
-   r = mmu_topup_memory_caches(vcpu);
-   if (r)
-   return r;
-
/*
 * If PFEC.RSVD is set, this is a shadow page fault.
 * The bit needs to be cleared before walking guest page tables.
@@ -819,6 +815,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t 
addr, u32 error_code,
return RET_PF_EMULATE;
}
 
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
+
vcpu->arch.write_fault_to_shadow_pgtable = false;
 
is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 05/21] KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty

2020-07-02 Thread Sean Christopherson
Attempt to allocate a new object instead of crashing KVM (and likely the
kernel) if a memory cache is unexpectedly empty.  Use GFP_ATOMIC for the
allocation as the caches are used while holding mmu_lock.  The immediate
BUG_ON() makes the code unnecessarily explosive and led to confusing
minimums being used in the past, e.g. allocating 4 objects where 1 would
suffice.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 94b27adab6ba..748235877def 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,6 +1060,15 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
local_irq_enable();
 }
 
+static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
+  gfp_t gfp_flags)
+{
+   if (mc->kmem_cache)
+   return kmem_cache_zalloc(mc->kmem_cache, gfp_flags);
+   else
+   return (void *)__get_free_page(gfp_flags);
+}
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
 {
void *obj;
@@ -1067,10 +1076,7 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *mc, int min)
if (mc->nobjs >= min)
return 0;
while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
-   if (mc->kmem_cache)
-   obj = kmem_cache_zalloc(mc->kmem_cache, 
GFP_KERNEL_ACCOUNT);
-   else
-   obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+   obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
if (!obj)
return mc->nobjs >= min ? 0 : -ENOMEM;
mc->objects[mc->nobjs++] = obj;
@@ -1118,8 +1124,11 @@ static void *mmu_memory_cache_alloc(struct 
kvm_mmu_memory_cache *mc)
 {
void *p;
 
-   BUG_ON(!mc->nobjs);
-   p = mc->objects[--mc->nobjs];
+   if (WARN_ON(!mc->nobjs))
+   p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
+   else
+   p = mc->objects[--mc->nobjs];
+   BUG_ON(!p);
return p;
 }
 
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 14/21] KVM: Move x86's version of struct kvm_mmu_memory_cache to common code

2020-07-02 Thread Sean Christopherson
Move x86's 'struct kvm_mmu_memory_cache' to common code in anticipation
of moving the entire x86 implementation code to common KVM and reusing
it for arm64 and MIPS.  Add a new architecture specific asm/kvm_types.h
to control the existence and parameters of the struct.  The new header
is needed to avoid a chicken-and-egg problem with asm/kvm_host.h as all
architectures define instances of the struct in their vCPU structs.

Add an asm-generic version of kvm_types.h to avoid having empty files on
PPC and s390 in the long term, and for arm64 and mips in the short term.

Suggested-by: Christoffer Dall 
Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/arm64/include/asm/Kbuild|  1 +
 arch/mips/include/asm/Kbuild |  1 +
 arch/powerpc/include/asm/Kbuild  |  1 +
 arch/s390/include/asm/Kbuild |  1 +
 arch/x86/include/asm/kvm_host.h  | 13 -
 arch/x86/include/asm/kvm_types.h |  7 +++
 include/asm-generic/kvm_types.h  |  5 +
 include/linux/kvm_types.h| 19 +++
 8 files changed, 35 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm_types.h
 create mode 100644 include/asm-generic/kvm_types.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index ff9cbb631212..35a68155cd0e 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 generic-y += early_ioremap.h
+generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += qrwlock.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 8643d313890e..397e6d24d2ab 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -5,6 +5,7 @@ generated-y += syscall_table_64_n32.h
 generated-y += syscall_table_64_n64.h
 generated-y += syscall_table_64_o32.h
 generic-y += export.h
+generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index dadbcf3a0b1e..2d444d09b553 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -4,6 +4,7 @@ generated-y += syscall_table_64.h
 generated-y += syscall_table_c32.h
 generated-y += syscall_table_spu.h
 generic-y += export.h
+generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += vtime.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 83f6e85de7bc..319efa0e6d02 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -6,5 +6,6 @@ generated-y += unistd_nr.h
 
 generic-y += asm-offsets.h
 generic-y += export.h
+generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 685fe5ac7124..c461710d621a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -193,8 +193,6 @@ struct x86_exception;
 enum x86_intercept;
 enum x86_intercept_stage;
 
-#define KVM_NR_MEM_OBJS 40
-
 #define KVM_NR_DB_REGS 4
 
 #define DR6_BD (1 << 13)
@@ -245,17 +243,6 @@ enum x86_intercept_stage;
 
 struct kvm_kernel_irq_routing_entry;
 
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   gfp_t gfp_zero;
-   struct kmem_cache *kmem_cache;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 /*
  * the pages used as guest page table on soft mmu are tracked by
  * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_types.h
new file mode 100644
index ..08f1b57d3b62
--- /dev/null
+++ b/arch/x86/include/asm/kvm_types.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_KVM_TYPES_H
+#define _ASM_X86_KVM_TYPES_H
+
+#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
+
+#endif /* _ASM_X86_KVM_TYPES_H */
diff --git a/include/asm-generic/kvm_types.h b/include/asm-generic/kvm_types.h
new file mode 100644
index ..2a82daf110f1
--- /dev/null
+++ b/include/asm-generic/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_KVM_TYPES_H
+#define _ASM_GENERIC_KVM_TYPES_H
+
+#endif
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 68e84cf42a3f..a7580f69dda0 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -20,6 +20,8 @@ enum kvm_mr_change;
 
 #include 
 
+#include 
+
 /*
  * Address types:
  *
@@ -58,4 +60,21 @@ struct gfn_to_pfn_cache {
bool dirty;
 };
 
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+/*
+ * Memory caches are used to preallocate memory ahead of various MMU flows,
+ * e.g. page fault handlers.  Gracefully handling allocation failures deep in
+ * MMU flows is problematic, as 

[PATCH v3 20/21] KVM: MIPS: Account pages used for GPA page tables

2020-07-02 Thread Sean Christopherson
Use GFP_KERNEL_ACCOUNT instead of GFP_KERNEL when allocating pages for
the the GPA page tables.  The primary motivation for accounting the
allocations is to align with the common KVM memory cache helpers in
preparation for moving to the common implementation in a future patch.
The actual accounting is a bonus side effect.

Signed-off-by: Sean Christopherson 
---
 arch/mips/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index d1f148db2449..9d3c8c025624 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -32,7 +32,7 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache 
*cache, int min)
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_KERNEL);
+   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!page)
return -ENOMEM;
cache->objects[cache->nobjs++] = page;
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 18/21] KVM: arm64: Use common KVM implementation of MMU memory caches

2020-07-02 Thread Sean Christopherson
Move to the common MMU memory cache implementation now that the common
code and arm64's existing code are semantically compatible.

No functional change intended.

Cc: Marc Zyngier 
Suggested-by: Christoffer Dall 
Signed-off-by: Sean Christopherson 
---
 arch/arm64/include/asm/Kbuild  |  1 -
 arch/arm64/include/asm/kvm_host.h  | 12 ---
 arch/arm64/include/asm/kvm_types.h |  8 +
 arch/arm64/kvm/mmu.c   | 53 +++---
 4 files changed, 19 insertions(+), 55 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_types.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 35a68155cd0e..ff9cbb631212 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -1,6 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 generic-y += early_ioremap.h
-generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += qrwlock.h
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 335170b59899..23d1f41548f5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -97,18 +97,6 @@ struct kvm_arch {
bool return_nisv_io_abort_to_user;
 };
 
-#define KVM_NR_MEM_OBJS 40
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   gfp_t gfp_zero;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 struct kvm_vcpu_fault_info {
u32 esr_el2;/* Hyp Syndrom Register */
u64 far_el2;/* Hyp Fault Address Register */
diff --git a/arch/arm64/include/asm/kvm_types.h 
b/arch/arm64/include/asm/kvm_types.h
new file mode 100644
index ..9a126b9e2d7c
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_types.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_KVM_TYPES_H
+#define _ASM_ARM64_KVM_TYPES_H
+
+#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
+
+#endif /* _ASM_ARM64_KVM_TYPES_H */
+
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5220623a4efb..ba66e9a9bd3c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -124,37 +124,6 @@ static void stage2_dissolve_pud(struct kvm *kvm, 
phys_addr_t addr, pud_t *pudp)
put_page(virt_to_page(pudp));
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
-{
-   void *page;
-
-   if (cache->nobjs >= min)
-   return 0;
-   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT |
-  cache->gfp_zero);
-   if (!page)
-   return -ENOMEM;
-   cache->objects[cache->nobjs++] = page;
-   }
-   return 0;
-}
-
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs)
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
-}
-
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
-   void *p;
-
-   BUG_ON(!mc || !mc->nobjs);
-   p = mc->objects[--mc->nobjs];
-   return p;
-}
-
 static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t 
addr)
 {
p4d_t *p4d_table __maybe_unused = stage2_p4d_offset(kvm, pgd, 0UL);
@@ -1131,7 +1100,7 @@ static p4d_t *stage2_get_p4d(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
if (stage2_pgd_none(kvm, *pgd)) {
if (!cache)
return NULL;
-   p4d = mmu_memory_cache_alloc(cache);
+   p4d = kvm_mmu_memory_cache_alloc(cache);
stage2_pgd_populate(kvm, pgd, p4d);
get_page(virt_to_page(pgd));
}
@@ -1149,7 +1118,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
if (stage2_p4d_none(kvm, *p4d)) {
if (!cache)
return NULL;
-   pud = mmu_memory_cache_alloc(cache);
+   pud = kvm_mmu_memory_cache_alloc(cache);
stage2_p4d_populate(kvm, p4d, pud);
get_page(virt_to_page(p4d));
}
@@ -1170,7 +1139,7 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
if (stage2_pud_none(kvm, *pud)) {
if (!cache)
return NULL;
-   pmd = mmu_memory_cache_alloc(cache);
+   pmd = kvm_mmu_memory_cache_alloc(cache);
stage2_pud_populate(kvm, pud, pmd);
get_page(virt_to_page(pud));
}
@@ -1376,7 +1345,7 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
if (stage2_pud_none(kvm, *pud)) {
if (!cache)
return 0; /* ignore calls from kvm_set_spte_hva */
-   pmd = mmu_memory_cache_alloc(cache);

[PATCH v3 01/21] KVM: x86/mmu: Track the associated kmem_cache in the MMU caches

2020-07-02 Thread Sean Christopherson
Track the kmem_cache used for non-page KVM MMU memory caches instead of
passing in the associated kmem_cache when filling the cache.  This will
allow consolidating code and other cleanups.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c  | 24 +++-
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f852ee350beb..71bc32e00d7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -251,6 +251,7 @@ struct kvm_kernel_irq_routing_entry;
  */
 struct kvm_mmu_memory_cache {
int nobjs;
+   struct kmem_cache *kmem_cache;
void *objects[KVM_NR_MEM_OBJS];
 };
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3dd0af7e7515..2c62cc1d0353 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,15 +1060,14 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
local_irq_enable();
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- struct kmem_cache *base_cache, int min)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
 {
void *obj;
 
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   obj = kmem_cache_zalloc(base_cache, GFP_KERNEL_ACCOUNT);
+   obj = kmem_cache_zalloc(cache->kmem_cache, GFP_KERNEL_ACCOUNT);
if (!obj)
return cache->nobjs >= min ? 0 : -ENOMEM;
cache->objects[cache->nobjs++] = obj;
@@ -1081,11 +1080,10 @@ static int mmu_memory_cache_free_objects(struct 
kvm_mmu_memory_cache *cache)
return cache->nobjs;
 }
 
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
- struct kmem_cache *cache)
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
 {
while (mc->nobjs)
-   kmem_cache_free(cache, mc->objects[--mc->nobjs]);
+   kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
 }
 
 static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
@@ -1115,25 +1113,22 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu)
int r;
 
r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
-  pte_list_desc_cache, 8 + PTE_PREFETCH_NUM);
+  8 + PTE_PREFETCH_NUM);
if (r)
goto out;
r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
if (r)
goto out;
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
-  mmu_page_header_cache, 4);
+   r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 4);
 out:
return r;
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-   mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
-   pte_list_desc_cache);
+   mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
-   mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache,
-   mmu_page_header_cache);
+   mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
 
 static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
@@ -5679,6 +5674,9 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
uint i;
int ret;
 
+   vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
+   vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+
vcpu->arch.mmu = &vcpu->arch.root_mmu;
vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
 
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 06/21] KVM: x86/mmu: Move fast_page_fault() call above mmu_topup_memory_caches()

2020-07-02 Thread Sean Christopherson
Avoid refilling the memory caches and potentially slow reclaim/swap when
handling a fast page fault, which does not need to allocate any new
objects.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 748235877def..3d0768e16463 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4095,6 +4095,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 error_code,
if (page_fault_handle_page_track(vcpu, error_code, gfn))
return RET_PF_EMULATE;
 
+   if (fast_page_fault(vcpu, gpa, error_code))
+   return RET_PF_RETRY;
+
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
@@ -4102,9 +4105,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 error_code,
if (lpage_disallowed)
max_level = PG_LEVEL_4K;
 
-   if (fast_page_fault(vcpu, gpa, error_code))
-   return RET_PF_RETRY;
-
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
 
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 04/21] KVM: x86/mmu: Remove superfluous gotos from mmu_topup_memory_caches()

2020-07-02 Thread Sean Christopherson
Return errors directly from mmu_topup_memory_caches() instead of
branching to a label that does the same.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b6df4525e86c..94b27adab6ba 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1100,13 +1100,11 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu)
r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
   8 + PTE_PREFETCH_NUM);
if (r)
-   goto out;
+   return r;
r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache, 8);
if (r)
-   goto out;
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 4);
-out:
-   return r;
+   return r;
+   return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 4);
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 13/21] KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be global

2020-07-02 Thread Sean Christopherson
Rename the memory helpers that will soon be moved to common code and be
made globaly available via linux/kvm_host.h.  "mmu" alone is not a
sufficient namespace for globally available KVM symbols.

Opportunistically add "nr_" in mmu_memory_cache_free_objects() to make
it clear the function returns the number of free objects, as opposed to
freeing existing objects.

Suggested-by: Christoffer Dall 
Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 586d63de0e78..f4c8dae476bb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1071,7 +1071,7 @@ static inline void *mmu_memory_cache_alloc_obj(struct 
kvm_mmu_memory_cache *mc,
return (void *)__get_free_page(gfp_flags);
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
+static int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
 {
void *obj;
 
@@ -1086,12 +1086,12 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *mc, int min)
return 0;
 }
 
-static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *mc)
+static int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache 
*mc)
 {
return mc->nobjs;
 }
 
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+static void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
 {
while (mc->nobjs) {
if (mc->kmem_cache)
@@ -1106,33 +1106,33 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu, bool maybe_indirect)
int r;
 
/* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
-  1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
+   r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
+  1 + PT64_ROOT_MAX_LEVEL + 
PTE_PREFETCH_NUM);
if (r)
return r;
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
-  PT64_ROOT_MAX_LEVEL);
+   r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+  PT64_ROOT_MAX_LEVEL);
if (r)
return r;
if (maybe_indirect) {
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
-  PT64_ROOT_MAX_LEVEL);
+   r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
+  PT64_ROOT_MAX_LEVEL);
if (r)
return r;
}
-   return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
- PT64_ROOT_MAX_LEVEL);
+   return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+ PT64_ROOT_MAX_LEVEL);
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-   mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-   mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
-   mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
-   mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
+   kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
+   kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+   kvm_mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
+   kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
 
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+static void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
 {
void *p;
 
@@ -1146,7 +1146,7 @@ static void *mmu_memory_cache_alloc(struct 
kvm_mmu_memory_cache *mc)
 
 static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
 {
-   return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
+   return kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
 }
 
 static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
@@ -1417,7 +1417,7 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
struct kvm_mmu_memory_cache *mc;
 
mc = &vcpu->arch.mmu_pte_list_desc_cache;
-   return mmu_memory_cache_free_objects(mc);
+   return kvm_mmu_memory_cache_nr_free_objects(mc);
 }
 
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
@@ -2104,10 +2104,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
 {
struct kvm_mmu_page *sp;
 
-   sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-   sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+   sp = kvm_mmu_memory_cache_alloc(&vcpu

[PATCH v3 08/21] KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches()

2020-07-02 Thread Sean Christopherson
Clean up the minimums in mmu_topup_memory_caches() to document the
driving mechanisms behind the minimums.  Now that encountering an empty
cache is unlikely to trigger BUG_ON(), it is less dangerous to be more
precise when defining the minimums.

For rmaps, the logic is 1 parent PTE per level, plus a single rmap, and
prefetched rmaps.  The extra objects in the current '8 + PREFETCH'
minimum came about due to an abundance of paranoia in commit
c41ef344de212 ("KVM: MMU: increase per-vcpu rmap cache alloc size"),
i.e. it could have increased the minimum to 2 rmaps.  Furthermore, the
unexpected extra rmap case was killed off entirely by commits
f759e2b4c728c ("KVM: MMU: avoid pte_list_desc running out in
kvm_mmu_pte_write") and f5a1e9f89504f ("KVM: MMU: remove call to
kvm_mmu_pte_write from walk_addr").

For the so called page cache, replace '8' with 2*PT64_ROOT_MAX_LEVEL.
The 2x multiplier is needed because the cache is used for both shadow
pages and gfn arrays for indirect MMUs.

And finally, for page headers, replace '4' with PT64_ROOT_MAX_LEVEL.

Note, KVM now supports 5-level paging, i.e. the old minimums that used a
baseline derived from 4-level paging were technically wrong.  But, KVM
always allocates roots in a separate flow, e.g. it's impossible in the
current implementation to actually need 5 new shadow pages in a single
flow.  Use PT64_ROOT_MAX_LEVEL unmodified instead of subtracting 1, as
the direct usage is likely more intuitive to uninformed readers, and the
inflated minimum is unlikely to affect functionality in practice.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3d0768e16463..cf02ad93c249 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1103,14 +1103,17 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu)
 {
int r;
 
+   /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
-  8 + PTE_PREFETCH_NUM);
+  1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
if (r)
return r;
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache, 8);
+   r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
+  2 * PT64_ROOT_MAX_LEVEL);
if (r)
return r;
-   return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 4);
+   return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+ PT64_ROOT_MAX_LEVEL);
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 11/21] KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock)

2020-07-02 Thread Sean Christopherson
Set __GFP_ZERO for the shadow page memory cache and drop the explicit
clear_page() from kvm_mmu_get_page().  This moves the cost of zeroing a
page to the allocation time of the physical page, i.e. when topping up
the memory caches, and thus avoids having to zero out an entire page
while holding mmu_lock.

Cc: Peter Feiner 
Cc: Peter Shier 
Cc: Junaid Shahid 
Cc: Jim Mattson 
Suggested-by: Ben Gardon 
Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ed36f5e63863..60b0d460bbf5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2545,7 +2545,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (level > PG_LEVEL_4K && need_sync)
flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
}
-   clear_page(sp->spt);
trace_kvm_mmu_get_page(sp, true);
 
kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
@@ -5682,6 +5681,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
+   vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
+
vcpu->arch.mmu = &vcpu->arch.root_mmu;
vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
 
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 03/21] KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache locals

2020-07-02 Thread Sean Christopherson
Use "mc" for local variables to shorten line lengths and provide
consistent names, which will be especially helpful when some of the
helpers are moved to common KVM code in future patches.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d6612af6c056..b6df4525e86c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,27 +1060,27 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
local_irq_enable();
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
 {
void *obj;
 
-   if (cache->nobjs >= min)
+   if (mc->nobjs >= min)
return 0;
-   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   if (cache->kmem_cache)
-   obj = kmem_cache_zalloc(cache->kmem_cache, 
GFP_KERNEL_ACCOUNT);
+   while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
+   if (mc->kmem_cache)
+   obj = kmem_cache_zalloc(mc->kmem_cache, 
GFP_KERNEL_ACCOUNT);
else
obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!obj)
-   return cache->nobjs >= min ? 0 : -ENOMEM;
-   cache->objects[cache->nobjs++] = obj;
+   return mc->nobjs >= min ? 0 : -ENOMEM;
+   mc->objects[mc->nobjs++] = obj;
}
return 0;
 }
 
-static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *cache)
+static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *mc)
 {
-   return cache->nobjs;
+   return mc->nobjs;
 }
 
 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
@@ -1395,10 +1395,10 @@ static struct kvm_rmap_head *gfn_to_rmap(struct kvm 
*kvm, gfn_t gfn,
 
 static bool rmap_can_add(struct kvm_vcpu *vcpu)
 {
-   struct kvm_mmu_memory_cache *cache;
+   struct kvm_mmu_memory_cache *mc;
 
-   cache = &vcpu->arch.mmu_pte_list_desc_cache;
-   return mmu_memory_cache_free_objects(cache);
+   mc = &vcpu->arch.mmu_pte_list_desc_cache;
+   return mmu_memory_cache_free_objects(mc);
 }
 
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 21/21] KVM: MIPS: Use common KVM implementation of MMU memory caches

2020-07-02 Thread Sean Christopherson
Move to the common MMU memory cache implementation now that the common
code and MIPS's existing code are semantically compatible.

No functional change intended.

Suggested-by: Christoffer Dall 
Signed-off-by: Sean Christopherson 
---
 arch/mips/include/asm/Kbuild  |  1 -
 arch/mips/include/asm/kvm_host.h  | 11 -
 arch/mips/include/asm/kvm_types.h |  7 ++
 arch/mips/kvm/mmu.c   | 40 ---
 4 files changed, 12 insertions(+), 47 deletions(-)
 create mode 100644 arch/mips/include/asm/kvm_types.h

diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 397e6d24d2ab..8643d313890e 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -5,7 +5,6 @@ generated-y += syscall_table_64_n32.h
 generated-y += syscall_table_64_n64.h
 generated-y += syscall_table_64_o32.h
 generic-y += export.h
-generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 363e7a89d173..f49617175f60 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -335,17 +335,6 @@ struct kvm_mips_tlb {
long tlb_lo[2];
 };
 
-#define KVM_NR_MEM_OBJS 4
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 #define KVM_MIPS_AUX_FPU   0x1
 #define KVM_MIPS_AUX_MSA   0x2
 
diff --git a/arch/mips/include/asm/kvm_types.h 
b/arch/mips/include/asm/kvm_types.h
new file mode 100644
index ..213754d9ef6b
--- /dev/null
+++ b/arch/mips/include/asm/kvm_types.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_MIPS_KVM_TYPES_H
+#define _ASM_MIPS_KVM_TYPES_H
+
+#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 4
+
+#endif /* _ASM_MIPS_KVM_TYPES_H */
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 9d3c8c025624..87fa8d8a1031 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -25,39 +25,9 @@
 #define KVM_MMU_CACHE_MIN_PAGES 2
 #endif
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
-{
-   void *page;
-
-   if (cache->nobjs >= min)
-   return 0;
-   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
-   if (!page)
-   return -ENOMEM;
-   cache->objects[cache->nobjs++] = page;
-   }
-   return 0;
-}
-
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs)
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
-}
-
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
-   void *p;
-
-   BUG_ON(!mc || !mc->nobjs);
-   p = mc->objects[--mc->nobjs];
-   return p;
-}
-
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-   mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+   kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 }
 
 /**
@@ -151,7 +121,7 @@ static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct 
kvm_mmu_memory_cache *cache,
 
if (!cache)
return NULL;
-   new_pmd = mmu_memory_cache_alloc(cache);
+   new_pmd = kvm_mmu_memory_cache_alloc(cache);
pmd_init((unsigned long)new_pmd,
 (unsigned long)invalid_pte_table);
pud_populate(NULL, pud, new_pmd);
@@ -162,7 +132,7 @@ static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct 
kvm_mmu_memory_cache *cache,
 
if (!cache)
return NULL;
-   new_pte = mmu_memory_cache_alloc(cache);
+   new_pte = kvm_mmu_memory_cache_alloc(cache);
clear_page(new_pte);
pmd_populate_kernel(NULL, pmd, new_pte);
}
@@ -709,7 +679,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, 
unsigned long gpa,
goto out;
 
/* We need a minimum of cached pages ready for page table creation */
-   err = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
+   err = kvm_mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
if (err)
goto out;
 
@@ -793,7 +763,7 @@ static pte_t *kvm_trap_emul_pte_for_gva(struct kvm_vcpu 
*vcpu,
int ret;
 
/* We need a minimum of cached pages ready for page table creation */
-   ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
+   ret = kvm_mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
if (ret)
return NULL;
 
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 00/21] KVM: Cleanup and unify kvm_mmu_memory_cache usage

2020-07-02 Thread Sean Christopherson
The only interesting delta from v2 is that patch 18 is updated to handle
a conflict with arm64's p4d rework.  Resolution was straightforward
(famous last words).


This series resurrects Christoffer Dall's series[1] to provide a common
MMU memory cache implementation that can be shared by x86, arm64 and MIPS.

It also picks up a suggested change from Ben Gardon[2] to clear shadow
page tables during initial allocation so as to avoid clearing entire
pages while holding mmu_lock.

The front half of the patches do house cleaning on x86's memory cache
implementation in preparation for moving it to common code, along with a
fair bit of cleanup on the usage.  The middle chunk moves the patches to
common KVM, and the last two chunks convert arm64 and MIPS to the common
implementation.

Fully tested on x86 only.  Compile tested patches 14-21 on arm64, MIPS,
s390 and PowerPC.

v3:
  - Rebased to kvm/queue, commit a037ff353ba6 ("Merge ... into HEAD")
  - Collect more review tags. [Ben]

v2:
  - Rebase to kvm-5.8-2, commit 49b3deaad345 ("Merge tag ...").
  - Use an asm-generic kvm_types.h for s390 and PowerPC instead of an
empty arch-specific file. [Marc]
  - Explicit document "GFP_PGTABLE_USER == GFP_KERNEL_ACCOUNT | GFP_ZERO"
in the arm64 conversion patch. [Marc]
  - Collect review tags. [Ben]

Sean Christopherson (21):
  KVM: x86/mmu: Track the associated kmem_cache in the MMU caches
  KVM: x86/mmu: Consolidate "page" variant of memory cache helpers
  KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache locals
  KVM: x86/mmu: Remove superfluous gotos from mmu_topup_memory_caches()
  KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty
  KVM: x86/mmu: Move fast_page_fault() call above
mmu_topup_memory_caches()
  KVM: x86/mmu: Topup memory caches after walking GVA->GPA
  KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches()
  KVM: x86/mmu: Separate the memory caches for shadow pages and gfn
arrays
  KVM: x86/mmu: Make __GFP_ZERO a property of the memory cache
  KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock)
  KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU
topups
  KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be
global
  KVM: Move x86's version of struct kvm_mmu_memory_cache to common code
  KVM: Move x86's MMU memory cache helpers to common KVM code
  KVM: arm64: Drop @max param from mmu_topup_memory_cache()
  KVM: arm64: Use common code's approach for __GFP_ZERO with memory
caches
  KVM: arm64: Use common KVM implementation of MMU memory caches
  KVM: MIPS: Drop @max param from mmu_topup_memory_cache()
  KVM: MIPS: Account pages used for GPA page tables
  KVM: MIPS: Use common KVM implementation of MMU memory caches

 arch/arm64/include/asm/kvm_host.h  |  11 ---
 arch/arm64/include/asm/kvm_types.h |   8 ++
 arch/arm64/kvm/arm.c   |   2 +
 arch/arm64/kvm/mmu.c   |  56 +++--
 arch/mips/include/asm/kvm_host.h   |  11 ---
 arch/mips/include/asm/kvm_types.h  |   7 ++
 arch/mips/kvm/mmu.c|  44 ++
 arch/powerpc/include/asm/Kbuild|   1 +
 arch/s390/include/asm/Kbuild   |   1 +
 arch/x86/include/asm/kvm_host.h|  14 +---
 arch/x86/include/asm/kvm_types.h   |   7 ++
 arch/x86/kvm/mmu/mmu.c | 129 +
 arch/x86/kvm/mmu/paging_tmpl.h |  10 +--
 include/asm-generic/kvm_types.h|   5 ++
 include/linux/kvm_host.h   |   7 ++
 include/linux/kvm_types.h  |  19 +
 virt/kvm/kvm_main.c|  55 
 17 files changed, 176 insertions(+), 211 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_types.h
 create mode 100644 arch/mips/include/asm/kvm_types.h
 create mode 100644 arch/x86/include/asm/kvm_types.h
 create mode 100644 include/asm-generic/kvm_types.h

-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 17/21] KVM: arm64: Use common code's approach for __GFP_ZERO with memory caches

2020-07-02 Thread Sean Christopherson
Add a "gfp_zero" member to arm64's 'struct kvm_mmu_memory_cache' to make
the struct and its usage compatible with the common 'struct
kvm_mmu_memory_cache' in linux/kvm_host.h.  This will minimize code
churn when arm64 moves to the common implementation in a future patch, at
the cost of temporarily having somewhat silly code.

Note, GFP_PGTABLE_USER is equivalent to GFP_KERNEL_ACCOUNT | GFP_ZERO:

  #define GFP_PGTABLE_USER  (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
  |
  -> #define GFP_PGTABLE_KERNEL(GFP_KERNEL | __GFP_ZERO)

  == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO

versus

  #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)

with __GFP_ZERO explicitly OR'd in

  == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO

No functional change intended.

Tested-by: Marc Zyngier 
Signed-off-by: Sean Christopherson 
---
 arch/arm64/include/asm/kvm_host.h | 1 +
 arch/arm64/kvm/arm.c  | 2 ++
 arch/arm64/kvm/mmu.c  | 5 +++--
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index c3e6fcc664b1..335170b59899 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -105,6 +105,7 @@ struct kvm_arch {
  */
 struct kvm_mmu_memory_cache {
int nobjs;
+   gfp_t gfp_zero;
void *objects[KVM_NR_MEM_OBJS];
 };
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90cb90561446..1016635b3782 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -270,6 +270,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.target = -1;
bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
 
+   vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+
/* Set up the timer */
kvm_timer_vcpu_init(vcpu);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f78aa3e269e9..5220623a4efb 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -131,7 +131,8 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *cache, int min)
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_PGTABLE_USER);
+   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT |
+  cache->gfp_zero);
if (!page)
return -ENOMEM;
cache->objects[cache->nobjs++] = page;
@@ -1467,7 +1468,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
guest_ipa,
phys_addr_t addr, end;
int ret = 0;
unsigned long pfn;
-   struct kvm_mmu_memory_cache cache = { 0, };
+   struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, };
 
end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
pfn = __phys_to_pfn(pa);
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 10/21] KVM: x86/mmu: Make __GFP_ZERO a property of the memory cache

2020-07-02 Thread Sean Christopherson
Add a gfp_zero flag to 'struct kvm_mmu_memory_cache' and use it to
control __GFP_ZERO instead of hardcoding a call to kmem_cache_zalloc().
A future patch needs such a flag for the __get_free_page() path, as
gfn arrays do not need/want the allocator to zero the memory.  Convert
the kmem_cache paths to __GFP_ZERO now so as to avoid a weird and
inconsistent API in the future.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu/mmu.c  | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b71a4e77f65a..685fe5ac7124 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -251,6 +251,7 @@ struct kvm_kernel_irq_routing_entry;
  */
 struct kvm_mmu_memory_cache {
int nobjs;
+   gfp_t gfp_zero;
struct kmem_cache *kmem_cache;
void *objects[KVM_NR_MEM_OBJS];
 };
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8e1b55d8a728..ed36f5e63863 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1063,8 +1063,10 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
 static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
   gfp_t gfp_flags)
 {
+   gfp_flags |= mc->gfp_zero;
+
if (mc->kmem_cache)
-   return kmem_cache_zalloc(mc->kmem_cache, gfp_flags);
+   return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
else
return (void *)__get_free_page(gfp_flags);
 }
@@ -5675,7 +5677,10 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
int ret;
 
vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
+   vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO;
+
vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+   vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
vcpu->arch.mmu = &vcpu->arch.root_mmu;
vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 09/21] KVM: x86/mmu: Separate the memory caches for shadow pages and gfn arrays

2020-07-02 Thread Sean Christopherson
Use separate caches for allocating shadow pages versus gfn arrays.  This
sets the stage for specifying __GFP_ZERO when allocating shadow pages
without incurring extra cost for gfn arrays.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/mmu/mmu.c  | 15 ++-
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 71bc32e00d7e..b71a4e77f65a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -636,7 +636,8 @@ struct kvm_vcpu_arch {
struct kvm_mmu *walk_mmu;
 
struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   struct kvm_mmu_memory_cache mmu_shadow_page_cache;
+   struct kvm_mmu_memory_cache mmu_gfn_array_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
 
/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cf02ad93c249..8e1b55d8a728 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1108,8 +1108,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
   1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
if (r)
return r;
-   r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
-  2 * PT64_ROOT_MAX_LEVEL);
+   r = mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+  PT64_ROOT_MAX_LEVEL);
+   if (r)
+   return r;
+   r = mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
+  PT64_ROOT_MAX_LEVEL);
if (r)
return r;
return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
@@ -1119,7 +1123,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-   mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+   mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+   mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
 
@@ -2096,9 +2101,9 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
struct kvm_mmu_page *sp;
 
sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-   sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
+   sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
if (!direct)
-   sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
+   sp->gfns = 
mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
/*
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 02/21] KVM: x86/mmu: Consolidate "page" variant of memory cache helpers

2020-07-02 Thread Sean Christopherson
Drop the "page" variants of the topup/free memory cache helpers, using
the existence of an associated kmem_cache to select the correct alloc
or free routine.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 37 +++--
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2c62cc1d0353..d6612af6c056 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1067,7 +1067,10 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *cache, int min)
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   obj = kmem_cache_zalloc(cache->kmem_cache, GFP_KERNEL_ACCOUNT);
+   if (cache->kmem_cache)
+   obj = kmem_cache_zalloc(cache->kmem_cache, 
GFP_KERNEL_ACCOUNT);
+   else
+   obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!obj)
return cache->nobjs >= min ? 0 : -ENOMEM;
cache->objects[cache->nobjs++] = obj;
@@ -1082,30 +1085,12 @@ static int mmu_memory_cache_free_objects(struct 
kvm_mmu_memory_cache *cache)
 
 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
 {
-   while (mc->nobjs)
-   kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
-}
-
-static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
-  int min)
-{
-   void *page;
-
-   if (cache->nobjs >= min)
-   return 0;
-   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
-   if (!page)
-   return cache->nobjs >= min ? 0 : -ENOMEM;
-   cache->objects[cache->nobjs++] = page;
+   while (mc->nobjs) {
+   if (mc->kmem_cache)
+   kmem_cache_free(mc->kmem_cache, 
mc->objects[--mc->nobjs]);
+   else
+   free_page((unsigned long)mc->objects[--mc->nobjs]);
}
-   return 0;
-}
-
-static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs)
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
 }
 
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
@@ -1116,7 +1101,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
   8 + PTE_PREFETCH_NUM);
if (r)
goto out;
-   r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
+   r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache, 8);
if (r)
goto out;
r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 4);
@@ -1127,7 +1112,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-   mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
+   mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
 
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 15/21] KVM: Move x86's MMU memory cache helpers to common KVM code

2020-07-02 Thread Sean Christopherson
Move x86's memory cache helpers to common KVM code so that they can be
reused by arm64 and MIPS in future patches.

Suggested-by: Christoffer Dall 
Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c   | 53 --
 include/linux/kvm_host.h |  7 +
 virt/kvm/kvm_main.c  | 55 
 3 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f4c8dae476bb..5b7782d92faf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,47 +1060,6 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
local_irq_enable();
 }
 
-static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
-  gfp_t gfp_flags)
-{
-   gfp_flags |= mc->gfp_zero;
-
-   if (mc->kmem_cache)
-   return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
-   else
-   return (void *)__get_free_page(gfp_flags);
-}
-
-static int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
-{
-   void *obj;
-
-   if (mc->nobjs >= min)
-   return 0;
-   while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
-   obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
-   if (!obj)
-   return mc->nobjs >= min ? 0 : -ENOMEM;
-   mc->objects[mc->nobjs++] = obj;
-   }
-   return 0;
-}
-
-static int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache 
*mc)
-{
-   return mc->nobjs;
-}
-
-static void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs) {
-   if (mc->kmem_cache)
-   kmem_cache_free(mc->kmem_cache, 
mc->objects[--mc->nobjs]);
-   else
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
-   }
-}
-
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 {
int r;
@@ -1132,18 +1091,6 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
 
-static void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
-   void *p;
-
-   if (WARN_ON(!mc->nobjs))
-   p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
-   else
-   p = mc->objects[--mc->nobjs];
-   BUG_ON(!p);
-   return p;
-}
-
 static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
 {
return kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9edc6fc71a89..283df0fed4a8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -816,6 +816,13 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool 
usermode_vcpu_not_eligible);
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
+int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc);
+void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
+void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
+#endif
+
 bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
 struct kvm_vcpu *except,
 unsigned long *vcpu_bitmap, cpumask_var_t tmp);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a852af5c3214..e70d75de185d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -340,6 +340,61 @@ void kvm_reload_remote_mmus(struct kvm *kvm)
kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
 }
 
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
+  gfp_t gfp_flags)
+{
+   gfp_flags |= mc->gfp_zero;
+
+   if (mc->kmem_cache)
+   return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
+   else
+   return (void *)__get_free_page(gfp_flags);
+}
+
+int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
+{
+   void *obj;
+
+   if (mc->nobjs >= min)
+   return 0;
+   while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
+   obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
+   if (!obj)
+   return mc->nobjs >= min ? 0 : -ENOMEM;
+   mc->objects[mc->nobjs++] = obj;
+   }
+   return 0;
+}
+
+int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc)
+{
+   return mc->nobjs;
+}
+
+void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+   while (mc->nobjs) {
+   if (mc->kmem_cache)

[PATCH v3 16/21] KVM: arm64: Drop @max param from mmu_topup_memory_cache()

2020-07-02 Thread Sean Christopherson
Replace the @max param in mmu_topup_memory_cache() and instead use
ARRAY_SIZE() to terminate the loop to fill the cache.  This removes a
BUG_ON() and sets the stage for moving arm64 to the common memory cache
implementation.

No functional change intended.

Tested-by: Marc Zyngier 
Signed-off-by: Sean Christopherson 
---
 arch/arm64/kvm/mmu.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8c0035cab6b6..f78aa3e269e9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -124,15 +124,13 @@ static void stage2_dissolve_pud(struct kvm *kvm, 
phys_addr_t addr, pud_t *pudp)
put_page(virt_to_page(pudp));
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- int min, int max)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
 {
void *page;
 
-   BUG_ON(max > KVM_NR_MEM_OBJS);
if (cache->nobjs >= min)
return 0;
-   while (cache->nobjs < max) {
+   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
page = (void *)__get_free_page(GFP_PGTABLE_USER);
if (!page)
return -ENOMEM;
@@ -1481,8 +1479,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
guest_ipa,
pte = kvm_s2pte_mkwrite(pte);
 
ret = mmu_topup_memory_cache(&cache,
-kvm_mmu_cache_min_pages(kvm),
-KVM_NR_MEM_OBJS);
+kvm_mmu_cache_min_pages(kvm));
if (ret)
goto out;
spin_lock(&kvm->mmu_lock);
@@ -1882,8 +1879,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
mmap_read_unlock(current->mm);
 
/* We need minimum second+third level pages */
-   ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
-KVM_NR_MEM_OBJS);
+   ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm));
if (ret)
return ret;
 
-- 
2.26.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Andrew Jones
On Thu, Jul 02, 2020 at 02:57:56PM +0200, Auger Eric wrote:
> Hi Jingyi,
> 
> On 7/2/20 5:01 AM, Jingyi Wang wrote:
> > If gicv4.1(sgi hardware injection) supported, we test ipi injection
> > via hw/sw way separately.
> > 
> > Signed-off-by: Jingyi Wang 
> > ---
> >  arm/micro-bench.c| 45 +++-
> >  lib/arm/asm/gic-v3.h |  3 +++
> >  lib/arm/asm/gic.h|  1 +
> >  3 files changed, 44 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arm/micro-bench.c b/arm/micro-bench.c
> > index fc4d356..80d8db3 100644
> > --- a/arm/micro-bench.c
> > +++ b/arm/micro-bench.c
> > @@ -91,9 +91,40 @@ static void gic_prep_common(void)
> > assert(irq_ready);
> >  }
> >  
> > -static void ipi_prep(void)
> > +static bool ipi_prep(void)
> Any reason why the bool returned value is preferred over the standard int?

Why would an int be preferred over bool if the return is boolean?

Thanks,
drew

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V5 0/4] arm64/cpufeature: Introduce ID_PFR2, ID_DFR1, ID_MMFR5 and other changes

2020-07-02 Thread Catalin Marinas
On Wed, May 27, 2020 at 08:33:35AM +0530, Anshuman Khandual wrote:
> These are remaining patches from V4 series which had some pending reviews
> from Suzuki (https://patchwork.kernel.org/cover/11557333/). Also dropped
> [PATCH 15/17] as that will need some more investigation and rework.
> 
> This series applies on arm64/for-next/cpufeature.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Mark Rutland  
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: Suzuki K Poulose 
> Cc: kvmarm@lists.cs.columbia.edu
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org

Thanks Suzuki for review.

Anshuman, could you please rebase this series on top of 5.8-rc3? It no
longer applies cleanly.

Thanks.

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvmtool: arm64: Report missing support for 32bit guests

2020-07-02 Thread Marc Zyngier

On 2020-07-02 16:37, Suzuki K Poulose wrote:

Hi Marc

On 07/01/2020 04:42 PM, Marc Zyngier wrote:

On 2020-07-01 15:20, Suzuki K Poulose wrote:

When the host doesn't support 32bit guests, the kvmtool fails
without a proper message on what is wrong. i.e,

 $ lkvm run -c 1 Image --aarch32
  # lkvm run -k Image -m 256 -c 1 --name guest-105618
  Fatal: Unable to initialise vcpu

Given that there is no other easy way to check if the host supports 
32bit
guests, it is always good to report this by checking the capability, 
rather

than leaving the users to hunt this down by looking at the code!

After this patch:

 $ lkvm run -c 1 Image --aarch32
  # lkvm run -k Image -m 256 -c 1 --name guest-105695
  Fatal: 32bit guests are not supported


Fancy!



Cc: Will Deacon 
Reported-by: Sami Mujawar 
Signed-off-by: Suzuki K Poulose 
---
 arm/kvm-cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
index 554414f..2acecae 100644
--- a/arm/kvm-cpu.c
+++ b/arm/kvm-cpu.c
@@ -46,6 +46,10 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm 
*kvm,

unsigned long cpu_id)
 .features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id)
 };

+    if (kvm->cfg.arch.aarch32_guest &&
+    !kvm__supports_extension(kvm, KVM_CAP_ARM_EL1_32BIT))


Can you please check that this still compiles for 32bit host?


Yes, it does. I have built this on an arm32 rootfs with make ARCH=arm.
The kvm->cfg.arch is common across arm/arm64 and is defined here :

arm/include/arm-common/kvm-config-arch.h


I was worried about the availability of KVM_CAP_ARM_EL1_32BIT,
but being a capability, it is common to all arches. It is
KVM_ARM_VCPU_EL1_32BIT that is 32bit only, but that's not what
you are using. Too many flags! ;-)

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvmtool: arm64: Report missing support for 32bit guests

2020-07-02 Thread Suzuki K Poulose

Hi Marc

On 07/01/2020 04:42 PM, Marc Zyngier wrote:

On 2020-07-01 15:20, Suzuki K Poulose wrote:

When the host doesn't support 32bit guests, the kvmtool fails
without a proper message on what is wrong. i.e,

 $ lkvm run -c 1 Image --aarch32
  # lkvm run -k Image -m 256 -c 1 --name guest-105618
  Fatal: Unable to initialise vcpu

Given that there is no other easy way to check if the host supports 32bit
guests, it is always good to report this by checking the capability, 
rather

than leaving the users to hunt this down by looking at the code!

After this patch:

 $ lkvm run -c 1 Image --aarch32
  # lkvm run -k Image -m 256 -c 1 --name guest-105695
  Fatal: 32bit guests are not supported


Fancy!



Cc: Will Deacon 
Reported-by: Sami Mujawar 
Signed-off-by: Suzuki K Poulose 
---
 arm/kvm-cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
index 554414f..2acecae 100644
--- a/arm/kvm-cpu.c
+++ b/arm/kvm-cpu.c
@@ -46,6 +46,10 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm,
unsigned long cpu_id)
 .features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id)
 };

+    if (kvm->cfg.arch.aarch32_guest &&
+    !kvm__supports_extension(kvm, KVM_CAP_ARM_EL1_32BIT))


Can you please check that this still compiles for 32bit host?


Yes, it does. I have built this on an arm32 rootfs with make ARCH=arm.
The kvm->cfg.arch is common across arm/arm64 and is defined here :

arm/include/arm-common/kvm-config-arch.h

And the aarch32 command line option is only available on aarch64 host.
So this is safe on an arm32 host.




+    die("32bit guests are not supported\n");
+
 vcpu = calloc(1, sizeof(struct kvm_cpu));
 if (!vcpu)
 return NULL;


With the above detail checked,

Acked-by: Marc Zyngier 


Thanks
Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] arm64: Add workaround for Arm Cortex-A77 erratum 1508412

2020-07-02 Thread Will Deacon
On Thu, Jul 02, 2020 at 08:56:05AM -0600, Rob Herring wrote:
> On Thu, Jul 2, 2020 at 5:42 AM Will Deacon  wrote:
> >
> > On Wed, Jul 01, 2020 at 03:53:08PM -0600, Rob Herring wrote:
> > > On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load
> > > and a store exclusive or PAR_EL1 read can cause a deadlock.
> > >
> > > The workaround requires a DMB SY before and after a PAR_EL1 register read
> > > and the disabling of KVM. KVM must be disabled to prevent the problematic
> > > sequence in guests' EL1. This workaround also depends on a firmware
> > > counterpart to enable the h/w to insert DMB SY after load and store
> > > exclusive instructions. See the errata document SDEN-1152370 v10 [1] for
> > > more information.
> >
> > This ^^ is out of date not that we're not disabling KVM.
> 
> Indeed, I fixed the kconfig text, but missed this.
> 
> > > All the other PAR_EL1 reads besides the one in
> > > is_spurious_el1_translation_fault() are in KVM code, so the work-around is
> > > not needed for them.
> >
> > And I think this now needs some extra work.
> 
> Ugg, that was too easy...
> 
> The KVM code in __translate_far_to_hpfar() has:
> 
> read PAR
> read PAR
> write PAR
> 
> I'm wondering if we need 2 dmbs or 4 here. I'm checking on that.

Also worth checking what happens if the PAR access is executed
speculatively, as in that case we probably can't guarantee that the DMB
instructions are executed at all...

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] arm64: Add workaround for Arm Cortex-A77 erratum 1508412

2020-07-02 Thread Rob Herring
On Thu, Jul 2, 2020 at 5:42 AM Will Deacon  wrote:
>
> On Wed, Jul 01, 2020 at 03:53:08PM -0600, Rob Herring wrote:
> > On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load
> > and a store exclusive or PAR_EL1 read can cause a deadlock.
> >
> > The workaround requires a DMB SY before and after a PAR_EL1 register read
> > and the disabling of KVM. KVM must be disabled to prevent the problematic
> > sequence in guests' EL1. This workaround also depends on a firmware
> > counterpart to enable the h/w to insert DMB SY after load and store
> > exclusive instructions. See the errata document SDEN-1152370 v10 [1] for
> > more information.
>
> This ^^ is out of date not that we're not disabling KVM.

Indeed, I fixed the kconfig text, but missed this.

> > All the other PAR_EL1 reads besides the one in
> > is_spurious_el1_translation_fault() are in KVM code, so the work-around is
> > not needed for them.
>
> And I think this now needs some extra work.

Ugg, that was too easy...

The KVM code in __translate_far_to_hpfar() has:

read PAR
read PAR
write PAR

I'm wondering if we need 2 dmbs or 4 here. I'm checking on that.

Rob
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/3] KVM: arm64: Print warning when cpu erratum can cause guests to deadlock

2020-07-02 Thread Will Deacon
On Thu, Jul 02, 2020 at 08:04:43AM -0600, Rob Herring wrote:
> On Thu, Jul 2, 2020 at 5:45 AM Will Deacon  wrote:
> >
> > On Wed, Jul 01, 2020 at 03:53:06PM -0600, Rob Herring wrote:
> > > If guests don't have certain CPU erratum work-arounds implemented, then
> > > there is a possibility a guest can deadlock the system. IOW, only trusted
> > > guests should be used on systems with the erratum.
> > >
> > > This is the case for Cortex-A57 erratum 832075.
> > >
> > > Cc: Marc Zyngier 
> > > Cc: James Morse 
> > > Cc: Julien Thierry 
> > > Cc: Suzuki K Poulose 
> > > Cc: Catalin Marinas 
> > > Cc: Will Deacon 
> > > Cc: kvmarm@lists.cs.columbia.edu
> > > Signed-off-by: Rob Herring 
> > > ---
> > >  arch/arm64/kvm/arm.c | 4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 90cb90561446..e2f50fa4d825 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -1653,6 +1653,10 @@ int kvm_arch_init(void *opaque)
> > >   return -ENODEV;
> > >   }
> > >
> > > + if (cpus_have_const_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE))
> > > + kvm_info("Guests without required CPU erratum work-arounds 
> > > can deadlock system!\n" \
> >
> > work-arounds => workarounds
> >
> > (mainly for consistency, I have no clue if this is a real word or not!).
> >
> > I'd also probably do erratum => errata given that you're about to add a
> > second one.
> 
> Humm, the plural is on workarounds. If I use a more standard singular
> vs. plural form like "CPU feature workarounds" vs "CPU features
> workarounds", the former seems more correct to me. (working around
> features may be a bit nonsensical, but big.LITTLE ;) )

Ok, "erratum" it is then!

One other thing on the actual workaround... Case B in the document says:

  | In Program Order,
  | 1. The core executes any load with device memory attributes.
  | 2. The core executes a store-exclusive or register read of PAR_EL1.

and the patch register sequence says:

  | Use the following write sequence to several IMPLEMENTATION DEFINED
  | registers to have the hardware insert a DMB SY after all load-exclusive and
  | store-exclusive instructions.

but that would mean that the sequence is effectively:

LDR Xa, [device address]
STXRWb, Xc, [Xd]
DMB SY

Does that really prevent the deadlock?

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/3] KVM: arm64: Print warning when cpu erratum can cause guests to deadlock

2020-07-02 Thread Rob Herring
On Thu, Jul 2, 2020 at 5:45 AM Will Deacon  wrote:
>
> On Wed, Jul 01, 2020 at 03:53:06PM -0600, Rob Herring wrote:
> > If guests don't have certain CPU erratum work-arounds implemented, then
> > there is a possibility a guest can deadlock the system. IOW, only trusted
> > guests should be used on systems with the erratum.
> >
> > This is the case for Cortex-A57 erratum 832075.
> >
> > Cc: Marc Zyngier 
> > Cc: James Morse 
> > Cc: Julien Thierry 
> > Cc: Suzuki K Poulose 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: kvmarm@lists.cs.columbia.edu
> > Signed-off-by: Rob Herring 
> > ---
> >  arch/arm64/kvm/arm.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 90cb90561446..e2f50fa4d825 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1653,6 +1653,10 @@ int kvm_arch_init(void *opaque)
> >   return -ENODEV;
> >   }
> >
> > + if (cpus_have_const_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE))
> > + kvm_info("Guests without required CPU erratum work-arounds 
> > can deadlock system!\n" \
>
> work-arounds => workarounds
>
> (mainly for consistency, I have no clue if this is a real word or not!).
>
> I'd also probably do erratum => errata given that you're about to add a
> second one.

Humm, the plural is on workarounds. If I use a more standard singular
vs. plural form like "CPU feature workarounds" vs "CPU features
workarounds", the former seems more correct to me. (working around
features may be a bit nonsensical, but big.LITTLE ;) )

Rob
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 1/8] KVM: arm64: Set DBM bit for writable PTEs

2020-07-02 Thread Keqian Zhu
DBM bit is used by MMU to differentiate a genuinely non-writable
page from a page that is only temporarily non-writable in order
to mark dirty.

Signed-off-by: Keqian Zhu 
Signed-off-by: Peng Liang 
---
 arch/arm64/include/asm/kvm_mmu.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index b12bfc1f051a..2700442b0f75 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -169,6 +169,10 @@ void kvm_clear_hyp_idmap(void);
 static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
 {
pte_val(pte) |= PTE_S2_RDWR;
+
+   if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
+   pte_val(pte) |= PTE_DBM;
+
return pte;
 }
 
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 4/8] KVM: arm64: Save stage2 PTE dirty status if it is covered

2020-07-02 Thread Keqian Zhu
kvm_set_pte is called to replace a target PTE with a desired one.
We always do this without changing the desired one, but if dirty
status set by hardware is covered, let caller know it.

There are two types of operations will change PTE and may cover
dirty status set by hardware.

1. Stage2 PTE unmapping: Page table merging (revert of huge page
table dissolving), kvm_unmap_hva_range() and so on.

2. Stage2 PTE changing: including user_mem_abort(), kvm_mmu_notifier
_change_pte() and so on.

All operations above will invoke kvm_set_pte() finally. We should
save the dirty status into memslot bitmap.

Signed-off-by: Keqian Zhu 
Signed-off-by: Peng Liang 
---
 arch/arm64/include/asm/kvm_mmu.h |  5 
 arch/arm64/kvm/mmu.c | 42 
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index a1b6131d980c..adb5c8edb29e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -295,6 +295,11 @@ static inline bool kvm_mmu_hw_dbm_enabled(struct kvm *kvm)
return arm_mmu_hw_dbm_supported() && !!(kvm->arch.vtcr & VTCR_EL2_HD);
 }
 
+static inline bool kvm_s2pte_dbm(pte_t *ptep)
+{
+   return !!(READ_ONCE(pte_val(*ptep)) & PTE_DBM);
+}
+
 #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
 
 #ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ab8a6ceecbd8..d0c34549ef3b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -194,10 +194,26 @@ static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t 
*pmd, phys_addr_t addr
put_page(virt_to_page(pmd));
 }
 
-static inline void kvm_set_pte(pte_t *ptep, pte_t new_pte)
+/**
+ * @ret: true if dirty status set by hardware is covered.
+ */
+static inline bool kvm_set_pte(pte_t *ptep, pte_t new_pte)
 {
-   WRITE_ONCE(*ptep, new_pte);
-   dsb(ishst);
+   pteval_t old_pteval;
+   bool old_logging, new_no_write;
+
+   old_logging = IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
+   arm_mmu_hw_dbm_supported() && kvm_s2pte_dbm(ptep);
+   new_no_write = pte_none(new_pte) || kvm_s2pte_readonly(&new_pte);
+
+   if (!old_logging || !new_no_write) {
+   WRITE_ONCE(*ptep, new_pte);
+   dsb(ishst);
+   return false;
+   }
+
+   old_pteval = xchg(&pte_val(*ptep), pte_val(new_pte));
+   return !kvm_s2pte_readonly(&__pte(old_pteval));
 }
 
 static inline void kvm_set_pmd(pmd_t *pmdp, pmd_t new_pmd)
@@ -260,15 +276,23 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 {
phys_addr_t start_addr = addr;
pte_t *pte, *start_pte;
+   bool dirty_covered;
+   int idx;
 
start_pte = pte = pte_offset_kernel(pmd, addr);
do {
if (!pte_none(*pte)) {
pte_t old_pte = *pte;
 
-   kvm_set_pte(pte, __pte(0));
+   dirty_covered = kvm_set_pte(pte, __pte(0));
kvm_tlb_flush_vmid_ipa(kvm, addr);
 
+   if (dirty_covered) {
+   idx = srcu_read_lock(&kvm->srcu);
+   mark_page_dirty(kvm, addr >> PAGE_SHIFT);
+   srcu_read_unlock(&kvm->srcu, idx);
+   }
+
/* No need to invalidate the cache for device mappings 
*/
if (!kvm_is_device_pfn(pte_pfn(old_pte)))
kvm_flush_dcache_pte(old_pte);
@@ -1354,6 +1378,8 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
pte_t *pte, old_pte;
bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP;
bool logging_active = flags & KVM_S2_FLAG_LOGGING_ACTIVE;
+   bool dirty_covered;
+   int idx;
 
VM_BUG_ON(logging_active && !cache);
 
@@ -1419,8 +1445,14 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
if (pte_val(old_pte) == pte_val(*new_pte))
return 0;
 
-   kvm_set_pte(pte, __pte(0));
+   dirty_covered = kvm_set_pte(pte, __pte(0));
kvm_tlb_flush_vmid_ipa(kvm, addr);
+
+   if (dirty_covered) {
+   idx = srcu_read_lock(&kvm->srcu);
+   mark_page_dirty(kvm, addr >> PAGE_SHIFT);
+   srcu_read_unlock(&kvm->srcu, idx);
+   }
} else {
get_page(virt_to_page(pte));
}
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM

2020-07-02 Thread Keqian Zhu
This patch series add support for dirty log based on HW DBM.

It works well under some migration test cases, including VM with 4K
pages or 2M THP. I checked the SHA256 hash digest of all memory and
they keep same for source VM and destination VM, which means no dirty
pages is missed under hardware DBM.

Some key points:

1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs
   are not involved for now.

2. About *performance*: In RFC patch, I have mentioned that for every 64GB
   memory, KVM consumes about 40ms to scan all PTEs to collect dirty log.
   This patch solves this problem through two ways: HW/SW dynamic switch
   and Multi-core offload.

   HW/SW dynamic switch: Give userspace right to enable/disable hw dirty
   log. This adds a new KVM cap named KVM_CAP_ARM_HW_DIRTY_LOG. We can
   achieve this by change the kvm->arch.vtcr value and kick vCPUs out to
   reload this value to VCTR_EL2. Then userspace can enable hw dirty log
   at the begining and disable it when dirty pages is little and about to
   stop VM, so VM downtime is not affected.

   Multi-core offload: Offload the PT scanning workload to multi-core can
   greatly reduce scanning time. To promise we can complete in time, I use
   smp_call_fuction to realize this policy, which utilize IPI to dispatch
   workload to other CPUs. Under 128U Kunpeng 920 platform, it just takes
   about 5ms to scan PTs of 256 RAM (use mempress and almost all PTs have
   been established). And We dispatch workload iterately (every CPU just
   scan PTs of 512M RAM for each iteration), so it won't affect physical
   CPUs seriously.

3. About correctness: Only add DBM bit when PTE is already writable, so
   we still have readonly PTE and some mechanisms which rely on readonly
   PTs are not broken.

4. About PTs modification races: There are two kinds of PTs modification.
   
   The first is adding or clearing specific bit, such as AF or RW. All
   these operations have been converted to be atomic, avoid covering
   dirty status set by hardware.
   
   The second is replacement, such as PTEs unmapping or changement. All
   these operations will invoke kvm_set_pte finally. kvm_set_pte have
   been converted to be atomic and we save the dirty status to underlying
   bitmap if dirty status is coverred.

Change log:

v2:
 - Address Steven's comments.
 - Add support of parallel dirty log sync.
 - Simplify and merge patches of v1.

v1:
 - Address Catalin's comments.

Keqian Zhu (8):
  KVM: arm64: Set DBM bit for writable PTEs
  KVM: arm64: Scan PTEs to sync dirty log
  KVM: arm64: Modify stage2 young mechanism to support hw DBM
  KVM: arm64: Save stage2 PTE dirty status if it is covered
  KVM: arm64: Steply write protect page table by mask bit
  KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability
  KVM: arm64: Sync dirty log parallel
  KVM: Omit dirty log sync in log clear if initially all set

 arch/arm64/include/asm/kvm_host.h |   5 +
 arch/arm64/include/asm/kvm_mmu.h  |  43 -
 arch/arm64/kvm/arm.c  |  45 -
 arch/arm64/kvm/mmu.c  | 307 --
 arch/arm64/kvm/reset.c|   5 +
 include/uapi/linux/kvm.h  |   1 +
 tools/include/uapi/linux/kvm.h|   1 +
 virt/kvm/kvm_main.c   |   3 +-
 8 files changed, 389 insertions(+), 21 deletions(-)

-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 3/8] KVM: arm64: Modify stage2 young mechanism to support hw DBM

2020-07-02 Thread Keqian Zhu
Marking PTs young (set AF bit) should be atomic to avoid cover
dirty status set by hardware.

Signed-off-by: Keqian Zhu 
Signed-off-by: Peng Liang 
---
 arch/arm64/include/asm/kvm_mmu.h | 31 ++-
 arch/arm64/kvm/mmu.c | 15 ---
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 4c12b7ad8ae8..a1b6131d980c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -219,6 +219,18 @@ static inline void kvm_set_s2pte_readonly(pte_t *ptep)
} while (pteval != old_pteval);
 }
 
+static inline void kvm_set_s2pte_young(pte_t *ptep)
+{
+   pteval_t old_pteval, pteval;
+
+   pteval = READ_ONCE(pte_val(*ptep));
+   do {
+   old_pteval = pteval;
+   pteval |= PTE_AF;
+   pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval);
+   } while (pteval != old_pteval);
+}
+
 static inline bool kvm_s2pte_readonly(pte_t *ptep)
 {
return (READ_ONCE(pte_val(*ptep)) & PTE_S2_RDWR) == PTE_S2_RDONLY;
@@ -234,6 +246,11 @@ static inline void kvm_set_s2pmd_readonly(pmd_t *pmdp)
kvm_set_s2pte_readonly((pte_t *)pmdp);
 }
 
+static inline void kvm_set_s2pmd_young(pmd_t *pmdp)
+{
+   kvm_set_s2pte_young((pte_t *)pmdp);
+}
+
 static inline bool kvm_s2pmd_readonly(pmd_t *pmdp)
 {
return kvm_s2pte_readonly((pte_t *)pmdp);
@@ -249,6 +266,11 @@ static inline void kvm_set_s2pud_readonly(pud_t *pudp)
kvm_set_s2pte_readonly((pte_t *)pudp);
 }
 
+static inline void kvm_set_s2pud_young(pud_t *pudp)
+{
+   kvm_set_s2pte_young((pte_t *)pudp);
+}
+
 static inline bool kvm_s2pud_readonly(pud_t *pudp)
 {
return kvm_s2pte_readonly((pte_t *)pudp);
@@ -259,15 +281,6 @@ static inline bool kvm_s2pud_exec(pud_t *pudp)
return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
 }
 
-static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
-{
-   return pud_mkyoung(pud);
-}
-
-static inline bool kvm_s2pud_young(pud_t pud)
-{
-   return pud_young(pud);
-}
 
 static inline bool arm_mmu_hw_dbm_supported(void)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b3cb8b6da4c2..ab8a6ceecbd8 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2008,8 +2008,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
  * Resolve the access fault by making the page young again.
  * Note that because the faulting entry is guaranteed not to be
  * cached in the TLB, we don't need to invalidate anything.
- * Only the HW Access Flag updates are supported for Stage 2 (no DBM),
- * so there is no need for atomic (pte|pmd)_mkyoung operations.
+ *
+ * Note: Both DBM and HW AF updates are supported for Stage2, so
+ * young operations should be atomic.
  */
 static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 {
@@ -2027,15 +2028,15 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa)
goto out;
 
if (pud) {  /* HugeTLB */
-   *pud = kvm_s2pud_mkyoung(*pud);
+   kvm_set_s2pud_young(pud);
pfn = kvm_pud_pfn(*pud);
pfn_valid = true;
} else  if (pmd) {  /* THP, HugeTLB */
-   *pmd = pmd_mkyoung(*pmd);
+   kvm_set_s2pmd_young(pmd);
pfn = pmd_pfn(*pmd);
pfn_valid = true;
-   } else {
-   *pte = pte_mkyoung(*pte);   /* Just a page... */
+   } else {/* Just a page... */
+   kvm_set_s2pte_young(pte);
pfn = pte_pfn(*pte);
pfn_valid = true;
}
@@ -2280,7 +2281,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, 
gpa_t gpa, u64 size, void *
return 0;
 
if (pud)
-   return kvm_s2pud_young(*pud);
+   return pud_young(*pud);
else if (pmd)
return pmd_young(*pmd);
else
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/8] KVM: arm64: Scan PTEs to sync dirty log

2020-07-02 Thread Keqian Zhu
For hardware management of dirty state, dirty state is stored in
PTEs. We have to scan all PTEs to sync dirty log to memslot dirty
bitmap.

Signed-off-by: Keqian Zhu 
Signed-off-by: Peng Liang 
---
 arch/arm64/include/asm/kvm_host.h |   1 +
 arch/arm64/include/asm/kvm_mmu.h  |  13 +++
 arch/arm64/kvm/arm.c  |   3 +-
 arch/arm64/kvm/mmu.c  | 142 ++
 4 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index c3e6fcc664b1..86b9c210ba43 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -480,6 +480,7 @@ u64 __kvm_call_hyp(void *hypfn, ...);
 
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
 
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
int exception_index);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 2700442b0f75..4c12b7ad8ae8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -269,6 +269,19 @@ static inline bool kvm_s2pud_young(pud_t pud)
return pud_young(pud);
 }
 
+static inline bool arm_mmu_hw_dbm_supported(void)
+{
+   u8 hadbs = (read_sysreg(id_aa64mmfr1_el1) >>
+   ID_AA64MMFR1_HADBS_SHIFT) & 0xf;
+
+   return hadbs == 0x2;
+}
+
+static inline bool kvm_mmu_hw_dbm_enabled(struct kvm *kvm)
+{
+   return arm_mmu_hw_dbm_supported() && !!(kvm->arch.vtcr & VTCR_EL2_HD);
+}
+
 #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
 
 #ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90cb90561446..fefa5406e037 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1199,7 +1199,8 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
-
+   if (IS_ENABLED(CONFIG_ARM66_HW_AFDBM) && kvm_mmu_hw_dbm_enabled(kvm))
+   kvm_mmu_sync_dirty_log(kvm, memslot);
 }
 
 void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8c0035cab6b6..b3cb8b6da4c2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2411,6 +2411,148 @@ int kvm_mmu_init(void)
return err;
 }
 
+#ifdef CONFIG_ARM64_HW_AFDBM
+/**
+ * stage2_sync_dirty_log_ptes() - synchronize dirty log from PMD range
+ * @kvm:   The KVM pointer
+ * @pmd:   pointer to pmd entry
+ * @addr:  range start address
+ * @end:   range end address
+ */
+static void stage2_sync_dirty_log_ptes(struct kvm *kvm, pmd_t *pmd,
+  phys_addr_t addr, phys_addr_t end)
+{
+   pte_t *pte;
+
+   pte = pte_offset_kernel(pmd, addr);
+   do {
+   if (!pte_none(*pte) && !kvm_s2pte_readonly(pte))
+   mark_page_dirty(kvm, addr >> PAGE_SHIFT);
+   } while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_sync_dirty_log_pmds() - synchronize dirty log from PUD range
+ * @kvm:   The KVM pointer
+ * @pud:   pointer to pud entry
+ * @addr:  range start address
+ * @end:   range end address
+ */
+static void stage2_sync_dirty_log_pmds(struct kvm *kvm, pud_t *pud,
+  phys_addr_t addr, phys_addr_t end)
+{
+   pmd_t *pmd;
+   phys_addr_t next;
+
+   pmd = stage2_pmd_offset(kvm, pud, addr);
+   do {
+   next = stage2_pmd_addr_end(kvm, addr, end);
+   if (!pmd_none(*pmd) && !pmd_thp_or_huge(*pmd))
+   stage2_sync_dirty_log_ptes(kvm, pmd, addr, next);
+   } while (pmd++, addr = next, addr != end);
+}
+
+/**
+ * stage2_sync_dirty_log_puds() - synchronize dirty log from P4D range
+ * @kvm:   The KVM pointer
+ * @pgd:   pointer to pgd entry
+ * @addr:  range start address
+ * @end:   range end address
+ */
+static void stage2_sync_dirty_log_puds(struct kvm *kvm, p4d_t *p4d,
+  phys_addr_t addr, phys_addr_t end)
+{
+   pud_t *pud;
+   phys_addr_t next;
+
+   pud = stage2_pud_offset(kvm, p4d, addr);
+   do {
+   next = stage2_pud_addr_end(kvm, addr, end);
+   if (!stage2_pud_none(kvm, *pud) && !stage2_pud_huge(kvm, *pud))
+   stage2_sync_dirty_log_pmds(kvm, pud, addr, next);
+   } while (pud++, addr = next, addr != end);
+}
+
+/**
+ * stage2_sync_dirty_log_p4ds() - synchronize dirty log from PGD range
+ * @kvm:   The KVM pointer
+ * @pgd:   pointer to pgd entry
+ * @addr:  range start address
+ * @end:   range end address
+ */
+static void stage2_sync_dirty_log_p4ds(struct kvm *kvm, pgd_t *pgd,
+  phys_addr_t addr, phys_addr_t end)
+{
+   p4d_t *p4d;
+   phys

[PATCH v2 6/8] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability

2020-07-02 Thread Keqian Zhu
For that using arm64 DBM to log dirty pages has the side effect
of long time dirty log sync, we should give userspace opportunity
to enable or disable this feature, to realize some policy.

This feature is disabled by default.

Signed-off-by: Keqian Zhu 
Signed-off-by: Peng Liang 
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c  | 35 +++
 arch/arm64/kvm/mmu.c  | 22 +++
 arch/arm64/kvm/reset.c|  5 +
 include/uapi/linux/kvm.h  |  1 +
 tools/include/uapi/linux/kvm.h|  1 +
 6 files changed, 65 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 86b9c210ba43..69a5317c7049 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -481,6 +481,7 @@ u64 __kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
+void kvm_mmu_sync_dirty_log_all(struct kvm *kvm);
 
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
int exception_index);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fefa5406e037..9e3f765d5467 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -78,6 +78,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
 {
int r;
+#ifdef CONFIG_ARM64_HW_AFDBM
+   int i;
+   struct kvm_vcpu *vcpu;
+   bool enable_hw_dirty_log;
+#endif
 
if (cap->flags)
return -EINVAL;
@@ -87,6 +92,36 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = 0;
kvm->arch.return_nisv_io_abort_to_user = true;
break;
+#ifdef CONFIG_ARM64_HW_AFDBM
+   case KVM_CAP_ARM_HW_DIRTY_LOG:
+   if (!arm_mmu_hw_dbm_supported() || (cap->args[0] & ~0x1))
+   r = -EINVAL;
+
+   enable_hw_dirty_log = !!(cap->args[0] & 0x1);
+   if (!!(kvm->arch.vtcr & VTCR_EL2_HD) != enable_hw_dirty_log) {
+   if (enable_hw_dirty_log)
+   kvm->arch.vtcr |= VTCR_EL2_HD;
+   else
+   kvm->arch.vtcr &= ~VTCR_EL2_HD;
+
+   /*
+* We should kick vcpus out of guest mode here to
+* load new vtcr value to vtcr_el2 register when
+* re-enter guest mode.
+*/
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_vcpu_kick(vcpu);
+
+   if (!enable_hw_dirty_log) {
+   mutex_lock(&kvm->slots_lock);
+   kvm_mmu_sync_dirty_log_all(kvm);
+   mutex_unlock(&kvm->slots_lock);
+   }
+   }
+
+   r = 0;
+   break;
+#endif
default:
r = -EINVAL;
break;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index adfa62f1fced..1a48554accb0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2590,6 +2590,28 @@ void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct 
kvm_memory_slot *memslot)
spin_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, srcu_idx);
 }
+
+/**
+ * kvm_mmu_sync_dirty_log_all() - synchronize dirty log from PTEs for whole VM
+ * @kvm:   The KVM pointer
+ *
+ * Called with kvm->slots_lock mutex acquired
+ */
+void kvm_mmu_sync_dirty_log_all(struct kvm *kvm)
+{
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+   struct kvm_memory_slot *memslots = slots->memslots;
+   struct kvm_memory_slot *memslot;
+   int slot;
+
+   if (unlikely(!slots->used_slots))
+   return;
+
+   for (slot = 0; slot < slots->used_slots; slot++) {
+   memslot = &memslots[slot];
+   kvm_mmu_sync_dirty_log(kvm, memslot);
+   }
+}
 #endif /* CONFIG_ARM64_HW_AFDBM */
 
 void kvm_arch_commit_memory_region(struct kvm *kvm,
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index d3b209023727..a3be703dd54b 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -83,6 +83,11 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
r = has_vhe() && system_supports_address_auth() &&
 system_supports_generic_auth();
break;
+#ifdef CONFIG_ARM64_HW_AFDBM
+   case KVM_CAP_ARM_HW_DIRTY_LOG:
+   r = arm_mmu_hw_dbm_supported();
+   break;
+#endif /* CONFIG_ARM64_HW_AFDBM */
default:
r = 0;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4fdf30316582..e0b12c43397b 100644
--- a/include/uapi/linux/kvm.h

[PATCH v2 5/8] KVM: arm64: Steply write protect page table by mask bit

2020-07-02 Thread Keqian Zhu
During dirty log clear, page table entries are write protected
according to a mask. In the past we write protect all entries
corresponding to the mask from ffs to fls. Though there may be
zero bits between this range, we are holding the kvm mmu lock
so we won't write protect entries that we don't want to.

We are about to add support for hardware management of dirty state
to arm64, holding kvm mmu lock will be not enough. We should write
protect entries steply by mask bit.

Signed-off-by: Keqian Zhu 
Signed-off-by: Peng Liang 
---
 arch/arm64/kvm/mmu.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d0c34549ef3b..adfa62f1fced 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1703,10 +1703,16 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm 
*kvm,
gfn_t gfn_offset, unsigned long mask)
 {
phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
-   phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
-   phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+   phys_addr_t start, end;
+   u32 i;
 
-   stage2_wp_range(kvm, start, end);
+   for (i = __ffs(mask); i <= __fls(mask); i++) {
+   if (test_bit_le(i, &mask)) {
+   start = (base_gfn + i) << PAGE_SHIFT;
+   end = (base_gfn + i + 1) << PAGE_SHIFT;
+   stage2_wp_range(kvm, start, end);
+   }
+   }
 }
 
 /*
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 8/8] KVM: Omit dirty log sync in log clear if initially all set

2020-07-02 Thread Keqian Zhu
Synchronizing dirty log during log clear is useful only when the dirty
bitmap of userspace contains dirty bits that memslot dirty bitmap does
not contain, because we can sync new dirty bits to memslot dirty bitmap,
then we can clear them by the way and avoid reporting them to userspace
later.

With dirty bitmap "initially all set" feature, the above situation will
not appear if userspace logic is normal, so we can omit dirty log sync in
log clear. This is valuable when dirty log sync is a high-cost operation,
such as arm64 DBM.

Signed-off-by: Keqian Zhu 
---
 virt/kvm/kvm_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a852af5c3214..3f9e51d52b7a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1554,7 +1554,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
(log->num_pages < memslot->npages - log->first_page && 
(log->num_pages & 63)))
return -EINVAL;
 
-   kvm_arch_sync_dirty_log(kvm, memslot);
+   if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
+   kvm_arch_sync_dirty_log(kvm, memslot);
 
flush = false;
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 7/8] KVM: arm64: Sync dirty log parallel

2020-07-02 Thread Keqian Zhu
Give userspace another selection to solve high-cost dirty log
sync, which called multi-core offload. Usersapce can enable
this policy through KVM_CAP_ARM_HW_DIRTY_LOG.

Signed-off-by: Keqian Zhu 
Signed-off-by: Peng Liang 
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/kvm/arm.c  |  9 +++-
 arch/arm64/kvm/mmu.c  | 82 +--
 arch/arm64/kvm/reset.c|  2 +-
 4 files changed, 90 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 69a5317c7049..05da819f9adc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -95,6 +95,9 @@ struct kvm_arch {
 * supported.
 */
bool return_nisv_io_abort_to_user;
+
+   /* Sync dirty log parallel when hw dirty log enabled */
+   bool sync_dirty_log_parallel;
 };
 
 #define KVM_NR_MEM_OBJS 40
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9e3f765d5467..89614984831d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -82,6 +82,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
int i;
struct kvm_vcpu *vcpu;
bool enable_hw_dirty_log;
+   bool enable_sync_parallel;
 #endif
 
if (cap->flags)
@@ -94,10 +95,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
 #ifdef CONFIG_ARM64_HW_AFDBM
case KVM_CAP_ARM_HW_DIRTY_LOG:
-   if (!arm_mmu_hw_dbm_supported() || (cap->args[0] & ~0x1))
+   if (!arm_mmu_hw_dbm_supported() || (cap->args[0] & ~0x3))
r = -EINVAL;
 
enable_hw_dirty_log = !!(cap->args[0] & 0x1);
+   enable_sync_parallel = !!(cap->args[0] & 0x2);
+   if (!enable_hw_dirty_log && enable_sync_parallel)
+   r = -EINVAL;
+
if (!!(kvm->arch.vtcr & VTCR_EL2_HD) != enable_hw_dirty_log) {
if (enable_hw_dirty_log)
kvm->arch.vtcr |= VTCR_EL2_HD;
@@ -119,6 +124,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
}
 
+   kvm->arch.sync_dirty_log_parallel = enable_sync_parallel;
+
r = 0;
break;
 #endif
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1a48554accb0..be360e0fd20b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2548,15 +2548,55 @@ static void stage2_sync_dirty_log_range(struct kvm 
*kvm, phys_addr_t addr,
 
pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
do {
-   cond_resched_lock(&kvm->mmu_lock);
-   if (!READ_ONCE(kvm->arch.pgd))
-   break;
+   if (!kvm->arch.sync_dirty_log_parallel) {
+   cond_resched_lock(&kvm->mmu_lock);
+   if (!READ_ONCE(kvm->arch.pgd))
+   break;
+   }
next = stage2_pgd_addr_end(kvm, addr, end);
if (stage2_pgd_present(kvm, *pgd))
stage2_sync_dirty_log_p4ds(kvm, pgd, addr, next);
} while (pgd++, addr = next, addr != end);
 }
 
+static struct dirty_sync_task {
+   struct kvm *kvm;
+   struct kvm_memory_slot *memslot;
+   u32 cpu_cnt;
+   u16 cpu_idx_map[NR_CPUS];
+   u32 ite_npages;
+   u32 ite;
+   bool finished;
+} sync_task;
+
+static void stage2_sync_dirty_log_smp(void *task)
+{
+   struct dirty_sync_task *t = task;
+   struct kvm_memory_slot *memslot = t->memslot;
+   unsigned long ite_idx, base_page, end_page;
+   gfn_t base_gfn;
+
+   ite_idx = t->cpu_cnt * t->ite + t->cpu_idx_map[smp_processor_id()];
+
+   base_page = ite_idx * t->ite_npages;
+   if (base_page >= memslot->npages) {
+   t->finished = true;
+   trace_printk("stage2_sync_dirty_log_smp finished 1.\n");
+   return;
+   }
+
+   end_page = min(memslot->npages, base_page + t->ite_npages);
+   if (end_page == memslot->npages) {
+   t->finished = true;
+   trace_printk("stage2_sync_dirty_log_smp finished 2.\n");
+   }
+
+   base_gfn = memslot->base_gfn;
+   trace_printk("base_page 0x%lx, end_page 0x%lx.\n", base_page, end_page);
+   stage2_sync_dirty_log_range(t->kvm, (base_gfn + base_page) << 
PAGE_SHIFT,
+   (base_gfn + end_page) << PAGE_SHIFT);
+}
+
 /**
  * kvm_mmu_sync_dirty_log() - synchronize dirty log from stage2 PTEs for
  * memory slot
@@ -2577,18 +2617,52 @@ void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct 
kvm_memory_slot *memslot)
 {
phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+   u32 ite_npages, cpu_cnt, this_cpu, cpu;
+   u16 cpu_idx;
int srcu_idx;
 
if (WARN_ON_ONCE(!memslot->dirty_bit

Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Auger Eric
Hi Marc,

On 7/2/20 3:08 PM, Marc Zyngier wrote:
> Hi Eric,
> 
> On 2020-07-02 13:57, Auger Eric wrote:
>> Hi Jingyi,
>>
>> On 7/2/20 5:01 AM, Jingyi Wang wrote:
>>> If gicv4.1(sgi hardware injection) supported, we test ipi injection
>>> via hw/sw way separately.
>>>
>>> Signed-off-by: Jingyi Wang 
>>> ---
>>>  arm/micro-bench.c    | 45 +++-
>>>  lib/arm/asm/gic-v3.h |  3 +++
>>>  lib/arm/asm/gic.h    |  1 +
>>>  3 files changed, 44 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
>>> index fc4d356..80d8db3 100644
>>> --- a/arm/micro-bench.c
>>> +++ b/arm/micro-bench.c
>>> @@ -91,9 +91,40 @@ static void gic_prep_common(void)
>>>  assert(irq_ready);
>>>  }
>>>
>>> -static void ipi_prep(void)
>>> +static bool ipi_prep(void)
>> Any reason why the bool returned value is preferred over the standard
>> int?
>>>  {
>>> +    u32 val;
>>> +
>>> +    val = readl(vgic_dist_base + GICD_CTLR);
>>> +    if (readl(vgic_dist_base + GICD_TYPER2) & GICD_TYPER2_nASSGIcap) {
>>> +    val &= ~GICD_CTLR_ENABLE_G1A;
>>> +    val &= ~GICD_CTLR_nASSGIreq;
>>> +    writel(val, vgic_dist_base + GICD_CTLR);
>>> +    val |= GICD_CTLR_ENABLE_G1A;
>>> +    writel(val, vgic_dist_base + GICD_CTLR);
>> Why do we need this G1A dance?
> 
> Because it isn't legal to change the SGI behaviour when groups are enabled.
> Yes, it is described in this bit of documentation nobody has access to.

OK thank you for the explanation. Jingyi, maybe add a comment to avoid
the question again ;-)

Thanks

Eric
> 
> And this code needs to track RWP on disabling Group-1.
> 
>     M.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 8/8] arm64: microbench: Add vtimer latency test

2020-07-02 Thread Auger Eric
Hi Jingyi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:
> Trigger PPIs by setting up a 10msec timer and test the latency.

so for each iteration the accumulated valued is 10 ms + latency, right?
and what is printed at the end does include the accumulated periods.
Wouldn't it make sense to have a test->post() that substract this value.
You would need to store the actual number of iterations.

Thanks

Eric
> 
> Signed-off-by: Jingyi Wang 
> ---
>  arm/micro-bench.c | 56 ++-
>  1 file changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
> index 4c962b7..6822084 100644
> --- a/arm/micro-bench.c
> +++ b/arm/micro-bench.c
> @@ -23,8 +23,13 @@
>  #include 
>  
>  #define NTIMES (1U << 16)
> +#define NTIMES_MINOR (1U << 8)
>  #define MAX_NS (5 * 1000 * 1000 * 1000UL)
>  
> +#define IRQ_VTIMER   27
> +#define ARCH_TIMER_CTL_ENABLE(1 << 0)
> +#define ARCH_TIMER_CTL_IMASK (1 << 1)
> +
>  static u32 cntfrq;
>  
>  static volatile bool irq_ready, irq_received;
> @@ -33,9 +38,16 @@ static void (*write_eoir)(u32 irqstat);
>  
>  static void gic_irq_handler(struct pt_regs *regs)
>  {
> + u32 irqstat = gic_read_iar();
>   irq_ready = false;
>   irq_received = true;
> - gic_write_eoir(gic_read_iar());
> + gic_write_eoir(irqstat);
> +
> + if (irqstat == IRQ_VTIMER) {
> + write_sysreg((ARCH_TIMER_CTL_IMASK | ARCH_TIMER_CTL_ENABLE),
> +  cntv_ctl_el0);
> + isb();
> + }
>   irq_ready = true;
>  }
>  
> @@ -189,6 +201,47 @@ static void lpi_exec(void)
>   assert_msg(irq_received, "failed to receive LPI in time, but received 
> %d successfully\n", received);
>  }
>  
> +static bool timer_prep(void)
> +{
> + static void *gic_isenabler;
> +
> + gic_enable_defaults();
> + install_irq_handler(EL1H_IRQ, gic_irq_handler);
> + local_irq_enable();
> +
> + gic_isenabler = gicv3_sgi_base() + GICR_ISENABLER0;
> + writel(1 << 27, gic_isenabler);
> + write_sysreg(ARCH_TIMER_CTL_ENABLE, cntv_ctl_el0);
> + isb();
> +
> + gic_prep_common();
> + return true;
> +}
> +
> +static void timer_exec(void)
> +{
> + u64 before_timer;
> + u64 timer_10ms;
> + unsigned tries = 1 << 28;
> + static int received = 0;
> +
> + irq_received = false;
> +
> + before_timer = read_sysreg(cntvct_el0);
> + timer_10ms = cntfrq / 100;
> + write_sysreg(before_timer + timer_10ms, cntv_cval_el0);
> + write_sysreg(ARCH_TIMER_CTL_ENABLE, cntv_ctl_el0);
> + isb();
> +
> + while (!irq_received && tries--)
> + cpu_relax();
> +
> + if (irq_received)
> + ++received;
> +
> + assert_msg(irq_received, "failed to receive PPI in time, but received 
> %d successfully\n", received);
> +}
> +
>  static void hvc_exec(void)
>  {
>   asm volatile("mov w0, #0x4b00; hvc #0" ::: "w0");
> @@ -236,6 +289,7 @@ static struct exit_test tests[] = {
>   {"ipi", ipi_prep,   ipi_exec,   NTIMES, 
> true},
>   {"ipi_hw",  ipi_hw_prep,ipi_exec,   NTIMES, 
> true},
>   {"lpi", lpi_prep,   lpi_exec,   NTIMES, 
> true},
> + {"timer_10ms",  timer_prep, timer_exec, 
> NTIMES_MINOR,   true},
>  };
>  
>  struct ns_time {
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 7/8] arm64: microbench: Add time limit for each individual test

2020-07-02 Thread Auger Eric
Hi Jingyi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:
> Besides using separate running times parameter, we add time limit
> for loop_test to make sure each test should be done in a certain
> time(5 sec here).
> 
> Signed-off-by: Jingyi Wang 
> ---
>  arm/micro-bench.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
> index 506d2f9..4c962b7 100644
> --- a/arm/micro-bench.c
> +++ b/arm/micro-bench.c
> @@ -23,6 +23,7 @@
>  #include 
>  
>  #define NTIMES (1U << 16)
> +#define MAX_NS (5 * 1000 * 1000 * 1000UL)
>  
>  static u32 cntfrq;
>  
> @@ -258,22 +259,26 @@ static void loop_test(struct exit_test *test)
>   uint64_t start, end, total_ticks, ntimes = 0;
>   struct ns_time total_ns, avg_ns;
>  
> + total_ticks = 0;
>   if (test->prep) {
>   if(!test->prep()) {
>   printf("%s test skipped\n", test->name);
>   return;
>   }
>   }
> - isb();
> - start = read_sysreg(cntpct_el0);
> - while (ntimes < test->times) {
> +
> + while (ntimes < test->times && total_ns.ns < MAX_NS) {
> + isb();
> + start = read_sysreg(cntpct_el0);
>   test->exec();
> + isb();
> + end = read_sysreg(cntpct_el0);
> +
>   ntimes++;
> + total_ticks += (end - start);
> + ticks_to_ns_time(total_ticks, &total_ns);
>   }
you don't need the
ticks_to_ns_time(total_ticks, &total_ns);

after the loop
> - isb();
> - end = read_sysreg(cntpct_el0);
>  
> - total_ticks = end - start;
>   ticks_to_ns_time(total_ticks, &total_ns);
>   avg_ns.ns = total_ns.ns / ntimes;
>   avg_ns.ns_frac = total_ns.ns_frac / ntimes;
> 

Besides
Reviewed-by: Eric Auger 

Thanks

Eric

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 6/8] arm64: microbench: Allow each test to specify its running times

2020-07-02 Thread Auger Eric
Hi Jingyi,

On 7/2/20 7:29 AM, Andrew Jones wrote:
> On Thu, Jul 02, 2020 at 11:01:30AM +0800, Jingyi Wang wrote:
>> For some test in micro-bench can be time consuming, we add a
>> micro-bench test parameter to allow each individual test to specify
>> its running times.
>>
>> Signed-off-by: Jingyi Wang 
Reviewed-by: Eric Auger 

Eric
>> ---
>>  arm/micro-bench.c | 25 ++---
>>  1 file changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
>> index aeb60a7..506d2f9 100644
>> --- a/arm/micro-bench.c
>> +++ b/arm/micro-bench.c
>> @@ -223,17 +223,18 @@ struct exit_test {
>>  const char *name;
>>  bool (*prep)(void);
>>  void (*exec)(void);
>> +u32 times;
>>  bool run;
>>  };
>>  
>>  static struct exit_test tests[] = {
>> -{"hvc", NULL,   hvc_exec,   true},
>> -{"mmio_read_user",  NULL,   mmio_read_user_exec,true},
>> -{"mmio_read_vgic",  NULL,   mmio_read_vgic_exec,true},
>> -{"eoi", NULL,   eoi_exec,   true},
>> -{"ipi", ipi_prep,   ipi_exec,   true},
>> -{"ipi_hw",  ipi_hw_prep,ipi_exec,   true},
>> -{"lpi", lpi_prep,   lpi_exec,   true},
>> +{"hvc", NULL,   hvc_exec,   NTIMES, 
>> true},
>> +{"mmio_read_user",  NULL,   mmio_read_user_exec,NTIMES, 
>> true},
>> +{"mmio_read_vgic",  NULL,   mmio_read_vgic_exec,NTIMES, 
>> true},
>> +{"eoi", NULL,   eoi_exec,   NTIMES, 
>> true},
>> +{"ipi", ipi_prep,   ipi_exec,   NTIMES, 
>> true},
>> +{"ipi_hw",  ipi_hw_prep,ipi_exec,   NTIMES, 
>> true},
>> +{"lpi", lpi_prep,   lpi_exec,   NTIMES, 
>> true},
> 
> Now that we no longer use 'NTIMES' in functions we don't really need the
> define at all. We can just put 65536 directly into the table here for
> each test that needs 65536 times.
> 
> Thanks,
> drew
> 
>>  };
>>  
>>  struct ns_time {
>> @@ -254,7 +255,7 @@ static void ticks_to_ns_time(uint64_t ticks, struct 
>> ns_time *ns_time)
>>  
>>  static void loop_test(struct exit_test *test)
>>  {
>> -uint64_t start, end, total_ticks, ntimes = NTIMES;
>> +uint64_t start, end, total_ticks, ntimes = 0;
>>  struct ns_time total_ns, avg_ns;
>>  
>>  if (test->prep) {
>> @@ -265,15 +266,17 @@ static void loop_test(struct exit_test *test)
>>  }
>>  isb();
>>  start = read_sysreg(cntpct_el0);
>> -while (ntimes--)
>> +while (ntimes < test->times) {
>>  test->exec();
>> +ntimes++;
>> +}
>>  isb();
>>  end = read_sysreg(cntpct_el0);
>>  
>>  total_ticks = end - start;
>>  ticks_to_ns_time(total_ticks, &total_ns);
>> -avg_ns.ns = total_ns.ns / NTIMES;
>> -avg_ns.ns_frac = total_ns.ns_frac / NTIMES;
>> +avg_ns.ns = total_ns.ns / ntimes;
>> +avg_ns.ns_frac = total_ns.ns_frac / ntimes;
>>  
>>  printf("%-30s%15" PRId64 ".%-15" PRId64 "%15" PRId64 ".%-15" PRId64 
>> "\n",
>>  test->name, total_ns.ns, total_ns.ns_frac, avg_ns.ns, 
>> avg_ns.ns_frac);
>> -- 
>> 2.19.1
>>
>>

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 5/8] arm64: microbench: its: Add LPI latency test

2020-07-02 Thread Auger Eric
Hi Jingyi,
On 7/2/20 5:01 AM, Jingyi Wang wrote:
> Triggers LPIs through the INT command and test the latency.
> Mostly inherited form commit 0ef02cd6cbaa(arm/arm64: ITS: INT
> functional tests).
> 
> Signed-off-by: Jingyi Wang 
> ---
>  arm/micro-bench.c | 44 
>  1 file changed, 44 insertions(+)
> 
> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
> index 80d8db3..aeb60a7 100644
> --- a/arm/micro-bench.c
> +++ b/arm/micro-bench.c
> @@ -20,6 +20,7 @@
>   */
>  #include 
>  #include 
> +#include 
>  
>  #define NTIMES (1U << 16)
>  
> @@ -145,6 +146,48 @@ static void ipi_exec(void)
>   assert_msg(irq_received, "failed to receive IPI in time, but received 
> %d successfully\n", received);
>  }
>  
> +static bool lpi_prep(void)
> +{
> + struct its_collection *col1;
> + struct its_device *dev2;
> +
> + if (!gicv3_its_base())
> + return false;
> +
> + its_enable_defaults();
> + dev2 = its_create_device(2 /* dev id */, 8 /* nb_ites */);
> + col1 = its_create_collection(1 /* col id */, 1 /* target PE */);
> + gicv3_lpi_set_config(8199, LPI_PROP_DEFAULT);
> +
> + its_send_mapd_nv(dev2, true);
> + its_send_mapc_nv(col1, true);
> + its_send_invall_nv(col1);
> + its_send_mapti_nv(dev2, 8199 /* lpi id */, 20 /* event id */, col1);
> +
> + gic_prep_common();
> + return true;
> +}
> +
> +static void lpi_exec(void)
> +{
> + struct its_device *dev2;
> + unsigned tries = 1 << 28;
> + static int received = 0;
> +
> + irq_received = false;
> +
> + dev2 = its_get_device(2);
> + its_send_int_nv(dev2, 20);
> +
> + while (!irq_received && tries--)
> + cpu_relax();
> +
> + if (irq_received)
> + ++received;
> +
> + assert_msg(irq_received, "failed to receive LPI in time, but received 
> %d successfully\n", received);
> +}
> +
>  static void hvc_exec(void)
>  {
>   asm volatile("mov w0, #0x4b00; hvc #0" ::: "w0");
> @@ -190,6 +233,7 @@ static struct exit_test tests[] = {
>   {"eoi", NULL,   eoi_exec,   true},
>   {"ipi", ipi_prep,   ipi_exec,   true},
>   {"ipi_hw",  ipi_hw_prep,ipi_exec,   true},
> + {"lpi", lpi_prep,   lpi_exec,   true},
>  };
>  
>  struct ns_time {
> 
Looks good to me (w/wo the lpi_prep returned value change)

Reviewed-by: Eric Auger 

Thanks

Eric


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Marc Zyngier

Hi Eric,

On 2020-07-02 13:57, Auger Eric wrote:

Hi Jingyi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:

If gicv4.1(sgi hardware injection) supported, we test ipi injection
via hw/sw way separately.

Signed-off-by: Jingyi Wang 
---
 arm/micro-bench.c| 45 
+++-

 lib/arm/asm/gic-v3.h |  3 +++
 lib/arm/asm/gic.h|  1 +
 3 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index fc4d356..80d8db3 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -91,9 +91,40 @@ static void gic_prep_common(void)
assert(irq_ready);
 }

-static void ipi_prep(void)
+static bool ipi_prep(void)
Any reason why the bool returned value is preferred over the standard 
int?

 {
+   u32 val;
+
+   val = readl(vgic_dist_base + GICD_CTLR);
+   if (readl(vgic_dist_base + GICD_TYPER2) & GICD_TYPER2_nASSGIcap) {
+   val &= ~GICD_CTLR_ENABLE_G1A;
+   val &= ~GICD_CTLR_nASSGIreq;
+   writel(val, vgic_dist_base + GICD_CTLR);
+   val |= GICD_CTLR_ENABLE_G1A;
+   writel(val, vgic_dist_base + GICD_CTLR);

Why do we need this G1A dance?


Because it isn't legal to change the SGI behaviour when groups are 
enabled.

Yes, it is described in this bit of documentation nobody has access to.

And this code needs to track RWP on disabling Group-1.

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Marc Zyngier

Hi Eric,

On 2020-07-02 13:36, Auger Eric wrote:

Hi Marc,

On 7/2/20 10:22 AM, Marc Zyngier wrote:

On 2020-07-02 04:01, Jingyi Wang wrote:

If gicv4.1(sgi hardware injection) supported, we test ipi injection
via hw/sw way separately.


nit: active-less SGIs are not strictly a feature of GICv4.1 (you could
imagine a GIC emulation offering the same thing). Furthermore, GICv4.1
isn't as such visible to the guest itself (it only sees a GICv3).


By the way, I have just downloaded the latest GIC spec from the ARM
portal and I still do not find the GICD_CTLR_ENABLE_G1A,
GICD_CTLR_nASSGIreq and GICD_TYPER2_nASSGIcap. Do I miss something?


The latest spec still is the old one. There is a *confidential* erratum
to the spec that adds the missing bits, but nothing public.

You unfortunately will have to take my word for it.

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 4/8] arm64: its: Handle its command queue wrapping

2020-07-02 Thread Auger Eric
Hi Jingyi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:
> Because micro-bench may send a large number of ITS commands, we
> should handle ITS command queue wrapping as kernel instead of just
> failing the test.
> 
> Signed-off-by: Jingyi Wang 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  lib/arm64/gic-v3-its-cmd.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/arm64/gic-v3-its-cmd.c b/lib/arm64/gic-v3-its-cmd.c
> index 2c208d1..34574f7 100644
> --- a/lib/arm64/gic-v3-its-cmd.c
> +++ b/lib/arm64/gic-v3-its-cmd.c
> @@ -164,8 +164,9 @@ static struct its_cmd_block *its_allocate_entry(void)
>  {
>   struct its_cmd_block *cmd;
>  
> - assert((u64)its_data.cmd_write < (u64)its_data.cmd_base + SZ_64K);
>   cmd = its_data.cmd_write++;
> + if ((u64)its_data.cmd_write  == (u64)its_data.cmd_base + SZ_64K)
> + its_data.cmd_write = its_data.cmd_base;
>   return cmd;
>  }
>  
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Auger Eric
Hi Jingyi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:
> If gicv4.1(sgi hardware injection) supported, we test ipi injection
> via hw/sw way separately.
> 
> Signed-off-by: Jingyi Wang 
> ---
>  arm/micro-bench.c| 45 +++-
>  lib/arm/asm/gic-v3.h |  3 +++
>  lib/arm/asm/gic.h|  1 +
>  3 files changed, 44 insertions(+), 5 deletions(-)
> 
> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
> index fc4d356..80d8db3 100644
> --- a/arm/micro-bench.c
> +++ b/arm/micro-bench.c
> @@ -91,9 +91,40 @@ static void gic_prep_common(void)
>   assert(irq_ready);
>  }
>  
> -static void ipi_prep(void)
> +static bool ipi_prep(void)
Any reason why the bool returned value is preferred over the standard int?
>  {
> + u32 val;
> +
> + val = readl(vgic_dist_base + GICD_CTLR);
> + if (readl(vgic_dist_base + GICD_TYPER2) & GICD_TYPER2_nASSGIcap) {
> + val &= ~GICD_CTLR_ENABLE_G1A;
> + val &= ~GICD_CTLR_nASSGIreq;
> + writel(val, vgic_dist_base + GICD_CTLR);
> + val |= GICD_CTLR_ENABLE_G1A;
> + writel(val, vgic_dist_base + GICD_CTLR);
Why do we need this G1A dance?
> + }
> +
>   gic_prep_common();
> + return true;
> +}
> +
> +static bool ipi_hw_prep(void)
> +{
> + u32 val;
> +
> + val = readl(vgic_dist_base + GICD_CTLR);
> + if (readl(vgic_dist_base + GICD_TYPER2) & GICD_TYPER2_nASSGIcap) {
> + val &= ~GICD_CTLR_ENABLE_G1A;
> + val |= GICD_CTLR_nASSGIreq;
> + writel(val, vgic_dist_base + GICD_CTLR);
> + val |= GICD_CTLR_ENABLE_G1A;
> + writel(val, vgic_dist_base + GICD_CTLR);
> + } else {
> + return false;
> + }
> +
> + gic_prep_common();
> + return true;
>  }
>  
>  static void ipi_exec(void)
> @@ -147,7 +178,7 @@ static void eoi_exec(void)
>  
>  struct exit_test {
>   const char *name;
> - void (*prep)(void);
> + bool (*prep)(void);
>   void (*exec)(void);
>   bool run;
>  };
> @@ -158,6 +189,7 @@ static struct exit_test tests[] = {
>   {"mmio_read_vgic",  NULL,   mmio_read_vgic_exec,true},
>   {"eoi", NULL,   eoi_exec,   true},
>   {"ipi", ipi_prep,   ipi_exec,   true},
> + {"ipi_hw",  ipi_hw_prep,ipi_exec,   true},
>  };
>  
>  struct ns_time {
> @@ -181,9 +213,12 @@ static void loop_test(struct exit_test *test)
>   uint64_t start, end, total_ticks, ntimes = NTIMES;
>   struct ns_time total_ns, avg_ns;
>  
> - if (test->prep)
> - test->prep();
> -
> + if (test->prep) {
> + if(!test->prep()) {
> + printf("%s test skipped\n", test->name);
> + return;
> + }
> + }
>   isb();
>   start = read_sysreg(cntpct_el0);
>   while (ntimes--)
> diff --git a/lib/arm/asm/gic-v3.h b/lib/arm/asm/gic-v3.h
> index cb72922..b4ce130 100644
> --- a/lib/arm/asm/gic-v3.h
> +++ b/lib/arm/asm/gic-v3.h
> @@ -20,10 +20,13 @@
>   */
>  #define GICD_CTLR0x
>  #define GICD_CTLR_RWP(1U << 31)
> +#define GICD_CTLR_nASSGIreq  (1U << 8)
>  #define GICD_CTLR_ARE_NS (1U << 4)
>  #define GICD_CTLR_ENABLE_G1A (1U << 1)
>  #define GICD_CTLR_ENABLE_G1  (1U << 0)
>  
> +#define GICD_TYPER2_nASSGIcap(1U << 8)
> +
>  /* Re-Distributor registers, offsets from RD_base */
>  #define GICR_TYPER   0x0008
>  
> diff --git a/lib/arm/asm/gic.h b/lib/arm/asm/gic.h
> index 38e79b2..1898400 100644
> --- a/lib/arm/asm/gic.h
> +++ b/lib/arm/asm/gic.h
> @@ -13,6 +13,7 @@
>  #define GICD_CTLR0x
>  #define GICD_TYPER   0x0004
>  #define GICD_IIDR0x0008
> +#define GICD_TYPER2  0x000C
>  #define GICD_IGROUPR 0x0080
>  #define GICD_ISENABLER   0x0100
>  #define GICD_ICENABLER   0x0180
> 

Thanks

Eric

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Auger Eric
Hi Marc,

On 7/2/20 10:22 AM, Marc Zyngier wrote:
> On 2020-07-02 04:01, Jingyi Wang wrote:
>> If gicv4.1(sgi hardware injection) supported, we test ipi injection
>> via hw/sw way separately.
> 
> nit: active-less SGIs are not strictly a feature of GICv4.1 (you could
> imagine a GIC emulation offering the same thing). Furthermore, GICv4.1
> isn't as such visible to the guest itself (it only sees a GICv3).

By the way, I have just downloaded the latest GIC spec from the ARM
portal and I still do not find the GICD_CTLR_ENABLE_G1A,
GICD_CTLR_nASSGIreq and GICD_TYPER2_nASSGIcap. Do I miss something?

Thanks

Eric


> 
> Thanks,
> 
>     M.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 1/8] arm64: microbench: get correct ipi received num

2020-07-02 Thread Auger Eric
Hi Jingyi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:
> If ipi_exec() fails because of timeout, we shouldn't increase
> the number of ipi received.
> 
> Signed-off-by: Jingyi Wang 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  arm/micro-bench.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
> index 4612f41..794dfac 100644
> --- a/arm/micro-bench.c
> +++ b/arm/micro-bench.c
> @@ -103,7 +103,9 @@ static void ipi_exec(void)
>   while (!ipi_received && tries--)
>   cpu_relax();
>  
> - ++received;
> + if (ipi_received)
> + ++received;
> +
>   assert_msg(ipi_received, "failed to receive IPI in time, but received 
> %d successfully\n", received);
>  }
>  
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 2/8] arm64: microbench: Use the funcions for ipi test as the general functions for gic(ipi/lpi/timer) test

2020-07-02 Thread Auger Eric
Hi,

On 7/2/20 5:01 AM, Jingyi Wang wrote:
> The following patches will use that.
> 
> Signed-off-by: Jingyi Wang 
> ---
>  arm/micro-bench.c | 39 ++-
>  1 file changed, 22 insertions(+), 17 deletions(-)
> 
With commit message suggested by Drew,

Reviewed-by: Eric Auger 

Eric
> diff --git a/arm/micro-bench.c b/arm/micro-bench.c
> index 794dfac..fc4d356 100644
> --- a/arm/micro-bench.c
> +++ b/arm/micro-bench.c
> @@ -25,24 +25,24 @@
>  
>  static u32 cntfrq;
>  
> -static volatile bool ipi_ready, ipi_received;
> +static volatile bool irq_ready, irq_received;
>  static void *vgic_dist_base;
>  static void (*write_eoir)(u32 irqstat);
>  
> -static void ipi_irq_handler(struct pt_regs *regs)
> +static void gic_irq_handler(struct pt_regs *regs)
>  {
> - ipi_ready = false;
> - ipi_received = true;
> + irq_ready = false;
> + irq_received = true;
>   gic_write_eoir(gic_read_iar());
> - ipi_ready = true;
> + irq_ready = true;
>  }
>  
> -static void ipi_secondary_entry(void *data)
> +static void gic_secondary_entry(void *data)
>  {
> - install_irq_handler(EL1H_IRQ, ipi_irq_handler);
> + install_irq_handler(EL1H_IRQ, gic_irq_handler);
>   gic_enable_defaults();
>   local_irq_enable();
> - ipi_ready = true;
> + irq_ready = true;
>   while (true)
>   cpu_relax();
>  }
> @@ -72,9 +72,9 @@ static bool test_init(void)
>   break;
>   }
>  
> - ipi_ready = false;
> + irq_ready = false;
>   gic_enable_defaults();
> - on_cpu_async(1, ipi_secondary_entry, NULL);
> + on_cpu_async(1, gic_secondary_entry, NULL);
>  
>   cntfrq = get_cntfrq();
>   printf("Timer Frequency %d Hz (Output in microseconds)\n", cntfrq);
> @@ -82,13 +82,18 @@ static bool test_init(void)
>   return true;
>  }
>  
> -static void ipi_prep(void)
> +static void gic_prep_common(void)
>  {
>   unsigned tries = 1 << 28;
>  
> - while (!ipi_ready && tries--)
> + while (!irq_ready && tries--)
>   cpu_relax();
> - assert(ipi_ready);
> + assert(irq_ready);
> +}
> +
> +static void ipi_prep(void)
> +{
> + gic_prep_common();
>  }
>  
>  static void ipi_exec(void)
> @@ -96,17 +101,17 @@ static void ipi_exec(void)
>   unsigned tries = 1 << 28;
>   static int received = 0;
>  
> - ipi_received = false;
> + irq_received = false;
>  
>   gic_ipi_send_single(1, 1);
>  
> - while (!ipi_received && tries--)
> + while (!irq_received && tries--)
>   cpu_relax();
>  
> - if (ipi_received)
> + if (irq_received)
>   ++received;
>  
> - assert_msg(ipi_received, "failed to receive IPI in time, but received 
> %d successfully\n", received);
> + assert_msg(irq_received, "failed to receive IPI in time, but received 
> %d successfully\n", received);
>  }
>  
>  static void hvc_exec(void)
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/3] KVM: arm64: Print warning when cpu erratum can cause guests to deadlock

2020-07-02 Thread Will Deacon
On Wed, Jul 01, 2020 at 03:53:06PM -0600, Rob Herring wrote:
> If guests don't have certain CPU erratum work-arounds implemented, then
> there is a possibility a guest can deadlock the system. IOW, only trusted
> guests should be used on systems with the erratum.
> 
> This is the case for Cortex-A57 erratum 832075.
> 
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: Julien Thierry 
> Cc: Suzuki K Poulose 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: kvmarm@lists.cs.columbia.edu
> Signed-off-by: Rob Herring 
> ---
>  arch/arm64/kvm/arm.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 90cb90561446..e2f50fa4d825 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1653,6 +1653,10 @@ int kvm_arch_init(void *opaque)
>   return -ENODEV;
>   }
>  
> + if (cpus_have_const_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE))
> + kvm_info("Guests without required CPU erratum work-arounds can 
> deadlock system!\n" \

work-arounds => workarounds

(mainly for consistency, I have no clue if this is a real word or not!).

I'd also probably do erratum => errata given that you're about to add a
second one.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/3] arm64: Add workaround for Arm Cortex-A77 erratum 1508412

2020-07-02 Thread Will Deacon
On Wed, Jul 01, 2020 at 03:53:08PM -0600, Rob Herring wrote:
> On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load
> and a store exclusive or PAR_EL1 read can cause a deadlock.
> 
> The workaround requires a DMB SY before and after a PAR_EL1 register read
> and the disabling of KVM. KVM must be disabled to prevent the problematic
> sequence in guests' EL1. This workaround also depends on a firmware
> counterpart to enable the h/w to insert DMB SY after load and store
> exclusive instructions. See the errata document SDEN-1152370 v10 [1] for
> more information.

This ^^ is out of date not that we're not disabling KVM.

> All the other PAR_EL1 reads besides the one in
> is_spurious_el1_translation_fault() are in KVM code, so the work-around is
> not needed for them.

And I think this now needs some extra work.

> [1] 
> https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf
> 
> Cc: Catalin Marinas 
> Cc: James Morse 
> Cc: Suzuki K Poulose 
> Cc: Will Deacon 
> Cc: Marc Zyngier 
> Cc: Julien Thierry 
> Cc: kvmarm@lists.cs.columbia.edu
> Signed-off-by: Rob Herring 
> ---
> v2:
> - Don't disable KVM, just print warning
> ---
>  Documentation/arm64/silicon-errata.rst |  2 ++
>  arch/arm64/Kconfig | 19 +++
>  arch/arm64/include/asm/cpucaps.h   |  3 ++-
>  arch/arm64/kernel/cpu_errata.c | 10 ++
>  arch/arm64/kvm/arm.c   |  3 ++-
>  arch/arm64/mm/fault.c  | 10 ++
>  6 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.rst 
> b/Documentation/arm64/silicon-errata.rst
> index 936cf2a59ca4..716b279e3b33 100644
> --- a/Documentation/arm64/silicon-errata.rst
> +++ b/Documentation/arm64/silicon-errata.rst
> @@ -90,6 +90,8 @@ stable kernels.
>  
> ++-+-+-+
>  | ARM| Cortex-A76  | #1463225| ARM64_ERRATUM_1463225 
>   |
>  
> ++-+-+-+
> +| ARM| Cortex-A77  | #1508412| ARM64_ERRATUM_1508412 
>   |
> +++-+-+-+
>  | ARM| Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 
>   |
>  
> ++-+-+-+
>  | ARM| Neoverse-N1 | #1349291| N/A   
>   |
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4a094bedcb2..28993ad4c649 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -626,6 +626,25 @@ config ARM64_ERRATUM_1542419
>  
> If unsure, say Y.
>  
> +config ARM64_ERRATUM_1508412
> + bool "Cortex-A77: 1508412: workaround deadlock on sequence of NC/Device 
> load and store exclusive or PAR read"
> + default y
> + help
> +   This option adds a workaround for Arm Cortex-A77 erratum 1508412.
> +
> +   Affected Cortex-A77 cores (r0p0, r1p0) could deadlock on a sequence
> +   of a store-exclusive or read of PAR_EL1 and a load with device or
> +   non-cacheable memory attributes. The workaround depends on a firmware
> +   counterpart.
> +
> +   KVM guests must also have the work-around implemented or they can

work-around => workaround

> +   deadlock the system.
> +
> +   Workaround the issue by inserting DMB SY barriers around PAR_EL1

Workaround => work around

> +   register reads and warning KVM users.
> +
> +   If unsure, say Y.
> +
>  config CAVIUM_ERRATUM_22375
>   bool "Cavium erratum 22375, 24313"
>   default y
> diff --git a/arch/arm64/include/asm/cpucaps.h 
> b/arch/arm64/include/asm/cpucaps.h
> index d7b3bb0cb180..2a2cdb4ced8b 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -62,7 +62,8 @@
>  #define ARM64_HAS_GENERIC_AUTH   52
>  #define ARM64_HAS_32BIT_EL1  53
>  #define ARM64_BTI54
> +#define ARM64_WORKAROUND_1508412 55
>  
> -#define ARM64_NCAPS  55
> +#define ARM64_NCAPS  56
>  
>  #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index ad06d6802d2e..5eee8a75540c 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -938,6 +938,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
>   .matches = has_neoverse_n1_erratum_1542419,
>   .cpu_enable = cpu_enable_trap_ctr_access,
>   },
> +#endif
> +#ifdef CONFIG_ARM64_ERRATUM_1508412
> + {
> + /* we depend on the firmware portion for correctness */
> + .desc = "ARM erratum 1508412 (kernel portion)",
> + .capability = ARM64_W

Re: [PATCH 03/12] KVM: arm64: Report hardware dirty status of stage2 PTE if coverred

2020-07-02 Thread zhukeqian
Hi Steven,

On 2020/7/1 19:28, Steven Price wrote:
> Hi,
> 
> On 16/06/2020 10:35, Keqian Zhu wrote:
>> kvm_set_pte is called to replace a target PTE with a desired one.
>> We always do this without changing the desired one, but if dirty
>> status set by hardware is coverred, let caller know it.
>>
>> Signed-off-by: Keqian Zhu 
>> ---
>>   arch/arm64/kvm/mmu.c | 36 +++-
>>   1 file changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 5ad87bce23c0..27407153121b 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -194,11 +194,45 @@ static void clear_stage2_pmd_entry(struct kvm *kvm, 
>> pmd_t *pmd, phys_addr_t addr
>>   put_page(virt_to_page(pmd));
>>   }
>>   -static inline void kvm_set_pte(pte_t *ptep, pte_t new_pte)
>> +#ifdef CONFIG_ARM64_HW_AFDBM
>> +/**
>> + * @ret: true if dirty status set by hardware is coverred.
> 
> NIT: s/coverred/covered/, this is in several places.
> 
OK.
>> + */
>> +static bool kvm_set_pte(pte_t *ptep, pte_t new_pte)
>> +{
>> +pteval_t old_pteval, new_pteval, pteval;
>> +bool old_logging, new_no_write;
>> +
>> +old_logging = kvm_hw_dbm_enabled() && !pte_none(*ptep) &&
>> +  kvm_s2pte_dbm(ptep);
>> +new_no_write = pte_none(new_pte) || kvm_s2pte_readonly(&new_pte);
>> +
>> +if (!old_logging || !new_no_write) {
>> +WRITE_ONCE(*ptep, new_pte);
>> +dsb(ishst);
>> +return false;
>> +}
>> +
>> +new_pteval = pte_val(new_pte);
>> +pteval = READ_ONCE(pte_val(*ptep));
> 
> This usage of *ptep looks wrong - it's read twice using READ_ONCE (once in 
> kvm_s2pte_dbm()) and once without any decoration (in the pte_none() call). 
> Which looks a bit dodgy and at the very least needs some justification. 
> AFAICT you would be better taking a local copy and using that rather than 
> reading from memory repeatedly.
> 
oh yes. Here we can use a local copy to get higher performance.

I am not sure here has problem about correctness. I *think* we should always 
acquire corresponding lock before manipulate PTs,
so there is no other agent will update PTs during our several PTs access 
(except MMU, but it just modifies AF and [S2]AP) .
But do I miss something?

>> +do {
>> +old_pteval = pteval;
>> +pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, new_pteval);
>> +} while (pteval != old_pteval);
> This look appears to be reinventing xchg_relaxed(). Any reason we can't just 
> use xchg_relaxed()? Also we had a dsb() after the WRITE_ONCE but you are 
> using the _relaxed variant here. What is the justification for not having a 
> barrier?
> 
Aha, I have changed the code for several times and it is equal to xchg_relaxed 
now, thanks.

I use _relaxd here because I am not clear about the reason that we use _relaxed 
in kvm_set_s2XXX_XXX and use dsb() in kvm_set_pte.
But I will correct this in patch v2.
>> +
>> +return !kvm_s2pte_readonly(&__pte(pteval));
>> +}
>> +#else
>> +/**
>> + * @ret: true if dirty status set by hardware is coverred.
>> + */
>> +static inline bool kvm_set_pte(pte_t *ptep, pte_t new_pte)
>>   {
>>   WRITE_ONCE(*ptep, new_pte);
>>   dsb(ishst);
>> +return false;
>>   }
>> +#endif /* CONFIG_ARM64_HW_AFDBM */
> 
> You might be able to avoid this #ifdef by redefining old_logging as:
> 
>   old_logging = IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && ...
> 
> I *think* the compiler should be able to kill the dead code and leave you 
> with just the above when the config symbol is off.
> 
After my test, you are right ;-) .


Thanks,
Keqian
> Steve
> 
>> static inline void kvm_set_pmd(pmd_t *pmdp, pmd_t new_pmd)
>>   {
>>
> 
> .
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Jingyi Wang

On 7/2/2020 5:17 PM, Marc Zyngier wrote:

On 2020-07-02 10:02, Jingyi Wang wrote:

Hi Marc,

On 7/2/2020 4:22 PM, Marc Zyngier wrote:

On 2020-07-02 04:01, Jingyi Wang wrote:

If gicv4.1(sgi hardware injection) supported, we test ipi injection
via hw/sw way separately.


nit: active-less SGIs are not strictly a feature of GICv4.1 (you could
imagine a GIC emulation offering the same thing). Furthermore, GICv4.1
isn't as such visible to the guest itself (it only sees a GICv3).

Thanks,

 M.


Yes, but to measure the performance difference of hw/sw SGI injection,
do you think it is acceptable to make it visible to guest and let it
switch SGI injection mode?


It is of course acceptable. I simply object to the "GICv4.1" description.

     M.


Okay, maybe description like "GICv4.1 supported kvm" is better.

Thanks,
Jingyi

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Marc Zyngier

On 2020-07-02 10:02, Jingyi Wang wrote:

Hi Marc,

On 7/2/2020 4:22 PM, Marc Zyngier wrote:

On 2020-07-02 04:01, Jingyi Wang wrote:

If gicv4.1(sgi hardware injection) supported, we test ipi injection
via hw/sw way separately.


nit: active-less SGIs are not strictly a feature of GICv4.1 (you could
imagine a GIC emulation offering the same thing). Furthermore, GICv4.1
isn't as such visible to the guest itself (it only sees a GICv3).

Thanks,

     M.


Yes, but to measure the performance difference of hw/sw SGI injection,
do you think it is acceptable to make it visible to guest and let it
switch SGI injection mode?


It is of course acceptable. I simply object to the "GICv4.1" 
description.


M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvm-unit-tests] arm/arm64: timer: Extract irqs at setup time

2020-07-02 Thread Auger Eric
Hi Drew,

On 7/2/20 10:12 AM, Andrew Jones wrote:
> The timer can be useful for other tests besides the timer test.
> Extract the DT parsing of the irqs out of the timer test into
> setup and provide them along with some defines in a new timer.h
> file.
> 
> Signed-off-by: Andrew Jones 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  arm/timer.c   | 26 --
>  lib/arm/asm/timer.h   | 31 +++
>  lib/arm/setup.c   | 42 ++
>  lib/arm64/asm/timer.h |  1 +
>  4 files changed, 78 insertions(+), 22 deletions(-)
>  create mode 100644 lib/arm/asm/timer.h
>  create mode 100644 lib/arm64/asm/timer.h
> 
> diff --git a/arm/timer.c b/arm/timer.c
> index 44621b4f2967..09e3f8f6bd7d 100644
> --- a/arm/timer.c
> +++ b/arm/timer.c
> @@ -8,15 +8,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  
> -#define ARCH_TIMER_CTL_ENABLE  (1 << 0)
> -#define ARCH_TIMER_CTL_IMASK   (1 << 1)
> -#define ARCH_TIMER_CTL_ISTATUS (1 << 2)
> -
>  static void *gic_isenabler;
>  static void *gic_icenabler;
>  
> @@ -108,7 +105,6 @@ static void write_ptimer_ctl(u64 val)
>  
>  struct timer_info {
>   u32 irq;
> - u32 irq_flags;
>   volatile bool irq_received;
>   u64 (*read_counter)(void);
>   u64 (*read_cval)(void);
> @@ -304,23 +300,9 @@ static void test_ptimer(void)
>  
>  static void test_init(void)
>  {
> - const struct fdt_property *prop;
> - const void *fdt = dt_fdt();
> - int node, len;
> - u32 *data;
> -
> - node = fdt_node_offset_by_compatible(fdt, -1, "arm,armv8-timer");
> - assert(node >= 0);
> - prop = fdt_get_property(fdt, node, "interrupts", &len);
> - assert(prop && len == (4 * 3 * sizeof(u32)));
> -
> - data = (u32 *)prop->data;
> - assert(fdt32_to_cpu(data[3]) == 1);
> - ptimer_info.irq = fdt32_to_cpu(data[4]);
> - ptimer_info.irq_flags = fdt32_to_cpu(data[5]);
> - assert(fdt32_to_cpu(data[6]) == 1);
> - vtimer_info.irq = fdt32_to_cpu(data[7]);
> - vtimer_info.irq_flags = fdt32_to_cpu(data[8]);
> + assert(TIMER_PTIMER_IRQ != -1 && TIMER_VTIMER_IRQ != -1);
> + ptimer_info.irq = TIMER_PTIMER_IRQ;
> + vtimer_info.irq = TIMER_VTIMER_IRQ;
>  
>   install_exception_handler(EL1H_SYNC, ESR_EL1_EC_UNKNOWN, 
> ptimer_unsupported_handler);
>   ptimer_info.read_ctl();
> diff --git a/lib/arm/asm/timer.h b/lib/arm/asm/timer.h
> new file mode 100644
> index ..f75cc67f3ac4
> --- /dev/null
> +++ b/lib/arm/asm/timer.h
> @@ -0,0 +1,31 @@
> +/*
> + * Copyright (C) 2020, Red Hat Inc, Andrew Jones 
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2.
> + */
> +#ifndef _ASMARM_TIMER_H_
> +#define _ASMARM_TIMER_H_
> +
> +#define ARCH_TIMER_CTL_ENABLE  (1 << 0)
> +#define ARCH_TIMER_CTL_IMASK   (1 << 1)
> +#define ARCH_TIMER_CTL_ISTATUS (1 << 2)
> +
> +#ifndef __ASSEMBLY__
> +
> +struct timer_state {
> + struct {
> + u32 irq;
> + u32 irq_flags;
> + } ptimer;
> + struct {
> + u32 irq;
> + u32 irq_flags;
> + } vtimer;
> +};
> +extern struct timer_state __timer_state;
> +
> +#define TIMER_PTIMER_IRQ (__timer_state.ptimer.irq)
> +#define TIMER_VTIMER_IRQ (__timer_state.vtimer.irq)
> +
> +#endif /* !__ASSEMBLY__ */
> +#endif /* _ASMARM_TIMER_H_ */
> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
> index 418b4e58a5f8..78562e47c01c 100644
> --- a/lib/arm/setup.c
> +++ b/lib/arm/setup.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "io.h"
>  
> @@ -29,6 +30,8 @@
>  
>  extern unsigned long stacktop;
>  
> +struct timer_state __timer_state;
> +
>  char *initrd;
>  u32 initrd_size;
>  
> @@ -156,6 +159,43 @@ static void mem_init(phys_addr_t freemem_start)
>   page_alloc_ops_enable();
>  }
>  
> +static void timer_save_state(void)
> +{
> + const struct fdt_property *prop;
> + const void *fdt = dt_fdt();
> + int node, len;
> + u32 *data;
> +
> + node = fdt_node_offset_by_compatible(fdt, -1, "arm,armv8-timer");
> + assert(node >= 0 || node == -FDT_ERR_NOTFOUND);
> +
> + if (node == -FDT_ERR_NOTFOUND) {
> + __timer_state.ptimer.irq = -1;
> + __timer_state.vtimer.irq = -1;
> + return;
> + }
> +
> + /*
> +  * From Linux devicetree timer binding documentation
> +  *
> +  * interrupts :
> +  *  secure timer irq
> +  *  non-secure timer irq(ptimer)
> +  *  virtual timer irq   (vtimer)
> +  *  hypervisor timer irq
> +  */
> + prop = fdt_get_property(fdt, node, "interrupts", &len);
> + assert(prop && len == (4 * 3 * sizeof(u32)));
> +
> + data = (u32 *)prop->data;
> + assert(fdt32_to_cpu(data[3]) == 1 /* PPI */);
> + __timer_state.ptimer.irq = fdt32_to_cpu(data[4]);
> + __timer_state.ptimer.irq_flags = fdt

Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Jingyi Wang

Hi Marc,

On 7/2/2020 4:22 PM, Marc Zyngier wrote:

On 2020-07-02 04:01, Jingyi Wang wrote:

If gicv4.1(sgi hardware injection) supported, we test ipi injection
via hw/sw way separately.


nit: active-less SGIs are not strictly a feature of GICv4.1 (you could
imagine a GIC emulation offering the same thing). Furthermore, GICv4.1
isn't as such visible to the guest itself (it only sees a GICv3).

Thanks,

     M.


Yes, but to measure the performance difference of hw/sw SGI injection,
do you think it is acceptable to make it visible to guest and let it
switch SGI injection mode?

Thanks,
Jingyi

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 8/8] arm64: microbench: Add vtimer latency test

2020-07-02 Thread Jingyi Wang

Hi Drew,

On 7/2/2020 1:44 PM, Andrew Jones wrote:

On Thu, Jul 02, 2020 at 11:01:32AM +0800, Jingyi Wang wrote:

Trigger PPIs by setting up a 10msec timer and test the latency.

Signed-off-by: Jingyi Wang 
---
  arm/micro-bench.c | 56 ++-
  1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index 4c962b7..6822084 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -23,8 +23,13 @@
  #include 
  
  #define NTIMES (1U << 16)

+#define NTIMES_MINOR (1U << 8)


As mentioned in the previous patch, no need for this define, just put the
number in the table.


  #define MAX_NS (5 * 1000 * 1000 * 1000UL)
  
+#define IRQ_VTIMER		27


As you can see in the timer test (arm/timer.c) we've been doing our best
not to hard code stuff like this. I'd prefer we don't start now. Actually,
since the timer irqs may come in handy for other tests I'll extract the
DT stuff from arm/timer.c and save those irqs at setup time. I'll send
a patch for that now, then this patch can use the new saved state.


+#define ARCH_TIMER_CTL_ENABLE  (1 << 0)
+#define ARCH_TIMER_CTL_IMASK   (1 << 1)


I'll put these defines somewhere common as well.



The common implement for timer irqs will be much helpful, I will
rebase this patch on that.


+
  static u32 cntfrq;
  
  static volatile bool irq_ready, irq_received;

@@ -33,9 +38,16 @@ static void (*write_eoir)(u32 irqstat);
  
  static void gic_irq_handler(struct pt_regs *regs)

  {
+   u32 irqstat = gic_read_iar();
irq_ready = false;
irq_received = true;
-   gic_write_eoir(gic_read_iar());
+   gic_write_eoir(irqstat);
+
+   if (irqstat == IRQ_VTIMER) {
+   write_sysreg((ARCH_TIMER_CTL_IMASK | ARCH_TIMER_CTL_ENABLE),
+cntv_ctl_el0);
+   isb();
+   }
irq_ready = true;
  }
  
@@ -189,6 +201,47 @@ static void lpi_exec(void)

assert_msg(irq_received, "failed to receive LPI in time, but received %d 
successfully\n", received);
  }
  
+static bool timer_prep(void)

+{
+   static void *gic_isenabler;
+
+   gic_enable_defaults();
+   install_irq_handler(EL1H_IRQ, gic_irq_handler);
+   local_irq_enable();
+
+   gic_isenabler = gicv3_sgi_base() + GICR_ISENABLER0;
+   writel(1 << 27, gic_isenabler);


You should have used your define here.



Done.


+   write_sysreg(ARCH_TIMER_CTL_ENABLE, cntv_ctl_el0);
+   isb();
+
+   gic_prep_common();
+   return true;
+}
+
+static void timer_exec(void)
+{
+   u64 before_timer;
+   u64 timer_10ms;
+   unsigned tries = 1 << 28;
+   static int received = 0;
+
+   irq_received = false;
+
+   before_timer = read_sysreg(cntvct_el0);
+   timer_10ms = cntfrq / 100;
+   write_sysreg(before_timer + timer_10ms, cntv_cval_el0);
+   write_sysreg(ARCH_TIMER_CTL_ENABLE, cntv_ctl_el0);
+   isb();
+
+   while (!irq_received && tries--)
+   cpu_relax();
+
+   if (irq_received)
+   ++received;
+
+   assert_msg(irq_received, "failed to receive PPI in time, but received %d 
successfully\n", received);
+}
+
  static void hvc_exec(void)
  {
asm volatile("mov w0, #0x4b00; hvc #0" ::: "w0");
@@ -236,6 +289,7 @@ static struct exit_test tests[] = {
{"ipi",   ipi_prep,   ipi_exec,   
NTIMES, true},
{"ipi_hw",ipi_hw_prep,ipi_exec,   NTIMES,   
  true},
{"lpi",   lpi_prep,   lpi_exec,   
NTIMES, true},
+   {"timer_10ms",timer_prep, timer_exec, 
NTIMES_MINOR,   true},
  };
  
  struct ns_time {

--
2.19.1




Thanks,
drew


.



Thanks,
Jingyi

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 7/8] arm64: microbench: Add time limit for each individual test

2020-07-02 Thread Jingyi Wang




On 7/2/2020 1:48 PM, Andrew Jones wrote:

On Thu, Jul 02, 2020 at 11:01:31AM +0800, Jingyi Wang wrote:

Besides using separate running times parameter, we add time limit
for loop_test to make sure each test should be done in a certain
time(5 sec here).

Signed-off-by: Jingyi Wang 
---
  arm/micro-bench.c | 17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index 506d2f9..4c962b7 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -23,6 +23,7 @@
  #include 
  
  #define NTIMES (1U << 16)

+#define MAX_NS (5 * 1000 * 1000 * 1000UL)


How about naming this something like "NS_5_SECONDS"?



Done, thanks for reviewing.

  
  static u32 cntfrq;
  
@@ -258,22 +259,26 @@ static void loop_test(struct exit_test *test)

uint64_t start, end, total_ticks, ntimes = 0;
struct ns_time total_ns, avg_ns;
  
+	total_ticks = 0;

if (test->prep) {
if(!test->prep()) {
printf("%s test skipped\n", test->name);
return;
}
}
-   isb();
-   start = read_sysreg(cntpct_el0);
-   while (ntimes < test->times) {
+
+   while (ntimes < test->times && total_ns.ns < MAX_NS) {
+   isb();
+   start = read_sysreg(cntpct_el0);
test->exec();
+   isb();
+   end = read_sysreg(cntpct_el0);
+
ntimes++;
+   total_ticks += (end - start);
+   ticks_to_ns_time(total_ticks, &total_ns);
}
-   isb();
-   end = read_sysreg(cntpct_el0);
  
-	total_ticks = end - start;

ticks_to_ns_time(total_ticks, &total_ns);
avg_ns.ns = total_ns.ns / ntimes;
avg_ns.ns_frac = total_ns.ns_frac / ntimes;
--
2.19.1




Thanks,
drew


.



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 6/8] arm64: microbench: Allow each test to specify its running times

2020-07-02 Thread Jingyi Wang




On 7/2/2020 1:29 PM, Andrew Jones wrote:

On Thu, Jul 02, 2020 at 11:01:30AM +0800, Jingyi Wang wrote:

For some test in micro-bench can be time consuming, we add a
micro-bench test parameter to allow each individual test to specify
its running times.

Signed-off-by: Jingyi Wang 
---
  arm/micro-bench.c | 25 ++---
  1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index aeb60a7..506d2f9 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -223,17 +223,18 @@ struct exit_test {
const char *name;
bool (*prep)(void);
void (*exec)(void);
+   u32 times;
bool run;
  };
  
  static struct exit_test tests[] = {

-   {"hvc",   NULL,   hvc_exec,   
true},
-   {"mmio_read_user",NULL,   mmio_read_user_exec,true},
-   {"mmio_read_vgic",NULL,   mmio_read_vgic_exec,true},
-   {"eoi",   NULL,   eoi_exec,   
true},
-   {"ipi",   ipi_prep,   ipi_exec,   
true},
-   {"ipi_hw",ipi_hw_prep,ipi_exec,   true},
-   {"lpi",   lpi_prep,   lpi_exec,   
true},
+   {"hvc",   NULL,   hvc_exec,   
NTIMES, true},
+   {"mmio_read_user",NULL,   mmio_read_user_exec,NTIMES,   
  true},
+   {"mmio_read_vgic",NULL,   mmio_read_vgic_exec,NTIMES,   
  true},
+   {"eoi",   NULL,   eoi_exec,   
NTIMES, true},
+   {"ipi",   ipi_prep,   ipi_exec,   
NTIMES, true},
+   {"ipi_hw",ipi_hw_prep,ipi_exec,   NTIMES,   
  true},
+   {"lpi",   lpi_prep,   lpi_exec,   
NTIMES, true},


Now that we no longer use 'NTIMES' in functions we don't really need the
define at all. We can just put 65536 directly into the table here for
each test that needs 65536 times.

Thanks,
drew



Done, thanks for reviewing.


  };
  
  struct ns_time {

@@ -254,7 +255,7 @@ static void ticks_to_ns_time(uint64_t ticks, struct ns_time 
*ns_time)
  
  static void loop_test(struct exit_test *test)

  {
-   uint64_t start, end, total_ticks, ntimes = NTIMES;
+   uint64_t start, end, total_ticks, ntimes = 0;
struct ns_time total_ns, avg_ns;
  
  	if (test->prep) {

@@ -265,15 +266,17 @@ static void loop_test(struct exit_test *test)
}
isb();
start = read_sysreg(cntpct_el0);
-   while (ntimes--)
+   while (ntimes < test->times) {
test->exec();
+   ntimes++;
+   }
isb();
end = read_sysreg(cntpct_el0);
  
  	total_ticks = end - start;

ticks_to_ns_time(total_ticks, &total_ns);
-   avg_ns.ns = total_ns.ns / NTIMES;
-   avg_ns.ns_frac = total_ns.ns_frac / NTIMES;
+   avg_ns.ns = total_ns.ns / ntimes;
+   avg_ns.ns_frac = total_ns.ns_frac / ntimes;
  
  	printf("%-30s%15" PRId64 ".%-15" PRId64 "%15" PRId64 ".%-15" PRId64 "\n",

test->name, total_ns.ns, total_ns.ns_frac, avg_ns.ns, 
avg_ns.ns_frac);
--
2.19.1





.



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 3/8] arm64: microbench: gic: Add gicv4.1 support for ipi latency test.

2020-07-02 Thread Marc Zyngier

On 2020-07-02 04:01, Jingyi Wang wrote:

If gicv4.1(sgi hardware injection) supported, we test ipi injection
via hw/sw way separately.


nit: active-less SGIs are not strictly a feature of GICv4.1 (you could
imagine a GIC emulation offering the same thing). Furthermore, GICv4.1
isn't as such visible to the guest itself (it only sees a GICv3).

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kvm-unit-tests PATCH v2 2/8] arm64: microbench: Use the funcions for ipi test as the general functions for gic(ipi/lpi/timer) test

2020-07-02 Thread Jingyi Wang

Hi Drew,

On 7/2/2020 1:25 PM, Andrew Jones wrote:


Hi Jingyi,

This patch has quite a long summary. How about instead of

  arm64: microbench: Use the funcions for ipi test as the general functions for 
gic(ipi/lpi/timer) test

we use

  arm64: microbench: Generalize ipi test names

and then in the commit message, instead of

  The following patches will use that.

we use

  Later patches will use these functions for gic(ipi/lpi/timer) tests.

Thanks,
drew



This looks more concise, thanks for reviewing

Thanks,
Jingyi


On Thu, Jul 02, 2020 at 11:01:26AM +0800, Jingyi Wang wrote:

The following patches will use that.

Signed-off-by: Jingyi Wang 
---
  arm/micro-bench.c | 39 ++-
  1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index 794dfac..fc4d356 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -25,24 +25,24 @@
  
  static u32 cntfrq;
  
-static volatile bool ipi_ready, ipi_received;

+static volatile bool irq_ready, irq_received;
  static void *vgic_dist_base;
  static void (*write_eoir)(u32 irqstat);
  
-static void ipi_irq_handler(struct pt_regs *regs)

+static void gic_irq_handler(struct pt_regs *regs)
  {
-   ipi_ready = false;
-   ipi_received = true;
+   irq_ready = false;
+   irq_received = true;
gic_write_eoir(gic_read_iar());
-   ipi_ready = true;
+   irq_ready = true;
  }
  
-static void ipi_secondary_entry(void *data)

+static void gic_secondary_entry(void *data)
  {
-   install_irq_handler(EL1H_IRQ, ipi_irq_handler);
+   install_irq_handler(EL1H_IRQ, gic_irq_handler);
gic_enable_defaults();
local_irq_enable();
-   ipi_ready = true;
+   irq_ready = true;
while (true)
cpu_relax();
  }
@@ -72,9 +72,9 @@ static bool test_init(void)
break;
}
  
-	ipi_ready = false;

+   irq_ready = false;
gic_enable_defaults();
-   on_cpu_async(1, ipi_secondary_entry, NULL);
+   on_cpu_async(1, gic_secondary_entry, NULL);
  
  	cntfrq = get_cntfrq();

printf("Timer Frequency %d Hz (Output in microseconds)\n", cntfrq);
@@ -82,13 +82,18 @@ static bool test_init(void)
return true;
  }
  
-static void ipi_prep(void)

+static void gic_prep_common(void)
  {
unsigned tries = 1 << 28;
  
-	while (!ipi_ready && tries--)

+   while (!irq_ready && tries--)
cpu_relax();
-   assert(ipi_ready);
+   assert(irq_ready);
+}
+
+static void ipi_prep(void)
+{
+   gic_prep_common();
  }
  
  static void ipi_exec(void)

@@ -96,17 +101,17 @@ static void ipi_exec(void)
unsigned tries = 1 << 28;
static int received = 0;
  
-	ipi_received = false;

+   irq_received = false;
  
  	gic_ipi_send_single(1, 1);
  
-	while (!ipi_received && tries--)

+   while (!irq_received && tries--)
cpu_relax();
  
-	if (ipi_received)

+   if (irq_received)
++received;
  
-	assert_msg(ipi_received, "failed to receive IPI in time, but received %d successfully\n", received);

+   assert_msg(irq_received, "failed to receive IPI in time, but received %d 
successfully\n", received);
  }
  
  static void hvc_exec(void)

--
2.19.1





.



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvm-unit-tests] arm/arm64: timer: Extract irqs at setup time

2020-07-02 Thread Andrew Jones
The timer can be useful for other tests besides the timer test.
Extract the DT parsing of the irqs out of the timer test into
setup and provide them along with some defines in a new timer.h
file.

Signed-off-by: Andrew Jones 
---
 arm/timer.c   | 26 --
 lib/arm/asm/timer.h   | 31 +++
 lib/arm/setup.c   | 42 ++
 lib/arm64/asm/timer.h |  1 +
 4 files changed, 78 insertions(+), 22 deletions(-)
 create mode 100644 lib/arm/asm/timer.h
 create mode 100644 lib/arm64/asm/timer.h

diff --git a/arm/timer.c b/arm/timer.c
index 44621b4f2967..09e3f8f6bd7d 100644
--- a/arm/timer.c
+++ b/arm/timer.c
@@ -8,15 +8,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
-#define ARCH_TIMER_CTL_ENABLE  (1 << 0)
-#define ARCH_TIMER_CTL_IMASK   (1 << 1)
-#define ARCH_TIMER_CTL_ISTATUS (1 << 2)
-
 static void *gic_isenabler;
 static void *gic_icenabler;
 
@@ -108,7 +105,6 @@ static void write_ptimer_ctl(u64 val)
 
 struct timer_info {
u32 irq;
-   u32 irq_flags;
volatile bool irq_received;
u64 (*read_counter)(void);
u64 (*read_cval)(void);
@@ -304,23 +300,9 @@ static void test_ptimer(void)
 
 static void test_init(void)
 {
-   const struct fdt_property *prop;
-   const void *fdt = dt_fdt();
-   int node, len;
-   u32 *data;
-
-   node = fdt_node_offset_by_compatible(fdt, -1, "arm,armv8-timer");
-   assert(node >= 0);
-   prop = fdt_get_property(fdt, node, "interrupts", &len);
-   assert(prop && len == (4 * 3 * sizeof(u32)));
-
-   data = (u32 *)prop->data;
-   assert(fdt32_to_cpu(data[3]) == 1);
-   ptimer_info.irq = fdt32_to_cpu(data[4]);
-   ptimer_info.irq_flags = fdt32_to_cpu(data[5]);
-   assert(fdt32_to_cpu(data[6]) == 1);
-   vtimer_info.irq = fdt32_to_cpu(data[7]);
-   vtimer_info.irq_flags = fdt32_to_cpu(data[8]);
+   assert(TIMER_PTIMER_IRQ != -1 && TIMER_VTIMER_IRQ != -1);
+   ptimer_info.irq = TIMER_PTIMER_IRQ;
+   vtimer_info.irq = TIMER_VTIMER_IRQ;
 
install_exception_handler(EL1H_SYNC, ESR_EL1_EC_UNKNOWN, 
ptimer_unsupported_handler);
ptimer_info.read_ctl();
diff --git a/lib/arm/asm/timer.h b/lib/arm/asm/timer.h
new file mode 100644
index ..f75cc67f3ac4
--- /dev/null
+++ b/lib/arm/asm/timer.h
@@ -0,0 +1,31 @@
+/*
+ * Copyright (C) 2020, Red Hat Inc, Andrew Jones 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#ifndef _ASMARM_TIMER_H_
+#define _ASMARM_TIMER_H_
+
+#define ARCH_TIMER_CTL_ENABLE  (1 << 0)
+#define ARCH_TIMER_CTL_IMASK   (1 << 1)
+#define ARCH_TIMER_CTL_ISTATUS (1 << 2)
+
+#ifndef __ASSEMBLY__
+
+struct timer_state {
+   struct {
+   u32 irq;
+   u32 irq_flags;
+   } ptimer;
+   struct {
+   u32 irq;
+   u32 irq_flags;
+   } vtimer;
+};
+extern struct timer_state __timer_state;
+
+#define TIMER_PTIMER_IRQ (__timer_state.ptimer.irq)
+#define TIMER_VTIMER_IRQ (__timer_state.vtimer.irq)
+
+#endif /* !__ASSEMBLY__ */
+#endif /* _ASMARM_TIMER_H_ */
diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index 418b4e58a5f8..78562e47c01c 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "io.h"
 
@@ -29,6 +30,8 @@
 
 extern unsigned long stacktop;
 
+struct timer_state __timer_state;
+
 char *initrd;
 u32 initrd_size;
 
@@ -156,6 +159,43 @@ static void mem_init(phys_addr_t freemem_start)
page_alloc_ops_enable();
 }
 
+static void timer_save_state(void)
+{
+   const struct fdt_property *prop;
+   const void *fdt = dt_fdt();
+   int node, len;
+   u32 *data;
+
+   node = fdt_node_offset_by_compatible(fdt, -1, "arm,armv8-timer");
+   assert(node >= 0 || node == -FDT_ERR_NOTFOUND);
+
+   if (node == -FDT_ERR_NOTFOUND) {
+   __timer_state.ptimer.irq = -1;
+   __timer_state.vtimer.irq = -1;
+   return;
+   }
+
+   /*
+* From Linux devicetree timer binding documentation
+*
+* interrupts :
+*  secure timer irq
+*  non-secure timer irq(ptimer)
+*  virtual timer irq   (vtimer)
+*  hypervisor timer irq
+*/
+   prop = fdt_get_property(fdt, node, "interrupts", &len);
+   assert(prop && len == (4 * 3 * sizeof(u32)));
+
+   data = (u32 *)prop->data;
+   assert(fdt32_to_cpu(data[3]) == 1 /* PPI */);
+   __timer_state.ptimer.irq = fdt32_to_cpu(data[4]);
+   __timer_state.ptimer.irq_flags = fdt32_to_cpu(data[5]);
+   assert(fdt32_to_cpu(data[6]) == 1 /* PPI */);
+   __timer_state.vtimer.irq = fdt32_to_cpu(data[7]);
+   __timer_state.vtimer.irq_flags = fdt32_to_cpu(data[8]);
+}
+
 void setup(const void *fdt)
 {
void *freemem = &stacktop;
@@ -211,6 +251,8 @@ void setup(co