Re: [PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields

2022-12-20 Thread Ryan Roberts
On 20/12/2022 00:06, Oliver Upton wrote:
> Hi Ryan,
> 
> On Tue, Dec 06, 2022 at 01:59:28PM +0000, Ryan Roberts wrote:
>> In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
>> SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
>> SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
>> start levels they represent (that I can find, at least), so replace the
>> existing macros with functions that do lookups to encode and decode the
>> values. These new functions no longer make hardcoded assumptions about
>> the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
>> KVM_PGTABLE_LAST_LEVEL.
>>
>> This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
>> with FEAT_LPA2.
>>
>> No functional change intended.
>>
>> Signed-off-by: Ryan Roberts 
> 
> Why do we need to support 5-level paging at stage-2?
> 
> A configuration of start_level = 0, T0SZ = 12 with 4K paging would
> result in 16 concatenated tables at level 0, avoiding the level -1
> lookup altogether.

Yes, agreed. And that's exactly what the code does. So we could remove this
patch from the series and everything would continue to function correctly. But I
was trying to make things more consistent and maintainable (this now works in
terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL for example).

That said, I haven't exactly been consistent in my refactoring; patch 11 just
adds a comment to kvm_vcpu_trap_get_fault_level() explaining that the new -1
level encodings will never be seen due to stage2 never using 5 levels of
translation.

So happy to remove this and replace with a comment describing the limitations if
that's your preference?

> 
> --
> Thanks,
> Oliver

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2

2022-12-15 Thread Ryan Roberts
On 15/12/2022 00:52, Oliver Upton wrote:
> On Tue, Dec 06, 2022 at 01:59:18PM +0000, Ryan Roberts wrote:
>> (appologies, I'm resending this series as I managed to send the cover letter 
>> to
>> all but the following patches only to myself on first attempt).
>>
>> This is my first upstream feature submission so please go easy ;-)
> 
> Welcome :)
> 
>> Support 52-bit Output Addresses: FEAT_LPA2 changes the format of the PTEs. 
>> The
>> HW advertises support for LPA2 independently for stage 1 and stage 2, and
>> therefore its possible to have it for one and not the other. I've assumed 
>> that
>> there is a valid case for this if stage 1 is not supported but stage 2 is, 
>> KVM
>> could still then use LPA2 at stage 2 to create a 52 bit IPA space (which 
>> could
>> then be consumed by a 64KB page guest kernel with the help of FEAT_LPA). 
>> Because
>> of this independence and the fact that the kvm pgtable library is used for 
>> both
>> stage 1 and stage 2 tables, this means the library now has to remember the
>> in-use format on a per-page-table basis. To do this, I had to rework some
>> functions to take a `struct kvm_pgtable *` parameter, and as a result, there 
>> is
>> a noisy patch to add this parameter.
> 
> Mismatch between the translation stages is an interesting problem...
> 
> Given that userspace is responsible for setting up the IPA space, I
> can't really think of a strong use case for 52 bit IPAs with a 48 bit
> VA. Sure, the VMM could construct a sparse IPA space or remap the same
> HVA at multiple IPAs to artificially saturate the address space, but
> neither seems terribly compelling.
> 
> Nonetheless, AFAICT we already allow this sort of mismatch on LPA &&
> !LVA systems. A 48 bit userspace could construct a 52 bit IPA space for
> its guest.

I guess a simpler approach would be to only use LPA2 if its supported by both
stage1 and stage2. Then the code could just use a static key in the few required
places. However, there is also a place where kvm_pgtable walks the user space s1
page table that is constructed by the kernel. For this to keep working, the
kernel would need to decide whether to use LPA2 based on the same criteria. But
it feels odd to have the kernel depend on LPA2 support at stage2. I'll wait for
your fuller review.

> 
> Marc, is there any real reason for this or is it just a byproduct of how
> LPA support was added to KVM?
> 
>> Support 52-bit Input Addresses: The main difficulty here is that at stage 1 
>> for
>> 4KB pages, 52-bit IA requires a extra level of lookup, and that level is 
>> called
>> '-1'. (Although stage 2 can use concatenated pages at the first level, and
>> therefore still only uses 4 levels, the kvm pgtable library deals with both
>> stage 1 and stage 2 tables). So there is another noisy patch to convert all
>> level variables to signed.
>>
>> This is all tested on the FVP, using a test harness I put together, which 
>> does a
>> host + guest boot test for 180 configurations, built from all the (valid)
>> combinations of various FVP, host kernel and guest kernel parameters:
>>
>>  - hw_pa:[48, lpa, lpa2]
>>  - hw_va:[48, 52]
>>  - kvm_mode: [vhe, nvhe, protected]
>>  - host_page_size:   [4KB, 16KB, 64KB]
>>  - host_pa:  [48, 52]
>>  - host_va:  [48, 52]
>>  - host_load_addr:   [low, high]
>>  - guest_page_size:  [64KB]
>>  - guest_pa: [52]
>>  - guest_va: [52]
>>  - guest_load_addr:  [low, high]
> 
> Wow, what a matrix!
> 
> In a later revision of this series it might be good to add support for
> LPA2 guests in KVM selftests. We currently constrain the IPA size to
> 48bits on !64K kernels.

Ahh - I did have a quick look at kselftests and kvm-unit-tests but it looked
like they were hard-coded for 48-bit IPA and it looked like quite an effort to
rework. I guess if it already supports 52 bit IPA for 64K kernels then I missed
something. I'll take another look and aim to get some tests implemented for a
future revision.
> 
> I'll have a deeper look at this series in the coming days.

Thanks!

> 
> --
> Thanks,
> Oliver

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 12/12] KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systems

2022-12-06 Thread Ryan Roberts
With all the page-table infrastructure in place, we can finally increase
the maximum permisable IPA size to 52-bits on 4KB and 16KB page systems
that have FEAT_LPA2.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/kvm/reset.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ae18472205a..548756c3f43c 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -118,7 +118,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
kfree(buf);
return ret;
}
-   
+
vcpu->arch.sve_state = buf;
vcpu_set_flag(vcpu, VCPU_SVE_FINALIZED);
return 0;
@@ -361,12 +361,11 @@ int kvm_set_ipa_limit(void)
parange = cpuid_feature_extract_unsigned_field(mmfr0,
ID_AA64MMFR0_EL1_PARANGE_SHIFT);
/*
-* IPA size beyond 48 bits could not be supported
-* on either 4K or 16K page size. Hence let's cap
-* it to 48 bits, in case it's reported as larger
-* on the system.
+* IPA size beyond 48 bits for 4K and 16K page size is only supported
+* when LPA2 is available. So if we have LPA2, enable it, else cap to 48
+* bits, in case it's reported as larger on the system.
 */
-   if (PAGE_SIZE != SZ_64K)
+   if (!kvm_supports_stage2_lpa2(mmfr0) && PAGE_SIZE != SZ_64K)
parange = min(parange, (unsigned 
int)ID_AA64MMFR0_EL1_PARANGE_48);
 
/*
-- 
2.25.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 09/12] KVM: arm64: Convert translation level parameter to s8

2022-12-06 Thread Ryan Roberts
With the introduction of FEAT_LPA2, the Arm ARM adds a new level of
translation, level -1, so levels can now be in the range [-1;3]. 3 is
always the last level and the first level is determined based on the
number of VA bits in use.

Convert level variables to use a signed type in preparation for
supporting this new level -1.

Since the last level is always anchored at 3, and the first level varies
to suit the number of VA/IPA bits, take the opportunity to replace
KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that
levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no
longer be true.

No behavioral changes intended.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_emulate.h  |  2 +-
 arch/arm64/include/asm/kvm_pgtable.h  | 21 +++---
 arch/arm64/include/asm/kvm_pkvm.h |  5 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  6 +-
 arch/arm64/kvm/hyp/nvhe/setup.c   |  4 +-
 arch/arm64/kvm/hyp/pgtable.c  | 94 ++-
 arch/arm64/kvm/mmu.c  | 11 ++--
 7 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..270f49e7f29a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -341,7 +341,7 @@ static __always_inline u8 
kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu 
*vcpu)
+static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu 
*vcpu)
 {
return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index d6f4dcdd00fd..a282a3d5ddbc 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,8 @@
 #include 
 #include 
 
-#define KVM_PGTABLE_MAX_LEVELS 4U
+#define KVM_PGTABLE_FIRST_LEVEL0
+#define KVM_PGTABLE_LAST_LEVEL 3
 
 /*
  * The largest supported block sizes for KVM (no 52-bit PA support):
@@ -20,9 +21,9 @@
  *  - 64K (level 2):   512MB
  */
 #ifdef CONFIG_ARM64_4K_PAGES
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL1U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL1
 #else
-#define KVM_PGTABLE_MIN_BLOCK_LEVEL2U
+#define KVM_PGTABLE_MIN_BLOCK_LEVEL2
 #endif
 
 static inline bool kvm_supports_hyp_lpa2(void)
@@ -84,18 +85,18 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
return pte & KVM_PTE_VALID;
 }
 
-static inline u64 kvm_granule_shift(u32 level)
+static inline u64 kvm_granule_shift(s8 level)
 {
-   /* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
+   /* Assumes KVM_PGTABLE_LAST_LEVEL is 3 */
return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
 }
 
-static inline u64 kvm_granule_size(u32 level)
+static inline u64 kvm_granule_size(s8 level)
 {
return BIT(kvm_granule_shift(level));
 }
 
-static inline bool kvm_level_supports_block_mapping(u32 level)
+static inline bool kvm_level_supports_block_mapping(s8 level)
 {
return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
 }
@@ -202,7 +203,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 
end,
  */
 struct kvm_pgtable {
u32 ia_bits;
-   u32 start_level;
+   s8  start_level;
kvm_pte_t   *pgd;
struct kvm_pgtable_mm_ops   *mm_ops;
boollpa2_ena;
@@ -245,7 +246,7 @@ enum kvm_pgtable_walk_flags {
 };
 
 typedef int (*kvm_pgtable_visitor_fn_t)(struct kvm_pgtable *pgt,
-   u64 addr, u64 end, u32 level,
+   u64 addr, u64 end, s8 level,
kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg);
@@ -581,7 +582,7 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 
size,
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
-kvm_pte_t *ptep, u32 *level);
+kvm_pte_t *ptep, s8 *level);
 
 /**
  * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
diff --git a/arch/arm64/include/asm/kvm_pkvm.h 
b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..addcf63cf8d5 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -16,10 +16,11 @@ extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
-   unsigned long total = 0, i;
+

[PATCH v1 10/12] KVM: arm64: Rework logic to en/decode VTCR_EL2.{SL0, SL2} fields

2022-12-06 Thread Ryan Roberts
In order to support 5 level translation, FEAT_LPA2 introduces the 1-bit
SL2 field within VTCR_EL2 to extend the existing 2-bit SL0 field. The
SL2[0]:SL0[1:0] encodings have no simple algorithmic relationship to the
start levels they represent (that I can find, at least), so replace the
existing macros with functions that do lookups to encode and decode the
values. These new functions no longer make hardcoded assumptions about
the maximum level and instead rely on KVM_PGTABLE_FIRST_LEVEL and
KVM_PGTABLE_LAST_LEVEL.

This is preparatory work for enabling 52-bit IPA for 4KB and 16KB pages
with FEAT_LPA2.

No functional change intended.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_arm.h| 75 ++---
 arch/arm64/include/asm/kvm_pgtable.h| 33 +++
 arch/arm64/include/asm/stage2_pgtable.h | 13 -
 arch/arm64/kvm/hyp/pgtable.c| 67 +-
 4 files changed, 150 insertions(+), 38 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index f9619a10d5d9..94bbb05e348f 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -150,58 +150,65 @@
 VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
 
 /*
- * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
- * Interestingly, it depends on the page size.
- * See D.10.2.121, VTCR_EL2, in ARM DDI 0487C.a
+ * VTCR_EL2.{SL0, SL2} indicates the entry level for Stage2 translation.
+ * Interestingly, it depends on the page size. See D17.2.157, VTCR_EL2, in ARM
+ * DDI 0487I.a
  *
- * -
- * | Entry level   |  4K  | 16K/64K |
- * --
- * | Level: 0  |  2   |   - |
- * --
- * | Level: 1  |  1   |   2 |
- * --
- * | Level: 2  |  0   |   1 |
- * --
- * | Level: 3  |  -   |   0 |
- * --
+ *  --
+ *  | Entry level   |4K|16K   |64K   |
+ *  |   |  SL2:SL0 |  SL2:SL0 |  SL2:SL0 |
+ *  --
+ *  | Level: -1 |  0b100   | -| -|
+ *  --
+ *  | Level: 0  |  0b010   |  0b011   | -|
+ *  --
+ *  | Level: 1  |  0b001   |  0b010   |  0b010   |
+ *  --
+ *  | Level: 2  |  0b000   |  0b001   |  0b001   |
+ *  --
+ *  | Level: 3  |  0b011   |  0b000   |  0b000   |
+ *  --
  *
- * The table roughly translates to :
- *
- * SL0(PAGE_SIZE, Entry_level) = TGRAN_SL0_BASE - Entry_Level
- *
- * Where TGRAN_SL0_BASE is a magic number depending on the page size:
- * TGRAN_SL0_BASE(4K) = 2
- * TGRAN_SL0_BASE(16K) = 3
- * TGRAN_SL0_BASE(64K) = 3
- * provided we take care of ruling out the unsupported cases and
- * Entry_Level = 4 - Number_of_levels.
+ * There is no concise algorithm to convert between the SLx encodings and the
+ * level numbers, so we implement 2 helpers kvm_vtcr_el2_sl_encode()
+ * kvm_vtcr_el2_sl_decode() which can convert between the representations. 
These
+ * helpers use a concatenated form of SLx: SL2[0]:SL0[1:0] as the 3 LSBs in u8.
+ * If an invalid input value is provided, VTCR_EL2_SLx_ENC_INVAL is returned. 
We
+ * declare the appropriate encoded values here for the compiled in page size.
  *
+ * See kvm_pgtable.h for documentation on the helpers.
  */
+#define VTCR_EL2_SLx_ENC_INVAL 255
+
 #ifdef CONFIG_ARM64_64K_PAGES
 
 #define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K
-#define VTCR_EL2_TGRAN_SL0_BASE3UL
+#define VTCR_EL2_SLx_ENC_Lm1   VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_L0VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_Lp1   2
+#define VTCR_EL2_SLx_ENC_Lp2   1
+#define VTCR_EL2_SLx_ENC_Lp3   0
 
 #elif defined(CONFIG_ARM64_16K_PAGES)
 
 #define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K
-#define VTCR_EL2_TGRAN_SL0_BASE3UL
+#define VTCR_EL2_SLx_ENC_Lm1   VTCR_EL2_SLx_ENC_INVAL
+#define VTCR_EL2_SLx_ENC_L03
+#define VTCR_EL2_SLx_ENC_Lp1   2
+#define VTCR_EL2_SLx_ENC_Lp2   1
+#define VTCR_EL2_SLx_ENC_Lp3   0
 
 #else  /* 4K */
 
 #define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K
-#define VTCR_EL2_TGRAN_SL0_BASE

[PATCH v1 11/12] KVM: arm64: Support upto 5 levels of translation in kvm_pgtable

2022-12-06 Thread Ryan Roberts
FEAT_LPA2 increases the maximum levels of translation from 4 to 5 for
the 4KB page case, when IA is >48 bits. While we can still use 4 levels
for stage2 translation in this case (due to stage2 allowing concatenated
page tables for first level lookup), the same kvm_pgtable library is
used for the hyp stage1 page tables and stage1 does not support
concatenation.

Therefore, modify the library to support upto 5 levels. Previous patches
already laid the groundwork for this by refactoring code to work in
terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. So we just
need to change these macros.

The hardware sometimes encodes the new level differently from the
others: One such place is when reading the level from the FSC field in
the ESR_EL2 register. We never expect to see the lowest level (-1) here
since the stage 2 page tables always use concatenated tables for first
level lookup and therefore only use 4 levels of lookup. So we get away
with just adding a comment to explain why we are not being careful about
decoding level -1.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++
 arch/arm64/include/asm/kvm_pgtable.h |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 270f49e7f29a..6f68febfb214 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -343,6 +343,16 @@ static __always_inline u8 
kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vc
 
 static __always_inline s8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu 
*vcpu)
 {
+   /*
+* Note: With the introduction of FEAT_LPA2 an extra level of
+* translation (level -1) is added. This level (obviously) doesn't
+* follow the previous convention of encoding the 4 levels in the 2 LSBs
+* of the FSC so this function breaks if the fault is for level -1.
+*
+* However, stage2 tables always use concatenated tables for first level
+* lookup and therefore it is guaranteed that the level will be between
+* 0 and 3, and this function continues to work.
+*/
return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
 }
 
diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index 3e0b64052c51..3655279e6a7d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -11,7 +11,7 @@
 #include 
 #include 
 
-#define KVM_PGTABLE_FIRST_LEVEL0
+#define KVM_PGTABLE_FIRST_LEVEL-1
 #define KVM_PGTABLE_LAST_LEVEL 3
 
 /*
-- 
2.25.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 08/12] KVM: arm64: Insert PS field at TCR_EL2 assembly time

2022-12-06 Thread Ryan Roberts
With the addition of LPA2 support in the hypervisor, the PA size
supported by the HW must be capped with a runtime decision, rather than
simply using a compile-time decision based on PA_BITS.For example, on a
system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB
or 16KB kernel compiled with LPA2 support must still limit the PA size
to 48 bits.

Therefore, move the insertion of the PS field into TCR_EL2 out of
__kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode()
where the rest of TCR_EL2 is assembled. This allows us to figure out PS
with kvm_get_parange(), which has the appropriate logic to ensure the
above requirement. (and the PS field of VTCR_EL2 is already populated
this way).

Signed-off-by: Ryan Roberts 
---
 arch/arm64/kvm/arm.c   | 5 -
 arch/arm64/kvm/hyp/nvhe/hyp-init.S | 4 
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a234c6252c3c..ac30d849a308 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1522,6 +1522,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 {
struct kvm_nvhe_init_params *params = 
per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
unsigned long tcr;
+   bool lpa2_ena = kvm_supports_hyp_lpa2();
+   u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 
/*
 * Calculate the raw per-cpu offset without a translation from the
@@ -1537,7 +1539,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
tcr &= ~TCR_T0SZ_MASK;
tcr |= TCR_T0SZ(hyp_va_bits);
-   if (kvm_supports_hyp_lpa2())
+   tcr |= kvm_get_parange(mmfr0, lpa2_ena) << TCR_EL2_PS_SHIFT;
+   if (lpa2_ena)
tcr |= TCR_EL2_DS;
params->tcr_el2 = tcr;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S 
b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c953fb4b9a13..3cc6dd2ff253 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -108,11 +108,7 @@ alternative_if ARM64_HAS_CNP
 alternative_else_nop_endif
msr ttbr0_el2, x2
 
-   /*
-* Set the PS bits in TCR_EL2.
-*/
ldr x0, [x0, #NVHE_INIT_TCR_EL2]
-   tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
msr tcr_el2, x0
 
isb
-- 
2.25.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 04/12] KVM: arm64: Plumbing to enable multiple pgtable formats

2022-12-06 Thread Ryan Roberts
FEAT_LPA2 brings support for 52-bit input and output addresses for both
stage1 and stage2 translation when using 4KB and 16KB page sizes. The
architecture allows for the HW to support FEAT_LPA2 in one or both
stages of translation. When FEAT_LPA2 is enabled for a given stage, it
effectively changes the page table format; PTE bits change meaning and
blocks can be mapped at levels that were previously not possible.

All of this means that KVM has to support 2 page table formats and
decide which to use at runtime, after querying the HW. If FEAT_LPA2 is
advertised for stage1, KVM must choose to either use the classic format
or lpa2 format according to some policy for its hyp stage1, else it must
use the classic format. Independently, if FEAT_LPA2 is advertised for
stage2, KVM must which format to use for the vm stage2 tables according
to a policy.

As a first step towards enabling FEAT_LPA2, make struct kvm_pgtable
accessible to functions that will need to take different actions
depending on the page-table format. These functions are:

  - kvm_pte_to_phys()
  - kvm_phys_to_pte()
  - kvm_level_supports_block_mapping()
  - hyp_set_prot_attr()
  - stage2_set_prot_attr()

Fix this by more consistently passing the struct kvm_pgtable around as
the first parameter of each kvm_pgtable function call. As a result of
always passing it to walker callbacks, we can remove some ad-hoc members
from walker-specific data structures because those members are
accessible through the struct kvm_pgtable (notably mmu and mm_ops).

No functional changes are intended.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_pgtable.h  |  23 ++--
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |   5 +-
 arch/arm64/kvm/hyp/nvhe/setup.c   |   8 +-
 arch/arm64/kvm/hyp/pgtable.c  | 181 +-
 4 files changed, 109 insertions(+), 108 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index 3252eb50ecfe..2247ed74871a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -47,16 +47,6 @@ static inline bool kvm_pte_valid(kvm_pte_t pte)
return pte & KVM_PTE_VALID;
 }
 
-static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
-{
-   u64 pa = pte & KVM_PTE_ADDR_MASK;
-
-   if (PAGE_SHIFT == 16)
-   pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
-
-   return pa;
-}
-
 static inline u64 kvm_granule_shift(u32 level)
 {
/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
@@ -184,6 +174,16 @@ struct kvm_pgtable {
kvm_pgtable_force_pte_cb_t  force_pte_cb;
 };
 
+static inline u64 kvm_pte_to_phys(struct kvm_pgtable *pgt, kvm_pte_t pte)
+{
+   u64 pa = pte & KVM_PTE_ADDR_MASK;
+
+   if (PAGE_SHIFT == 16)
+   pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+
+   return pa;
+}
+
 /**
  * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table 
walk.
  * @KVM_PGTABLE_WALK_LEAF: Visit leaf entries, including invalid
@@ -199,7 +199,8 @@ enum kvm_pgtable_walk_flags {
KVM_PGTABLE_WALK_TABLE_POST = BIT(2),
 };
 
-typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
+typedef int (*kvm_pgtable_visitor_fn_t)(struct kvm_pgtable *pgt,
+   u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 07f9dc9848ef..6bf54c8daffa 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -417,7 +417,8 @@ struct check_walk_data {
enum pkvm_page_state(*get_page_state)(kvm_pte_t pte);
 };
 
-static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
+static int __check_page_state_visitor(struct kvm_pgtable *pgt,
+ u64 addr, u64 end, u32 level,
  kvm_pte_t *ptep,
  enum kvm_pgtable_walk_flags flag,
  void * const arg)
@@ -425,7 +426,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, 
u32 level,
struct check_walk_data *d = arg;
kvm_pte_t pte = *ptep;
 
-   if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+   if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pgt, pte)))
return -EINVAL;
 
return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..60a6821ae98a 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -186,12 +186,13 @@ static void hpool_put_page(void *addr)

[PATCH v1 05/12] KVM: arm64: Maintain page-table format info in struct kvm_pgtable

2022-12-06 Thread Ryan Roberts
As the next step on the journey to supporting FEAT_LPA2 in KVM, add a
flag to struct kvm_pgtable, which functions can then use to select the
approprate behavior for either the `classic` or `lpa2` page-table
formats. For now, all page-tables remain in the `classic` format.

No functional changes are intended.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_pgtable.h | 2 ++
 arch/arm64/kvm/hyp/pgtable.c | 2 ++
 arch/arm64/kvm/mmu.c | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index 2247ed74871a..744e224d964b 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -157,6 +157,7 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 
end,
  * @start_level:   Level at which the page-table walk starts.
  * @pgd:   Pointer to the first top-level entry of the page-table.
  * @mm_ops:Memory management callbacks.
+ * @lpa2_ena:  Format used for page-table; false->classic, true->lpa2.
  * @mmu:   Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
  * @flags: Stage-2 page-table flags.
  * @force_pte_cb:  Function that returns true if page level mappings must
@@ -167,6 +168,7 @@ struct kvm_pgtable {
u32 start_level;
kvm_pte_t   *pgd;
struct kvm_pgtable_mm_ops   *mm_ops;
+   boollpa2_ena;
 
/* Stage-2 only */
struct kvm_s2_mmu   *mmu;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 221e0dafb149..c7799cd50af8 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -530,6 +530,7 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 
va_bits,
pgt->ia_bits= va_bits;
pgt->start_level= KVM_PGTABLE_MAX_LEVELS - levels;
pgt->mm_ops = mm_ops;
+   pgt->lpa2_ena   = false;
pgt->mmu= NULL;
pgt->force_pte_cb   = NULL;
 
@@ -1190,6 +1191,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, 
struct kvm_s2_mmu *mmu,
pgt->ia_bits= ia_bits;
pgt->start_level= start_level;
pgt->mm_ops = mm_ops;
+   pgt->lpa2_ena   = false;
pgt->mmu= mmu;
pgt->flags  = flags;
pgt->force_pte_cb   = force_pte_cb;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1ef0704420d9..e3fe3e194fd1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -645,6 +645,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
.start_level= (KVM_PGTABLE_MAX_LEVELS -
   CONFIG_PGTABLE_LEVELS),
.mm_ops = _user_mm_ops,
+   .lpa2_ena   = lpa2_is_enabled(),
};
kvm_pte_t pte = 0;  /* Keep GCC quiet... */
u32 level = ~0;
-- 
2.25.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 06/12] KVM: arm64: Use LPA2 page-tables for stage2 if HW supports it

2022-12-06 Thread Ryan Roberts
Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for stage 2,
regardless of the VMM-requested IPA size or HW-implemented PA size. When
in use we can now support up to 52-bit IPA and PA sizes.

We use the preparitory work that tracks the page-table format in struct
kvm_pgtable and passes the pgt pointer to all kvm_pgtable functions that
need to modify their behavior based on the format.

Note that FEAT_LPA2 brings support for bigger block mappings (512GB with
4KB, 64GB with 16KB). We explicitly don't enable these in the library
because stage2_apply_range() works on batch sizes of the largest used
block mapping, and increasing the size of the batch would lead to soft
lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit
stage2_apply_range() batch size to largest block").

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_pgtable.h  | 42 -
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 12 +++
 arch/arm64/kvm/hyp/pgtable.c  | 45 ++-
 3 files changed, 78 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index 744e224d964b..a7fd547dcc71 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,12 +25,32 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL2U
 #endif
 
-static inline u64 kvm_get_parange(u64 mmfr0)
+static inline bool kvm_supports_stage2_lpa2(u64 mmfr0)
 {
+   unsigned int tgran;
+
+   tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+   ID_AA64MMFR0_EL1_TGRAN_2_SHIFT);
+   return (tgran == ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA2 &&
+   PAGE_SIZE != SZ_64K);
+}
+
+static inline u64 kvm_get_parange_max(bool lpa2_ena)
+{
+   if (lpa2_ena ||
+  (IS_ENABLED(CONFIG_ARM64_PA_BITS_52) && PAGE_SIZE == SZ_64K))
+   return ID_AA64MMFR0_EL1_PARANGE_52;
+   else
+   return ID_AA64MMFR0_EL1_PARANGE_48;
+}
+
+static inline u64 kvm_get_parange(u64 mmfr0, bool lpa2_ena)
+{
+   u64 parange_max = kvm_get_parange_max(lpa2_ena);
u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
ID_AA64MMFR0_EL1_PARANGE_SHIFT);
-   if (parange > ID_AA64MMFR0_EL1_PARANGE_MAX)
-   parange = ID_AA64MMFR0_EL1_PARANGE_MAX;
+   if (parange > parange_max)
+   parange = parange_max;
 
return parange;
 }
@@ -41,6 +61,8 @@ typedef u64 kvm_pte_t;
 
 #define KVM_PTE_ADDR_MASK  GENMASK(47, PAGE_SHIFT)
 #define KVM_PTE_ADDR_51_48 GENMASK(15, 12)
+#define KVM_PTE_ADDR_MASK_LPA2 GENMASK(49, PAGE_SHIFT)
+#define KVM_PTE_ADDR_51_50_LPA2GENMASK(9, 8)
 
 static inline bool kvm_pte_valid(kvm_pte_t pte)
 {
@@ -178,10 +200,16 @@ struct kvm_pgtable {
 
 static inline u64 kvm_pte_to_phys(struct kvm_pgtable *pgt, kvm_pte_t pte)
 {
-   u64 pa = pte & KVM_PTE_ADDR_MASK;
+   u64 pa;
 
-   if (PAGE_SHIFT == 16)
-   pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+   if (pgt->lpa2_ena) {
+   pa = pte & KVM_PTE_ADDR_MASK_LPA2;
+   pa |= FIELD_GET(KVM_PTE_ADDR_51_50_LPA2, pte) << 50;
+   } else {
+   pa = pte & KVM_PTE_ADDR_MASK;
+   if (PAGE_SHIFT == 16)
+   pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+   }
 
return pa;
 }
@@ -287,7 +315,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 
addr, u64 size);
  * kvm_get_vtcr() - Helper to construct VTCR_EL2
  * @mmfr0: Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
  * @mmfr1: Sanitized value of SYS_ID_AA64MMFR1_EL1 register.
- * @phys_shfit:Value to set in VTCR_EL2.T0SZ.
+ * @phys_shift:Value to set in VTCR_EL2.T0SZ, or 0 to infer from 
parange.
  *
  * The VTCR value is common across all the physical CPUs on the system.
  * We use system wide sanitised values to fill in different fields,
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 6bf54c8daffa..43e729694deb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -105,14 +105,12 @@ static int prepare_s2_pool(void *pgt_pool_base)
 
 static void prepare_host_vtcr(void)
 {
-   u32 parange, phys_shift;
-
-   /* The host stage 2 is id-mapped, so use parange for T0SZ */
-   parange = kvm_get_parange(id_aa64mmfr0_el1_sys_val);
-   phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
-
+   /*
+* The host stage 2 is id-mapped; passing phys_shift=0 forces parange to
+* be used for T0SZ.
+*/
host_kvm.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
-

[PATCH v1 07/12] KVM: arm64: Use LPA2 page-tables for hyp stage1 if HW supports it

2022-12-06 Thread Ryan Roberts
Implement a simple policy whereby if the HW supports FEAT_LPA2 for the
page size we are using, always use LPA2-style page-tables for hyp stage
1, regardless of the IPA or PA size requirements. When in use we can now
support up to 52-bit IPA and PA sizes.

For the protected kvm case, the host creates the initial page-tables
using either the lpa2 or `classic` format as determined by whats
reported in mmfr0 and also sets the TCR_EL2.DS bit in the params
structure. The hypervisor then looks at this DS bit to determine the
format that it should use to re-create the page-tables.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_pgtable.h | 18 +-
 arch/arm64/kvm/arm.c |  2 ++
 arch/arm64/kvm/hyp/nvhe/setup.c  | 18 +-
 arch/arm64/kvm/hyp/pgtable.c |  7 ---
 arch/arm64/kvm/mmu.c |  3 ++-
 5 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
b/arch/arm64/include/asm/kvm_pgtable.h
index a7fd547dcc71..d6f4dcdd00fd 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,6 +25,21 @@
 #define KVM_PGTABLE_MIN_BLOCK_LEVEL2U
 #endif
 
+static inline bool kvm_supports_hyp_lpa2(void)
+{
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+   u64 mmfr0;
+   unsigned int tgran;
+
+   mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+   tgran = cpuid_feature_extract_unsigned_field(mmfr0,
+   ID_AA64MMFR0_EL1_TGRAN_SHIFT);
+   return (tgran == ID_AA64MMFR0_EL1_TGRAN_LPA2);
+#else
+   return false;
+#endif
+}
+
 static inline bool kvm_supports_stage2_lpa2(u64 mmfr0)
 {
unsigned int tgran;
@@ -253,11 +268,12 @@ struct kvm_pgtable_walker {
  * @pgt:   Uninitialised page-table structure to initialise.
  * @va_bits:   Maximum virtual address bits.
  * @mm_ops:Memory management callbacks.
+ * @lpa2_ena:  Whether to use the lpa2 page-table format.
  *
  * Return: 0 on success, negative error code on failure.
  */
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
-struct kvm_pgtable_mm_ops *mm_ops);
+struct kvm_pgtable_mm_ops *mm_ops, bool lpa2_ena);
 
 /**
  * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 803055da3ee3..a234c6252c3c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1537,6 +1537,8 @@ static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
tcr &= ~TCR_T0SZ_MASK;
tcr |= TCR_T0SZ(hyp_va_bits);
+   if (kvm_supports_hyp_lpa2())
+   tcr |= TCR_EL2_DS;
params->tcr_el2 = tcr;
 
params->pgd_pa = kvm_mmu_get_httbr();
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 60a6821ae98a..b44e87b9d168 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -56,7 +56,7 @@ static int divide_memory_pool(void *virt, unsigned long size)
 
 static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 unsigned long *per_cpu_base,
-u32 hyp_va_bits)
+u32 hyp_va_bits, bool lpa2_ena)
 {
void *start, *end, *virt = hyp_phys_to_virt(phys);
unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
@@ -66,7 +66,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned 
long size,
/* Recreate the hyp page-table using the early page allocator */
hyp_early_alloc_init(hyp_pgt_base, pgt_size);
ret = kvm_pgtable_hyp_init(_pgtable, hyp_va_bits,
-  _early_alloc_mm_ops);
+  _early_alloc_mm_ops, lpa2_ena);
if (ret)
return ret;
 
@@ -304,10 +304,11 @@ void __noreturn __pkvm_init_finalise(void)
 int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus,
unsigned long *per_cpu_base, u32 hyp_va_bits)
 {
-   struct kvm_nvhe_init_params *params;
+   struct kvm_nvhe_init_params *params = this_cpu_ptr(_init_params);
void *virt = hyp_phys_to_virt(phys);
void (*fn)(phys_addr_t params_pa, void *finalize_fn_va);
int ret;
+   bool lpa2_ena;
 
BUG_ON(kvm_check_pvm_sysreg_table());
 
@@ -321,14 +322,21 @@ int __pkvm_init(phys_addr_t phys, unsigned long size, 
unsigned long nr_cpus,
if (ret)
return ret;
 
-   ret = recreate_hyp_mappings(phys, size, per_cpu_base, hyp_va_bits);
+   /*
+* The host has already done the hard work to figure out if LPA2 is
+* supported at stage 1 and passed the info in the in the DS bit of the
+* TCR. Extract and pass

[PATCH v1 01/12] arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

2022-12-06 Thread Ryan Roberts
From: Anshuman Khandual 

PAGE_SIZE support is tested against possible minimum and maximum values for
its respective ID_AA64MMFR0.TGRAN field, depending on whether it is signed
or unsigned. But then FEAT_LPA2 implementation needs to be validated for 4K
and 16K page sizes via feature specific ID_AA64MMFR0.TGRAN values. Hence it
adds FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2] values per ARM ARM (0487G.A).

Acked-by: Catalin Marinas 
Signed-off-by: Anshuman Khandual 
Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/sysreg.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7d301700d1a9..9ad8172eea58 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -673,10 +673,12 @@
 
 /* id_aa64mmfr0 */
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN  0x0
+#define ID_AA64MMFR0_EL1_TGRAN4_LPA2   ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX  0x7
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MIN 0x0
 #define ID_AA64MMFR0_EL1_TGRAN64_SUPPORTED_MAX 0x7
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN 0x1
+#define ID_AA64MMFR0_EL1_TGRAN16_LPA2  ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX 0xf
 
 #define ARM64_MIN_PARANGE_BITS 32
@@ -684,6 +686,7 @@
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_DEFAULT 0x0
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_NONE0x1
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MIN 0x2
+#define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_LPA20x3
 #define ID_AA64MMFR0_EL1_TGRAN_2_SUPPORTED_MAX 0x7
 
 #ifdef CONFIG_ARM64_PA_BITS_52
@@ -800,11 +803,13 @@
 
 #if defined(CONFIG_ARM64_4K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT   ID_AA64MMFR0_EL1_TGRAN4_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2ID_AA64MMFR0_EL1_TGRAN4_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN   
ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX   
ID_AA64MMFR0_EL1_TGRAN4_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT ID_AA64MMFR0_EL1_TGRAN4_2_SHIFT
 #elif defined(CONFIG_ARM64_16K_PAGES)
 #define ID_AA64MMFR0_EL1_TGRAN_SHIFT   ID_AA64MMFR0_EL1_TGRAN16_SHIFT
+#define ID_AA64MMFR0_EL1_TGRAN_LPA2ID_AA64MMFR0_EL1_TGRAN16_52_BIT
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MIN   
ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MIN
 #define ID_AA64MMFR0_EL1_TGRAN_SUPPORTED_MAX   
ID_AA64MMFR0_EL1_TGRAN16_SUPPORTED_MAX
 #define ID_AA64MMFR0_EL1_TGRAN_2_SHIFT ID_AA64MMFR0_EL1_TGRAN16_2_SHIFT
-- 
2.25.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 03/12] KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2

2022-12-06 Thread Ryan Roberts
As per Arm ARM (0487I.a), (V)TCR_EL2.DS fields control whether 52 bit
input and output addresses are supported on 4K and 16K page size
configurations when FEAT_LPA2 is known to have been implemented.
Additionally, VTCR_EL2.SL2 field is added to enable encoding of a 5th
starting level of translation, which is required with 4KB IPA size of
49-52 bits if concatenated first level page tables are not used.

This adds these field definitions which will be used by KVM when
FEAT_LPA2 is enabled.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_arm.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index a82f2493a72b..f9619a10d5d9 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -92,6 +92,7 @@
 #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
 
 /* TCR_EL2 Registers bits */
+#define TCR_EL2_DS (1UL << 32)
 #define TCR_EL2_RES1   ((1U << 31) | (1 << 23))
 #define TCR_EL2_TBI(1 << 20)
 #define TCR_EL2_PS_SHIFT   16
@@ -106,6 +107,9 @@
 TCR_EL2_ORGN0_MASK | TCR_EL2_IRGN0_MASK | 
TCR_EL2_T0SZ_MASK)
 
 /* VTCR_EL2 Registers bits */
+#define VTCR_EL2_SL2_SHIFT 33
+#define VTCR_EL2_SL2_MASK  (1UL << VTCR_EL2_SL2_SHIFT)
+#define VTCR_EL2_DSTCR_EL2_DS
 #define VTCR_EL2_RES1  (1U << 31)
 #define VTCR_EL2_HD(1 << 22)
 #define VTCR_EL2_HA(1 << 21)
-- 
2.25.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 02/12] arm64/mm: Update tlb invalidation routines for FEAT_LPA2

2022-12-06 Thread Ryan Roberts
FEAT_LPA2 impacts tlb invalidation in 2 ways; Firstly, the TTL field in
the non-range tlbi instructions can now validly take a 0 value for the
4KB granule (this is due to the extra level of translation). Secondly,
the BADDR field in the range tlbi instructions must be aligned to 64KB
when LPA2 is in use (TCR.DS=1). Changes are required for tlbi to
continue to operate correctly when LPA2 is in use.

We solve the first by always adding the level hint if the level is
between [0, 3] (previously anything other than 0 was hinted, which
breaks in the new level -1 case from kvm). When running on non-LPA2 HW,
0 is still safe to hint as the HW will fall back to non-hinted. We also
update kernel code to take advantage of the new hint for p4d flushing.
While we are at it, we replace the notion of 0 being the non-hinted
seninel with a macro, TLBI_TTL_UNKNOWN. This means callers won't need
updating if/when translation depth increases in future.

The second problem is tricker. When LPA2 is in use, we need to use the
non-range tlbi instructions to forward align to a 64KB boundary first,
then we can use range-based tlbi from there on, until we have either
invalidated all pages or we have a single page remaining. If the latter,
that is done with non-range tlbi. (Previously we invalidated a single
odd page first, but we can no longer do this because it could wreck our
64KB alignment). When LPA2 is not in use, we don't need the initial
alignemnt step. However, the bigger impact is that we can no longer use
the previous method of iterating from smallest to largest 'scale', since
this would likely unalign the boundary again for the LPA2 case. So
instead we iterate from highest to lowest scale, which guarrantees that
we remain 64KB aligned until the last op (at scale=0).

The original commit (d1d3aa9 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for incrementing scale:

  However, in most scenarios, the pages = 1 when flush_tlb_range() is
  called. Start from scale = 3 or other proper value (such as scale
  =ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
  to maximum, the flush order is exactly opposite to the example.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/pgtable-prot.h |  6 ++
 arch/arm64/include/asm/tlb.h  | 15 +++--
 arch/arm64/include/asm/tlbflush.h | 83 +--
 3 files changed, 69 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h 
b/arch/arm64/include/asm/pgtable-prot.h
index 9b165117a454..308cc02fcdf3 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -40,6 +40,12 @@ extern bool arm64_use_ng_mappings;
 #define PTE_MAYBE_NG   (arm64_use_ng_mappings ? PTE_NG : 0)
 #define PMD_MAYBE_NG   (arm64_use_ng_mappings ? PMD_SECT_NG : 0)
 
+/*
+ * For now the kernel never uses lpa2 for its stage1 tables. But kvm does and
+ * this hook allows us to update the common tlbi code to handle lpa2.
+ */
+#define lpa2_is_enabled()  false
+
 /*
  * If we have userspace only BTI we don't want to mark kernel pages
  * guarded even if the system does support BTI.
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..3a189c435973 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -22,15 +22,15 @@ static void tlb_flush(struct mmu_gather *tlb);
 #include 
 
 /*
- * get the tlbi levels in arm64.  Default value is 0 if more than one
- * of cleared_* is set or neither is set.
- * Arm64 doesn't support p4ds now.
+ * get the tlbi levels in arm64.  Default value is TLBI_TTL_UNKNOWN if more 
than
+ * one of cleared_* is set or neither is set - this elides the level hinting to
+ * the hardware.
  */
 static inline int tlb_get_level(struct mmu_gather *tlb)
 {
/* The TTL field is only valid for the leaf entry. */
if (tlb->freed_tables)
-   return 0;
+   return TLBI_TTL_UNKNOWN;
 
if (tlb->cleared_ptes && !(tlb->cleared_pmds ||
   tlb->cleared_puds ||
@@ -47,7 +47,12 @@ static inline int tlb_get_level(struct mmu_gather *tlb)
   tlb->cleared_p4ds))
return 1;
 
-   return 0;
+   if (tlb->cleared_p4ds && !(tlb->cleared_ptes ||
+  tlb->cleared_pmds ||
+  tlb->cleared_puds))
+   return 0;
+
+   return TLBI_TTL_UNKNOWN;
 }
 
 static inline void tlb_flush(struct mmu_gather *tlb)
diff --git a/arch/arm64/include/asm/tlbflush.h 
b/arch/arm64/include/asm/tlbflush

[PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2

2022-12-06 Thread Ryan Roberts
|  16k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  16k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  16k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  16k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  16k   |   52|   52|  high
  |   low   | False |
| lpa2  |  52   |   nvhe|  16k   |   52|   52|  high
  |  high   | False |
| lpa2  |  52   |   nvhe|  64k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  64k   |   48|   52|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   48|   52|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   48|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   48|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   52|  high
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   52|  high
  |  high   | True  |
| lpa2  |  52   | protected |   4k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   | protected |   4k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   | protected |   4k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   | protected |   4k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   | protected |   4k   |   52|   52|  high
  |   low   | False |
| lpa2  |  52   | protected |   4k   |   52|   52|  high
  |  high   | False |
| lpa2  |  52   | protected |  16k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  16k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  16k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  16k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  16k   |   52|   52|  high
  |   low   | False |
| lpa2  |  52   | protected |  16k   |   52|   52|  high
  |  high   | False |
| lpa2  |  52   | protected |  64k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  64k   |   48|   52|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   48|   52|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  64k   |   52|   48|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   52|   48|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  64k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  64k   |   52|   52|  high
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   52|   52|  high
  |  high   | True  |
+---+---+---++-+-++-+---+

[1] 
https://lore.kernel.org/linux-arm-kernel/20221124123932.2648991-1-a...@kernel.org
[2] https://lore.kernel.org/kvmarm/20221027120945.29679-1-ryan.robe...@arm.com
[3] https://lore.kernel.org/kvmarm/20221103150507.32948-1-ryan.robe...@arm.com
[4] https://lore.kernel.org/kvmarm/20221205114031.3972780-1-ryan.robe...@arm.com
[5] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/kvm_lkml-v1
[6] 
https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/ardb_arm64-4k-lpa2_plus_kvm_2022-12-01

Thanks,
Ryan


Anshuman Khandual (1):
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ryan Roberts (11):
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  KVM: arm64

[PATCH v1 00/12] KVM: arm64: Support FEAT_LPA2 at hyp s1 and vm s2

2022-12-06 Thread Ryan Roberts
|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  16k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  16k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  16k   |   52|   52|  high
  |   low   | False |
| lpa2  |  52   |   nvhe|  16k   |   52|   52|  high
  |  high   | False |
| lpa2  |  52   |   nvhe|  64k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  64k   |   48|   52|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   48|   52|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   48|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   48|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   52|  high
  |   low   | True  |
| lpa2  |  52   |   nvhe|  64k   |   52|   52|  high
  |  high   | True  |
| lpa2  |  52   | protected |   4k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   | protected |   4k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   | protected |   4k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   | protected |   4k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   | protected |   4k   |   52|   52|  high
  |   low   | False |
| lpa2  |  52   | protected |   4k   |   52|   52|  high
  |  high   | False |
| lpa2  |  52   | protected |  16k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  16k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  16k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  16k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  16k   |   52|   52|  high
  |   low   | False |
| lpa2  |  52   | protected |  16k   |   52|   52|  high
  |  high   | False |
| lpa2  |  52   | protected |  64k   |   48|   48|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   48|   48|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  64k   |   48|   52|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   48|   52|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  64k   |   52|   48|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   52|   48|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  64k   |   52|   52|  low 
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   52|   52|  low 
  |  high   | True  |
| lpa2  |  52   | protected |  64k   |   52|   52|  high
  |   low   | True  |
| lpa2  |  52   | protected |  64k   |   52|   52|  high
  |  high   | True  |
+---+---+---++-+-++-+---+

[1] 
https://lore.kernel.org/linux-arm-kernel/20221124123932.2648991-1-a...@kernel.org
[2] https://lore.kernel.org/kvmarm/20221027120945.29679-1-ryan.robe...@arm.com
[3] https://lore.kernel.org/kvmarm/20221103150507.32948-1-ryan.robe...@arm.com
[4] https://lore.kernel.org/kvmarm/20221205114031.3972780-1-ryan.robe...@arm.com
[5] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/kvm_lkml-v1
[6] 
https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/lpa2/ardb_arm64-4k-lpa2_plus_kvm_2022-12-01

Thanks,
Ryan


Anshuman Khandual (1):
  arm64/mm: Add FEAT_LPA2 specific ID_AA64MMFR0.TGRAN[2]

Ryan Roberts (11):
  arm64/mm: Update tlb invalidation routines for FEAT_LPA2
  KVM: arm64: Add new (V)TCR_EL2 field definitions for FEAT_LPA2
  KVM: arm64: Plumbing to enable multiple pgtable formats
  KVM: arm64: Maintain page-table

Re: [PATCH v1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.

2022-12-05 Thread Ryan Roberts
On 05/12/2022 13:49, Marc Zyngier wrote:
> Hi Ryan,
> 
> Thanks for that.
> 
> On Mon, 05 Dec 2022 11:40:31 +,
> Ryan Roberts  wrote:
>>
>> get_user_mapping_size() uses kvm's pgtable library to walk a user space
>> page table created by the kernel, and in doing so, fakes up the metadata
>> that the library needs, including ia_bits, which defines the size of the
>> input address.
> 
> It isn't supposed to "fake" anything. It simply provides the
> information that the walker needs to correctly parse the page tables.

Apologies - poor choice of words.

> 
>>
>> For the case where the kernel is compiled for 52 VA bits but runs on HW
>> that does not support LVA, it will fall back to 48 VA bits at runtime.
>> Therefore we must use vabits_actual rather than VA_BITS to get the true
>> address size.
>>
>> This is benign in the current code base because the pgtable library only
>> uses it for error checking.
>>
>> Fixes: 6011cf68c885 ("KVM: arm64: Walk userspace page tables to compute
>> the THP mapping size")
> 
> nit: this should appear on a single line, without a line-break in the
> middle [1]...>
>>
> 
> ... without a blank line between Fixes: and the rest of the tags.

Ahh, thanks for the pointer. I'll admit that checkpatch did raise this but I
assumed it was a false positive, because assumed the 75 chars per line rule
would override.

> 
> And while I'm on the "trivial remarks" train, drop the full stop at
> the end of the subject line.

Yep, will do.

> 
>> Signed-off-by: Ryan Roberts 
>> ---
>>  arch/arm64/kvm/mmu.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 4efb983cff43..1ef0704420d9 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -641,7 +641,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 
>> addr)
>>  {
>>  struct kvm_pgtable pgt = {
>>  .pgd= (kvm_pte_t *)kvm->mm->pgd,
>> -.ia_bits= VA_BITS,
>> +.ia_bits= vabits_actual,
>>  .start_level= (KVM_PGTABLE_MAX_LEVELS -
>> CONFIG_PGTABLE_LEVELS),
>>  .mm_ops = _user_mm_ops,
>> --
>> 2.25.1
>>
>>
> 
> Other than the above nits, this is well spotted. I need to regenerate
> the kvmarm/next branch after the sysreg attack from James, so I'll try
> and fold that in.

Sounds like you are happy to tend to the nits and don't need me to repost?

> 
> Thanks,
> 
>   M.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n139
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.

2022-12-05 Thread Ryan Roberts
get_user_mapping_size() uses kvm's pgtable library to walk a user space
page table created by the kernel, and in doing so, fakes up the metadata
that the library needs, including ia_bits, which defines the size of the
input address.

For the case where the kernel is compiled for 52 VA bits but runs on HW
that does not support LVA, it will fall back to 48 VA bits at runtime.
Therefore we must use vabits_actual rather than VA_BITS to get the true
address size.

This is benign in the current code base because the pgtable library only
uses it for error checking.

Fixes: 6011cf68c885 ("KVM: arm64: Walk userspace page tables to compute
the THP mapping size")

Signed-off-by: Ryan Roberts 
---
 arch/arm64/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4efb983cff43..1ef0704420d9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -641,7 +641,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr)
 {
struct kvm_pgtable pgt = {
.pgd= (kvm_pte_t *)kvm->mm->pgd,
-   .ia_bits= VA_BITS,
+   .ia_bits= vabits_actual,
.start_level= (KVM_PGTABLE_MAX_LEVELS -
   CONFIG_PGTABLE_LEVELS),
.mm_ops = _user_mm_ops,
--
2.25.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 1/2] KVM: arm64: Fix kvm init failure when mode!=vhe and VA_BITS=52.

2022-11-03 Thread Ryan Roberts
For nvhe and protected modes, the hyp stage 1 page-table was previously
configured to have the same number of VA bits as the kernel's idmap.
However, for kernel configs with VA_BITS=52 and where the kernel is
loaded in physical memory below 48 bits, the idmap VA bits is actually
smaller than the kernel's normal stage 1 VA bits. This can lead to
kernel addresses that can't be mapped into the hypervisor, leading to
kvm initialization failure during boot:

  kvm [1]: IPA Size Limit: 48 bits
  kvm [1]: Cannot map world-switch code
  kvm [1]: error initializing Hyp mode: -34

Fix this by ensuring that the hyp stage 1 VA size is the maximum of
what's used for the idmap and the regular kernel stage 1. At the same
time, refactor the code so that the hyp VA bits is only calculated in
one place.

Fixes: 7ba8f2b2d652 ("arm64: mm: use a 48-bit ID map when possible on
52-bit VA builds")

Prior to the above mentioned change, the idmap was always 52 bits for a
52 VA bits kernel and therefore the hyp stage1 was also always 52 bits.

Signed-off-by: Ryan Roberts 
---
 arch/arm64/kvm/arm.c | 20 +++-
 arch/arm64/kvm/mmu.c | 28 +++-
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..803055da3ee3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1518,7 +1518,7 @@ static int kvm_init_vector_slots(void)
return 0;
 }
 
-static void cpu_prepare_hyp_mode(int cpu)
+static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
 {
struct kvm_nvhe_init_params *params = 
per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
unsigned long tcr;
@@ -1534,23 +1534,9 @@ static void cpu_prepare_hyp_mode(int cpu)
 
params->mair_el2 = read_sysreg(mair_el1);
 
-   /*
-* The ID map may be configured to use an extended virtual address
-* range. This is only the case if system RAM is out of range for the
-* currently configured page size and VA_BITS, in which case we will
-* also need the extended virtual range for the HYP ID map, or we won't
-* be able to enable the EL2 MMU.
-*
-* However, at EL2, there is only one TTBR register, and we can't switch
-* between translation tables *and* update TCR_EL2.T0SZ at the same
-* time. Bottom line: we need to use the extended range with *both* our
-* translation tables.
-*
-* So use the same T0SZ value we use for the ID map.
-*/
tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
tcr &= ~TCR_T0SZ_MASK;
-   tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
+   tcr |= TCR_T0SZ(hyp_va_bits);
params->tcr_el2 = tcr;
 
params->pgd_pa = kvm_mmu_get_httbr();
@@ -2054,7 +2040,7 @@ static int init_hyp_mode(void)
}
 
/* Prepare the CPU initialization parameters */
-   cpu_prepare_hyp_mode(cpu);
+   cpu_prepare_hyp_mode(cpu, hyp_va_bits);
}
 
if (is_protected_kvm_enabled()) {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..4efb983cff43 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1618,6 +1618,8 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
 int kvm_mmu_init(u32 *hyp_va_bits)
 {
int err;
+   u32 idmap_bits;
+   u32 kernel_bits;
 
hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
@@ -1631,7 +1633,31 @@ int kvm_mmu_init(u32 *hyp_va_bits)
 */
BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
-   *hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
+   /*
+* The ID map may be configured to use an extended virtual address
+* range. This is only the case if system RAM is out of range for the
+* currently configured page size and VA_BITS_MIN, in which case we will
+* also need the extended virtual range for the HYP ID map, or we won't
+* be able to enable the EL2 MMU.
+*
+* However, in some cases the ID map may be configured for fewer than
+* the number of VA bits used by the regular kernel stage 1. This
+* happens when VA_BITS=52 and the kernel image is placed in PA space
+* below 48 bits.
+*
+* At EL2, there is only one TTBR register, and we can't switch between
+* translation tables *and* update TCR_EL2.T0SZ at the same time. Bottom
+* line: we need to use the extended range with *both* our translation
+* tables.
+*
+* So use the maximum of the idmap VA bits and the regular kernel stage
+* 1 VA bits to assure that the hypervisor can both ID map its code page
+* and map any kernel memory.
+*/
+   idmap_bits = 64

[PATCH v1 2/2] KVM: arm64: Fix PAR_TO_HPFAR() to work independently of PA_BITS.

2022-11-03 Thread Ryan Roberts
Kernel configs with PAGE_SIZE=64KB and PA_BITS=48 still advertise 52 bit
IPA space on HW that implements LPA. This is by design. (Admitedly this
is a very unlikely configuration in the real world).

However on such a config, attempting to create a vm with the guest
kernel placed above 48 bits in IPA space results in misbehaviour due to
the hypervisor incorrectly interpretting a faulting IPA.

Fix up PAR_TO_HPFAR() to always take 52 bits out of the PAR rather than
masking to CONFIG_ARM64_PA_BITS. If the system has a smaller implemented
PARange this should be safe because the bits are res0.

A more robust approach would be to discover the IPA size in use by the
page-table and mask based on that, to avoid relying on res0 reading back
as zero. But this information is difficult to access safely from the
code's location, so take the easy way out.

Fixes: bc1d7de8c550 ("kvm: arm64: Add 52bit support for PAR to HPFAR
conversoin")

Signed-off-by: Ryan Roberts 
---
 arch/arm64/include/asm/kvm_arm.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8aa8492dafc0..a82f2493a72b 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -340,9 +340,13 @@
  * We have
  * PAR [PA_Shift - 1   : 12] = PA  [PA_Shift - 1 : 12]
  * HPFAR   [PA_Shift - 9   : 4]  = FIPA[PA_Shift - 1 : 12]
+ *
+ * Always assume 52 bit PA since at this point, we don't know how many PA bits
+ * the page table has been set up for. This should be safe since unused address
+ * bits in PAR are res0.
  */
 #define PAR_TO_HPFAR(par)  \
-   (((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)
+   (((par) & GENMASK_ULL(52 - 1, 12)) >> 8)
 
 #define ECN(x) { ESR_ELx_EC_##x, #x }
 
-- 
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1 0/2] KVM fixes for exotic configurations

2022-11-03 Thread Ryan Roberts
I've been adding support for FEAT_LPA2 to KVM and as part of that work have been
testing various (84) configurations of HW, host and guest kernels on FVP. This
has thrown up a couple of pre-existing bugs, for which the fixes are provided.

Thanks,
Ryan

Ryan Roberts (2):
  KVM: arm64: Fix kvm init failure when mode!=vhe and VA_BITS=52.
  KVM: arm64: Fix PAR_TO_HPFAR() to work independently of PA_BITS.

 arch/arm64/include/asm/kvm_arm.h |  6 +-
 arch/arm64/kvm/arm.c | 20 +++-
 arch/arm64/kvm/mmu.c | 28 +++-
 3 files changed, 35 insertions(+), 19 deletions(-)

--
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v1] KVM: arm64: Fix bad dereference on MTE-enabled systems.

2022-10-27 Thread Ryan Roberts
enter_exception64() performs an MTE check, which involves dereferencing
vcpu->kvm. While vcpu has already been fixed up for hyp va, kvm is still
a kern va pointer.

Fix this by first converting kvm to a hyp va pointer. If the system does
not support MTE, the dereference is avoided in the first place.

Fixes: ea7fc1bb1cd1 ("KVM: arm64: Introduce MTE VM feature")

Signed-off-by: Ryan Roberts 
---
 arch/arm64/kvm/hyp/exception.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index b7557b25ed56..791d3de76771 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 

 #if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
 #error Hypervisor code only!
@@ -115,7 +116,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, 
unsigned long target_mode,
new |= (old & PSR_C_BIT);
new |= (old & PSR_V_BIT);

-   if (kvm_has_mte(vcpu->kvm))
+   if (kvm_has_mte(kern_hyp_va(vcpu->kvm)))
new |= PSR_TCO_BIT;

new |= (old & PSR_DIT_BIT);
--
2.17.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

2022-08-24 Thread Ryan Roberts

On 24/08/2022 18:25, Yosry Ahmed wrote:

On Wed, Aug 24, 2022 at 6:42 AM Ryan Roberts  wrote:



diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index e7aafc82be99..898c99eae8e4 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -982,6 +982,7 @@ Example output. You may not have all of these fields.
   SUnreclaim:   142336 kB
   KernelStack:   11168 kB
   PageTables:20540 kB
+SecPageTables: 0 kB
   NFS_Unstable:  0 kB
   Bounce:0 kB
   WritebackTmp:  0 kB
@@ -1090,6 +1091,9 @@ KernelStack
 Memory consumed by the kernel stacks of all tasks
   PageTables
 Memory consumed by userspace page tables
+SecPageTables
+  Memory consumed by secondary page tables, this currently
+  currently includes KVM mmu allocations on x86 and arm64.


nit: I think you have a typo here: "currently currently".


Sorry I missed this, thanks for catching it. The below diff fixes it
(let me know if I need to send v8 for this, hopefully not).

diff --git a/Documentation/filesystems/proc.rst
b/Documentation/filesystems/proc.rst
index 898c99eae8e4..0b3778ec12e1 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1093,7 +1093,7 @@ PageTables
Memory consumed by userspace page tables
  SecPageTables
Memory consumed by secondary page tables, this currently
-  currently includes KVM mmu allocations on x86 and arm64.
+  includes KVM mmu allocations on x86 and arm64.
  NFS_Unstable
Always zero. Previous counted pages which had been written to
the server, but has not been committed to stable storage.



Looks good to me!


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 4/4] KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats

2022-08-24 Thread Ryan Roberts

On 24/08/2022 15:24, Marc Zyngier wrote:

On Wed, 24 Aug 2022 14:43:43 +0100,
Ryan Roberts  wrote:



Count the pages used by KVM in arm64 for stage2 mmu in memory stats
under secondary pagetable stats (e.g. "SecPageTables" in /proc/meminfo)
to give better visibility into the memory consumption of KVM mmu in a
similar way to how normal user page tables are accounted.

Signed-off-by: Yosry Ahmed 
Reviewed-by: Oliver Upton 
Reviewed-by: Marc Zyngier 
---


I see that you are not including the memory reserved for the host
stage2 table when using protected KVM. Is this something worth adding?
(See arch/arm64/kvm/pkvm.c:kvm_hyp_reserve()).

This reservation is done pretty early on in bootmem_init() so not sure
if this could cause some init ordering issues that might be tricky to
solve though.


I also don't see what this buys us. This memory can't be reclaimed,
and is not part of KVM's job for the purpose of running guests, which
is what this series is about.

If anything, it should be accounted separately.


OK fair enough. It just struck me from the patch description that the 
host stage2 might qualify as "pages used by KVM in arm64 for stage2 
mmu". But I don't have any understanding of the use case this is for.


Sorry for the noise!

Thanks,
Ryan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 4/4] KVM: arm64/mmu: count KVM s2 mmu usage in secondary pagetable stats

2022-08-24 Thread Ryan Roberts

Count the pages used by KVM in arm64 for stage2 mmu in memory stats
under secondary pagetable stats (e.g. "SecPageTables" in /proc/meminfo)
to give better visibility into the memory consumption of KVM mmu in a
similar way to how normal user page tables are accounted.

Signed-off-by: Yosry Ahmed 
Reviewed-by: Oliver Upton 
Reviewed-by: Marc Zyngier 
---


I see that you are not including the memory reserved for the host stage2 
table when using protected KVM. Is this something worth adding? (See 
arch/arm64/kvm/pkvm.c:kvm_hyp_reserve()).


This reservation is done pretty early on in bootmem_init() so not sure 
if this could cause some init ordering issues that might be tricky to 
solve though.


Thanks,
Ryan


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 1/4] mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses.

2022-08-24 Thread Ryan Roberts

diff --git a/Documentation/filesystems/proc.rst 
b/Documentation/filesystems/proc.rst
index e7aafc82be99..898c99eae8e4 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -982,6 +982,7 @@ Example output. You may not have all of these fields.
  SUnreclaim:   142336 kB
  KernelStack:   11168 kB
  PageTables:20540 kB
+SecPageTables: 0 kB
  NFS_Unstable:  0 kB
  Bounce:0 kB
  WritebackTmp:  0 kB
@@ -1090,6 +1091,9 @@ KernelStack
Memory consumed by the kernel stacks of all tasks
  PageTables
Memory consumed by userspace page tables
+SecPageTables
+  Memory consumed by secondary page tables, this currently
+  currently includes KVM mmu allocations on x86 and arm64.


nit: I think you have a typo here: "currently currently".

Thanks,
Ryan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm