Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-04 Thread Ard Biesheuvel
On 4 December 2015 at 02:58, Ben Hutchings  wrote:
> On Wed, 2015-12-02 at 18:41 +0100, Ard Biesheuvel wrote:
>> Hi Pavel,
>>
>> Thanks for getting to the bottom of this.
>>
>> On 1 December 2015 at 14:03, Pavel Fedin  wrote:
>> > This function takes stage-II physical addresses (A.K.A. IPA), on input, not
>> > real physical addresses. This causes kvm_is_device_pfn() to return wrong
>> > values, depending on how much guest and host memory maps match. This
>> > results in completely broken KVM on some boards. The problem has been
>> > caught on Samsung proprietary hardware.
>> >
>> > Cc: sta...@vger.kernel.org
>> > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
>> > uncachedness")
>> >
>>
>> That commit is not in a release yet, so no need for cc stable
> [...]
>
> But it is cc'd to stable, so unless it is going to be nacked at review
> stage, any subsequent fixes should also be cc'd.
>

Ah yes, thanks for pointing that out.

But please, don't cc your proposed patches straight to
sta...@vger.kernel.org. I usually leave it up to the maintainer that
merges the patch to add the Cc: line to the commit log.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-03 Thread Ard Biesheuvel
On 3 December 2015 at 08:14, Pavel Fedin  wrote:
>  Hello!
>
>> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> > index 7dace90..51ad98f 100644
>> > --- a/arch/arm/kvm/mmu.c
>> > +++ b/arch/arm/kvm/mmu.c
>> > @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> > *pmd,
>> >
>> > pte = pte_offset_kernel(pmd, addr);
>> > do {
>> > -   if (!pte_none(*pte) && 
>> > !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> > +   if (!pte_none(*pte) &&
>> > +   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>>
>> I think your analysis is correct, but does that not apply to both instances?
>
>  No no, another one is correct, since it operates on real PFN (at least looks 
> like so). I have verified my fix against the original problem (crash on 
> Exynos5410 without generic timer), and it still works fine there.
>

I don't think so. Regardless of whether you are manipulating HYP
mappings or stage-2 mappings, the physical address is always the
output, not the input of the translation, so addr is always either a
virtual address or a intermediate physical address, whereas
pfn_valid() operates on host physical addresses.

>> And instead of reverting, could we fix this properly instead?
>
>  Of course, i'm not against alternate approaches, feel free to. I've just 
> suggested what i could, to fix things quickly. I'm indeed no expert in KVM 
> memory management yet. After all, this is what mailing lists are for.
>

OK. I will follow up with a patch, as Christoffer requested. I'd
appreciate it if you could test to see if it also fixes the current
issue, and the original arch timer issue.

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Ard Biesheuvel
On 2 December 2015 at 19:50, Christoffer Dall
 wrote:
> On Tue, Dec 01, 2015 at 04:03:52PM +0300, Pavel Fedin wrote:
>> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
>> real physical addresses. This causes kvm_is_device_pfn() to return wrong
>> values, depending on how much guest and host memory maps match. This
>> results in completely broken KVM on some boards. The problem has been
>> caught on Samsung proprietary hardware.
>>
>> Cc: sta...@vger.kernel.org
>
> cc'ing stable doesn't make sense here as the bug was introduced in
> v4.4-rc3 and we didn't release v4.4 yet...
>
>> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
>> uncachedness")
>>
>> Signed-off-by: Pavel Fedin 
>> ---
>>  arch/arm/kvm/mmu.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 7dace90..51ad98f 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> *pmd,
>>
>>   pte = pte_offset_kernel(pmd, addr);
>>   do {
>> - if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> + if (!pte_none(*pte) &&
>> + (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>>   kvm_flush_dcache_pte(*pte);
>>   } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>
> You are right that there was a bug in the fix, but your fix is not the
> right one.
>
> Either we have to apply an actual mask and the compare against the value
> (yes, I know, because of the UXN bit we get lucky so far, but that's too
> brittle), or we should do a translation fo the gfn to a pfn.  Is there
> anything preventing us to do the following?
>
> if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
>

Yes, that looks better. I got confused by addr being a 'phys_addr_t'
but obviously, the address inside the PTE is the one we need to test
for device-ness, so I think we should replace both instances with this

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Ard Biesheuvel
Hi Pavel,

Thanks for getting to the bottom of this.

On 1 December 2015 at 14:03, Pavel Fedin  wrote:
> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> real physical addresses. This causes kvm_is_device_pfn() to return wrong
> values, depending on how much guest and host memory maps match. This
> results in completely broken KVM on some boards. The problem has been
> caught on Samsung proprietary hardware.
>
> Cc: sta...@vger.kernel.org
> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
>

That commit is not in a release yet, so no need for cc stable

> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7dace90..51ad98f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>
> pte = pte_offset_kernel(pmd, addr);
> do {
> -   if (!pte_none(*pte) && 
> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> +   if (!pte_none(*pte) &&
> +   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)

I think your analysis is correct, but does that not apply to both instances?
And instead of reverting, could we fix this properly instead?

> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> --
> 2.4.4
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/21] arm64: KVM: Add panic handling

2015-11-16 Thread Ard Biesheuvel
On 16 November 2015 at 14:11, Marc Zyngier  wrote:
> Add the panic handler, together with the small bits of assembly
> code to call the kernel's panic implementation.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/hyp-entry.S | 11 ++-
>  arch/arm64/kvm/hyp/hyp.h   |  1 +
>  arch/arm64/kvm/hyp/switch.c| 35 +++
>  3 files changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index e11a129..7218eed 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -141,7 +141,16 @@ el1_irq:
> mov x1, #ARM_EXCEPTION_IRQ
> b   __guest_exit
>
> -.macro invalid_vector  label, target = __kvm_hyp_panic
> +ENTRY(__hyp_do_panic)
> +   mov lr, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
> + PSR_MODE_EL1h)
> +   msr spsr_el2, lr
> +   ldr lr, =panic
> +   msr elr_el2, lr
> +   eret
> +ENDPROC(__hyp_do_panic)
> +
> +.macro invalid_vector  label, target = __hyp_panic
> .align  2
>  \label:
> b \target
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 240fb79..d5d500d 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -74,6 +74,7 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
>  void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>
>  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
> +void __noreturn __hyp_do_panic(unsigned long, ...);
>
>  #endif /* __ARM64_KVM_HYP_H__ */
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 06d3e20..cdc2a96 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -140,3 +140,38 @@ int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>
> return exit_code;
>  }
> +
> +static const char *__hyp_panic_string = "HYP panic:\nPS:%08x PC:%p 
> ESR:%p\nFAR:%p HPFAR:%p PAR:%p\nVCPU:%p\n";
> +

Re separating the HYP text from the kernel proper: this is exactly the
thing that is likely to cause trouble when you execute the kernel text
from HYP.

__hyp_panic_string is a non-const char pointer containing the absolute
address of the string in the initializer, as seen from the high kernel
virtual mapping.
Better use 'static const char __hyp_panic_string[]' instead.

(If it currenty works fine, it is only because the compiler optimizes
the entire variable away, and performs a relative access in the place
where the variable is referenced.)


> +void __hyp_text __noreturn __hyp_panic(void)
> +{
> +   u64 spsr = read_sysreg(spsr_el2);
> +   u64 elr = read_sysreg(elr_el2);
> +   u64 par = read_sysreg(par_el1);
> +
> +   if (read_sysreg(vttbr_el2)) {
> +   struct kvm_vcpu *vcpu;
> +   struct kvm_cpu_context *host_ctxt;
> +
> +   vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
> +   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +   __deactivate_traps(vcpu);
> +   __deactivate_vm(vcpu);
> +   __sysreg_restore_state(host_ctxt);
> +
> +   write_sysreg(host_ctxt->gp_regs.sp_el1, sp_el1);
> +   }
> +
> +   /* Call panic for real */
> +   while (1) {
> +   unsigned long str_va = (unsigned long)__hyp_panic_string;
> +
> +   str_va -= HYP_PAGE_OFFSET;
> +   str_va += PAGE_OFFSET;
> +   __hyp_do_panic(str_va,
> +  spsr,  elr,
> +  read_sysreg(esr_el2),   read_sysreg(far_el2),
> +  read_sysreg(hpfar_el2), par,
> +  read_sysreg(tpidr_el2));
> +   }
> +}
> --
> 2.1.4
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm: kvm: Fix STRICT_MM_TYPECHECK errors

2015-11-13 Thread Ard Biesheuvel
On 11 November 2015 at 03:03, Laura Abbott  wrote:
>
> PAGE_S2_DEVICE is a pgprot val and needs to be accessed using the proper
> accessors. Switch to these accessors to avoid errors with
> STRICT_MM_TYPECHECK.
>
> Signed-off-by: Laura Abbott 
> ---
> Found in the course of other work

Already fixed here:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/142953

Looks like we may need a mutex :-)

> ---
>  arch/arm/kvm/mmu.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 6984342..43f8162 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -213,7 +213,8 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> kvm_tlb_flush_vmid_ipa(kvm, addr);
>
> /* No need to invalidate the cache for device 
> mappings */
> -   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
> PAGE_S2_DEVICE)
> +   if ((pte_val(old_pte) & pgprot_val(PAGE_S2_DEVICE)) !=
> +pgprot_val(PAGE_S2_DEVICE))
> kvm_flush_dcache_pte(old_pte);
>
> put_page(virt_to_page(pte));
> @@ -306,7 +307,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
> pte = pte_offset_kernel(pmd, addr);
> do {
> if (!pte_none(*pte) &&
> -   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> +   (pte_val(*pte) & pgprot_val(PAGE_S2_DEVICE)) !=
> +pgprot_val(PAGE_S2_DEVICE))
> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> --
> 2.5.0
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 resend] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
The open coded tests for checking whether a PTE maps a page as
uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern,
which is not guaranteed to work since the type of a mapping is
not a set of mutually exclusive bits

For HYP mappings, the type is an index into the MAIR table (i.e, the
index itself does not contain any information whatsoever about the
type of the mapping), and for stage-2 mappings it is a bit field where
normal memory and device types are defined as follows:

#define MT_S2_NORMAL0xf
#define MT_S2_DEVICE_nGnRE  0x1

I.e., masking *and* comparing with the latter matches on the former,
and we have been getting lucky merely because the S2 device mappings
also have the PTE_UXN bit set, or we would misidentify memory mappings
as device mappings.

Since the unmap_range() code path (which contains one instance of the
flawed test) is used both for HYP mappings and stage-2 mappings, and
considering the difference between the two, it is non-trivial to fix
this by rewriting the tests in place, as it would involve passing
down the type of mapping through all the functions.

However, since HYP mappings and stage-2 mappings both deal with host
physical addresses, we can simply check whether the mapping is backed
by memory that is managed by the host kernel, and only perform the
D-cache maintenance if this is the case.

Signed-off-by: Ard Biesheuvel 
Tested-by: Pavel Fedin 
Reviewed-by: Christoffer Dall 
---

Resending since I failed to cc the lists the first time around, only
difference is the added tags.

 arch/arm/kvm/mmu.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6984342da13d..7dace909d5cf 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
__kvm_flush_dcache_pud(pud);
 }
 
+static bool kvm_is_device_pfn(unsigned long pfn)
+{
+   return !pfn_valid(pfn);
+}
+
 /**
  * stage2_dissolve_pmd() - clear and flush huge PMD entry
  * @kvm:   pointer to kvm structure.
@@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
kvm_tlb_flush_vmid_ipa(kvm, addr);
 
/* No need to invalidate the cache for device mappings 
*/
-   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
PAGE_S2_DEVICE)
+   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
kvm_flush_dcache_pte(old_pte);
 
put_page(virt_to_page(pte));
@@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 
pte = pte_offset_kernel(pmd, addr);
do {
-   if (!pte_none(*pte) &&
-   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
+   if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
kvm_flush_dcache_pte(*pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
 }
@@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
return kvm_vcpu_dabt_iswrite(vcpu);
 }
 
-static bool kvm_is_device_pfn(unsigned long pfn)
-{
-   return !pfn_valid(pfn);
-}
-
 /**
  * stage2_wp_ptes - write protect PMD range
  * @pmd:   pointer to pmd entry
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
On 10 November 2015 at 14:40, Christoffer Dall
 wrote:
> On Tue, Nov 10, 2015 at 02:15:45PM +0100, Ard Biesheuvel wrote:
>> On 10 November 2015 at 13:22, Christoffer Dall
>>  wrote:
>> > On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
>> >> Hi all,
>> >>
>> >> I wonder if this is a better way to address the problem. It looks at
>> >> the nature of the memory rather than the nature of the mapping, which
>> >> is probably a more reliable indicator of whether cache maintenance is
>> >> required when performing the unmap.
>> >>
>> >>
>> >> ---8<
>> >> The open coded tests for checking whether a PTE maps a page as
>> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> >> which is not guaranteed to work since the type of a mapping is
>> >> not a set of mutually exclusive bits
>> >>
>> >> For HYP mappings, the type is an index into the MAIR table (i.e, the
>> >> index itself does not contain any information whatsoever about the
>> >> type of the mapping), and for stage-2 mappings it is a bit field where
>> >> normal memory and device types are defined as follows:
>> >>
>> >> #define MT_S2_NORMAL0xf
>> >> #define MT_S2_DEVICE_nGnRE  0x1
>> >>
>> >> I.e., masking *and* comparing with the latter matches on the former,
>> >> and we have been getting lucky merely because the S2 device mappings
>> >> also have the PTE_UXN bit set, or we would misidentify memory mappings
>> >> as device mappings.
>> >>
>> >> Since the unmap_range() code path (which contains one instance of the
>> >> flawed test) is used both for HYP mappings and stage-2 mappings, and
>> >> considering the difference between the two, it is non-trivial to fix
>> >> this by rewriting the tests in place, as it would involve passing
>> >> down the type of mapping through all the functions.
>> >>
>> >> However, since HYP mappings and stage-2 mappings both deal with host
>> >> physical addresses, we can simply check whether the mapping is backed
>> >> by memory that is managed by the host kernel, and only perform the
>> >> D-cache maintenance if this is the case.
>> >>
>> >> Signed-off-by: Ard Biesheuvel 
>> >> ---
>> >>  arch/arm/kvm/mmu.c | 15 +++
>> >>  1 file changed, 7 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> >> index 6984342da13d..7dace909d5cf 100644
>> >> --- a/arch/arm/kvm/mmu.c
>> >> +++ b/arch/arm/kvm/mmu.c
>> >> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
>> >>   __kvm_flush_dcache_pud(pud);
>> >>  }
>> >>
>> >> +static bool kvm_is_device_pfn(unsigned long pfn)
>> >> +{
>> >> + return !pfn_valid(pfn);
>> >> +}
>> >> +
>> >>  /**
>> >>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>> >>   * @kvm: pointer to kvm structure.
>> >> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>> >>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>> >>
>> >>   /* No need to invalidate the cache for device 
>> >> mappings */
>> >> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
>> >> PAGE_S2_DEVICE)
>> >> + if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
>> >>   kvm_flush_dcache_pte(old_pte);
>> >>
>> >>   put_page(virt_to_page(pte));
>> >> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> >> *pmd,
>> >>
>> >>   pte = pte_offset_kernel(pmd, addr);
>> >>   do {
>> >> - if (!pte_none(*pte) &&
>> >> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>> >> + if (!pte_none(*pte) && 
>> >> !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> >>   kvm_flush_dcache_pte(*pte);
>> >>   } while (pte++, addr += PAGE_SIZE, addr != end);
>> >>  }
>> >> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu 
>&

Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
On 10 November 2015 at 13:22, Christoffer Dall
 wrote:
> On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
>> Hi all,
>>
>> I wonder if this is a better way to address the problem. It looks at
>> the nature of the memory rather than the nature of the mapping, which
>> is probably a more reliable indicator of whether cache maintenance is
>> required when performing the unmap.
>>
>>
>> ---8<
>> The open coded tests for checking whether a PTE maps a page as
>> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> which is not guaranteed to work since the type of a mapping is
>> not a set of mutually exclusive bits
>>
>> For HYP mappings, the type is an index into the MAIR table (i.e, the
>> index itself does not contain any information whatsoever about the
>> type of the mapping), and for stage-2 mappings it is a bit field where
>> normal memory and device types are defined as follows:
>>
>> #define MT_S2_NORMAL0xf
>> #define MT_S2_DEVICE_nGnRE  0x1
>>
>> I.e., masking *and* comparing with the latter matches on the former,
>> and we have been getting lucky merely because the S2 device mappings
>> also have the PTE_UXN bit set, or we would misidentify memory mappings
>> as device mappings.
>>
>> Since the unmap_range() code path (which contains one instance of the
>> flawed test) is used both for HYP mappings and stage-2 mappings, and
>> considering the difference between the two, it is non-trivial to fix
>> this by rewriting the tests in place, as it would involve passing
>> down the type of mapping through all the functions.
>>
>> However, since HYP mappings and stage-2 mappings both deal with host
>> physical addresses, we can simply check whether the mapping is backed
>> by memory that is managed by the host kernel, and only perform the
>> D-cache maintenance if this is the case.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  arch/arm/kvm/mmu.c | 15 +++
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 6984342da13d..7dace909d5cf 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
>>   __kvm_flush_dcache_pud(pud);
>>  }
>>
>> +static bool kvm_is_device_pfn(unsigned long pfn)
>> +{
>> + return !pfn_valid(pfn);
>> +}
>> +
>>  /**
>>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>>   * @kvm: pointer to kvm structure.
>> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>>
>>   /* No need to invalidate the cache for device mappings 
>> */
>> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
>> PAGE_S2_DEVICE)
>> + if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
>>   kvm_flush_dcache_pte(old_pte);
>>
>>   put_page(virt_to_page(pte));
>> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> *pmd,
>>
>>   pte = pte_offset_kernel(pmd, addr);
>>   do {
>> - if (!pte_none(*pte) &&
>> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>> + if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
>>   kvm_flush_dcache_pte(*pte);
>>   } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>>   return kvm_vcpu_dabt_iswrite(vcpu);
>>  }
>>
>> -static bool kvm_is_device_pfn(unsigned long pfn)
>> -{
>> - return !pfn_valid(pfn);
>> -}
>> -
>>  /**
>>   * stage2_wp_ptes - write protect PMD range
>>   * @pmd: pointer to pmd entry
>> --
>> 1.9.1
>>
>
> So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and
> PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory
> regions where the vma has (vm_flags & VM_PFNMAP).
>
> Will these, and only these, cases be covered by the pfn_valid check?
>

The pfn_valid() check will ensure that cache maintenance is only
performed on regions that are known to the host as memory, are managed
by the host (i.e., there is a struct page associated with them) and
will be accessed by the host via cacheable mappings (t

Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
(adding lists)

On 10 November 2015 at 10:45, Ard Biesheuvel  wrote:
> Hi all,
>
> I wonder if this is a better way to address the problem. It looks at
> the nature of the memory rather than the nature of the mapping, which
> is probably a more reliable indicator of whether cache maintenance is
> required when performing the unmap.
>
>
> ---8<
> The open coded tests for checking whether a PTE maps a page as
> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
> which is not guaranteed to work since the type of a mapping is
> not a set of mutually exclusive bits
>
> For HYP mappings, the type is an index into the MAIR table (i.e, the
> index itself does not contain any information whatsoever about the
> type of the mapping), and for stage-2 mappings it is a bit field where
> normal memory and device types are defined as follows:
>
> #define MT_S2_NORMAL0xf
> #define MT_S2_DEVICE_nGnRE  0x1
>
> I.e., masking *and* comparing with the latter matches on the former,
> and we have been getting lucky merely because the S2 device mappings
> also have the PTE_UXN bit set, or we would misidentify memory mappings
> as device mappings.
>
> Since the unmap_range() code path (which contains one instance of the
> flawed test) is used both for HYP mappings and stage-2 mappings, and
> considering the difference between the two, it is non-trivial to fix
> this by rewriting the tests in place, as it would involve passing
> down the type of mapping through all the functions.
>
> However, since HYP mappings and stage-2 mappings both deal with host
> physical addresses, we can simply check whether the mapping is backed
> by memory that is managed by the host kernel, and only perform the
> D-cache maintenance if this is the case.
>
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm/kvm/mmu.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 6984342da13d..7dace909d5cf 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
> __kvm_flush_dcache_pud(pud);
>  }
>
> +static bool kvm_is_device_pfn(unsigned long pfn)
> +{
> +   return !pfn_valid(pfn);
> +}
> +
>  /**
>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>   * @kvm:   pointer to kvm structure.
> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> kvm_tlb_flush_vmid_ipa(kvm, addr);
>
> /* No need to invalidate the cache for device 
> mappings */
> -   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
> PAGE_S2_DEVICE)
> +   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
> kvm_flush_dcache_pte(old_pte);
>
> put_page(virt_to_page(pte));
> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>
> pte = pte_offset_kernel(pmd, addr);
> do {
> -   if (!pte_none(*pte) &&
> -   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> +   if (!pte_none(*pte) && 
> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> return kvm_vcpu_dabt_iswrite(vcpu);
>  }
>
> -static bool kvm_is_device_pfn(unsigned long pfn)
> -{
> -   return !pfn_valid(pfn);
> -}
> -
>  /**
>   * stage2_wp_ptes - write protect PMD range
>   * @pmd:   pointer to pmd entry
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-04 Thread Ard Biesheuvel
On 4 November 2015 at 19:49, Christopher Covington  wrote:
> On 11/04/2015 08:31 AM, Christoffer Dall wrote:
>> On Tue, Nov 03, 2015 at 01:39:44PM -0600, Rob Herring wrote:
>>> On Tue, Nov 3, 2015 at 1:17 PM, Mario Smarduch  
>>> wrote:
 On 11/3/2015 9:55 AM, Will Deacon wrote:
> On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote:
>> On 11/3/2015 8:33 AM, Christopher Covington wrote:
>>> On 11/02/2015 06:51 PM, Mario Smarduch wrote:
this is a re-post from couple weeks ago, please take time to review 
 this
 simple patch which simplifies DEBUG_LL and prevents kernel crash on 
 virtual
 platforms.

 Before this patch DEBUG_LL for 'dummy virtual machine':

 ( ) Kernel low-level debugging via EmbeddedICE DCC channel
 ( ) Kernel low-level debug output via semihosting I/O
 ( ) Kernel low-level debugging via 8250 UART
 ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell

 In summary if debug uart is not emulated kernel crashes.
 And once you pass that hurdle, uart physical/virtual addresses are 
 unknown.
 DEBUG_LL comes in handy on many occasions and should be somewhat
 intuitive to use like it is for physical platforms. For virtual 
 platforms
 user may start daubting the host and get into a bigger mess.

 After this patch is applied user gets:

 (X) Kernel low-level debugging on QEMU Virtual Platform
 ( ) Kernel low-level debugging on Kvmtool Virtual Platform
. above repeated 

 The virtual addresses selected follow arm reference models, high in 
 vmalloc
 section with high mem enabled and guest running with >= 1GB of memory. 
 The
 offset is leftover from arm reference models.
>>>
>>> Which model? It doesn't appear to match the vexpress 
>>> AEM/RTSM/FVP/whatever
>>> which used 0x1c09 for UART0.
>>
>> I recall QEMU virt model had it's own physical address map, for sure I 
>> saw the
>> virtio-mmio regions assigned in some ARM document. Peter would you know?
>>
>> As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre 
>> will that
>> stay fixed?
>
> We make absolutely no guarantees about the memory map provided by kvmtool.

 If that's also the case for qemu, then I guess the best you can do is find 
 a way
 to dump the device tree. Find the uart, physical address and try figure 
 out the
 virtual address.

 Pretty involved, hoped for something more automated since that's a handy 
 feature.
>>>
>>> You really only need LL_DEBUG now if you are debugging very early code
>>> before memory is setup and/or bad memory. Use earlycon instead which
>>> should already be supported both via the pl011 or semihosting. I used
>>> it with QEMU semihosting support.
>>>
>> Then we should really document how to use that with qemu's virt platform
>> and kvmtool's platform on both 32-bit and 64-bit so that users can
>> easily figure out what they're doing wrong when they get no output.
>>
>> In practice, the address for the pl011 is quite unlikely to change, I
>> dare speculate, so that documentation shouldn't need frequent updating.
>
> Is it not on by default since the following change?
>
> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f022b8e95379b0433d13509706b66f38fc15dde8
>

Yes, but it still requires the plain 'earlycon' argument (i.e, without
'=pl011,...') to be passed on the kernel command line if you want
early output.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/15] arm64: Add page size to the kernel image header

2015-10-05 Thread Ard Biesheuvel
On 5 October 2015 at 14:02, Suzuki K. Poulose  wrote:
> On 02/10/15 16:49, Catalin Marinas wrote:
>>
>> On Tue, Sep 15, 2015 at 04:41:18PM +0100, Suzuki K. Poulose wrote:
>>>
>>> From: Ard Biesheuvel 
>>>
>>> This patch adds the page size to the arm64 kernel image header
>>> so that one can infer the PAGESIZE used by the kernel. This will
>>> be helpful to diagnose failures to boot the kernel with page size
>>> not supported by the CPU.
>>>
>>> Signed-off-by: Ard Biesheuvel 
>>
>>
>> This patch needs you signed-off-by as well since you are posting it. And
>> IIRC I acked it as well, I'll check.
>>
>
> Yes, you did mention that you were OK with the patch. But I thought there
> was
> no  'Acked-by' tag added. Hence didn't pick that up.
>

In my version of this patch (which I sent separately before noticing
that Suzuki had already folded it into his series), I took the comment
from Catalin to the email in which I suggested this change as an
implicit Acked-by.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/14] arm64: Check for selected granule support

2015-09-02 Thread Ard Biesheuvel
On 2 September 2015 at 11:48, Ard Biesheuvel  wrote:
> On 13 August 2015 at 19:29, Catalin Marinas  wrote:
>> On Thu, Aug 13, 2015 at 03:45:07PM +0100, Suzuki K. Poulose wrote:
>>> On 13/08/15 13:28, Steve Capper wrote:
>>> >On 13 August 2015 at 12:34, Suzuki K. Poulose  
>>> >wrote:
>>> >>  __enable_mmu:
>>> >>+   mrs x1, ID_AA64MMFR0_EL1
>>> >>+   ubfxx2, x1, #ID_AA64MMFR0_TGran_SHIFT, 4
>>> >>+   cmp x2, #ID_AA64MMFR0_TGran_ENABLED
>>> >>+   b.ne__no_granule_support
>>> >> ldr x5, =vectors
>>> >> msr vbar_el1, x5
>>> >> msr ttbr0_el1, x25  // load TTBR0
>>> >>@@ -626,3 +643,8 @@ __enable_mmu:
>>> >> isb
>>> >> br  x27
>>> >>  ENDPROC(__enable_mmu)
>>> >>+
>>> >>+__no_granule_support:
>>> >>+   wfe
>>> >>+   b __no_granule_support
>>> >>+ENDPROC(__no_granule_support)
>>> >>--
>>> >>1.7.9.5
>>> >>
>>> >
>>> >Is is possible to tell the user that the kernel has failed to boot due
>>> >to the kernel granule being unsupported?
>>>
>>> We don't have anything up at this time. The "looping address" is actually a 
>>> clue
>>> to the (expert) user. Not sure we can do something, until we get something 
>>> like DEBUG_LL(?)
>>
>> No.
>>
>>> Or we should let it continue and end in a panic(?). The current situation 
>>> can boot a
>>> multi-cluster system with boot cluster having the Tgran support(which 
>>> doesn't make a
>>> strong use case though). I will try out some options and get back to you.
>>
>> If the boot CPU does not support 16KB pages, in general there isn't much
>> we can do since the console printing is done after we enabled the MMU.
>> Even mapping the UART address requires fixmap support and the PAGE_SIZE
>> is hard-coded in the kernel image. The DT is also mapped at run-time.
>>
>> While in theory it's possible to fall back to a 4KB page size just
>> enough to load the DT and figure out the early console, I suggest we
>> just live with the "looping address" clue.
>>
>
> Couldn't we allocate some flag bits in the Image header to communicate
> the page size to the bootloader?

Something like this perhaps?

8<---
diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 7d9d3c2286b2..13a8aaa9a6e9 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -104,7 +104,8 @@ Header notes:
 - The flags field (introduced in v3.17) is a little-endian 64-bit field
   composed as follows:
   Bit 0:   Kernel endianness.  1 if BE, 0 if LE.
-  Bits 1-63:   Reserved.
+  Bits 1-2:Kernel page size.   0=unspecified, 1=4K, 2=16K, 3=64K
+  Bits 3-63:   Reserved.

 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index 8fae0756e175..5def289bda84 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -47,7 +47,9 @@
 #define __HEAD_FLAG_BE 0
 #endif

-#define __HEAD_FLAGS   (__HEAD_FLAG_BE << 0)
+#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+
+#define __HEAD_FLAGS   (__HEAD_FLAG_BE << 0) | (__HEAD_FLAG_PAGE_SIZE << 1)

 /*
  * These will output as part of the Image header, which should be little-endian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/14] arm64: Handle section maps for swapper/idmap

2015-09-02 Thread Ard Biesheuvel
On 2 September 2015 at 11:42, Suzuki K. Poulose  wrote:
> On 02/09/15 10:38, Ard Biesheuvel wrote:
>>
>> On 13 August 2015 at 13:33, Suzuki K. Poulose 
>> wrote:
>>>
>>> From: "Suzuki K. Poulose" 
>>>
>>> We use section maps with 4K page size to create the
>>> swapper/idmaps. So far we have used !64K or 4K checks
>>> to handle the case where we use the section maps. This
>>> patch adds a symbol to make it clear those cases.
>>>
>>
>> That sentence does not make sense.
>
>
> I agree. How about :
>
> "This patch adds a new symbol, 'ARM64_SWAPPER_USES_SECTION_MAPS', to
> handle cases where we use section maps, instead of using the page size
> symbols."
>

Yep, much better
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] arm64: 16K translation granule support

2015-09-02 Thread Ard Biesheuvel
On 13 August 2015 at 13:33, Suzuki K. Poulose  wrote:
> From: "Suzuki K. Poulose" 
>
> This series enables the 16K page size support on Linux for arm64.
> This series adds support for 48bit VA(4 level), 47bit VA(3 level) and
> 36bit VA(2 level) with 16K. 16K was a late addition to the architecture
> and is not implemented by all CPUs. Added a check to ensure the
> selected granule size is supported by the CPU, failing which the CPU
> won't proceed with booting.
>
> KVM bits have been tested on a fast model with GICv3 using Andre's kvmtool
> with gicv3 support[1].
>
> Patches 1-7 cleans up the kernel page size handling code.
> Patches 8-11 Fixes some issues with the KVM bits, mainly the fake PGD
>  handling code.
> Patch 12Adds a check to ensure the CPU supports the selected granule size.
> Patch 13-14 Adds the 16k page size support bits.
>
> This series applies on top of for-next/core branch of the aarch64 tree and is
> also available here:
>
> git://linux-arm.org/linux-skp.git  16k/v1
>


Hi Suzuki,

I have given this a spin on the FVP Base model to check UEFI booting,
and everything seems to work fine. (I tested 2-level and 3-level)
I didn't test the KVM changes, so for all patches except those:

Reviewed-by: Ard Biesheuvel 
Tested-by: Ard Biesheuvel 

Regards,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/14] arm64: Check for selected granule support

2015-09-02 Thread Ard Biesheuvel
On 13 August 2015 at 19:29, Catalin Marinas  wrote:
> On Thu, Aug 13, 2015 at 03:45:07PM +0100, Suzuki K. Poulose wrote:
>> On 13/08/15 13:28, Steve Capper wrote:
>> >On 13 August 2015 at 12:34, Suzuki K. Poulose  
>> >wrote:
>> >>  __enable_mmu:
>> >>+   mrs x1, ID_AA64MMFR0_EL1
>> >>+   ubfxx2, x1, #ID_AA64MMFR0_TGran_SHIFT, 4
>> >>+   cmp x2, #ID_AA64MMFR0_TGran_ENABLED
>> >>+   b.ne__no_granule_support
>> >> ldr x5, =vectors
>> >> msr vbar_el1, x5
>> >> msr ttbr0_el1, x25  // load TTBR0
>> >>@@ -626,3 +643,8 @@ __enable_mmu:
>> >> isb
>> >> br  x27
>> >>  ENDPROC(__enable_mmu)
>> >>+
>> >>+__no_granule_support:
>> >>+   wfe
>> >>+   b __no_granule_support
>> >>+ENDPROC(__no_granule_support)
>> >>--
>> >>1.7.9.5
>> >>
>> >
>> >Is is possible to tell the user that the kernel has failed to boot due
>> >to the kernel granule being unsupported?
>>
>> We don't have anything up at this time. The "looping address" is actually a 
>> clue
>> to the (expert) user. Not sure we can do something, until we get something 
>> like DEBUG_LL(?)
>
> No.
>
>> Or we should let it continue and end in a panic(?). The current situation 
>> can boot a
>> multi-cluster system with boot cluster having the Tgran support(which 
>> doesn't make a
>> strong use case though). I will try out some options and get back to you.
>
> If the boot CPU does not support 16KB pages, in general there isn't much
> we can do since the console printing is done after we enabled the MMU.
> Even mapping the UART address requires fixmap support and the PAGE_SIZE
> is hard-coded in the kernel image. The DT is also mapped at run-time.
>
> While in theory it's possible to fall back to a 4KB page size just
> enough to load the DT and figure out the early console, I suggest we
> just live with the "looping address" clue.
>

Couldn't we allocate some flag bits in the Image header to communicate
the page size to the bootloader?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/14] arm64: Handle section maps for swapper/idmap

2015-09-02 Thread Ard Biesheuvel
On 13 August 2015 at 13:33, Suzuki K. Poulose  wrote:
> From: "Suzuki K. Poulose" 
>
> We use section maps with 4K page size to create the
> swapper/idmaps. So far we have used !64K or 4K checks
> to handle the case where we use the section maps. This
> patch adds a symbol to make it clear those cases.
>

That sentence does not make sense.

> Cc: Ard Biesheuvel 
> Cc: Mark Rutland 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Signed-off-by: Suzuki K. Poulose 
> ---
>  arch/arm64/include/asm/kernel-pgtable.h |   31 +-
>  arch/arm64/mm/mmu.c |   70 
> ++-
>  2 files changed, 51 insertions(+), 50 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h 
> b/arch/arm64/include/asm/kernel-pgtable.h
> index 622929d..5876a36 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -19,6 +19,13 @@
>  #ifndef __ASM_KERNEL_PGTABLE_H
>  #define __ASM_KERNEL_PGTABLE_H
>
> +/* With 4K pages, we use section maps. */
> +#ifdef CONFIG_ARM64_4K_PAGES
> +#define ARM64_SWAPPER_USES_SECTION_MAPS 1
> +#else
> +#define ARM64_SWAPPER_USES_SECTION_MAPS 0
> +#endif
> +
>  /*
>   * The idmap and swapper page tables need some space reserved in the kernel
>   * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
> @@ -28,26 +35,28 @@
>   * could be increased on the fly if system RAM is out of reach for the 
> default
>   * VA range, so 3 pages are reserved in all cases.
>   */
> -#ifdef CONFIG_ARM64_64K_PAGES
> -#define SWAPPER_PGTABLE_LEVELS (CONFIG_PGTABLE_LEVELS)
> -#else
> +#if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_PGTABLE_LEVELS (CONFIG_PGTABLE_LEVELS - 1)
> +#else
> +#define SWAPPER_PGTABLE_LEVELS (CONFIG_PGTABLE_LEVELS)
>  #endif
>
>  #define SWAPPER_DIR_SIZE   (SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
>  #define IDMAP_DIR_SIZE (3 * PAGE_SIZE)
>
>  /* Initial memory map size */
> -#ifdef CONFIG_ARM64_64K_PAGES
> -#define SWAPPER_BLOCK_SHIFTPAGE_SHIFT
> -#define SWAPPER_BLOCK_SIZE PAGE_SIZE
> -#define SWAPPER_TABLE_SHIFTPMD_SHIFT
> -#else
> +#if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_BLOCK_SHIFTSECTION_SHIFT
>  #define SWAPPER_BLOCK_SIZE SECTION_SIZE
>  #define SWAPPER_TABLE_SHIFTPUD_SHIFT
> +#else
> +#define SWAPPER_BLOCK_SHIFTPAGE_SHIFT
> +#define SWAPPER_BLOCK_SIZE PAGE_SIZE
> +#define SWAPPER_TABLE_SHIFTPMD_SHIFT
>  #endif
>
> +/* The size of the initial kernel direct mapping */
> +#define SWAPPER_INIT_MAP_SIZE  (_AC(1, UL) << SWAPPER_TABLE_SHIFT)
>
>  /*
>   * Initial memory map attributes.
> @@ -55,10 +64,10 @@
>  #define SWAPPER_PTE_FLAGS  PTE_TYPE_PAGE | PTE_AF | PTE_SHARED
>  #define SWAPPER_PMD_FLAGS  PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S
>
> -#ifdef CONFIG_ARM64_64K_PAGES
> -#define SWAPPER_MM_MMUFLAGSPTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS
> -#else
> +#if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_MM_MMUFLAGSPMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS
> +#else
> +#define SWAPPER_MM_MMUFLAGSPTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS
>  #endif
>
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 9211b85..71230488 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -32,6 +32,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -353,14 +354,11 @@ static void __init map_mem(void)
>  * memory addressable from the initial direct kernel mapping.
>  *
>  * The initial direct kernel mapping, located at swapper_pg_dir, gives
> -* us PUD_SIZE (4K pages) or PMD_SIZE (64K pages) memory starting from
> -* PHYS_OFFSET (which must be aligned to 2MB as per
> -* Documentation/arm64/booting.txt).
> +* us PUD_SIZE (with SECTION maps, i.e, 4K) or PMD_SIZE (without
> +* SECTION maps, i.e, 64K pages) memory starting from PHYS_OFFSET
> +* (which must be aligned to 2MB as per 
> Documentation/arm64/booting.txt).
>  */
> -   if (IS_ENABLED(CONFIG_ARM64_64K_PAGES))
> -   limit = PHYS_OFFSET + PMD_SIZE;
> -   else
> -   limit = PHYS_OFFSET + PUD_SIZE;
> +   limit = PHYS_OFFSET + SWAPPER_INIT_MAP_SIZE;
> memblock_set_current_limit(limit);
>
> /* map all the memory banks */
> @@ -371,21 +369,24 @@ static void __init map_mem(void)
> if (start >= end)
> break;
>
> -#ifndef CONFIG_ARM64_64K_PAGES
> -   /*
> -* For the

Re: [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0

2015-08-31 Thread Ard Biesheuvel
On 31 August 2015 at 10:57, Christoffer Dall
 wrote:
> On Mon, Aug 31, 2015 at 10:46:59AM +0200, Ard Biesheuvel wrote:
>> On 30 August 2015 at 15:54, Christoffer Dall
>>  wrote:
>> > Provide a better quality of implementation and be architecture compliant
>> > on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
>> > reset of the timer, and call kvm_timer_update_state(vcpu) at the same
>> > time, ensuring the timer output is not asserted after, for example, a
>> > PSCI system reset.
>> >
>> > This change alone fixes the UEFI reset issue reported by Laszlo back in
>> > February.
>> >
>>
>> Do you have a link to that report? I can't quite remember the details ...
>>
> There was no public mailing list cc'ed on the report, and the bugzilla
> bug was created in a non-public Red Hat instance.
>
> But you were cc'ed on the original mail from Laszlo.  Look for an e-mail
> from February 5th with the subject "virtual timer interrupt stuck across
> guest firmware reboot on KVM".
>

Found it, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0

2015-08-31 Thread Ard Biesheuvel
On 30 August 2015 at 15:54, Christoffer Dall
 wrote:
> Provide a better quality of implementation and be architecture compliant
> on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
> reset of the timer, and call kvm_timer_update_state(vcpu) at the same
> time, ensuring the timer output is not asserted after, for example, a
> PSCI system reset.
>
> This change alone fixes the UEFI reset issue reported by Laszlo back in
> February.
>

Do you have a link to that report? I can't quite remember the details ...

> Cc: Laszlo Ersek 
> Cc: Ard Biesheuvel 
> Cc: Drew Jones 
> Cc: Wei Huang 
> Cc: Peter Maydell 
> Signed-off-by: Christoffer Dall 
> ---
>  virt/kvm/arm/arch_timer.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 747302f..8a0fdfc 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -255,6 +255,15 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> timer->irq.irq = irq->irq;
>
> /*
> +* The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> +* and to 0 for ARMv7.  We provide an implementation that always
> +* resets the timer to be disabled and unmasked and is compliant with
> +* the ARMv7 architecture.
> +*/
> +   timer->cntv_ctl = 0;
> +   kvm_timer_update_state(vcpu);
> +
> +   /*
>  * Tell the VGIC that the virtual interrupt is tied to a
>  * physical interrupt. We do that once per VCPU.
>  */
> --
> 2.1.2.330.g565301e.dirty
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ARM: KVM/XEN: how should we support virt-what?

2015-03-26 Thread Ard Biesheuvel
On 26 March 2015 at 19:49, Ard Biesheuvel  wrote:
> On 26 March 2015 at 19:45, Stefano Stabellini
>  wrote:
>> On Thu, 26 Mar 2015, Andrew Jones wrote:
>>> On Wed, Mar 25, 2015 at 10:44:42AM +0100, Andrew Jones wrote:
>>> > Hello ARM virt maintainers,
>>> >
>>> > I'd like to start a discussion about supporting virt-what[1]. virt-what
>>> > allows userspace to determine if the system it's running on is running
>>> > in a guest, and of what type (KVM, Xen, etc.). Despite it being a best
>>> > effort tool, see the Caveat emptor in [1], it has become quite a useful
>>> > tool, and is showing up in different places, such as OpenStack. If you
>>> > look at the code[2], specifically [3], then you'll see how it works on
>>> > x86, which is to use the dedicated hypervisor cpuid leaves. I'm
>>> > wondering what equivalent we have, or can develop, for arm.
>>> > Here are some thoughts;
>>> > 0) there's already something we can use, and I just need to be told
>>> >about it.
>>> > 1) be as similar as possible to x86 by dedicating some currently
>>> >undefined sysreg bits. This would take buy-in from lots of parties,
>>> >so is not likely the way to go.
>>> > 2) create a specific DT node that will get exposed through sysfs, or
>>> >somewhere.
>>> > 3) same as (2), but just use the nodes currently in mach-virt's DT
>>> >as the indication we're a guest. This would just be a heuristic,
>>> >i.e. "have virtio mmio" && psci.method == hvc, or something,
>>> >and we'd still need a way to know if we're kvm vs. xen vs. ??.
>>> >
>>> > Thanks,
>>> > drew
>>> >
>>> > [1] http://people.redhat.com/~rjones/virt-what/
>>> > [2] http://git.annexia.org/?p=virt-what.git;a=summary
>>> > [3] 
>>> > http://git.annexia.org/?p=virt-what.git;a=blob_plain;f=virt-what-cpuid-helper.c;hb=HEAD
>>>
>>> Thanks everyone for their responses. So, the current summary seems to
>>> be;
>>> 1) Xen has both a DT node and an ACPI table, virt-what can learn how
>>>to probe those.
>>> 2) We don't have anything yet for KVM, and we're reluctant to create a
>>>specific DT node. Anyway, we'd still need to address ACPI booted
>>>guests some other way.
>>>
>>> For a short-term, DT-only, approach we could go with a heuristic, one
>>> that includes Marc's "if hypervisor node exists, then xen, else kvm"
>>> condition.
>>>
>>> How about SMBIOS for a long-term solution that works for both DT and
>>> ACPI? We're not populating SMBIOS for arm guests yet in qemu, but now
>>> that AAVMF has fw_cfg, we should be able to. On x86 we already have
>>> smbios populated from qemu, although not in a way that allows us to
>>> determine kvm vs. xen vs. tcg.
>>
>> I don't think that SMBIOS works with DT.
>>
>
> SMBIOS works fine with DT

... but it needs UEFI ...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ARM: KVM/XEN: how should we support virt-what?

2015-03-26 Thread Ard Biesheuvel
On 26 March 2015 at 19:45, Stefano Stabellini
 wrote:
> On Thu, 26 Mar 2015, Andrew Jones wrote:
>> On Wed, Mar 25, 2015 at 10:44:42AM +0100, Andrew Jones wrote:
>> > Hello ARM virt maintainers,
>> >
>> > I'd like to start a discussion about supporting virt-what[1]. virt-what
>> > allows userspace to determine if the system it's running on is running
>> > in a guest, and of what type (KVM, Xen, etc.). Despite it being a best
>> > effort tool, see the Caveat emptor in [1], it has become quite a useful
>> > tool, and is showing up in different places, such as OpenStack. If you
>> > look at the code[2], specifically [3], then you'll see how it works on
>> > x86, which is to use the dedicated hypervisor cpuid leaves. I'm
>> > wondering what equivalent we have, or can develop, for arm.
>> > Here are some thoughts;
>> > 0) there's already something we can use, and I just need to be told
>> >about it.
>> > 1) be as similar as possible to x86 by dedicating some currently
>> >undefined sysreg bits. This would take buy-in from lots of parties,
>> >so is not likely the way to go.
>> > 2) create a specific DT node that will get exposed through sysfs, or
>> >somewhere.
>> > 3) same as (2), but just use the nodes currently in mach-virt's DT
>> >as the indication we're a guest. This would just be a heuristic,
>> >i.e. "have virtio mmio" && psci.method == hvc, or something,
>> >and we'd still need a way to know if we're kvm vs. xen vs. ??.
>> >
>> > Thanks,
>> > drew
>> >
>> > [1] http://people.redhat.com/~rjones/virt-what/
>> > [2] http://git.annexia.org/?p=virt-what.git;a=summary
>> > [3] 
>> > http://git.annexia.org/?p=virt-what.git;a=blob_plain;f=virt-what-cpuid-helper.c;hb=HEAD
>>
>> Thanks everyone for their responses. So, the current summary seems to
>> be;
>> 1) Xen has both a DT node and an ACPI table, virt-what can learn how
>>to probe those.
>> 2) We don't have anything yet for KVM, and we're reluctant to create a
>>specific DT node. Anyway, we'd still need to address ACPI booted
>>guests some other way.
>>
>> For a short-term, DT-only, approach we could go with a heuristic, one
>> that includes Marc's "if hypervisor node exists, then xen, else kvm"
>> condition.
>>
>> How about SMBIOS for a long-term solution that works for both DT and
>> ACPI? We're not populating SMBIOS for arm guests yet in qemu, but now
>> that AAVMF has fw_cfg, we should be able to. On x86 we already have
>> smbios populated from qemu, although not in a way that allows us to
>> determine kvm vs. xen vs. tcg.
>
> I don't think that SMBIOS works with DT.
>

SMBIOS works fine with DT
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-05 Thread Ard Biesheuvel
On 5 March 2015 at 15:58, Catalin Marinas  wrote:
> On Thu, Mar 05, 2015 at 01:26:39PM +0100, Paolo Bonzini wrote:
>> On 05/03/2015 13:03, Catalin Marinas wrote:
>> >> > I'd hate to have to do that. PCI should be entirely probeable
>> >> > given that we tell the guest where the host bridge is, that's
>> >> > one of its advantages.
>> > I didn't say a DT node per device, the DT doesn't know what PCI devices
>> > are available (otherwise it defeats the idea of probing). But we need to
>> > tell the OS where the host bridge is via DT.
>> >
>> > So the guest would be told about two host bridges: one for real devices
>> > and another for virtual devices. These can have different coherency
>> > properties.
>>
>> Yeah, and it would suck that the user needs to know the difference
>> between the coherency proprties of the host bridges.
>
> The host needs to know about this, unless we assume full coherency on
> all the platforms. Arguably, Qemu needs to know as well if it is the one
> generating the DT for guest (or at least passing some snippets from the
> host DT).
>
>> It would especially suck if the user has a cluster with different
>> machines, some of them coherent and others non-coherent, and then has to
>> debug why the same configuration works on some machines and not on others.
>
> That's a problem indeed, especially with guest migration. But I don't
> think we have any sane solution here for the bus master DMA.
>
>> To avoid replying in two different places, which of the solutions look
>> to me like something that half-works?  Pretty much all of them, because
>> in the end it is just a processor misfeature.  For example, Intel
>> virtualization extensions let the hypervisor override stage1 translation
>> _if necessary_.  AMD doesn't, but has some other quirky things that let
>> you achieve the same effect..
>
> ARM can override them as well but only making them stricter. Otherwise,
> on a weakly ordered architecture, it's not always safe (let's say the
> guest thinks it accesses Strongly Ordered memory and avoids barriers for
> flag updates but the host "upgrades" it to Cacheable which breaks the
> memory order).
>
> If we want the host to enforce guest memory mapping attributes via stage
> 2, we could do it the other way around: get the guests to always assume
> full cache coherency, generating Normal Cacheable mappings, but use the
> stage 2 attributes restriction in the host to make such mappings
> non-cacheable when needed (it works this way on ARM but not in the other
> direction to relax the attributes).
>

This was precisely the idea of the MAIR mangling patch: promote all
stage1 mappings to cacheable, and put the host in control by allowing
it to supersede them with device mappings in stage 2.
But it appears there are too many corner cases where this doesn't
quite work out.


>> In particular, I am not even sure that this is about bus coherency,
>> because this problem does not happen when the device is doing bus master
>> DMA.  Working around coherency for bus master DMA would be easy.
>
> My previous emails on the "dma-coherent" property were only about bus
> master DMA (which would cause the correct selection of the DMA API ops
> in the guest).
>
> But even for bus master DMA, guest OS still needs to be aware of the
> (virtual) device DMA capabilities (cache coherent or not). You may be
> able to work around it in the host (stage 2, explicit cache flushing or
> SMMU attributes) if the guests assumes non-coherency but it's not really
> efficient (nor nice to implement).
>
>> The problem arises with MMIO areas that the guest can reasonably expect
>> to be uncacheable, but that are optimized by the host so that they end
>> up backed by cacheable RAM.  It's perfectly reasonable that the same
>> device needs cacheable mapping with one userspace, and works with
>> uncacheable mapping with another userspace that doesn't optimize the
>> MMIO area to RAM.
>
> Unless the guest allocates the framebuffer itself (e.g.
> dma_alloc_coherent), we can't control the cacheability via
> "dma-coherent" properties as it refers to bus master DMA.
>
> So for MMIO with the buffer allocated by the host (Qemu), the only
> solution I see on ARM is for the host to ensure coherency, either via
> explicit cache maintenance (new KVM API) or by changing the memory
> attributes used by Qemu to access such virtual MMIO.
>
> Basically Qemu is acting as a bus master when reading the framebuffer it
> allocated but the guest considers it a slave access and we don't have a
> way to tell the guest that such accesses should be cacheable, nor can we
> upgrade them via architecture features.
>
>> Currently the VGA framebuffer is the main case where this happen, and I
>> don't expect many more.  Because this is not bus master DMA, it's hard
>> to find a QEMU API that can be hooked to invalidate the cache.  QEMU is
>> just reading from an array of chars.
>
> I now understand the problem better. I was under the impression that the
> guest a

Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Ard Biesheuvel
On 4 March 2015 at 13:29, Catalin Marinas  wrote:
> On Wed, Mar 04, 2015 at 12:50:57PM +0100, Ard Biesheuvel wrote:
>> On 4 March 2015 at 12:35, Catalin Marinas  wrote:
>> > On Mon, Mar 02, 2015 at 06:20:19PM -0800, Mario Smarduch wrote:
>> >> On 03/02/2015 08:31 AM, Christoffer Dall wrote:
>> >> > However, my concern with these patches are on two points:
>> >> >
>> >> > 1. It's not a fix-all.  We still have the case where the guest expects
>> >> > the behavior of device memory (for strong ordering for example) on a RAM
>> >> > region, which we now break.  Similiarly this doesn't support the
>> >> > non-coherent DMA to RAM region case.
>> >> >
>> >> > 2. While the code is probably as nice as this kind of stuff gets, it
>> >> > is non-trivial and extremely difficult to debug.  The counter-point here
>> >> > is that we may end up handling other stuff at EL2 for performanc reasons
>> >> > in the future.
>> >> >
>> >> > Mainly because of point 1 above, I am leaning to thinking userspace
>> >> > should do the invalidation when it knows it needs to, either through KVM
>> >> > via a memslot flag or through some other syscall mechanism.
>> >
>> > I expressed my concerns as well, I'm definitely against merging this
>> > series.
>>
>> Don't worry, that was never the intention, at least not as-is :-)
>
> I wasn't worried, just wanted to make my position clearer ;).
>
>> I think we have established that the performance hit is not the
>> problem but the correctness is.
>
> I haven't looked at the performance figures but has anyone assessed the
> hit caused by doing cache maintenance in Qemu vs cacheable guest
> accesses (and no maintenance)?
>

No, I don't think so. The performance hit I am referring to is the
performance hit caused by leaving the trapping of VM system register
writes enabled all the time, so that writes to MAIR_EL1 are always
caught. This is why patch #1 implements some of the sysreg write
handling in EL2

>> I do have a remaining question, though: my original [non-working]
>> approach was to replace uncached mappings with write-through
>> read-allocate write-allocate,
>
> Does it make sense to have write-through and write-allocate at the same
> time? The write-allocate hint would probably be ignored as write-through
> writes do not generate linefills.
>

OK, that answers my question then. The availability of a
write-allocate setting on write-through attributes suggested to me
that writes would go to both the cache and main memory, so that the
write-back cached attribute the host is using for the same memory
would not result in it reading stale data.

>> which I expected would keep the caches
>> in sync with main memory, but apparently I am misunderstanding
>> something here. (This is the reason for s/0xbb/0xff/ in patch #2 to
>> get it to work: it replaces WT/RA/WA with WB/RA/WA)
>>
>> Is there no way to use write-through caching here?
>
> Write-through is considered non-cacheable from a write perspective when
> it does not hit in the cache. AFAIK, it should still be able to hit
> existing cache lines and evict. The ARM ARM states that cache cleaning
> to _PoU_ is not required for coherency when the writes are to
> write-through memory but I have to dig further into the PoC because
> that's what we care about here.
>
> What platform did you test it on? I can't tell what the behaviour of
> system caches is. I know they intercept explicit cache maintenance by VA
> but not sure what happens to write-through writes when they hit in the
> system cache (are they evicted to RAM or not?). If such write-through
> writes are only evicted to the point-of-unification, they won't work
> since non-cacheable accesses go all the way to PoC.
>

This was tested on APM, by Drew and Laszlo (thanks guys)

I have recently received a Seattle myself, but I haven't had time yet
to test these patches myself.

> I need to do more reading through the ARM ARM,

If you say so :-)

> it should be hidden
> somewhere ;).
>

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Ard Biesheuvel
On 4 March 2015 at 12:35, Catalin Marinas  wrote:
> (please try to avoid top-posting)
>
> On Mon, Mar 02, 2015 at 06:20:19PM -0800, Mario Smarduch wrote:
>> On 03/02/2015 08:31 AM, Christoffer Dall wrote:
>> > However, my concern with these patches are on two points:
>> >
>> > 1. It's not a fix-all.  We still have the case where the guest expects
>> > the behavior of device memory (for strong ordering for example) on a RAM
>> > region, which we now break.  Similiarly this doesn't support the
>> > non-coherent DMA to RAM region case.
>> >
>> > 2. While the code is probably as nice as this kind of stuff gets, it
>> > is non-trivial and extremely difficult to debug.  The counter-point here
>> > is that we may end up handling other stuff at EL2 for performanc reasons
>> > in the future.
>> >
>> > Mainly because of point 1 above, I am leaning to thinking userspace
>> > should do the invalidation when it knows it needs to, either through KVM
>> > via a memslot flag or through some other syscall mechanism.
>
> I expressed my concerns as well, I'm definitely against merging this
> series.
>

Don't worry, that was never the intention, at least not as-is :-)

I think we have established that the performance hit is not the
problem but the correctness is.

I do have a remaining question, though: my original [non-working]
approach was to replace uncached mappings with write-through
read-allocate write-allocate, which I expected would keep the caches
in sync with main memory, but apparently I am misunderstanding
something here. (This is the reason for s/0xbb/0xff/ in patch #2 to
get it to work: it replaces WT/RA/WA with WB/RA/WA)

Is there no way to use write-through caching here?

>> I don't understand how can the CPU handle different cache attributes
>> used by QEMU and Guest won't you run into B2.9 checklist? Wouldn't
>> cache evictions or cleans wipe out guest updates to same cache
>> line(s)?
>
> "Clean+invalidate" is a safe operation even if the guest accesses the
> memory in a cacheable way. But if the guest can update the cache lines,
> Qemu should avoid cache maintenance from a performance perspective.
>
> The guest is either told that the DMA is coherent (via DT properties) or
> Qemu deals with (non-)coherency itself. The latter is fully in line with
> the B2.9 chapter in the ARM ARM, more precisely point 5:
>
>   If the mismatched attributes for a memory location all assign the same
>   shareability attribute to the location, any loss of uniprocessor
>   semantics or coherency within a shareability domain can be avoided by
>   use of software cache management.
>
> ... it continues with what kind of cache maintenance is required,
> together with:
>
>   A clean and invalidate instruction can be used instead of a clean
>   instruction, or instead of an invalidate instruction.
>
> --
> Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-24 Thread Ard Biesheuvel
On 24 February 2015 at 14:55, Andrew Jones  wrote:
> On Fri, Feb 20, 2015 at 04:36:26PM +0100, Andrew Jones wrote:
>> On Fri, Feb 20, 2015 at 02:37:25PM +0000, Ard Biesheuvel wrote:
>> > On 20 February 2015 at 14:29, Andrew Jones  wrote:
>> > > So looks like the 3 orders of magnitude greater number of traps
>> > > (only to el2) don't impact kernel compiles.
>> > >
>> >
>> > OK, good! That was what I was hoping for, obviously.
>> >
>> > > Then I thought I'd be able to quick measure the number of cycles
>> > > a trap to el2 takes with this kvm-unit-tests test
>> > >
>> > > int main(void)
>> > > {
>> > > unsigned long start, end;
>> > > unsigned int sctlr;
>> > >
>> > > asm volatile(
>> > > "   mrs %0, sctlr_el1\n"
>> > > "   msr pmcr_el0, %1\n"
>> > > : "=&r" (sctlr) : "r" (5));
>> > >
>> > > asm volatile(
>> > > "   mrs %0, pmccntr_el0\n"
>> > > "   msr sctlr_el1, %2\n"
>> > > "   mrs %1, pmccntr_el0\n"
>> > > : "=&r" (start), "=&r" (end) : "r" (sctlr));
>> > >
>> > > printf("%llx\n", end - start);
>> > > return 0;
>> > > }
>> > >
>> > > after applying this patch to kvm
>> > >
>> > > diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> > > index bb91b6fc63861..5de39d740aa58 100644
>> > > --- a/arch/arm64/kvm/hyp.S
>> > > +++ b/arch/arm64/kvm/hyp.S
>> > > @@ -770,7 +770,7 @@
>> > >
>> > > mrs x2, mdcr_el2
>> > > and x2, x2, #MDCR_EL2_HPMN_MASK
>> > > -   orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
>> > > +// orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
>> > > orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
>> > >
>> > > // Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
>> > >
>> > > But I get zero for the cycle count. Not sure what I'm missing.
>> > >
>> >
>> > No clue tbh. Does the counter work as expected in the host?
>> >
>>
>> Guess not. I dropped the test into a module_init and inserted
>> it on the host. Always get zero for pmccntr_el0 reads. Or, if
>> I set it to something non-zero with a write, then I always get
>> that back - no increments. pmcr_el0 looks OK... I had forgotten
>> to set bit 31 of pmcntenset_el0, but doing that still doesn't
>> help. Anyway, I assume the problem is me. I'll keep looking to
>> see what I'm missing.
>>
>
> I returned to this and see that the problem was indeed me. I needed yet
> another enable bit set (the filter register needed to be instructed to
> count cycles while in el2). I've attached the code for the curious.
> The numbers are mean=6999, std_dev=242. Run on the host, or in a guest
> running on a host without this patch series (after TVM traps have been
> disabled), I get a pretty consistent 40.
>
> I checked how many vm-sysreg traps we do during the kernel compile
> benchmark. It's 124924. So it's a bit strange that we don't see the
> benchmark taking 10 to 20 seconds longer on average. I should probably
> double check my runs. In any case, while I like the approach of this
> series, the overhead is looking non-negligible.
>

Thanks a lot for producing these numbers. 125k x 7k == <1 billion
cycles == <1 second on a >1 GHz machine, I think?
Or am I missing something? How long does the actual compile take?

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-20 Thread Ard Biesheuvel
On 20 February 2015 at 14:29, Andrew Jones  wrote:
> On Thu, Feb 19, 2015 at 06:57:24PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 19/02/2015 18:55, Andrew Jones wrote:
>> >> > > (I don't have an exact number for how many times it went to EL1 
>> >> > > because
>> >> > >  access_mair() doesn't have a trace point.)
>> >> > > (I got the 62873 number by testing a 3rd kernel build that only had 
>> >> > > patch
>> >> > >  3/3 applied to the base, and counting kvm_toggle_cache events.)
>> >> > > (The number 50 is the number of kvm_toggle_cache events *without* 3/3
>> >> > >  applied.)
>> >> > >
>> >> > > I consider this bad news because, even considering it only goes to 
>> >> > > EL2,
>> >> > > it goes a ton more than it used to. I realize patch 3/3 isn't the 
>> >> > > final
>> >> > > plan for enabling traps though.
>>
>> If a full guest boots, can you try timing a kernel compile?
>>
>
> Guests boot. I used an 8 vcpu, 14G memory guest; compiled the kernel 4
> times inside the guest for each host kernel; base and mair. I dropped
> the time from the first run of each set, and captured the other 3.
> Command line used below. Time is from the
>   Elapsed (wall clock) time (h:mm:ss or m:ss):
> output of /usr/bin/time - the host's wall clock.
>
>   /usr/bin/time --verbose ssh $VM 'cd kernel && make -s clean && make -s -j8'
>
> Results:
> base: 3:06.11 3:07.00 3:10.93
> mair: 3:08.47 3:06.75 3:04.76
>
> So looks like the 3 orders of magnitude greater number of traps
> (only to el2) don't impact kernel compiles.
>

OK, good! That was what I was hoping for, obviously.

> Then I thought I'd be able to quick measure the number of cycles
> a trap to el2 takes with this kvm-unit-tests test
>
> int main(void)
> {
> unsigned long start, end;
> unsigned int sctlr;
>
> asm volatile(
> "   mrs %0, sctlr_el1\n"
> "   msr pmcr_el0, %1\n"
> : "=&r" (sctlr) : "r" (5));
>
> asm volatile(
> "   mrs %0, pmccntr_el0\n"
> "   msr sctlr_el1, %2\n"
> "   mrs %1, pmccntr_el0\n"
> : "=&r" (start), "=&r" (end) : "r" (sctlr));
>
> printf("%llx\n", end - start);
> return 0;
> }
>
> after applying this patch to kvm
>
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index bb91b6fc63861..5de39d740aa58 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -770,7 +770,7 @@
>
> mrs x2, mdcr_el2
> and x2, x2, #MDCR_EL2_HPMN_MASK
> -   orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> +// orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
>
> // Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
>
> But I get zero for the cycle count. Not sure what I'm missing.
>

No clue tbh. Does the counter work as expected in the host?

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-19 Thread Ard Biesheuvel

> On 19 feb. 2015, at 17:55, Andrew Jones  wrote:
> 
>> On Thu, Feb 19, 2015 at 05:19:35PM +, Ard Biesheuvel wrote:
>>> On 19 February 2015 at 16:57, Andrew Jones  wrote:
>>>> On Thu, Feb 19, 2015 at 10:54:43AM +, Ard Biesheuvel wrote:
>>>> This is a 0th order approximation of how we could potentially force the 
>>>> guest
>>>> to avoid uncached mappings, at least from the moment the MMU is on. (Before
>>>> that, all of memory is implicitly classified as Device-nGnRnE)
>>>> 
>>>> The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached 
>>>> mappings
>>>> with cached ones. This way, there is no need to mangle any guest page 
>>>> tables.
>>>> 
>>>> The downside is that, to do this correctly, we need to always trap writes 
>>>> to
>>>> the VM sysreg group, which includes registers that the guest may write to 
>>>> very
>>>> often. To reduce the associated performance hit, patch #1 introduces a 
>>>> fast path
>>>> for EL2 to perform trivial sysreg writes on behalf of the guest, without 
>>>> the
>>>> need for a full world switch to the host and back.
>>>> 
>>>> The main purpose of these patches is to quantify the performance hit, and
>>>> verify whether the MAIR_EL1 handling works correctly.
>>>> 
>>>> Ard Biesheuvel (3):
>>>>  arm64: KVM: handle some sysreg writes in EL2
>>>>  arm64: KVM: mangle MAIR register to prevent uncached guest mappings
>>>>  arm64: KVM: keep trapping of VM sysreg writes enabled
>>> 
>>> Hi Ard,
>>> 
>>> I took this series for test drive. Unfortunately I have bad news and worse
>>> news. First, a description of the test; simply boot a guest, once at login,
>>> login, and then shutdown with 'poweroff'. The guest boots through AAVMF 
>>> using
>>> a build from Laszlo that enables PCI, but does *not* have the 'map pci mmio
>>> as cached' kludge. This test allows us to check for corrupt vram on the
>>> graphical console, plus it completes a boot/shutdown cycle allowing us to
>>> count sysreg traps of the boot/shutdown cycle.
>> 
>> Thanks a lot for giving this a spin right away!
>> 
>>> So, the bad news
>>> 
>>> Before this series we trapped 50 times on sysreg writes with the test
>>> described above. With this series we trap 62873 times. But, less than
>>> 20 required going to EL1.
>> 
>> OK, this is very useful information. We still don't know what the
>> penalty is of all those traps, but that's quite a big number indeed.
>> 
>>> (I don't have an exact number for how many times it went to EL1 because
>>> access_mair() doesn't have a trace point.)
>>> (I got the 62873 number by testing a 3rd kernel build that only had patch
>>> 3/3 applied to the base, and counting kvm_toggle_cache events.)
>>> (The number 50 is the number of kvm_toggle_cache events *without* 3/3
>>> applied.)
>>> 
>>> I consider this bad news because, even considering it only goes to EL2,
>>> it goes a ton more than it used to. I realize patch 3/3 isn't the final
>>> plan for enabling traps though.
>>> 
>>> And, now the worse news
>>> 
>>> The vram corruption persists with this patch series.
>> 
>> OK, so the primary difference is that I am not substituting for write
>> back mappings, as Laszlo is doing in his patch.
>> If you have energy left, would you mind having another go but use 0xff
>> (not 0xbb) for the MAIR values in patch #2?
> 
> Yup, a bit energy left, and, yup, 0xff fixes it

ok so that means we'd need to map as writeback cacheable by default, and 
restrict it as necessary at stage 2.

thanks


> Thanks,
> drew
> 
>> 
>>>> 
>>>> arch/arm/kvm/mmu.c   |   2 +-
>>>> arch/arm64/include/asm/kvm_arm.h |   2 +-
>>>> arch/arm64/kvm/hyp.S | 101 
>>>> +++
>>>> arch/arm64/kvm/sys_regs.c|  63 
>>>> 4 files changed, 156 insertions(+), 12 deletions(-)
>>>> 
>>>> --
>>>> 1.8.3.2
>>>> 
>>>> ___
>>>> kvmarm mailing list
>>>> kvm...@lists.cs.columbia.edu
>>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-19 Thread Ard Biesheuvel
On 19 February 2015 at 16:57, Andrew Jones  wrote:
> On Thu, Feb 19, 2015 at 10:54:43AM +0000, Ard Biesheuvel wrote:
>> This is a 0th order approximation of how we could potentially force the guest
>> to avoid uncached mappings, at least from the moment the MMU is on. (Before
>> that, all of memory is implicitly classified as Device-nGnRnE)
>>
>> The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached 
>> mappings
>> with cached ones. This way, there is no need to mangle any guest page tables.
>>
>> The downside is that, to do this correctly, we need to always trap writes to
>> the VM sysreg group, which includes registers that the guest may write to 
>> very
>> often. To reduce the associated performance hit, patch #1 introduces a fast 
>> path
>> for EL2 to perform trivial sysreg writes on behalf of the guest, without the
>> need for a full world switch to the host and back.
>>
>> The main purpose of these patches is to quantify the performance hit, and
>> verify whether the MAIR_EL1 handling works correctly.
>>
>> Ard Biesheuvel (3):
>>   arm64: KVM: handle some sysreg writes in EL2
>>   arm64: KVM: mangle MAIR register to prevent uncached guest mappings
>>   arm64: KVM: keep trapping of VM sysreg writes enabled
>
> Hi Ard,
>
> I took this series for test drive. Unfortunately I have bad news and worse
> news. First, a description of the test; simply boot a guest, once at login,
> login, and then shutdown with 'poweroff'. The guest boots through AAVMF using
> a build from Laszlo that enables PCI, but does *not* have the 'map pci mmio
> as cached' kludge. This test allows us to check for corrupt vram on the
> graphical console, plus it completes a boot/shutdown cycle allowing us to
> count sysreg traps of the boot/shutdown cycle.
>

Thanks a lot for giving this a spin right away!

> So, the bad news
>
> Before this series we trapped 50 times on sysreg writes with the test
> described above. With this series we trap 62873 times. But, less than
> 20 required going to EL1.
>

OK, this is very useful information. We still don't know what the
penalty is of all those traps, but that's quite a big number indeed.

> (I don't have an exact number for how many times it went to EL1 because
>  access_mair() doesn't have a trace point.)
> (I got the 62873 number by testing a 3rd kernel build that only had patch
>  3/3 applied to the base, and counting kvm_toggle_cache events.)
> (The number 50 is the number of kvm_toggle_cache events *without* 3/3
>  applied.)
>
> I consider this bad news because, even considering it only goes to EL2,
> it goes a ton more than it used to. I realize patch 3/3 isn't the final
> plan for enabling traps though.
>
> And, now the worse news
>
> The vram corruption persists with this patch series.
>

OK, so the primary difference is that I am not substituting for write
back mappings, as Laszlo is doing in his patch.
If you have energy left, would you mind having another go but use 0xff
(not 0xbb) for the MAIR values in patch #2?

>>
>>  arch/arm/kvm/mmu.c   |   2 +-
>>  arch/arm64/include/asm/kvm_arm.h |   2 +-
>>  arch/arm64/kvm/hyp.S | 101 
>> +++
>>  arch/arm64/kvm/sys_regs.c|  63 
>>  4 files changed, 156 insertions(+), 12 deletions(-)
>>
>> --
>> 1.8.3.2
>>
>> ___
>> kvmarm mailing list
>> kvm...@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-19 Thread Ard Biesheuvel
On 19 February 2015 at 15:27, Alexander Graf  wrote:
>
>
> On 19.02.15 15:56, Ard Biesheuvel wrote:
>> On 19 February 2015 at 14:50, Alexander Graf  wrote:
>>>
>>>
>>> On 19.02.15 11:54, Ard Biesheuvel wrote:
>>>> This is a 0th order approximation of how we could potentially force the 
>>>> guest
>>>> to avoid uncached mappings, at least from the moment the MMU is on. (Before
>>>> that, all of memory is implicitly classified as Device-nGnRnE)
>>>>
>>>> The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached 
>>>> mappings
>>>> with cached ones. This way, there is no need to mangle any guest page 
>>>> tables.
>>>
>>> Would you mind to give a brief explanation on what this does? What
>>> happens to actually assigned devices that need to be mapped as uncached?
>>> What happens to DMA from such devices when the guest assumes that it's
>>> accessing RAM uncached and then triggers DMA?
>>>
>>
>> On ARM, stage 2 mappings that are more strict will supersede stage 1
>> mappings, so the idea is to use cached mappings exclusively for stage
>> 1 so that the host is fully in control of the actual memory attributes
>> by setting the attributes at stage 2. This also makes sense because
>> the host will ultimately know better whether some range that the guest
>> thinks is a device is actually a device or just emulated (no stage 2
>> mapping), backed by host memory (such as the NOR flash read case) or
>> backed by a passthrough device.
>
> Ok, so that means if the guest maps RAM as uncached, it will actually
> end up as cached memory. Now if the guest triggers a DMA request to a
> passed through device to that RAM, it will conflict with the cache.
>
> I don't know whether it's a big deal, but it's the scenario that came up
> with the approach above before when I talked to people about it.
>

Well, I am using write-through read+write allocate, which hopefully
means that the actual RAM is kept in sync with the cache, but I must
confess I am a bit out of my depth here with the fine print in the ARM
ARM.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 3/3] arm64: KVM: keep trapping of VM sysreg writes enabled

2015-02-19 Thread Ard Biesheuvel
On 19 February 2015 at 15:19, Marc Zyngier  wrote:
> On 19/02/15 13:44, Ard Biesheuvel wrote:
>> On 19 February 2015 at 13:40, Marc Zyngier  wrote:
>>> On 19/02/15 10:54, Ard Biesheuvel wrote:
>>>> ---
>>>>  arch/arm/kvm/mmu.c   | 2 +-
>>>>  arch/arm64/include/asm/kvm_arm.h | 2 +-
>>>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 136662547ca6..fa8ec55220ea 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -1530,7 +1530,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool 
>>>> was_enabled)
>>>>   stage2_flush_vm(vcpu->kvm);
>>>>
>>>>   /* Caches are now on, stop trapping VM ops (until a S/W op) */
>>>> - if (now_enabled)
>>>> + if (0)//now_enabled)
>>>>   vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) & ~HCR_TVM);
>>>>
>>>>   trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
>>>> diff --git a/arch/arm64/include/asm/kvm_arm.h 
>>>> b/arch/arm64/include/asm/kvm_arm.h
>>>> index 8afb863f5a9e..437e1ec17539 100644
>>>> --- a/arch/arm64/include/asm/kvm_arm.h
>>>> +++ b/arch/arm64/include/asm/kvm_arm.h
>>>> @@ -75,7 +75,7 @@
>>>>   * FMO:  Override CPSR.F and enable signaling with VF
>>>>   * SWIO: Turn set/way invalidates into set/way clean+invalidate
>>>>   */
>>>> -#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | 
>>>> \
>>>> +#define HCR_GUEST_FLAGS (HCR_TSC | /* HCR_TSW | */ HCR_TWE | HCR_TWI | 
>>>> HCR_VM | \
>>>
>>> Why do we stop to trap S/W ops here? We can't let the guest issue those
>>> without doing anything, as this will break anything that expects the
>>> data to make it to memory. Think of the 32bit kernel decompressor, for
>>> example.
>>>
>>
>> TBH patch #3 is just a q'n'd hack to ensure that the TVM bit remains
>> set in HCR. I was assuming that cleaning the entire cache on mmu
>> enable/disable would be sufficient to quantify the performance impact
>> and check whether patch #2 works as advertised.
>
> OK.
>
>> I was wondering: isn't calling stage2_flush_vm() for each set of each
>> way very costly?
>
> It's only called once, when TVM is not set. We then set TVM to make sure
> that this doesn't happen anymore, until we stop trapping.
>

Ah, right, of course.

> Of course, with your new approach, this doesn't work anymore and we'd
> need to find another approach.
>

Well, *if* this approach is feasible in the first place, I'm sure we
can find another bit to set to keep track of whether to perform the
s/w operations. For now, I just ripped it out because I was afraid it
might skew performance measurements done with the TVM kept set.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-19 Thread Ard Biesheuvel
On 19 February 2015 at 14:50, Alexander Graf  wrote:
>
>
> On 19.02.15 11:54, Ard Biesheuvel wrote:
>> This is a 0th order approximation of how we could potentially force the guest
>> to avoid uncached mappings, at least from the moment the MMU is on. (Before
>> that, all of memory is implicitly classified as Device-nGnRnE)
>>
>> The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached 
>> mappings
>> with cached ones. This way, there is no need to mangle any guest page tables.
>
> Would you mind to give a brief explanation on what this does? What
> happens to actually assigned devices that need to be mapped as uncached?
> What happens to DMA from such devices when the guest assumes that it's
> accessing RAM uncached and then triggers DMA?
>

On ARM, stage 2 mappings that are more strict will supersede stage 1
mappings, so the idea is to use cached mappings exclusively for stage
1 so that the host is fully in control of the actual memory attributes
by setting the attributes at stage 2. This also makes sense because
the host will ultimately know better whether some range that the guest
thinks is a device is actually a device or just emulated (no stage 2
mapping), backed by host memory (such as the NOR flash read case) or
backed by a passthrough device.

-- 
Ard.


>>
>> The downside is that, to do this correctly, we need to always trap writes to
>> the VM sysreg group, which includes registers that the guest may write to 
>> very
>> often. To reduce the associated performance hit, patch #1 introduces a fast 
>> path
>> for EL2 to perform trivial sysreg writes on behalf of the guest, without the
>> need for a full world switch to the host and back.
>>
>> The main purpose of these patches is to quantify the performance hit, and
>> verify whether the MAIR_EL1 handling works correctly.
>>
>> Ard Biesheuvel (3):
>>   arm64: KVM: handle some sysreg writes in EL2
>>   arm64: KVM: mangle MAIR register to prevent uncached guest mappings
>>   arm64: KVM: keep trapping of VM sysreg writes enabled
>>
>>  arch/arm/kvm/mmu.c   |   2 +-
>>  arch/arm64/include/asm/kvm_arm.h |   2 +-
>>  arch/arm64/kvm/hyp.S | 101 
>> +++
>>  arch/arm64/kvm/sys_regs.c|  63 
>>  4 files changed, 156 insertions(+), 12 deletions(-)
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 3/3] arm64: KVM: keep trapping of VM sysreg writes enabled

2015-02-19 Thread Ard Biesheuvel
On 19 February 2015 at 13:40, Marc Zyngier  wrote:
> On 19/02/15 10:54, Ard Biesheuvel wrote:
>> ---
>>  arch/arm/kvm/mmu.c   | 2 +-
>>  arch/arm64/include/asm/kvm_arm.h | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 136662547ca6..fa8ec55220ea 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1530,7 +1530,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool 
>> was_enabled)
>>   stage2_flush_vm(vcpu->kvm);
>>
>>   /* Caches are now on, stop trapping VM ops (until a S/W op) */
>> - if (now_enabled)
>> + if (0)//now_enabled)
>>   vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) & ~HCR_TVM);
>>
>>   trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
>> diff --git a/arch/arm64/include/asm/kvm_arm.h 
>> b/arch/arm64/include/asm/kvm_arm.h
>> index 8afb863f5a9e..437e1ec17539 100644
>> --- a/arch/arm64/include/asm/kvm_arm.h
>> +++ b/arch/arm64/include/asm/kvm_arm.h
>> @@ -75,7 +75,7 @@
>>   * FMO:  Override CPSR.F and enable signaling with VF
>>   * SWIO: Turn set/way invalidates into set/way clean+invalidate
>>   */
>> -#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
>> +#define HCR_GUEST_FLAGS (HCR_TSC | /* HCR_TSW | */ HCR_TWE | HCR_TWI | 
>> HCR_VM | \
>
> Why do we stop to trap S/W ops here? We can't let the guest issue those
> without doing anything, as this will break anything that expects the
> data to make it to memory. Think of the 32bit kernel decompressor, for
> example.
>

TBH patch #3 is just a q'n'd hack to ensure that the TVM bit remains
set in HCR. I was assuming that cleaning the entire cache on mmu
enable/disable would be sufficient to quantify the performance impact
and check whether patch #2 works as advertised.

I was wondering: isn't calling stage2_flush_vm() for each set of each
way very costly?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/RFT PATCH 2/3] arm64: KVM: mangle MAIR register to prevent uncached guest mappings

2015-02-19 Thread Ard Biesheuvel
Mangle the memory attribute register values at each write to MAIR_EL1
so that regions that the guest intends to map as device or uncached are
in fact mapped as cached instead. This avoids incoherency issues when
the guest bypassed the caches to access memory that the host has mapped
as cached.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/kvm/sys_regs.c | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1e170eab6603..bde2b49a7cd8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -110,6 +110,39 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
return true;
 }
 
+static bool access_mair(struct kvm_vcpu *vcpu,
+   const struct sys_reg_params *p,
+   const struct sys_reg_desc *r)
+{
+   unsigned long val, mask;
+
+   BUG_ON(!p->is_write);
+
+   val = *vcpu_reg(vcpu, p->Rt);
+
+   if (!p->is_aarch32) {
+   /*
+* Mangle val so that all device and uncached attributes are
+* replaced with cached attributes.
+* For each attribute, check whether any of bit 7, bit 5 or bit
+* 4 are set. If not, it is a device or outer non-cacheable
+* mapping and we override it with inner, outer write-through,
+* read+write-allocate (0xbb).
+* TODO: handle outer cacheable inner non-cacheable
+*/
+   mask = ~(val >> 7 | val >> 5 | val >> 4) & 0x0101010101010101UL;
+   val = (val & ~(mask * 0xff)) | (mask * 0xbb);
+
+   vcpu_sys_reg(vcpu, r->reg) = val;
+   } else {
+   if (!p->is_32bit)
+   vcpu_cp15_64_high(vcpu, r->reg) = val >> 32;
+   vcpu_cp15_64_low(vcpu, r->reg) = val & 0xUL;
+   }
+
+   return true;
+}
+
 static bool trap_raz_wi(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
const struct sys_reg_desc *r)
@@ -371,7 +404,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
/* MAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
- access_vm_reg, reset_unknown, MAIR_EL1 },
+ access_mair, reset_unknown, MAIR_EL1 },
/* AMAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0011), Op2(0b000),
  access_handled_at_el2, reset_amair_el1, AMAIR_EL1 },
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/RFT PATCH 1/3] arm64: KVM: handle some sysreg writes in EL2

2015-02-19 Thread Ard Biesheuvel
This adds handling to el1_trap() to perform some sysreg writes directly
in EL2, without performing the full world switch to the host and back
again. This is mainly for doing writes that don't need special handling,
but where the register is part of the group that we need to trap for
other reasons.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/kvm/hyp.S  | 101 ++
 arch/arm64/kvm/sys_regs.c |  28 -
 2 files changed, 120 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index c3ca89c27c6b..e3af6840cb3f 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CPU_GP_REG_OFFSET(x)   (CPU_GP_REGS + x)
 #define CPU_XREG_OFFSET(x) CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
@@ -887,6 +888,34 @@
 1:
 .endm
 
+/*
+ * Macro to conditionally perform a parametrised system register write. Note
+ * that we currently only support writing x3 to a system register in class
+ * Op0 == 3 and Op1 == 0, which is all we need at the moment.
+ */
+.macro cond_sysreg_write,op0,op1,crn,crm,op2,sreg,opreg,outlbl
+   .ifnc   \op0,3; .err ; .endif
+   .ifnc   \op1,0; .err ; .endif
+   .ifnc   \opreg,x3 ; .err ; .endif
+   cmp \sreg, #((\crm) | ((\crn) << 4) | ((\op2) << 8))
+   bne f
+   // doesn't work: msr_s sys_reg(\op0,\op1,\crn,\crm,\op2), \opreg
+   .inst   0xd5180003|((\crn) << 12)|((\crm) << 8)|((\op2 << 5))
+   b   \outlbl
+:
+.endm
+
+/*
+ * Pack CRn, CRm and Op2 into 11 adjacent low bits so we can use a single
+ * cmp instruction to compare it with a 12-bit immediate.
+ */
+.macro pack_sysreg_idx, outreg, inreg
+   ubfm\outreg, \inreg, #(17 - 8), #(17 + 2)   // Op2 -> bits 8 - 10
+   bfm \outreg, \inreg, #(10 - 4), #(10 + 3)   // CRn -> bits 4 - 7
+   bfm \outreg, \inreg, #(1 - 0), #(1 + 3) // CRm -> bits 0 - 3
+.endm
+
+
 __save_sysregs:
save_sysregs
ret
@@ -1178,6 +1207,15 @@ el1_trap:
 * x1: ESR
 * x2: ESR_EC
 */
+
+   /*
+* Find out if the exception we are about to pass to the host is a
+* write to a system register, which we may prefer to handle in EL2.
+*/
+   tst x1, #1  // direction == write (0) ?
+   ccmpx2, #ESR_EL2_EC_SYS64, #0, eq   // is a sysreg access?
+   b.eq4f
+
cmp x2, #ESR_EL2_EC_DABT
mov x0, #ESR_EL2_EC_IABT
ccmpx2, x0, #4, ne
@@ -1239,6 +1277,69 @@ el1_trap:
 
eret
 
+4: and x2, x1, #(3 << 20)  // check for Op0 == 0b11
+   cmp x2, #(3 << 20)
+   b.ne1b
+   andsx2, x1, #(7 << 14)  // check for Op1 == 0b000
+   b.ne1b
+
+   /*
+* If we end up here, we are about to perform a system register write
+* with Op0 == 0b11 and Op1 == 0b000. Move the operand to x3 first, we
+* will check later if we are actually going to handle this write in EL2
+*/
+   adr x0, 5f
+   ubfxx2, x1, #5, #5  // operand reg# in bits 9 .. 5
+   add x0, x0, x2, lsl #3
+   br  x0
+5: ldr x3, [sp, #16]   // x0 from the stack
+   b   6f
+   ldr x3, [sp, #24]   // x1 from the stack
+   b   6f
+   ldr x3, [sp]// x2 from the stack
+   b   6f
+   ldr x3, [sp, #8]// x3 from the stack
+   b   6f
+   .irp
reg,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
+   mov x3, x\reg
+   b   6f
+   .endr
+   mov x3, xzr // x31
+
+   /*
+* Ok, so now we have the desired value in x3, let's write it into the
+* sysreg if it's a register write we want to handle in EL2. Since these
+* are tried in order, it makes sense to put the ones used most often at
+* the top.
+*/
+6: pack_sysreg_idx x2, x1
+   cond_sysreg_write   3,0, 2,0,0,x2,x3,7f // TTBR0_EL1
+   cond_sysreg_write   3,0, 2,0,1,x2,x3,7f // TTBR1_EL1
+   cond_sysreg_write   3,0, 2,0,2,x2,x3,7f // TCR_EL1
+   cond_sysreg_write   3,0, 5,2,0,x2,x3,7f // ESR_EL1
+   cond_sysreg_write   3,0, 6,0,0,x2,x3,7f // FAR_EL1
+   cond_sysreg_write   3,0, 5,1,0,x2,x3,7f // AFSR0_EL1
+   cond_sysreg_write   3,0, 5,1,1,x2,x3,7f // AFSR1_EL1
+   cond_sysreg_write   3,0,10,3,0,x2,x3,7f // AMAIR_EL1
+   cond_sysreg_write   3,0,13,0,1,x2,x3,7f // CONTEXTIDR_EL1
+
+   /*
+* If we end up here, the write is to a register that we don't handle
+* in EL2. Let the host handle it instead ...
+*/
+   b   1b
+
+

[RFC/RFT PATCH 3/3] arm64: KVM: keep trapping of VM sysreg writes enabled

2015-02-19 Thread Ard Biesheuvel
---
 arch/arm/kvm/mmu.c   | 2 +-
 arch/arm64/include/asm/kvm_arm.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 136662547ca6..fa8ec55220ea 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1530,7 +1530,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool 
was_enabled)
stage2_flush_vm(vcpu->kvm);
 
/* Caches are now on, stop trapping VM ops (until a S/W op) */
-   if (now_enabled)
+   if (0)//now_enabled)
vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) & ~HCR_TVM);
 
trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8afb863f5a9e..437e1ec17539 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -75,7 +75,7 @@
  * FMO:Override CPSR.F and enable signaling with VF
  * SWIO:   Turn set/way invalidates into set/way clean+invalidate
  */
-#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
+#define HCR_GUEST_FLAGS (HCR_TSC | /* HCR_TSW | */ HCR_TWE | HCR_TWI | HCR_VM 
| \
 HCR_TVM | HCR_BSU_IS | HCR_FB | HCR_TAC | \
 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-19 Thread Ard Biesheuvel
This is a 0th order approximation of how we could potentially force the guest
to avoid uncached mappings, at least from the moment the MMU is on. (Before
that, all of memory is implicitly classified as Device-nGnRnE)

The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached mappings
with cached ones. This way, there is no need to mangle any guest page tables.

The downside is that, to do this correctly, we need to always trap writes to
the VM sysreg group, which includes registers that the guest may write to very
often. To reduce the associated performance hit, patch #1 introduces a fast path
for EL2 to perform trivial sysreg writes on behalf of the guest, without the
need for a full world switch to the host and back.

The main purpose of these patches is to quantify the performance hit, and
verify whether the MAIR_EL1 handling works correctly. 

Ard Biesheuvel (3):
  arm64: KVM: handle some sysreg writes in EL2
  arm64: KVM: mangle MAIR register to prevent uncached guest mappings
  arm64: KVM: keep trapping of VM sysreg writes enabled

 arch/arm/kvm/mmu.c   |   2 +-
 arch/arm64/include/asm/kvm_arm.h |   2 +-
 arch/arm64/kvm/hyp.S | 101 +++
 arch/arm64/kvm/sys_regs.c|  63 
 4 files changed, 156 insertions(+), 12 deletions(-)

-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 2/9] arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()

2014-12-09 Thread Ard Biesheuvel
On 9 December 2014 at 13:38, Christoffer Dall
 wrote:
> On Tue, Dec 09, 2014 at 01:29:52PM +0100, Ard Biesheuvel wrote:
>> On 9 December 2014 at 12:26, Christoffer Dall
>>  wrote:
>> > From: Ard Biesheuvel 
>> >
>> > Instead of using kvm_is_mmio_pfn() to decide whether a host region
>> > should be stage 2 mapped with device attributes, add a new static
>> > function kvm_is_device_pfn() that disregards RAM pages with the
>> > reserved bit set, as those should usually not be mapped as device
>> > memory.
>> >
>> > Signed-off-by: Ard Biesheuvel 
>> > Signed-off-by: Marc Zyngier 
>>
>> As I mentioned last week, this patch (and the next one) are already in
>> 3.18 so unless there is some policy I am unaware of, these do not have
>> to be submitted again.
>>
> They're in kvmarm/next, which is a stable branch (doesn't rebase) so our
> only choice would be to revert this commit specifically in kvmarm/next
> before sending the pull request.  Since that would be more confusing
> than help anything, and Paolo said to just include the duplicate commit
> in the pull request, here it is.
>
> As we can see in linux-next, it's not really a problem.
>

OK, thanks for the explanation.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 2/9] arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()

2014-12-09 Thread Ard Biesheuvel
On 9 December 2014 at 12:26, Christoffer Dall
 wrote:
> From: Ard Biesheuvel 
>
> Instead of using kvm_is_mmio_pfn() to decide whether a host region
> should be stage 2 mapped with device attributes, add a new static
> function kvm_is_device_pfn() that disregards RAM pages with the
> reserved bit set, as those should usually not be mapped as device
> memory.
>
> Signed-off-by: Ard Biesheuvel 
> Signed-off-by: Marc Zyngier 

As I mentioned last week, this patch (and the next one) are already in
3.18 so unless there is some policy I am unaware of, these do not have
to be submitted again.

-- 
Ard.




> ---
>  arch/arm/kvm/mmu.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 57a403a..b007438 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -834,6 +834,11 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> return kvm_vcpu_dabt_iswrite(vcpu);
>  }
>
> +static bool kvm_is_device_pfn(unsigned long pfn)
> +{
> +   return !pfn_valid(pfn);
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   struct kvm_memory_slot *memslot, unsigned long hva,
>   unsigned long fault_status)
> @@ -904,7 +909,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
> if (is_error_pfn(pfn))
> return -EFAULT;
>
> -   if (kvm_is_mmio_pfn(pfn))
> +   if (kvm_is_device_pfn(pfn))
> mem_type = PAGE_S2_DEVICE;
>
> spin_lock(&kvm->mmu_lock);
> --
> 2.1.2.330.g565301e.dirty
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()

2014-12-01 Thread Ard Biesheuvel
On 21 November 2014 at 12:24, Christoffer Dall
 wrote:
> On Mon, Nov 10, 2014 at 09:33:55AM +0100, Ard Biesheuvel wrote:
>> Instead of using kvm_is_mmio_pfn() to decide whether a host region
>> should be stage 2 mapped with device attributes, add a new static
>> function kvm_is_device_pfn() that disregards RAM pages with the
>> reserved bit set, as those should usually not be mapped as device
>> memory.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  arch/arm/kvm/mmu.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 57a403a5c22b..b007438242e2 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -834,6 +834,11 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>>   return kvm_vcpu_dabt_iswrite(vcpu);
>>  }
>>
>> +static bool kvm_is_device_pfn(unsigned long pfn)
>> +{
>> + return !pfn_valid(pfn);
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> struct kvm_memory_slot *memslot, unsigned long hva,
>> unsigned long fault_status)
>> @@ -904,7 +909,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>   if (is_error_pfn(pfn))
>>   return -EFAULT;
>>
>> - if (kvm_is_mmio_pfn(pfn))
>> + if (kvm_is_device_pfn(pfn))
>>   mem_type = PAGE_S2_DEVICE;
>>
>>   spin_lock(&kvm->mmu_lock);
>> --
>> 1.8.3.2
>>
> Acked-by: Christoffer Dall 

These 2 patches are now in 3.18-rc7, so they can be dropped from the
kvmarm queue/next/etc branches

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-21 Thread Ard Biesheuvel
On 10 November 2014 09:33, Ard Biesheuvel  wrote:
> This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
> kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.
>
> The problem being addressed by the patch above was that some ARM code
> based the memory mapping attributes of a pfn on the return value of
> kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
> be mapped as device memory.
>
> However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
> and the existing non-ARM users were already using it in a way which
> suggests that its name should probably have been 'kvm_is_reserved_pfn'
> from the beginning, e.g., whether or not to call get_page/put_page on
> it etc. This means that returning false for the zero page is a mistake
> and the patch above should be reverted.
>
> Signed-off-by: Ard Biesheuvel 


Ping?


> ---
>  arch/ia64/kvm/kvm-ia64.c |  2 +-
>  arch/x86/kvm/mmu.c   |  6 +++---
>  include/linux/kvm_host.h |  2 +-
>  virt/kvm/kvm_main.c  | 16 
>  4 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index ec6b9acb6bea..dbe46f43884d 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1563,7 +1563,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>
> for (i = 0; i < npages; i++) {
> pfn = gfn_to_pfn(kvm, base_gfn + i);
> -   if (!kvm_is_mmio_pfn(pfn)) {
> +   if (!kvm_is_reserved_pfn(pfn)) {
> kvm_set_pmt_entry(kvm, base_gfn + i,
> pfn << PAGE_SHIFT,
> _PAGE_AR_RWX | _PAGE_MA_WB);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ac1c4de3a484..978f402006ee 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -630,7 +630,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
>  * kvm mmu, before reclaiming the page, we should
>  * unmap it from mmu first.
>  */
> -   WARN_ON(!kvm_is_mmio_pfn(pfn) && !page_count(pfn_to_page(pfn)));
> +   WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
>
> if (!shadow_accessed_mask || old_spte & shadow_accessed_mask)
> kvm_set_pfn_accessed(pfn);
> @@ -2461,7 +2461,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> spte |= PT_PAGE_SIZE_MASK;
> if (tdp_enabled)
> spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> -   kvm_is_mmio_pfn(pfn));
> +   kvm_is_reserved_pfn(pfn));
>
> if (host_writable)
> spte |= SPTE_HOST_WRITEABLE;
> @@ -2737,7 +2737,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu 
> *vcpu,
>  * PT_PAGE_TABLE_LEVEL and there would be no adjustment done
>  * here.
>  */
> -   if (!is_error_noslot_pfn(pfn) && !kvm_is_mmio_pfn(pfn) &&
> +   if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
> level == PT_PAGE_TABLE_LEVEL &&
> PageTransCompound(pfn_to_page(pfn)) &&
> !has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04993f2..a6059bdf7b03 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -703,7 +703,7 @@ void kvm_arch_sync_events(struct kvm *kvm);
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>
> -bool kvm_is_mmio_pfn(pfn_t pfn);
> +bool kvm_is_reserved_pfn(pfn_t pfn);
>
>  struct kvm_irq_ack_notifier {
> struct hlist_node link;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 25ffac9e947d..3cee7b167052 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -107,10 +107,10 @@ EXPORT_SYMBOL_GPL(kvm_rebooting);
>
>  static bool largepages_enabled = true;
>
> -bool kvm_is_mmio_pfn(pfn_t pfn)
> +bool kvm_is_reserved_pfn(pfn_t pfn)
>  {
> if (pfn_valid(pfn))
> -   return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
> +   return PageReserved(pfn_to_page(pfn));
>
> return true;
>  }
> @@ -1321,7 +1321,7 @@ static pfn_t hva_to_pfn(unsigned long addr, bool 
> atomic, bool *async,
> else if ((vma->vm_flags & VM_PFNMAP)) {
> pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
> vma->vm_pgoff;
> -  

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-19 Thread Ard Biesheuvel
On 17 November 2014 15:58, Ard Biesheuvel  wrote:
> Readonly memslots are often used to implement emulation of ROMs and
> NOR flashes, in which case the guest may legally map these regions as
> uncached.
> To deal with the incoherency associated with uncached guest mappings,
> treat all readonly memslots as incoherent, and ensure that pages that
> belong to regions tagged as such are flushed to DRAM before being passed
> to the guest.
>
> Signed-off-by: Ard Biesheuvel 
> ---

Hello all,

I have another bug report (from Canonical this time) of essentially
the same issue, and it is also fixed by these patches.
Are you happy with these patches? Should I respin to add Laszlo's tested-by?

Cheers,
Ard.


>  arch/arm/kvm/mmu.c | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index cb924c6d56a6..f2a9874ff5cb 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -919,7 +919,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
> if (!hugetlb && !force_pte)
> hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>
> -   fault_ipa_uncached = false;
> +   fault_ipa_uncached = memslot->flags & KVM_MEMSLOT_INCOHERENT;
>
> if (hugetlb) {
> pmd_t new_pmd = pfn_pmd(pfn, mem_type);
> @@ -1298,11 +1298,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> hva = vm_end;
> } while (hva < reg_end);
>
> -   if (ret) {
> -   spin_lock(&kvm->mmu_lock);
> +   spin_lock(&kvm->mmu_lock);
> +   if (ret)
> unmap_stage2_range(kvm, mem->guest_phys_addr, 
> mem->memory_size);
> -   spin_unlock(&kvm->mmu_lock);
> -   }
> +   else
> +   stage2_flush_memslot(kvm, memslot);
> +   spin_unlock(&kvm->mmu_lock);
> return ret;
>  }
>
> @@ -1314,6 +1315,15 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
> kvm_memory_slot *free,
>  int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> unsigned long npages)
>  {
> +   /*
> +* Readonly memslots are not incoherent with the caches by definition,
> +* but in practice, they are used mostly to emulate ROMs or NOR 
> flashes
> +* that the guest may consider devices and hence map as uncached.
> +* To prevent incoherency issues in these cases, tag all readonly
> +* regions as incoherent.
> +*/
> +   if (slot->flags & KVM_MEM_READONLY)
> +   slot->flags |= KVM_MEMSLOT_INCOHERENT;
> return 0;
>  }
>
> --
> 1.8.3.2
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] arm, arm64: KVM: allow forced dcache flush on page faults

2014-11-17 Thread Ard Biesheuvel
From: Laszlo Ersek 

To allow handling of incoherent memslots in a subsequent patch, this
patch adds a paramater 'ipa_uncached' to cache_coherent_guest_page()
so that we can instruct it to flush the page's contents to DRAM even
if the guest has caching globally enabled.

Signed-off-by: Laszlo Ersek 
Signed-off-by: Ard Biesheuvel 
---
 arch/arm/include/asm/kvm_mmu.h   | 5 +++--
 arch/arm/kvm/mmu.c   | 9 +++--
 arch/arm64/include/asm/kvm_mmu.h | 5 +++--
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index acb0d5712716..f867060035ec 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -161,9 +161,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
 }
 
 static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
-unsigned long size)
+unsigned long size,
+bool ipa_uncached)
 {
-   if (!vcpu_has_cache_enabled(vcpu))
+   if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
kvm_flush_dcache_to_poc((void *)hva, size);

/*
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index b007438242e2..cb924c6d56a6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -852,6 +852,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
struct vm_area_struct *vma;
pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
+   bool fault_ipa_uncached;
 
write_fault = kvm_is_write_fault(vcpu);
if (fault_status == FSC_PERM && !write_fault) {
@@ -918,6 +919,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (!hugetlb && !force_pte)
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
+   fault_ipa_uncached = false;
+
if (hugetlb) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
new_pmd = pmd_mkhuge(new_pmd);
@@ -925,7 +928,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
kvm_set_s2pmd_writable(&new_pmd);
kvm_set_pfn_dirty(pfn);
}
-   coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE);
+   coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE,
+ fault_ipa_uncached);
ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
} else {
pte_t new_pte = pfn_pte(pfn, mem_type);
@@ -933,7 +937,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
kvm_set_s2pte_writable(&new_pte);
kvm_set_pfn_dirty(pfn);
}
-   coherent_cache_guest_page(vcpu, hva, PAGE_SIZE);
+   coherent_cache_guest_page(vcpu, hva, PAGE_SIZE,
+ fault_ipa_uncached);
ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE));
}
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0caf7a59f6a1..123b521a9908 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -243,9 +243,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
 }
 
 static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
-unsigned long size)
+unsigned long size,
+bool ipa_uncached)
 {
-   if (!vcpu_has_cache_enabled(vcpu))
+   if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
kvm_flush_dcache_to_poc((void *)hva, size);
 
if (!icache_is_aliasing()) {/* PIPT */
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-17 Thread Ard Biesheuvel
Readonly memslots are often used to implement emulation of ROMs and
NOR flashes, in which case the guest may legally map these regions as
uncached.
To deal with the incoherency associated with uncached guest mappings,
treat all readonly memslots as incoherent, and ensure that pages that
belong to regions tagged as such are flushed to DRAM before being passed
to the guest.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm/kvm/mmu.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index cb924c6d56a6..f2a9874ff5cb 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -919,7 +919,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (!hugetlb && !force_pte)
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
-   fault_ipa_uncached = false;
+   fault_ipa_uncached = memslot->flags & KVM_MEMSLOT_INCOHERENT;
 
if (hugetlb) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
@@ -1298,11 +1298,12 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva = vm_end;
} while (hva < reg_end);
 
-   if (ret) {
-   spin_lock(&kvm->mmu_lock);
+   spin_lock(&kvm->mmu_lock);
+   if (ret)
unmap_stage2_range(kvm, mem->guest_phys_addr, mem->memory_size);
-   spin_unlock(&kvm->mmu_lock);
-   }
+   else
+   stage2_flush_memslot(kvm, memslot);
+   spin_unlock(&kvm->mmu_lock);
return ret;
 }
 
@@ -1314,6 +1315,15 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
kvm_memory_slot *free,
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned long npages)
 {
+   /*
+* Readonly memslots are not incoherent with the caches by definition,
+* but in practice, they are used mostly to emulate ROMs or NOR flashes
+* that the guest may consider devices and hence map as uncached.
+* To prevent incoherency issues in these cases, tag all readonly
+* regions as incoherent.
+*/
+   if (slot->flags & KVM_MEM_READONLY)
+   slot->flags |= KVM_MEMSLOT_INCOHERENT;
return 0;
 }
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] kvm: add a memslot flag for incoherent memory regions

2014-11-17 Thread Ard Biesheuvel
Memory regions may be incoherent with the caches, typically when the
guest has mapped a host system RAM backed memory region as uncached.
Add a flag KVM_MEMSLOT_INCOHERENT so that we can tag these memslots
and handle them appropriately when mapping them.

Signed-off-by: Ard Biesheuvel 
---
 include/linux/kvm_host.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a6059bdf7b03..e4d8f705fecd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -43,6 +43,7 @@
  * include/linux/kvm_h.
  */
 #define KVM_MEMSLOT_INVALID(1UL << 16)
+#define KVM_MEMSLOT_INCOHERENT (1UL << 17)
 
 /* Two fragments for cross MMIO pages. */
 #define KVM_MAX_MMIO_FRAGMENTS 2
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()

2014-11-10 Thread Ard Biesheuvel
On 10 November 2014 11:57, Christoffer Dall  wrote:
> On Mon, Nov 10, 2014 at 09:33:55AM +0100, Ard Biesheuvel wrote:
>> Instead of using kvm_is_mmio_pfn() to decide whether a host region
>> should be stage 2 mapped with device attributes, add a new static
>> function kvm_is_device_pfn() that disregards RAM pages with the
>> reserved bit set, as those should usually not be mapped as device
>> memory.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  arch/arm/kvm/mmu.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 57a403a5c22b..b007438242e2 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -834,6 +834,11 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>>   return kvm_vcpu_dabt_iswrite(vcpu);
>>  }
>>
>> +static bool kvm_is_device_pfn(unsigned long pfn)
>> +{
>> + return !pfn_valid(pfn);
>> +}
>
> So this works for Magnus' use case, because a device tree memreserve
> results in reserved, but valid, existing pages being backed by a struct
> page?
>

That is the idea, yes, but it would be good if he could confirm that
it works as expected.

Also, there may be some corner cases where pfn_valid returns false for
regions that are in fact mapped as MT_NORMAL by the host kernel, i.e.,
UEFI configuration tables, for instance, so this test may require
further refinement. But it at least eliminates the false positives for
plain memreserve regions.


>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> struct kvm_memory_slot *memslot, unsigned long hva,
>> unsigned long fault_status)
>> @@ -904,7 +909,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>   if (is_error_pfn(pfn))
>>   return -EFAULT;
>>
>> - if (kvm_is_mmio_pfn(pfn))
>> + if (kvm_is_device_pfn(pfn))
>>   mem_type = PAGE_S2_DEVICE;
>>
>>   spin_lock(&kvm->mmu_lock);
>> --
>> 1.8.3.2
>>
>
> If my understanding above is correct, then:
>
> Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-10 Thread Ard Biesheuvel
On 10 November 2014 11:53, Christoffer Dall  wrote:
> Hi Ard,
>
> On Mon, Nov 10, 2014 at 09:33:56AM +0100, Ard Biesheuvel wrote:
>> This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
>> kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.
>>
>> The problem being addressed by the patch above was that some ARM code
>> based the memory mapping attributes of a pfn on the return value of
>> kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
>> be mapped as device memory.
>>
>> However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
>> and the existing non-ARM users were already using it in a way which
>> suggests that its name should probably have been 'kvm_is_reserved_pfn'
>> from the beginning, e.g., whether or not to call get_page/put_page on
>> it etc. This means that returning false for the zero page is a mistake
>> and the patch above should be reverted.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  arch/ia64/kvm/kvm-ia64.c |  2 +-
>>  arch/x86/kvm/mmu.c   |  6 +++---
>>  include/linux/kvm_host.h |  2 +-
>>  virt/kvm/kvm_main.c  | 16 
>>  4 files changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
>> index ec6b9acb6bea..dbe46f43884d 100644
>> --- a/arch/ia64/kvm/kvm-ia64.c
>> +++ b/arch/ia64/kvm/kvm-ia64.c
>> @@ -1563,7 +1563,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>
>>   for (i = 0; i < npages; i++) {
>>   pfn = gfn_to_pfn(kvm, base_gfn + i);
>> - if (!kvm_is_mmio_pfn(pfn)) {
>> + if (!kvm_is_reserved_pfn(pfn)) {
>>   kvm_set_pmt_entry(kvm, base_gfn + i,
>>   pfn << PAGE_SHIFT,
>>   _PAGE_AR_RWX | _PAGE_MA_WB);
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index ac1c4de3a484..978f402006ee 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -630,7 +630,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
>>* kvm mmu, before reclaiming the page, we should
>>* unmap it from mmu first.
>>*/
>> - WARN_ON(!kvm_is_mmio_pfn(pfn) && !page_count(pfn_to_page(pfn)));
>> + WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
>>
>>   if (!shadow_accessed_mask || old_spte & shadow_accessed_mask)
>>   kvm_set_pfn_accessed(pfn);
>> @@ -2461,7 +2461,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>   spte |= PT_PAGE_SIZE_MASK;
>>   if (tdp_enabled)
>>   spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>> - kvm_is_mmio_pfn(pfn));
>> + kvm_is_reserved_pfn(pfn));
>>
>>   if (host_writable)
>>   spte |= SPTE_HOST_WRITEABLE;
>> @@ -2737,7 +2737,7 @@ static void transparent_hugepage_adjust(struct 
>> kvm_vcpu *vcpu,
>>* PT_PAGE_TABLE_LEVEL and there would be no adjustment done
>>* here.
>>*/
>> - if (!is_error_noslot_pfn(pfn) && !kvm_is_mmio_pfn(pfn) &&
>> + if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
>>   level == PT_PAGE_TABLE_LEVEL &&
>>   PageTransCompound(pfn_to_page(pfn)) &&
>>   !has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ea53b04993f2..a6059bdf7b03 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -703,7 +703,7 @@ void kvm_arch_sync_events(struct kvm *kvm);
>>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
>>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>>
>> -bool kvm_is_mmio_pfn(pfn_t pfn);
>> +bool kvm_is_reserved_pfn(pfn_t pfn);
>>
>>  struct kvm_irq_ack_notifier {
>>   struct hlist_node link;
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 25ffac9e947d..3cee7b167052 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -107,10 +107,10 @@ EXPORT_SYMBOL_GPL(kvm_rebooting);
>>
>>  static bool largepages_enabled = true;
>>
>> -bool kvm_is_mmio_pfn(pfn_t pfn)
>> +bool kvm_is_reserved_pfn(pfn_t pfn)
>>  {
>>   if (pfn_valid(pfn))
>> - return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
>> + 

[PATCH 2/2] kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()

2014-11-10 Thread Ard Biesheuvel
This reverts commit 85c8555ff0 ("KVM: check for !is_zero_pfn() in
kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

The problem being addressed by the patch above was that some ARM code
based the memory mapping attributes of a pfn on the return value of
kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
be mapped as device memory.

However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
and the existing non-ARM users were already using it in a way which
suggests that its name should probably have been 'kvm_is_reserved_pfn'
from the beginning, e.g., whether or not to call get_page/put_page on
it etc. This means that returning false for the zero page is a mistake
and the patch above should be reverted.

Signed-off-by: Ard Biesheuvel 
---
 arch/ia64/kvm/kvm-ia64.c |  2 +-
 arch/x86/kvm/mmu.c   |  6 +++---
 include/linux/kvm_host.h |  2 +-
 virt/kvm/kvm_main.c  | 16 
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index ec6b9acb6bea..dbe46f43884d 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1563,7 +1563,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
for (i = 0; i < npages; i++) {
pfn = gfn_to_pfn(kvm, base_gfn + i);
-   if (!kvm_is_mmio_pfn(pfn)) {
+   if (!kvm_is_reserved_pfn(pfn)) {
kvm_set_pmt_entry(kvm, base_gfn + i,
pfn << PAGE_SHIFT,
_PAGE_AR_RWX | _PAGE_MA_WB);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ac1c4de3a484..978f402006ee 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -630,7 +630,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
 * kvm mmu, before reclaiming the page, we should
 * unmap it from mmu first.
 */
-   WARN_ON(!kvm_is_mmio_pfn(pfn) && !page_count(pfn_to_page(pfn)));
+   WARN_ON(!kvm_is_reserved_pfn(pfn) && !page_count(pfn_to_page(pfn)));
 
if (!shadow_accessed_mask || old_spte & shadow_accessed_mask)
kvm_set_pfn_accessed(pfn);
@@ -2461,7 +2461,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
spte |= PT_PAGE_SIZE_MASK;
if (tdp_enabled)
spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
-   kvm_is_mmio_pfn(pfn));
+   kvm_is_reserved_pfn(pfn));
 
if (host_writable)
spte |= SPTE_HOST_WRITEABLE;
@@ -2737,7 +2737,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu 
*vcpu,
 * PT_PAGE_TABLE_LEVEL and there would be no adjustment done
 * here.
 */
-   if (!is_error_noslot_pfn(pfn) && !kvm_is_mmio_pfn(pfn) &&
+   if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
level == PT_PAGE_TABLE_LEVEL &&
PageTransCompound(pfn_to_page(pfn)) &&
!has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ea53b04993f2..a6059bdf7b03 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -703,7 +703,7 @@ void kvm_arch_sync_events(struct kvm *kvm);
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 
-bool kvm_is_mmio_pfn(pfn_t pfn);
+bool kvm_is_reserved_pfn(pfn_t pfn);
 
 struct kvm_irq_ack_notifier {
struct hlist_node link;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 25ffac9e947d..3cee7b167052 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -107,10 +107,10 @@ EXPORT_SYMBOL_GPL(kvm_rebooting);
 
 static bool largepages_enabled = true;
 
-bool kvm_is_mmio_pfn(pfn_t pfn)
+bool kvm_is_reserved_pfn(pfn_t pfn)
 {
if (pfn_valid(pfn))
-   return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
+   return PageReserved(pfn_to_page(pfn));
 
return true;
 }
@@ -1321,7 +1321,7 @@ static pfn_t hva_to_pfn(unsigned long addr, bool atomic, 
bool *async,
else if ((vma->vm_flags & VM_PFNMAP)) {
pfn = ((addr - vma->vm_start) >> PAGE_SHIFT) +
vma->vm_pgoff;
-   BUG_ON(!kvm_is_mmio_pfn(pfn));
+   BUG_ON(!kvm_is_reserved_pfn(pfn));
} else {
if (async && vma_is_valid(vma, write_fault))
*async = true;
@@ -1427,7 +1427,7 @@ static struct page *kvm_pfn_to_page(pfn_t pfn)
if (is_error_noslot_pfn(pfn))
return KVM_ERR_PTR_BAD_PAGE;
 
-   if (kvm_is_mmio_pfn(pfn)) {
+   if (kvm_is_reserved_pfn(pfn)) {
WARN_ON(1);
return KVM_ERR_PTR_BAD_PAGE;
}
@@ -1456,7 +1456,7 @@ EXPORT_SYMBOL_GPL(

[PATCH 1/2] arm/arm64: kvm: drop inappropriate use of kvm_is_mmio_pfn()

2014-11-10 Thread Ard Biesheuvel
Instead of using kvm_is_mmio_pfn() to decide whether a host region
should be stage 2 mapped with device attributes, add a new static
function kvm_is_device_pfn() that disregards RAM pages with the
reserved bit set, as those should usually not be mapped as device
memory.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm/kvm/mmu.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 57a403a5c22b..b007438242e2 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -834,6 +834,11 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
return kvm_vcpu_dabt_iswrite(vcpu);
 }
 
+static bool kvm_is_device_pfn(unsigned long pfn)
+{
+   return !pfn_valid(pfn);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
@@ -904,7 +909,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (is_error_pfn(pfn))
return -EFAULT;
 
-   if (kvm_is_mmio_pfn(pfn))
+   if (kvm_is_device_pfn(pfn))
mem_type = PAGE_S2_DEVICE;
 
spin_lock(&kvm->mmu_lock);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] [Qemu-devel] [INVITE] OVMF BoF session at the KVM Forum 2014

2014-09-19 Thread Ard Biesheuvel
On 19 September 2014 10:03, Paolo Bonzini  wrote:
> Il 19/09/2014 16:17, Ard Biesheuvel ha scritto:
>>
>>> > (**) Ard's patches for the upstream host kernel (== KVM) have been...
>>> > ugh, not sure... applied to a maintainer tree? Ard? :)
>>> >
>> Some are in kvm/master, which I think means to should go into the next
>> 3.17-rc, although I haven't seen much movement there.
>
> They'll miss this week's rc, but I'll send them out early next week.
>

OK thanks for the update


> --
> Slashdot TV.  Video for Nerds.  Stuff that Matters.
> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
> ___
> edk2-devel mailing list
> edk2-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/edk2-devel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [INVITE] OVMF BoF session at the KVM Forum 2014

2014-09-19 Thread Ard Biesheuvel
On 18 September 2014 05:18, Laszlo Ersek  wrote:
> On 09/18/14 13:44, Andreas Färber wrote:
>> Hello Laszlo,
>>
>> Am 18.09.2014 um 10:23 schrieb Laszlo Ersek:
>>> I've been made an offer that I couldn't refuse :) to "organize" a Birds
>>> of a Feather session concerning OVMF at the KVM Forum 2014.
>>>
>>> Interested people, please sign up:
>>>
>>>   http://www.linux-kvm.org/page/KVM_Forum_2014_BOF#OVMF
>>
>> Nice idea. Your summary mentions only ia32 and x86_64 - I would be
>> interested in an update on OVMF for AArch64 - there seemed to already be
>> support for ARM's Foundation Model but not yet for QEMU.
>
> We've successfully UEFI-booted
> - GNU/Linux guest(s) on
> - upstream edk2 (*) and
> - upstream qemu-system-aarch64 with
>   - TCG on x86_64 host,
>   - KVM on aarch64 host (**)
>
> (*) Ard's patches for upstream edk2 are in the process of being tested /
> merged.
>

They have been merged as of yesterday.

> (**) Ard's patches for the upstream host kernel (== KVM) have been...
> ugh, not sure... applied to a maintainer tree? Ard? :)
>

Some are in kvm/master, which I think means to should go into the next
3.17-rc, although I haven't seen much movement there.
Some other bits are being taken through the kvmarm/queue branch, so
they should arrive by 3.18-rc1

> So, it works (as far as I tested it myself on TCG, and heard reports
> about it on KVM), but right now you need to apply a handful of pending
> patches manually.
>
> We can certainly talk about Aarch64 at the BoF, but then Ard should
> co-organize. No good deed goes unpunished, as ever! :)
>

I do appreciate the suggestion, but I don't think it will be feasible
for me to come to Duesseldorf.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: export symbol dependencies of is_zero_pfn()

2014-09-12 Thread Ard Biesheuvel
On 12 September 2014 23:14, Andrew Morton  wrote:
> On Fri, 12 Sep 2014 22:17:23 +0200 Ard Biesheuvel  
> wrote:
>
>> In order to make the static inline function is_zero_pfn() callable by
>> modules, export its symbol dependencies 'zero_pfn' and (for s390 and
>> mips) 'zero_page_mask'.
>
> So hexagon and score get the export if/when needed.
>

Exactly.

>> We need this for KVM, as CONFIG_KVM is a tristate for all supported
>> architectures except ARM and arm64, and testing a pfn whether it refers
>> to the zero page is required to correctly distinguish the zero page
>> from other special RAM ranges that may also have the PG_reserved bit
>> set, but need to be treated as MMIO memory.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  arch/mips/mm/init.c | 1 +
>>  arch/s390/mm/init.c | 1 +
>>  mm/memory.c | 2 ++
>
> Looks OK to me.  Please include the patch in whichever tree is is that
> needs it, and merge it up via that tree.
>

Thanks.

@Paolo: could you please take this (with Andrew's ack), and put it
before the patch you took earlier today?

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mm: export symbol dependencies of is_zero_pfn()

2014-09-12 Thread Ard Biesheuvel
In order to make the static inline function is_zero_pfn() callable by
modules, export its symbol dependencies 'zero_pfn' and (for s390 and
mips) 'zero_page_mask'.

We need this for KVM, as CONFIG_KVM is a tristate for all supported
architectures except ARM and arm64, and testing a pfn whether it refers
to the zero page is required to correctly distinguish the zero page
from other special RAM ranges that may also have the PG_reserved bit
set, but need to be treated as MMIO memory.

Signed-off-by: Ard Biesheuvel 
---
 arch/mips/mm/init.c | 1 +
 arch/s390/mm/init.c | 1 +
 mm/memory.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 571aab064936..f42e35e42790 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -53,6 +53,7 @@
  */
 unsigned long empty_zero_page, zero_page_mask;
 EXPORT_SYMBOL_GPL(empty_zero_page);
+EXPORT_SYMBOL(zero_page_mask);
 
 /*
  * Not static inline because used by IP27 special magic initialization code
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 0c1073ed1e84..c7235e01fd67 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -43,6 +43,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] 
__attribute__((__aligned__(PAGE_SIZE)));
 
 unsigned long empty_zero_page, zero_page_mask;
 EXPORT_SYMBOL(empty_zero_page);
+EXPORT_SYMBOL(zero_page_mask);
 
 static void __init setup_zero_pages(void)
 {
diff --git a/mm/memory.c b/mm/memory.c
index adeac306610f..d17f1bcd2a91 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -118,6 +118,8 @@ __setup("norandmaps", disable_randmaps);
 unsigned long zero_pfn __read_mostly;
 unsigned long highest_memmap_pfn __read_mostly;
 
+EXPORT_SYMBOL(zero_pfn);
+
 /*
  * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init()
  */
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm:master 11/11] ERROR: "zero_pfn" [arch/powerpc/kvm/kvm.ko] undefined!

2014-09-12 Thread Ard Biesheuvel
On 12 September 2014 19:25, kbuild test robot  wrote:
> tree:   git://git.kernel.org/pub/scm/virt/kvm/kvm.git master
> head:   e20e1bde3bb158cd3d08b9d94a90d3cabf1ba7cb
> commit: e20e1bde3bb158cd3d08b9d94a90d3cabf1ba7cb [11/11] KVM: check for 
> !is_zero_pfn() in kvm_is_mmio_pfn()
> config: powerpc-defconfig
> reproduce:
>   wget 
> https://github.com/fengguang/reproduce-kernel-bug/raw/master/cross-build/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout e20e1bde3bb158cd3d08b9d94a90d3cabf1ba7cb
>   make.cross ARCH=powerpc  defconfig
>   make.cross ARCH=powerpc
>
> All error/warnings:
>
>>> ERROR: "zero_pfn" [arch/powerpc/kvm/kvm.ko] undefined!
>

OK, so apparently, zero_pfn, which is used by the inline_zero_pfn() is
not exported, which is unfortunate.

I will go ahead and propose a patch to add this EXPORT(), but
unfortunately, s390's and mips's definition of is_zero_pfn() requires
zero_page_mask to be exported as well.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()

2014-09-12 Thread Ard Biesheuvel
Read-only memory ranges may be backed by the zero page, so avoid
misidentifying it a a MMIO pfn.

Signed-off-by: Ard Biesheuvel 
Fixes: b88657674d39 ("ARM: KVM: user_mem_abort: support stage 2 MMIO page 
mapping")
---

This fixes another issue I identified when testing QEMU+KVM_UEFI, where
a read to an uninitialized emulated NOR flash brought in the zero page,
but mapped as a read-write device region, because kvm_is_mmio_pfn()
misidentifies it as a MMIO pfn due to its PG_reserved bit being set.

 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 36b887dd0c84..f8adaabeac13 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -110,7 +110,7 @@ static bool largepages_enabled = true;
 bool kvm_is_mmio_pfn(pfn_t pfn)
 {
if (pfn_valid(pfn))
-   return PageReserved(pfn_to_page(pfn));
+   return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn));
 
return true;
 }
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-09 Thread Ard Biesheuvel
The ISS encoding for an exception from a Data Abort has a WnR
bit[6] that indicates whether the Data Abort was caused by a
read or a write instruction. While there are several fields
in the encoding that are only valid if the ISV bit[24] is set,
WnR is not one of them, so we can read it unconditionally.

Instead of fixing both implementations of kvm_is_write_fault()
in place, reimplement it just once using kvm_vcpu_dabt_iswrite(),
which already does the right thing with respect to the WnR bit.
Also fix up the callers to pass 'vcpu'

Acked-by: Laszlo Ersek 
Signed-off-by: Ard Biesheuvel 
---
v2: reimplement locally in mmu.c and drop the 2 arch specific version
add ack

 arch/arm/include/asm/kvm_mmu.h   | 11 ---
 arch/arm/kvm/mmu.c   | 12 ++--
 arch/arm64/include/asm/kvm_mmu.h | 13 -
 3 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f5f72f..3f688b458143 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -78,17 +78,6 @@ static inline void kvm_set_pte(pte_t *pte, pte_t new_pte)
flush_pmd_entry(pte);
 }
 
-static inline bool kvm_is_write_fault(unsigned long hsr)
-{
-   unsigned long hsr_ec = hsr >> HSR_EC_SHIFT;
-   if (hsr_ec == HSR_EC_IABT)
-   return false;
-   else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR))
-   return false;
-   else
-   return true;
-}
-
 static inline void kvm_clean_pgd(pgd_t *pgd)
 {
clean_dcache_area(pgd, PTRS_PER_S2_PGD * sizeof(pgd_t));
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 747ef4c97d98..c68ec28f17c3 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -746,6 +746,14 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
phys_addr_t *ipap)
return false;
 }
 
+static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
+{
+   if (kvm_vcpu_trap_is_iabt(vcpu))
+   return false;
+
+   return kvm_vcpu_dabt_iswrite(vcpu);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
@@ -760,7 +768,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
 
-   write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
+   write_fault = kvm_is_write_fault(vcpu);
if (fault_status == FSC_PERM && !write_fault) {
kvm_err("Unexpected L2 read permission error\n");
return -EFAULT;
@@ -886,7 +894,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
gfn = fault_ipa >> PAGE_SHIFT;
memslot = gfn_to_memslot(vcpu->kvm, gfn);
hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
-   write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
+   write_fault = kvm_is_write_fault(vcpu);
if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
if (is_iabt) {
/* Prefetch Abort on I/O address */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 8e138c7c53ac..737da742b293 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -93,19 +93,6 @@ void kvm_clear_hyp_idmap(void);
 #definekvm_set_pte(ptep, pte)  set_pte(ptep, pte)
 #definekvm_set_pmd(pmdp, pmd)  set_pmd(pmdp, pmd)
 
-static inline bool kvm_is_write_fault(unsigned long esr)
-{
-   unsigned long esr_ec = esr >> ESR_EL2_EC_SHIFT;
-
-   if (esr_ec == ESR_EL2_EC_IABT)
-   return false;
-
-   if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR))
-   return false;
-
-   return true;
-}
-
 static inline void kvm_clean_pgd(pgd_t *pgd) {}
 static inline void kvm_clean_pmd_entry(pmd_t *pmd) {}
 static inline void kvm_clean_pte(pte_t *pte) {}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-09 Thread Ard Biesheuvel
On 9 September 2014 11:35, Marc Zyngier  wrote:
> Hi Ard,
>
>
> On 2014-09-08 21:29, Ard Biesheuvel wrote:
>>
>> The ISS encoding for an exception from a Data Abort has a WnR
>> bit[6] that indicates whether the Data Abort was caused by a
>> read or a write instruction. While there are several fields
>> in the encoding that are only valid if the ISV bit[24] is set,
>> WnR is not one of them, so we can read it unconditionally.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>
>> This fixes an issue I observed with UEFI running under QEMU/KVM using
>> NOR flash emulation and the upcoming KVM_CAP_READONLY_MEM support, where
>> NOR flash reads were mistaken for NOR flash writes, resulting in all read
>> accesses to go through the MMIO emulation layer.
>>
>>  arch/arm/include/asm/kvm_mmu.h   | 5 +
>>  arch/arm64/include/asm/kvm_mmu.h | 5 +
>>  2 files changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h
>> b/arch/arm/include/asm/kvm_mmu.h
>> index 5cc0b0f5f72f..fad5648980ad 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -83,10 +83,7 @@ static inline bool kvm_is_write_fault(unsigned long
>> hsr)
>> unsigned long hsr_ec = hsr >> HSR_EC_SHIFT;
>> if (hsr_ec == HSR_EC_IABT)
>> return false;
>> -   else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR))
>> -   return false;
>> -   else
>> -   return true;
>> +   return hsr & HSR_WNR;
>>  }
>>
>>  static inline void kvm_clean_pgd(pgd_t *pgd)
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h
>> b/arch/arm64/include/asm/kvm_mmu.h
>> index 8e138c7c53ac..09fd9e4c13d8 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -100,10 +100,7 @@ static inline bool kvm_is_write_fault(unsigned long
>> esr)
>> if (esr_ec == ESR_EL2_EC_IABT)
>> return false;
>>
>> -   if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR))
>> -   return false;
>> -
>> -   return true;
>> +   return esr & ESR_EL2_WNR;
>>  }
>>
>>  static inline void kvm_clean_pgd(pgd_t *pgd) {}
>
>
> Nice catch. One thing though.
>
> This is a case where code duplication has led to this glaring bug:
> On both arm and arm64, kvm_emulate.h has code that implements this
> correctly, just that we failed to use it. Blame me.
>
> I think this should be rewritten entierely in mmu.c, with something like
> this (fully untested, of course):
>
> static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> {
> if (kvm_vcpu_trap_is_iabt(vcpu))
> return false;
>
> return kvm_vcpu_dabt_iswrite(vcpu);
> }
>
> Care to respin it?
>

Will do.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()

2014-09-08 Thread Ard Biesheuvel
The ISS encoding for an exception from a Data Abort has a WnR
bit[6] that indicates whether the Data Abort was caused by a
read or a write instruction. While there are several fields
in the encoding that are only valid if the ISV bit[24] is set,
WnR is not one of them, so we can read it unconditionally.

Signed-off-by: Ard Biesheuvel 
---

This fixes an issue I observed with UEFI running under QEMU/KVM using
NOR flash emulation and the upcoming KVM_CAP_READONLY_MEM support, where
NOR flash reads were mistaken for NOR flash writes, resulting in all read
accesses to go through the MMIO emulation layer.

 arch/arm/include/asm/kvm_mmu.h   | 5 +
 arch/arm64/include/asm/kvm_mmu.h | 5 +
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f5f72f..fad5648980ad 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -83,10 +83,7 @@ static inline bool kvm_is_write_fault(unsigned long hsr)
unsigned long hsr_ec = hsr >> HSR_EC_SHIFT;
if (hsr_ec == HSR_EC_IABT)
return false;
-   else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR))
-   return false;
-   else
-   return true;
+   return hsr & HSR_WNR;
 }
 
 static inline void kvm_clean_pgd(pgd_t *pgd)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 8e138c7c53ac..09fd9e4c13d8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -100,10 +100,7 @@ static inline bool kvm_is_write_fault(unsigned long esr)
if (esr_ec == ESR_EL2_EC_IABT)
return false;
 
-   if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR))
-   return false;
-
-   return true;
+   return esr & ESR_EL2_WNR;
 }
 
 static inline void kvm_clean_pgd(pgd_t *pgd) {}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] arm/arm64: KVM: Support KVM_CAP_READONLY_MEM

2014-09-07 Thread Ard Biesheuvel
On 26 August 2014 13:43, Christoffer Dall  wrote:
> When userspace loads code and data in a read-only memory regions, KVM
> needs to be able to handle this on arm and arm64.  Specifically this is
> used when running code directly from a read-only flash device; the
> common scenario is a UEFI blob loaded with the -bios option in QEMU.
>
> Note that the MMIO exit on writes to a read-only memory is ABI and can
> be used to emulate block-erase style flash devices.
>
> Cc: Ard Biesheuvel 
> Signed-off-by: Christoffer Dall 

Tested-by: Ard Biesheuvel 

Are these on track for 3.18?


> ---
> Changelog[v3]:
>  - Remove the check for fault_status != FSC_FAULT in the I/O memory
>abort section, since we now do support permission faults on I/O
>regions.  Reported-by: Ard Biesheuvel 
> Changelog[v2]:
>  - None
>
>  arch/arm/include/uapi/asm/kvm.h   |  1 +
>  arch/arm/kvm/arm.c|  1 +
>  arch/arm/kvm/mmu.c| 22 --
>  arch/arm64/include/uapi/asm/kvm.h |  1 +
>  4 files changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index e6ebdd3..51257fd 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -25,6 +25,7 @@
>
>  #define __KVM_HAVE_GUEST_DEBUG
>  #define __KVM_HAVE_IRQ_LINE
> +#define __KVM_HAVE_READONLY_MEM
>
>  #define KVM_REG_SIZE(id)   \
> (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9f788eb..ac306b4 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -188,6 +188,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
> case KVM_CAP_ONE_REG:
> case KVM_CAP_ARM_PSCI:
> case KVM_CAP_ARM_PSCI_0_2:
> +   case KVM_CAP_READONLY_MEM:
> r = 1;
> break;
> case KVM_CAP_COALESCED_MMIO:
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 16e7994..62f5642 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -747,14 +747,13 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
> phys_addr_t *ipap)
>  }
>
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> - struct kvm_memory_slot *memslot,
> + struct kvm_memory_slot *memslot, unsigned long hva,
>   unsigned long fault_status)
>  {
> int ret;
> bool write_fault, writable, hugetlb = false, force_pte = false;
> unsigned long mmu_seq;
> gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> -   unsigned long hva = gfn_to_hva(vcpu->kvm, gfn);
> struct kvm *kvm = vcpu->kvm;
> struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> struct vm_area_struct *vma;
> @@ -863,7 +862,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
> unsigned long fault_status;
> phys_addr_t fault_ipa;
> struct kvm_memory_slot *memslot;
> -   bool is_iabt;
> +   unsigned long hva;
> +   bool is_iabt, write_fault, writable;
> gfn_t gfn;
> int ret, idx;
>
> @@ -884,7 +884,10 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
> idx = srcu_read_lock(&vcpu->kvm->srcu);
>
> gfn = fault_ipa >> PAGE_SHIFT;
> -   if (!kvm_is_visible_gfn(vcpu->kvm, gfn)) {
> +   memslot = gfn_to_memslot(vcpu->kvm, gfn);
> +   hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
> +   write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> +   if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
> if (is_iabt) {
> /* Prefetch Abort on I/O address */
> kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> @@ -892,13 +895,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
> goto out_unlock;
> }
>
> -   if (fault_status != FSC_FAULT) {
> -   kvm_err("Unsupported fault status on io memory: 
> %#lx\n",
> -   fault_status);
> -   ret = -EFAULT;
> -   goto out_unlock;
> -   }
> -
> /*
>  * The IPA is reported as [MAX:12], so we need to
>  * complement it with the bottom 12 bits from the
> @@ -910,9 +906,7 @@ int kvm_handle_guest_abo