On 10 November 2015 at 14:40, Christoffer Dall
<christoffer.d...@linaro.org> wrote:
> On Tue, Nov 10, 2015 at 02:15:45PM +0100, Ard Biesheuvel wrote:
>> On 10 November 2015 at 13:22, Christoffer Dall
>> <christoffer.d...@linaro.org> wrote:
>> > On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
>> >> Hi all,
>> >>
>> >> I wonder if this is a better way to address the problem. It looks at
>> >> the nature of the memory rather than the nature of the mapping, which
>> >> is probably a more reliable indicator of whether cache maintenance is
>> >> required when performing the unmap.
>> >>
>> >>
>> >> -----------8<----------------
>> >> The open coded tests for checking whether a PTE maps a page as
>> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> >> which is not guaranteed to work since the type of a mapping is
>> >> not a set of mutually exclusive bits
>> >>
>> >> For HYP mappings, the type is an index into the MAIR table (i.e, the
>> >> index itself does not contain any information whatsoever about the
>> >> type of the mapping), and for stage-2 mappings it is a bit field where
>> >> normal memory and device types are defined as follows:
>> >>
>> >>     #define MT_S2_NORMAL            0xf
>> >>     #define MT_S2_DEVICE_nGnRE      0x1
>> >>
>> >> I.e., masking *and* comparing with the latter matches on the former,
>> >> and we have been getting lucky merely because the S2 device mappings
>> >> also have the PTE_UXN bit set, or we would misidentify memory mappings
>> >> as device mappings.
>> >>
>> >> Since the unmap_range() code path (which contains one instance of the
>> >> flawed test) is used both for HYP mappings and stage-2 mappings, and
>> >> considering the difference between the two, it is non-trivial to fix
>> >> this by rewriting the tests in place, as it would involve passing
>> >> down the type of mapping through all the functions.
>> >>
>> >> However, since HYP mappings and stage-2 mappings both deal with host
>> >> physical addresses, we can simply check whether the mapping is backed
>> >> by memory that is managed by the host kernel, and only perform the
>> >> D-cache maintenance if this is the case.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> >> ---
>> >>  arch/arm/kvm/mmu.c | 15 +++++++--------
>> >>  1 file changed, 7 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> >> index 6984342da13d..7dace909d5cf 100644
>> >> --- a/arch/arm/kvm/mmu.c
>> >> +++ b/arch/arm/kvm/mmu.c
>> >> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
>> >>       __kvm_flush_dcache_pud(pud);
>> >>  }
>> >>
>> >> +static bool kvm_is_device_pfn(unsigned long pfn)
>> >> +{
>> >> +     return !pfn_valid(pfn);
>> >> +}
>> >> +
>> >>  /**
>> >>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>> >>   * @kvm:     pointer to kvm structure.
>> >> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>> >>                       kvm_tlb_flush_vmid_ipa(kvm, addr);
>> >>
>> >>                       /* No need to invalidate the cache for device 
>> >> mappings */
>> >> -                     if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
>> >> PAGE_S2_DEVICE)
>> >> +                     if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
>> >>                               kvm_flush_dcache_pte(old_pte);
>> >>
>> >>                       put_page(virt_to_page(pte));
>> >> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> >> *pmd,
>> >>
>> >>       pte = pte_offset_kernel(pmd, addr);
>> >>       do {
>> >> -             if (!pte_none(*pte) &&
>> >> -                 (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>> >> +             if (!pte_none(*pte) && 
>> >> !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> >>                       kvm_flush_dcache_pte(*pte);
>> >>       } while (pte++, addr += PAGE_SIZE, addr != end);
>> >>  }
>> >> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu 
>> >> *vcpu)
>> >>       return kvm_vcpu_dabt_iswrite(vcpu);
>> >>  }
>> >>
>> >> -static bool kvm_is_device_pfn(unsigned long pfn)
>> >> -{
>> >> -     return !pfn_valid(pfn);
>> >> -}
>> >> -
>> >>  /**
>> >>   * stage2_wp_ptes - write protect PMD range
>> >>   * @pmd:     pointer to pmd entry
>> >> --
>> >> 1.9.1
>> >>
>> >
>> > So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and
>> > PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory
>> > regions where the vma has (vm_flags & VM_PFNMAP).
>> >
>> > Will these, and only these, cases be covered by the pfn_valid check?
>> >
>>
>> The pfn_valid() check will ensure that cache maintenance is only
>> performed on regions that are known to the host as memory, are managed
>> by the host (i.e., there is a struct page associated with them) and
>> will be accessed by the host via cacheable mappings (they are covered
>> by the linear mapping, or [on ARM] will be kmap'ed cacheable if they
>> are highmem). If you look at the commit that introduced these tests
>> (363ef89f8e9b arm/arm64: KVM: Invalidate data cache on unmap), the
>> concern it addresses is that the guest may perform uncached accesses
>> to regions that the host has mapped cacheable, meaning guest writes
>> may be shadowed by clean cachelines, making them invisble to cache
>> coherent I/O. So afaict, pfn_valid() is an appropriate check here.
>
> right, this I agree with.
>
>>
>> pfn_valid() will not misidentify device regions as memory (unless the
>> host is really broken) so this should fix Pavel's case. The converse
>> case (a memory region misidentified as a device) could only happen for
>> a carve-out (i.e., via the /reserved-memory node) that is mapped
>> inside the guest via a pass-through (VM_PFNMAP) mapping. That case is
>> already dodgy, since the guest accesses would be forced
>> uncached/ordered due to the fact that those mappings are forced
>> PAGE_S2_DEVICE at stage 2 (as you mention), and would also be
>> misidentified by the current code (due to the PAGE_S2_DEVICE
>> attributes)
>>
>
> ok, but such pages should never be swapped out by the host, so I think
> we're still ok here.
>

Yes, I think so, and the patch does not change how that case is handled anyway.

> Will you send an updated proper patch to the list?  You can add my
> RB if you want.
>

Thanks. I will resend with your R-b and Pavel's Tested-by added.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to