Marcelo Tosatti wrote:
> On Wed, Jan 30, 2008 at 03:00:49PM -0800, Jeremy Fitzhardinge wrote:
>
>> Marcelo Tosatti wrote:
>>
>>> Forgot to copy you... Ideally all pte updates should be done via the
>>> paravirt interface.
>>>
>>>
>> Hm, are you sure?
>>
>>
>>> +static inline void pte_clear_bit(unsigned int bit, pte_t *ptep)
>>> +{
>>> + pte_t pte = *ptep;
>>> + clear_bit(bit, (unsigned long *)&pte.pte);
>>> + set_pte(ptep, pte);
>>> +}
>>>
>>>
>> This is not a safe transformation. This will lose hardware A/D bit
>> changes if the pte is part of an active pagetable. Aside from that,
>> clear_bit is atomic and so is the wrong thing to use on a local
>> variable; a plain bitmask would do the job.
>>
>
> True this is racy on baremetal.
>
And under Xen (since the pagetable is the real hardware pagetable).
>> clear_bit on a PTE is fine, since everyone has to support some level of
>> trap'n'emulate on pte-level entries (ptep_get_and_clear is very hard to
>> implement otherwise). Shadow pagetable implementations will do this
>> naturally, and Xen must do it to allow atomic updates on ptes (also
>> because it makes late-pinning/early-unpinning a performance win).
>>
>
> Sure clear_bit on a PTE is fine, but we wan't to virtualize for batching
> pte updates via hypercall.
>
Yes, mprotect needs batching, but that probably needs special handling.
>
>>> +
>>> static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>>> pte_t *ptep, pte_t pte)
>>> {
>>> Index: linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> ===================================================================
>>> --- linux-2.6-x86-kvm.orig/include/asm-x86/pgtable.h
>>> +++ linux-2.6-x86-kvm/include/asm-x86/pgtable.h
>>> @@ -227,6 +227,8 @@ void native_pagetable_setup_done(pgd_t *
>>> #define pte_update(mm, addr, ptep) do { } while (0)
>>> #define pte_update_defer(mm, addr, ptep) do { } while (0)
>>>
>>> +#define pte_clear_bit(bit, ptep) clear_bit(bit, (unsigned long
>>> *)&ptep->pte)
>>> +
>>> static inline void paravirt_pagetable_setup_start(pgd_t *base)
>>> {
>>> native_pagetable_setup_start(base);
>>> @@ -302,7 +304,7 @@ static inline void native_set_pte_at(str
>>> ({ \
>>> int __changed = !pte_same(*(ptep), entry); \
>>> if (__changed && dirty) { \
>>> - *ptep = entry; \
>>> + set_pte(ptep, entry); \
>>>
>>>
>> This change is OK, but doesn't really do anything. All hypervisors
>> allow direct access to ptes.
>>
>
> We want the pte update to go through a hypercall instead of trap'n'emulate.
>
ptep_set_access_flags is only used in fault handling, so there's no
scope for batching.
>
>>> pte_update_defer((vma)->vm_mm, (address), (ptep)); \
>>> flush_tlb_page(vma, address); \
>>> } \
>>> @@ -357,7 +359,7 @@ static inline pte_t ptep_get_and_clear_f
>>> #define __HAVE_ARCH_PTEP_SET_WRPROTECT
>>> static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long
>>> addr, pte_t *ptep)
>>> {
>>> - clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
>>> + pte_clear_bit(_PAGE_BIT_RW, ptep);
>>>
>>>
>> This is only valid if ptep_set_wrprotect is never used on active ptes,
>> which I don't think is the case.
>>
>
> Ok, pte_clear_bit() like above is broken. But the point is that we
> want to batch the write protection of the source pte along with the
> creation of the write protected destination pte on COW using the
> paravirt_enter_lazy/paravirt_leave_lazy.
>
Which "we" are we talking about here? In Xen, the destination pte is
unpinned during a fork(), so a plain memory write will work without any
hypervisor interaction. That's what I mean about the "late pin/early
unpin" optimisation. Almost all fork/exit pagetable manipulation
happens on unpinned pagetables which are plain RW pages. It's only
updates on pinned pagetables which need hypervisor interaction. The
only significant batchable update is mprotect.
I don't know what the issues for KVM/VMI are.
J
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel