Re: unit tests and get_user_pages_ptes_fast()
On 10/05/2010 01:59 AM, Marcelo Tosatti wrote: Yep, the drawback is the unnecessary write fault. What i have here is: --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -827,7 +827,7 @@ unsigned long gfn_to_hva(struct kvm *kvm } EXPORT_SYMBOL_GPL(gfn_to_hva); -pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) +pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn, int *writable) { struct page *page[1]; unsigned long addr; @@ -842,8 +842,16 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t return page_to_pfn(bad_page); } + *writable = 1; npages = get_user_pages_fast(addr, 1, 1, page); + /* attempt to map read-only */ + if (unlikely(npages != 1)) { + npages = get_user_pages_fast(addr, 1, 0, page); + if (npages == 1) + *writable = 0; + } + if (unlikely(npages != 1)) { struct vm_area_struct *vma; Can rebase and resend, if you'd like. That will work for me but not for ksm. I guess it's good to get things going, so please to post it. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unit tests and get_user_pages_ptes_fast()
On Tue, Oct 05, 2010 at 09:36:59AM +0200, Avi Kivity wrote: On 10/05/2010 01:59 AM, Marcelo Tosatti wrote: Yep, the drawback is the unnecessary write fault. What i have here is: --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -827,7 +827,7 @@ unsigned long gfn_to_hva(struct kvm *kvm } EXPORT_SYMBOL_GPL(gfn_to_hva); -pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) +pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn, int *writable) { struct page *page[1]; unsigned long addr; @@ -842,8 +842,16 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t return page_to_pfn(bad_page); } + *writable = 1; npages = get_user_pages_fast(addr, 1, 1, page); + /* attempt to map read-only */ + if (unlikely(npages != 1)) { + npages = get_user_pages_fast(addr, 1, 0, page); + if (npages == 1) + *writable = 0; + } + if (unlikely(npages != 1)) { struct vm_area_struct *vma; Can rebase and resend, if you'd like. That will work for me but not for ksm. I guess it's good to get things going, so please to post it. It'll not be so advantageous for ksm because there should be read-faults very rarely on that case. Will post. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unit tests and get_user_pages_ptes_fast()
On Tue, Oct 05, 2010 at 06:22:17AM -0300, Marcelo Tosatti wrote: It'll not be so advantageous for ksm because there should be read-faults very rarely on that case. It'll also make all clean swapcache dirty for no good. Will post. If we've to walk pagetables twice, why don't you do this: writable=1 get_user_pages_fast(write=write_fault) if (!write_fault) writable = __get_user_pages_fast(write=1) That will solve the debugging knob and it'll solve ksm and it'll be optimal for read swapins on exclusive clean swapcache too. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unit tests and get_user_pages_ptes_fast()
On 10/05/2010 04:15 PM, Andrea Arcangeli wrote: On Tue, Oct 05, 2010 at 06:22:17AM -0300, Marcelo Tosatti wrote: It'll not be so advantageous for ksm because there should be read-faults very rarely on that case. It'll also make all clean swapcache dirty for no good. Will post. If we've to walk pagetables twice, why don't you do this: writable=1 get_user_pages_fast(write=write_fault) if (!write_fault) writable = __get_user_pages_fast(write=1) That will solve the debugging knob and it'll solve ksm and it'll be optimal for read swapins on exclusive clean swapcache too. But it means an extra vmexit in the following case: - read fault - page is present and writeable in the Linux page table which is very common. For this you need get_user_pages_ptes_fast(). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unit tests and get_user_pages_ptes_fast()
On Tue, Oct 05, 2010 at 04:25:05PM +0200, Avi Kivity wrote: On 10/05/2010 04:15 PM, Andrea Arcangeli wrote: On Tue, Oct 05, 2010 at 06:22:17AM -0300, Marcelo Tosatti wrote: It'll not be so advantageous for ksm because there should be read-faults very rarely on that case. It'll also make all clean swapcache dirty for no good. Will post. If we've to walk pagetables twice, why don't you do this: writable=1 get_user_pages_fast(write=write_fault) if (!write_fault) writable = __get_user_pages_fast(write=1) That will solve the debugging knob and it'll solve ksm and it'll be optimal for read swapins on exclusive clean swapcache too. But it means an extra vmexit in the following case: - read fault - page is present and writeable in the Linux page table which is very common. For this you need get_user_pages_ptes_fast(). With a read fault, the VM already sets the pte as writable if the VM permissions allows that and the page isn't shared (i.e. if it's an exclusive swap page). We've just to check if it did that or not. So when it's a read fault we've to run __get_user_pages_fast(write=1) before we can assume the page is mapped writable in the pte. So I don't see the problem... in terms of page faults is optimal. Only downside is having to walk the pagetables twice, the second time to verify if the first gup_fast has marked the host pte writable or not. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unit tests and get_user_pages_ptes_fast()
On 10/05/2010 04:32 PM, Andrea Arcangeli wrote: On Tue, Oct 05, 2010 at 04:25:05PM +0200, Avi Kivity wrote: On 10/05/2010 04:15 PM, Andrea Arcangeli wrote: On Tue, Oct 05, 2010 at 06:22:17AM -0300, Marcelo Tosatti wrote: It'll not be so advantageous for ksm because there should be read-faults very rarely on that case. It'll also make all clean swapcache dirty for no good. Will post. If we've to walk pagetables twice, why don't you do this: writable=1 get_user_pages_fast(write=write_fault) if (!write_fault) writable = __get_user_pages_fast(write=1) That will solve the debugging knob and it'll solve ksm and it'll be optimal for read swapins on exclusive clean swapcache too. But it means an extra vmexit in the following case: - read fault - page is present and writeable in the Linux page table which is very common. For this you need get_user_pages_ptes_fast(). With a read fault, the VM already sets the pte as writable if the VM permissions allows that and the page isn't shared (i.e. if it's an exclusive swap page). We've just to check if it did that or not. So when it's a read fault we've to run __get_user_pages_fast(write=1) before we can assume the page is mapped writable in the pte. So I don't see the problem... in terms of page faults is optimal. Only downside is having to walk the pagetables twice, the second time to verify if the first gup_fast has marked the host pte writable or not. You're right, I misread your pseudocode. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
unit tests and get_user_pages_ptes_fast()
During the last kvm forum, I described a unit test framework that can help test the kvm APIs. Briefly, it starts a process in host userspace, which sets up a memory slot mapping gpa 0:3G to hva 0:3G. It then sets up guest registers for unpaged protected mode (or paged protected mode with 1:1 mapping). The effect is that we have a 1:1 gva-hva translation, and can use KVM_RUN to run host code in guest mode. There is a snag however. kvm calls get_user_pages_fast(.write = 1), and the host process maps its code pages read-only. The way I'd like to work around this is to map read-only accesses to read-only pages as read-only. This also prevents ksm cow pages from being broken by read accesses. It can also be used to get the page size for transparent huge pages (and later hugetlbfs too). So, for a read fault we do: pte_t pte; get_user_pages_ptes_fast(..., page, pte, 1, .write = 0) ... if (pte_transhuge(pte)) // or however it's called ... ... if (pte_write(pte)) map writeable else map readonly Any snags? or alternatives? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unit tests and get_user_pages_ptes_fast()
Hi Avi, On Mon, Oct 04, 2010 at 11:35:28AM +0200, Avi Kivity wrote: During the last kvm forum, I described a unit test framework that can help test the kvm APIs. Briefly, it starts a process in host userspace, which sets up a memory slot mapping gpa 0:3G to hva 0:3G. It then sets up guest registers for unpaged protected mode (or paged protected mode with 1:1 mapping). The effect is that we have a 1:1 gva-hva translation, and can use KVM_RUN to run host code in guest mode. There is a snag however. kvm calls get_user_pages_fast(.write = 1), and the host process maps its code pages read-only. The way I'd like to work around this is to map read-only accesses to read-only pages as read-only. This also prevents ksm cow pages from being broken by read accesses. It can also be used to get the page size for transparent huge pages (and later hugetlbfs too). So, for a read fault we do: pte_t pte; get_user_pages_ptes_fast(..., page, pte, 1, .write = 0) ... if (pte_transhuge(pte)) // or however it's called ... ... if (pte_write(pte)) map writeable else map readonly Any snags? or alternatives? I think we tried to do this write=0 before, it provides benefits for swapins from swapcache too (after the page is unmapped but the swapcache is present and uptodate and clean). If I recall correctly the only obstacle was the fact the out of sync code wants to map the spte writable if it can, but if we use write=0 in gup, it won't know if it can. If it'd wait a real write access from guest, we'd risk creating a spte wrprotect fault for nothing (it should only write a write access from the guest if the linux VM solves the page fault with a readonly pte, even the minor-read-fault from exclusive swapcache will not set the dirty bit but it will still set the pte_write on the pte so the out of sync code should take advantage of that information in the host pte). In the past you suggested some TRY_WRITE operation instead of only read/write. We just need to pass the write bit of the pte somehow to the gup caller. Full agreement that once we use write=0 for read faults your problem with the debugging userland running in guest mode will have better chance to work too :). Andrea -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: unit tests and get_user_pages_ptes_fast()
On Mon, Oct 04, 2010 at 03:40:52PM +0200, Andrea Arcangeli wrote: Hi Avi, On Mon, Oct 04, 2010 at 11:35:28AM +0200, Avi Kivity wrote: During the last kvm forum, I described a unit test framework that can help test the kvm APIs. Briefly, it starts a process in host userspace, which sets up a memory slot mapping gpa 0:3G to hva 0:3G. It then sets up guest registers for unpaged protected mode (or paged protected mode with 1:1 mapping). The effect is that we have a 1:1 gva-hva translation, and can use KVM_RUN to run host code in guest mode. There is a snag however. kvm calls get_user_pages_fast(.write = 1), and the host process maps its code pages read-only. The way I'd like to work around this is to map read-only accesses to read-only pages as read-only. This also prevents ksm cow pages from being broken by read accesses. It can also be used to get the page size for transparent huge pages (and later hugetlbfs too). So, for a read fault we do: pte_t pte; get_user_pages_ptes_fast(..., page, pte, 1, .write = 0) ... if (pte_transhuge(pte)) // or however it's called ... ... if (pte_write(pte)) map writeable else map readonly Any snags? or alternatives? I think we tried to do this write=0 before, it provides benefits for swapins from swapcache too (after the page is unmapped but the swapcache is present and uptodate and clean). If I recall correctly the only obstacle was the fact the out of sync code wants to map the spte writable if it can, but if we use write=0 in gup, it won't know if it can. If it'd wait a real write access from guest, we'd risk creating a spte wrprotect fault for nothing (it should only write a write access from the guest if the linux VM solves the page fault with a readonly pte, even the minor-read-fault from exclusive swapcache will not set the dirty bit but it will still set the pte_write on the pte so the out of sync code should take advantage of that information in the host pte). In the past you suggested some TRY_WRITE operation instead of only read/write. We just need to pass the write bit of the pte somehow to the gup caller. Yep, the drawback is the unnecessary write fault. What i have here is: --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -827,7 +827,7 @@ unsigned long gfn_to_hva(struct kvm *kvm } EXPORT_SYMBOL_GPL(gfn_to_hva); -pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) +pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn, int *writable) { struct page *page[1]; unsigned long addr; @@ -842,8 +842,16 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t return page_to_pfn(bad_page); } + *writable = 1; npages = get_user_pages_fast(addr, 1, 1, page); + /* attempt to map read-only */ + if (unlikely(npages != 1)) { + npages = get_user_pages_fast(addr, 1, 0, page); + if (npages == 1) + *writable = 0; + } + if (unlikely(npages != 1)) { struct vm_area_struct *vma; Can rebase and resend, if you'd like. Full agreement that once we use write=0 for read faults your problem with the debugging userland running in guest mode will have better chance to work too :). Andrea -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html