Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()
On 5/16/23 07:35, David Hildenbrand wrote: ... >>> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked() >>> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with >>> the mapcount of the page. > > For completeness: s/mapcount/refcount/ :) whew, you had me going there! Now it all adds up. :) thanks, -- John Hubbard NVIDIA
Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()
On 16.05.23 16:30, Sean Christopherson wrote: On Tue, May 16, 2023, David Hildenbrand wrote: On 15.05.23 21:07, Sean Christopherson wrote: On Sun, May 14, 2023, Lorenzo Stoakes wrote: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cb5c13eee193..eaa5bb8dbadc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) { int rc, flags = FOLL_HWPOISON | FOLL_WRITE; - rc = get_user_pages(addr, 1, flags, NULL, NULL); + rc = get_user_pages(addr, 1, flags, NULL); return rc == -EHWPOISON; Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns a valid page, KVM will leak the refcount and unintentionally pin the page. That's When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked() won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with the mapcount of the page. For completeness: s/mapcount/refcount/ :) Ah, that's what I'm missing. -- Thanks, David / dhildenb
Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()
On Tue, May 16, 2023, David Hildenbrand wrote: > On 15.05.23 21:07, Sean Christopherson wrote: > > On Sun, May 14, 2023, Lorenzo Stoakes wrote: > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index cb5c13eee193..eaa5bb8dbadc 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned > > > long addr) > > > { > > > int rc, flags = FOLL_HWPOISON | FOLL_WRITE; > > > - rc = get_user_pages(addr, 1, flags, NULL, NULL); > > > + rc = get_user_pages(addr, 1, flags, NULL); > > > return rc == -EHWPOISON; > > > > Unrelated to this patch, I think there's a pre-existing bug here. If gup() > > returns > > a valid page, KVM will leak the refcount and unintentionally pin the page. > > That's > > When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked() > won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with > the mapcount of the page. Ah, that's what I'm missing. > So even if get_user_pages() returns "1", we should be fine. > > > Or am I misunderstanding your concern? Nope, you covered everything. I do think we can drop the extra gup() though, AFAICT it's 100% redundant. But it's not a bug. Thanks!
Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()
On 15.05.23 21:07, Sean Christopherson wrote: On Sun, May 14, 2023, Lorenzo Stoakes wrote: No invocation of get_user_pages() use the vmas parameter, so remove it. The GUP API is confusing and caveated. Recent changes have done much to improve that, however there is more we can do. Exporting vmas is a prime target as the caller has to be extremely careful to preclude their use after the mmap_lock has expired or otherwise be left with dangling pointers. Removing the vmas parameter focuses the GUP functions upon their primary purpose - pinning (and outputting) pages as well as performing the actions implied by the input flags. This is part of a patch series aiming to remove the vmas parameter altogether. Suggested-by: Matthew Wilcox (Oracle) Acked-by: Greg Kroah-Hartman Acked-by: David Hildenbrand Reviewed-by: Jason Gunthorpe Acked-by: Christian K�nig (for radeon parts) Acked-by: Jarkko Sakkinen Signed-off-by: Lorenzo Stoakes --- arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/misc/sgi-gru/grufault.c | 2 +- include/linux/mm.h | 3 +-- mm/gup.c| 9 +++-- mm/gup_test.c | 5 ++--- virt/kvm/kvm_main.c | 2 +- 7 files changed, 10 insertions(+), 15 deletions(-) Acked-by: Sean Christopherson (KVM) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cb5c13eee193..eaa5bb8dbadc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr) { int rc, flags = FOLL_HWPOISON | FOLL_WRITE; - rc = get_user_pages(addr, 1, flags, NULL, NULL); + rc = get_user_pages(addr, 1, flags, NULL); return rc == -EHWPOISON; Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns a valid page, KVM will leak the refcount and unintentionally pin the page. That's When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked() won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with the mapcount of the page. So even if get_user_pages() returns "1", we should be fine. Or am I misunderstanding your concern? At least hva_to_pfn_slow() most certainly didn't return "1" if we end up calling check_user_page_hwpoison(), so nothing would have been pinned there as well. -- Thanks, David / dhildenb
Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()
On Sun, May 14, 2023, Lorenzo Stoakes wrote: > No invocation of get_user_pages() use the vmas parameter, so remove it. > > The GUP API is confusing and caveated. Recent changes have done much to > improve that, however there is more we can do. Exporting vmas is a prime > target as the caller has to be extremely careful to preclude their use > after the mmap_lock has expired or otherwise be left with dangling > pointers. > > Removing the vmas parameter focuses the GUP functions upon their primary > purpose - pinning (and outputting) pages as well as performing the actions > implied by the input flags. > > This is part of a patch series aiming to remove the vmas parameter > altogether. > > Suggested-by: Matthew Wilcox (Oracle) > Acked-by: Greg Kroah-Hartman > Acked-by: David Hildenbrand > Reviewed-by: Jason Gunthorpe > Acked-by: Christian K�nig (for radeon parts) > Acked-by: Jarkko Sakkinen > Signed-off-by: Lorenzo Stoakes > --- > arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- > drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- > drivers/misc/sgi-gru/grufault.c | 2 +- > include/linux/mm.h | 3 +-- > mm/gup.c| 9 +++-- > mm/gup_test.c | 5 ++--- > virt/kvm/kvm_main.c | 2 +- > 7 files changed, 10 insertions(+), 15 deletions(-) Acked-by: Sean Christopherson (KVM) > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index cb5c13eee193..eaa5bb8dbadc 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned > long addr) > { > int rc, flags = FOLL_HWPOISON | FOLL_WRITE; > > - rc = get_user_pages(addr, 1, flags, NULL, NULL); > + rc = get_user_pages(addr, 1, flags, NULL); > return rc == -EHWPOISON; Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns a valid page, KVM will leak the refcount and unintentionally pin the page. That's highly unlikely as check_user_page_hwpoison() is called iff get_user_pages_unlocked() fails (called by hva_to_pfn_slow()), but it's theoretically possible that userspace could change the VMAs between hva_to_pfn_slow() and check_user_page_hwpoison() since KVM doesn't hold any relevant locks at this point. E.g. if there's no VMA during hva_to_pfn_{fast,slow}(), npages==-EFAULT and KVM will invoke check_user_page_hwpoison(). If userspace installs a valid mapping after hva_to_pfn_slow() but before KVM acquires mmap_lock, then gup() will find a valid page. I _think_ the fix is to simply delete this code. The bug was introduced by commit fafc3dbaac64 ("KVM: Replace is_hwpoison_address with __get_user_pages"). At that time, KVM didn't check for "npages == -EHWPOISON" from the first call to get_user_pages_unlocked(). Later on, commit 0857b9e95c1a ("KVM: Enable async page fault processing") reworked the caller to be: mmap_read_lock(current->mm); if (npages == -EHWPOISON || (!async && check_user_page_hwpoison(addr))) { pfn = KVM_PFN_ERR_HWPOISON; goto exit; } where async really means NOWAIT, so that the hwpoison use of gup() didn't sleep. KVM: Enable async page fault processing If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to avoid sleeping on IO. Check for hwpoison is done at the same time, otherwise check_user_page_hwpoison() will call GUP again and will put vcpu to sleep. There are other potential problems too, e.g. the hwpoison call doesn't honor the recently introduced @interruptible flag. I don't see any reason to keep check_user_page_hwpoison(), KVM can simply rely on the "npages == -EHWPOISON" check. get_user_pages_unlocked() is guaranteed to be called with roughly equivalent flags, and the flags that aren't equivalent are arguably bugs in check_user_page_hwpoison(), e.g. assuming FOLL_WRITE is wrong. TL;DR: Go ahead with this change, I'll submit a separate patch to delete the buggy KVM code.