Re: 答复: [PATCH V3] vfio dma_map/unmap: optimized for hugetlbfs pages

2020-09-01 Thread Hugh Dickins
On Tue, 1 Sep 2020, Maoming (maoming, Cloud Infrastructure Service Product 
Dept.) wrote:
> > 
> > > In the original process of dma_map/unmap pages for VFIO-devices, to
> > > make sure the pages are contiguous, we have to check them one by one.
> > > As a result, dma_map/unmap could spend a long time.
> > > Using the hugetlb pages, we can avoid this problem.
> > > All pages in hugetlb pages are contiguous.And the hugetlb page should
> > > not be split.So we can delete the for loops.
> > 
> > I know nothing about VFIO, but I'm surprised that you're paying such 
> > attention
> > to PageHuge hugetlbfs pages, rather than to PageCompound
> > pages: which would also include Transparent Huge Pages, of the traditional
> > anonymous kind, or the huge tmpfs kind, or the more general (not necessarily
> > pmd-sized) kind that Matthew Wilcox is currently working on.
> > 
> > It's true that hugetlbfs is peculiar enough that whatever you write for it 
> > may
> > need some tweaks to cover the THP case too, or vice versa; but wouldn't your
> > patch be a lot better for covering all cases?
> > 
> > You mention above that "the hugetlb page should not be split":
> > perhaps you have been worried that a THP could be split beneath you?
> > That used to be a possibility some years ago, but nowadays a THP cannot be
> > split while anyone is pinning it with an extra reference.
> > 
> > Hugh
> > 
> 
> 
> Thanks for your suggestions.
> You mention that a THP cannot be split while anyone is pinning it.
> Do you mean the check of can_split_huge_page()(in split_huge_page_to_list())?

Partly, yes: but that's just a racy check to avoid doing wasted work in
common cases; the more important check comes later in page_ref_freeze(),
which either fails, or prevents anyone else taking a new reference to
the THP (head or tails) while all the work on splitting is being done.

> When we want to pin pages, vfio_pin_pages_remote() always gets a normal page 
> first.
> In this case, a THP cannot be split because of the increased reference.
> Is this right?

Yes.

> 
> Maybe I can optimize for hugetlb pages as a first step,
> and then cover all compound pages.

Okay.  That's definitely a good way to start out - but you may find
that the second step leads you to rewrite every line you wrote before!

Hugh


答复: [PATCH V3] vfio dma_map/unmap: optimized for hugetlbfs pages

2020-09-01 Thread Maoming (maoming, Cloud Infrastructure Service Product Dept.)

> 
> > In the original process of dma_map/unmap pages for VFIO-devices, to
> > make sure the pages are contiguous, we have to check them one by one.
> > As a result, dma_map/unmap could spend a long time.
> > Using the hugetlb pages, we can avoid this problem.
> > All pages in hugetlb pages are contiguous.And the hugetlb page should
> > not be split.So we can delete the for loops.
> 
> I know nothing about VFIO, but I'm surprised that you're paying such attention
> to PageHuge hugetlbfs pages, rather than to PageCompound
> pages: which would also include Transparent Huge Pages, of the traditional
> anonymous kind, or the huge tmpfs kind, or the more general (not necessarily
> pmd-sized) kind that Matthew Wilcox is currently working on.
> 
> It's true that hugetlbfs is peculiar enough that whatever you write for it may
> need some tweaks to cover the THP case too, or vice versa; but wouldn't your
> patch be a lot better for covering all cases?
> 
> You mention above that "the hugetlb page should not be split":
> perhaps you have been worried that a THP could be split beneath you?
> That used to be a possibility some years ago, but nowadays a THP cannot be
> split while anyone is pinning it with an extra reference.
> 
> Hugh
> 


Thanks for your suggestions.
You mention that a THP cannot be split while anyone is pinning it.
Do you mean the check of can_split_huge_page()(in split_huge_page_to_list())?
When we want to pin pages, vfio_pin_pages_remote() always gets a normal page 
first.
In this case, a THP cannot be split because of the increased reference.
Is this right?

Maybe I can optimize for hugetlb pages as a first step, and then cover all 
compound pages.







Re: [PATCH V3] vfio dma_map/unmap: optimized for hugetlbfs pages

2020-08-30 Thread Hugh Dickins
On Fri, 28 Aug 2020, Ming Mao wrote:

> In the original process of dma_map/unmap pages for VFIO-devices,
> to make sure the pages are contiguous, we have to check them one by one.
> As a result, dma_map/unmap could spend a long time.
> Using the hugetlb pages, we can avoid this problem.
> All pages in hugetlb pages are contiguous.And the hugetlb
> page should not be split.So we can delete the for loops.

I know nothing about VFIO, but I'm surprised that you're paying such
attention to PageHuge hugetlbfs pages, rather than to PageCompound
pages: which would also include Transparent Huge Pages, of the
traditional anonymous kind, or the huge tmpfs kind, or the more
general (not necessarily pmd-sized) kind that Matthew Wilcox is
currently working on.

It's true that hugetlbfs is peculiar enough that whatever you write
for it may need some tweaks to cover the THP case too, or vice versa;
but wouldn't your patch be a lot better for covering all cases?

You mention above that "the hugetlb page should not be split":
perhaps you have been worried that a THP could be split beneath you?
That used to be a possibility some years ago, but nowadays a THP
cannot be split while anyone is pinning it with an extra reference.

Hugh

> 
> Signed-off-by: Ming Mao 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 393 +++-
>  1 file changed, 382 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5e556ac91..a689b9698 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -479,6 +479,303 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
> long vaddr,
>   return ret;
>  }
>  
> +static bool is_hugetlb_page(unsigned long pfn)
> +{
> + struct page *page;
> +
> + if (!pfn_valid(pfn))
> + return false;
> +
> + page = pfn_to_page(pfn);
> + /* only check for hugetlb pages */
> + return page && PageHuge(page);
> +}
> +
> +static bool vaddr_is_hugetlb_page(unsigned long vaddr, int prot)
> +{
> + unsigned long pfn;
> + int ret;
> + bool result;
> +
> + if (!current->mm)
> + return false;
> +
> + ret = vaddr_get_pfn(current->mm, vaddr, prot, &pfn);
> + if (ret)
> + return false;
> +
> + result = is_hugetlb_page(pfn);
> +
> + put_pfn(pfn, prot);
> +
> + return result;
> +}
> +
> +/*
> + * get the number of residual PAGE_SIZE-pages in a hugetlb page
> + * (including the page which pointed by this address)
> + * @address: we count residual pages from this address to the end of
> + * a hugetlb page
> + * @order: the order of the same hugetlb page
> + */
> +static long
> +hugetlb_get_residual_pages(unsigned long address, unsigned int order)
> +{
> + unsigned long hugetlb_npage;
> + unsigned long hugetlb_mask;
> +
> + if (!order)
> + return -EINVAL;
> +
> + hugetlb_npage = 1UL << order;
> + hugetlb_mask = hugetlb_npage - 1;
> + address = address >> PAGE_SHIFT;
> +
> + /*
> +  * Since we count the page pointed by this address, the number of
> +  * residual PAGE_SIZE-pages is greater than or equal to 1.
> +  */
> + return hugetlb_npage - (address & hugetlb_mask);
> +}
> +
> +static unsigned int
> +hugetlb_page_get_externally_pinned_num(struct vfio_dma *dma,
> + unsigned long start,
> + unsigned long npage)
> +{
> + struct vfio_pfn *vpfn;
> + struct rb_node *node;
> + unsigned long end;
> + unsigned int num = 0;
> +
> + if (!dma || !npage)
> + return 0;
> +
> + end = start + npage - 1;
> + /* If we find a page in dma->pfn_list, this page has been pinned 
> externally */
> + for (node = rb_first(&dma->pfn_list); node; node = rb_next(node)) {
> + vpfn = rb_entry(node, struct vfio_pfn, node);
> + if ((vpfn->pfn >= start) && (vpfn->pfn <= end))
> + num++;
> + }
> +
> + return num;
> +}
> +
> +static long hugetlb_page_vaddr_get_pfn(struct mm_struct *mm, unsigned long 
> vaddr,
> + int prot, long npage, unsigned long pfn)
> +{
> + long hugetlb_residual_npage;
> + struct page *head;
> + int ret = 0;
> + unsigned int contiguous_npage;
> + struct page **pages = NULL;
> + unsigned int flags = 0;
> +
> + if ((npage < 0) || !pfn_valid(pfn))
> + return -EINVAL;
> +
> + /* all pages are done? */
> + if (!npage)
> + goto out;
> + /*
> +  * Since pfn is valid,
> +  * hugetlb_residual_npage is greater than or equal to 1.
> +  */
> + head = compound_head(pfn_to_page(pfn));
> + hugetlb_residual_npage = hugetlb_get_residual_pages(vaddr,
> + compound_order(head));
> + /* The page of vaddr has been gotten by vaddr_get_pfn */
> + contiguous_npage = min_t(long, (hugetlb

[PATCH V3] vfio dma_map/unmap: optimized for hugetlbfs pages

2020-08-28 Thread Ming Mao
In the original process of dma_map/unmap pages for VFIO-devices,
to make sure the pages are contiguous, we have to check them one by one.
As a result, dma_map/unmap could spend a long time.
Using the hugetlb pages, we can avoid this problem.
All pages in hugetlb pages are contiguous.And the hugetlb
page should not be split.So we can delete the for loops.

Signed-off-by: Ming Mao 
---
 drivers/vfio/vfio_iommu_type1.c | 393 +++-
 1 file changed, 382 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5e556ac91..a689b9698 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -479,6 +479,303 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
return ret;
 }
 
+static bool is_hugetlb_page(unsigned long pfn)
+{
+   struct page *page;
+
+   if (!pfn_valid(pfn))
+   return false;
+
+   page = pfn_to_page(pfn);
+   /* only check for hugetlb pages */
+   return page && PageHuge(page);
+}
+
+static bool vaddr_is_hugetlb_page(unsigned long vaddr, int prot)
+{
+   unsigned long pfn;
+   int ret;
+   bool result;
+
+   if (!current->mm)
+   return false;
+
+   ret = vaddr_get_pfn(current->mm, vaddr, prot, &pfn);
+   if (ret)
+   return false;
+
+   result = is_hugetlb_page(pfn);
+
+   put_pfn(pfn, prot);
+
+   return result;
+}
+
+/*
+ * get the number of residual PAGE_SIZE-pages in a hugetlb page
+ * (including the page which pointed by this address)
+ * @address: we count residual pages from this address to the end of
+ * a hugetlb page
+ * @order: the order of the same hugetlb page
+ */
+static long
+hugetlb_get_residual_pages(unsigned long address, unsigned int order)
+{
+   unsigned long hugetlb_npage;
+   unsigned long hugetlb_mask;
+
+   if (!order)
+   return -EINVAL;
+
+   hugetlb_npage = 1UL << order;
+   hugetlb_mask = hugetlb_npage - 1;
+   address = address >> PAGE_SHIFT;
+
+   /*
+* Since we count the page pointed by this address, the number of
+* residual PAGE_SIZE-pages is greater than or equal to 1.
+*/
+   return hugetlb_npage - (address & hugetlb_mask);
+}
+
+static unsigned int
+hugetlb_page_get_externally_pinned_num(struct vfio_dma *dma,
+   unsigned long start,
+   unsigned long npage)
+{
+   struct vfio_pfn *vpfn;
+   struct rb_node *node;
+   unsigned long end;
+   unsigned int num = 0;
+
+   if (!dma || !npage)
+   return 0;
+
+   end = start + npage - 1;
+   /* If we find a page in dma->pfn_list, this page has been pinned 
externally */
+   for (node = rb_first(&dma->pfn_list); node; node = rb_next(node)) {
+   vpfn = rb_entry(node, struct vfio_pfn, node);
+   if ((vpfn->pfn >= start) && (vpfn->pfn <= end))
+   num++;
+   }
+
+   return num;
+}
+
+static long hugetlb_page_vaddr_get_pfn(struct mm_struct *mm, unsigned long 
vaddr,
+   int prot, long npage, unsigned long pfn)
+{
+   long hugetlb_residual_npage;
+   struct page *head;
+   int ret = 0;
+   unsigned int contiguous_npage;
+   struct page **pages = NULL;
+   unsigned int flags = 0;
+
+   if ((npage < 0) || !pfn_valid(pfn))
+   return -EINVAL;
+
+   /* all pages are done? */
+   if (!npage)
+   goto out;
+   /*
+* Since pfn is valid,
+* hugetlb_residual_npage is greater than or equal to 1.
+*/
+   head = compound_head(pfn_to_page(pfn));
+   hugetlb_residual_npage = hugetlb_get_residual_pages(vaddr,
+   compound_order(head));
+   /* The page of vaddr has been gotten by vaddr_get_pfn */
+   contiguous_npage = min_t(long, (hugetlb_residual_npage - 1), npage);
+   /* There is on page left in this hugetlb page. */
+   if (!contiguous_npage)
+   goto out;
+
+   pages = kvmalloc_array(contiguous_npage, sizeof(struct page *), 
GFP_KERNEL);
+   if (!pages)
+   return -ENOMEM;
+
+   if (prot & IOMMU_WRITE)
+   flags |= FOLL_WRITE;
+
+   mmap_read_lock(mm);
+   /* The number of pages pinned may be less than contiguous_npage */
+   ret = pin_user_pages_remote(NULL, mm, vaddr + PAGE_SIZE, 
contiguous_npage,
+   flags | FOLL_LONGTERM, pages, NULL, NULL);
+   mmap_read_unlock(mm);
+out:
+   if (pages)
+   kvfree(pages);
+   return ret;
+}
+
+/*
+ * put pfns for a hugetlb page
+ * @start:the PAGE_SIZE-page we start to put,can be any page in this hugetlb 
page
+ * @npage:the number of PAGE_SIZE-pages need to put
+ * @prot:IOMMU_READ/WRITE
+ */
+static int hugetlb_put_pfn(unsigned long start, unsigne