On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote:
> From: Zi Yan <z...@nvidia.com>
> 
> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
> pageblocks with different migratetypes. It might unnecessarily convert
> extra pageblocks at the beginning and at the end of the range. Change
> alloc_contig_range() to work at pageblock granularity.
> 
> It is done by restoring pageblock types and split >pageblock_order free
> pages after isolating at MAX_ORDER-1 granularity and migrating pages
> away at pageblock granularity. The reason for this process is that
> during isolation, some pages, either free or in-use, might have >pageblock
> sizes and isolating part of them can cause free accounting issues.
> Restoring the migratetypes of the pageblocks not in the interesting
> range later is much easier.

Hi Zi Yan,

Due to time constraints I only glanced over, so some comments below
about stuff that caught my eye:

> +static inline void split_free_page_into_pageblocks(struct page *free_page,
> +                             int order, struct zone *zone)
> +{
> +     unsigned long pfn;
> +
> +     spin_lock(&zone->lock);
> +     del_page_from_free_list(free_page, zone, order);
> +     for (pfn = page_to_pfn(free_page);
> +          pfn < page_to_pfn(free_page) + (1UL << order);

It migt make sense to have a end_pfn variable so that does not have to
be constantly evaluated. Or maybe the compiler is clever enough to only
evualuate it once.

> +          pfn += pageblock_nr_pages) {
> +             int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
> +
> +             __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
> +                             mt, FPI_NONE);
> +     }
> +     spin_unlock(&zone->lock);

It is possible that free_page's order is already pageblock_order, so I
would add a one-liner upfront to catch that case and return, otherwise
we do the delete_from_freelist-and-free_it_again dance.

> +     /* Save the migratepages of the pageblocks before start and after end */
> +     num_pageblock_to_save = (alloc_start - isolate_start) / 
> pageblock_nr_pages
> +                             + (isolate_end - alloc_end) / 
> pageblock_nr_pages;
> +     saved_mt =
> +             kmalloc_array(num_pageblock_to_save,
> +                           sizeof(unsigned char), GFP_KERNEL);
> +     if (!saved_mt)
> +             return -ENOMEM;
> +
> +     num = save_migratetypes(saved_mt, isolate_start, alloc_start);
> +
> +     num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end);

I really hope we can put all this magic within start_isolate_page_range,
and the counterparts in undo_isolate_page_range.

Also, I kinda dislike the &saved_mt thing. I thought about some other
approaches but nothing that wasn't too specific for this case, and I
guess we want that function to be as generic as possible.

> +     /*
> +      * Split free page spanning [alloc_end, isolate_end) and put the
> +      * pageblocks in the right migratetype list
> +      */
> +     for (outer_end = alloc_end; outer_end < isolate_end;) {
> +             unsigned long begin_pfn = outer_end;
> +
> +             order = 0;
> +             while (!PageBuddy(pfn_to_page(outer_end))) {
> +                     if (++order >= MAX_ORDER) {
> +                             outer_end = begin_pfn;
> +                             break;
> +                     }
> +                     outer_end &= ~0UL << order;
> +             }
> +
> +             if (outer_end != begin_pfn) {
> +                     order = buddy_order(pfn_to_page(outer_end));
> +
> +                     /*
> +                      * split the free page has start page and put the 
> pageblocks
> +                      * in the right migratetype list
> +                      */
> +                     VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn);

How could this possibily happen?

> +                     {
> +                             struct page *free_page = pfn_to_page(outer_end);
> +
> +                             split_free_page_into_pageblocks(free_page, 
> order, cc.zone);
> +                     }
> +                     outer_end += 1UL << order;
> +             } else
> +                     outer_end = begin_pfn + 1;
>       }

I think there are cases could optimize for. If the page has already been
split in pageblock by the outer_start loop, we could skip this outer_end
logic altogether.

E.g: An order-10 page is split in two pageblocks. There's nothing else
to be done, right? We could skip this. 


-- 
Oscar Salvador
SUSE Labs
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to