On 2/4/21 3:50 AM, Muchun Song wrote:
> Hi all,
> 

[...]

> When a HugeTLB is freed to the buddy system, we should allocate 6 pages for
> vmemmap pages and restore the previous mapping relationship.
> 
> Apart from 2MB HugeTLB page, we also have 1GB HugeTLB page. It is similar
> to the 2MB HugeTLB page. We also can use this approach to free the vmemmap
> pages.
> 
> In this case, for the 1GB HugeTLB page, we can save 4094 pages. This is a
> very substantial gain. On our server, run some SPDK/QEMU applications which
> will use 1024GB hugetlbpage. With this feature enabled, we can save ~16GB
> (1G hugepage)/~12GB (2MB hugepage) memory.
> 
> Because there are vmemmap page tables reconstruction on the freeing/allocating
> path, it increases some overhead. Here are some overhead analysis.

[...]

> Although the overhead has increased, the overhead is not significant. Like 
> Mike
> said, "However, remember that the majority of use cases create hugetlb pages 
> at
> or shortly after boot time and add them to the pool. So, additional overhead 
> is
> at pool creation time. There is no change to 'normal run time' operations of
> getting a page from or returning a page to the pool (think page fault/unmap)".
> 

Despite the overhead and in addition to the memory gains from this series ...
there's an additional benefit there isn't talked here with your vmemmap page
reuse trick. That is page (un)pinners will see an improvement and I presume 
because
there are fewer memmap pages and thus the tail/head pages are staying in cache 
more
often.

Out of the box I saw (when comparing linux-next against linux-next + this 
series)
with gup_test and pinning a 16G hugetlb file (with 1G pages):

        get_user_pages(): ~32k -> ~9k
        unpin_user_pages(): ~75k -> ~70k

Usually any tight loop fetching compound_head(), or reading tail pages data 
(e.g.
compound_head) benefit a lot. There's some unpinning inefficiencies I am 
fixing[0], but
with that in added it shows even more:

        unpin_user_pages(): ~27k -> ~3.8k

FWIW, I was also seeing that with devdax and the ZONE_DEVICE vmemmap page reuse 
equivalent
series[1] but it was mixed with other numbers.

Anyways, JFYI :)

        Joao

[0] 
https://lore.kernel.org/linux-mm/20210204202500.26474-1-joao.m.mart...@oracle.com/
[1] 
https://lore.kernel.org/linux-mm/20201208172901.17384-1-joao.m.mart...@oracle.com/

Reply via email to