Re: [RFC PATCH 00/31] Generating physically contiguous memory after page allocation
On 2/19/19 9:19 PM, Zi Yan wrote: > On 19 Feb 2019, at 19:18, Mike Kravetz wrote: >> Another high level question. One of the benefits of this approach is >> that exchanging pages does not require N free pages as you describe >> above. This assumes that the vma which we are trying to make contiguous >> is already populated. If it is not populated, then you also need to >> have N free pages. Correct? If this is true, then is the expected use >> case to first populate a vma, and then try to make contiguous? I would >> assume that if it is not populated and a request to make contiguous is >> given, we should try to allocate/populate the vma with contiguous pages >> at that time? > > Yes, I assume the pages within the VMA are already populated but not > contiguous > yet. > > My approach considers memory contiguity as an on-demand resource. In some > phases > of an application, accelerators or RDMA controllers would process/transfer > data > in one > or more VMAs, at which time contiguous memory can help reduce address > translation > overheads or lift certain constraints. And different VMAs could be processed > at > different program phases, thus it might be hard to get contiguous memory for > all > these VMAs at the allocation time using alloc_contig_pages(). My approach can > help get contiguous memory later, when the demand comes. > > For some cases, you definitely can use alloc_contig_pages() to give users > a contiguous area at page allocation time, if you know the user is going to > use > this > area for accelerator data processing or as a RDMA buffer and the area size is > fixed. > > In addition, we can also use khugepaged approach, having a daemon periodically > scan VMAs and use alloc_contig_pages() to convert non-contiguous pages in a > VMA > to contiguous pages, but it would require N free pages during the conversion. > > In sum, my approach complements alloc_contig_pages() and provides more > flexibility. > It is not trying to replaces alloc_contig_pages(). Thank you for the explanation. That makes sense. I have mostly been thinking about contiguous memory from an allocation perspective and did not really consider other use cases. -- Mike Kravetz
Re: [RFC PATCH 00/31] Generating physically contiguous memory after page allocation
On 19 Feb 2019, at 19:18, Mike Kravetz wrote: On 2/19/19 6:33 PM, Zi Yan wrote: On 19 Feb 2019, at 17:42, Mike Kravetz wrote: On 2/15/19 2:08 PM, Zi Yan wrote: Thanks for working on this issue! I have not yet had a chance to take a look at the code. However, I do have some general questions/comments on the approach. Thanks for replying. The code is very intrusive and has a lot of hacks, so it is OK for us to discuss the general idea first. :) Patch structure The patchset I developed to generate physically contiguous memory/arbitrary sized pages merely moves pages around. There are three components in this patchset: 1) a new page migration mechanism, called exchange pages, that exchanges the content of two in-use pages instead of performing two back-to-back page migration. It saves on overheads and avoids page reclaim and memory compaction in the page allocation path, although it is not strictly required if enough free memory is available in the system. 2) a new mechanism that utilizes both page migration and exchange pages to produce physically contiguous memory/arbitrary sized pages without allocating any new pages, unlike what khugepaged does. It works on per-VMA basis, creating physically contiguous memory out of each VMA, which is virtually contiguous. A simple range tree is used to ensure no two VMAs are overlapping with each other in the physical address space. This appears to be a new approach to generating contiguous areas. Previous attempts had relied on finding a contiguous area that can then be used for various purposes including user mappings. Here, you take an existing mapping and make it contiguous. [RFC PATCH 04/31] mm: add mem_defrag functionality talks about creating a (VPN, PFN) anchor pair for each vma and then using this pair as the base for creating a contiguous area. I'm curious, how 'fixed' is the anchor? As you know, there could be a non-movable page in the PFN range. As a result, you will not be able to create a contiguous area starting at that PFN. In such a case, do we try another PFN? I know this could result in much page shuffling. I'm just trying to figure out how we satisfy a user who really wants a contiguous area. Is there some method to keep trying? Good question. The anchor is determined on a per-VMA basis, which can be changed easily, but in this patchiest, I used a very simple strategy — making all VMAs not overlapping in the physical address space to get maximum overall contiguity and not changing anchors even if non-moveable pages are encountered when generating physically contiguous pages. Basically, first VMA1 in the virtual address space has its anchor as (VMA1_start_VPN, ZONE_start_PFN), second VMA1 has its anchor as (VMA2_start_VPN, ZONE_start_PFN + VMA1_size), and so on. This makes all VMA not overlapping in physical address space during contiguous memory generation. When there is a non-moveable page, the anchor will not be changed, because no matter whether we assign a new anchor or not, the contiguous pages stops at the non-moveable page. If we are trying to get a new anchor, more effort is needed to avoid overlapping new anchor with existing contiguous pages. Any overlapping will nullify the existing contiguous pages. To satisfy a user who wants a contiguous area with N pages, the minimal distance between any two non-moveable pages should be bigger than N pages in the system memory. Otherwise, nothing would work. If there is such an area (PFN1, PFN1+N) in the physical address space, you can set the anchor to (VPN_USER, PFN1) and use exchange_pages() to generate a contiguous area with N pages. Instead, alloc_contig_pages(PFN1, PFN1+N, …) could also work, but only at page allocation time. It also requires the system has N free pages when alloc_contig_pages() are migrating the pages in (PFN1, PFN1+N) away, or you need to swap pages to make the space. Let me know if this makes sense to you. Yes, that is how I expected the implementation would work. Thank you. Another high level question. One of the benefits of this approach is that exchanging pages does not require N free pages as you describe above. This assumes that the vma which we are trying to make contiguous is already populated. If it is not populated, then you also need to have N free pages. Correct? If this is true, then is the expected use case to first populate a vma, and then try to make contiguous? I would assume that if it is not populated and a request to make contiguous is given, we should try to allocate/populate the vma with contiguous pages at that time? Yes, I assume the pages within the VMA are already populated but not contiguous yet. My approach considers memory contiguity as an on-demand resource. In some phases of an application, accelerators or RDMA controllers would process/transfer data in one or more VMAs, at which time contiguous memory can help reduce address
Re: [RFC PATCH 00/31] Generating physically contiguous memory after page allocation
On 2/19/19 6:33 PM, Zi Yan wrote: > On 19 Feb 2019, at 17:42, Mike Kravetz wrote: > >> On 2/15/19 2:08 PM, Zi Yan wrote: >> >> Thanks for working on this issue! >> >> I have not yet had a chance to take a look at the code. However, I do have >> some general questions/comments on the approach. > > Thanks for replying. The code is very intrusive and has a lot of hacks, so it > is > OK for us to discuss the general idea first. :) > > >>> Patch structure >>> >>> >>> The patchset I developed to generate physically contiguous memory/arbitrary >>> sized pages merely moves pages around. There are three components in this >>> patchset: >>> >>> 1) a new page migration mechanism, called exchange pages, that exchanges the >>> content of two in-use pages instead of performing two back-to-back page >>> migration. It saves on overheads and avoids page reclaim and memory >>> compaction >>> in the page allocation path, although it is not strictly required if enough >>> free memory is available in the system. >>> >>> 2) a new mechanism that utilizes both page migration and exchange pages to >>> produce physically contiguous memory/arbitrary sized pages without >>> allocating >>> any new pages, unlike what khugepaged does. It works on per-VMA basis, >>> creating >>> physically contiguous memory out of each VMA, which is virtually contiguous. >>> A simple range tree is used to ensure no two VMAs are overlapping with each >>> other in the physical address space. >> >> This appears to be a new approach to generating contiguous areas. Previous >> attempts had relied on finding a contiguous area that can then be used for >> various purposes including user mappings. Here, you take an existing mapping >> and make it contiguous. [RFC PATCH 04/31] mm: add mem_defrag functionality >> talks about creating a (VPN, PFN) anchor pair for each vma and then using >> this pair as the base for creating a contiguous area. >> >> I'm curious, how 'fixed' is the anchor? As you know, there could be a >> non-movable page in the PFN range. As a result, you will not be able to >> create a contiguous area starting at that PFN. In such a case, do we try >> another PFN? I know this could result in much page shuffling. I'm just >> trying to figure out how we satisfy a user who really wants a contiguous >> area. Is there some method to keep trying? > > Good question. The anchor is determined on a per-VMA basis, which can be > changed > easily, > but in this patchiest, I used a very simple strategy — making all VMAs not > overlapping > in the physical address space to get maximum overall contiguity and not > changing > anchors > even if non-moveable pages are encountered when generating physically > contiguous > pages. > > Basically, first VMA1 in the virtual address space has its anchor as > (VMA1_start_VPN, ZONE_start_PFN), > second VMA1 has its anchor as (VMA2_start_VPN, ZONE_start_PFN + VMA1_size), > and > so on. > This makes all VMA not overlapping in physical address space during contiguous > memory > generation. When there is a non-moveable page, the anchor will not be changed, > because > no matter whether we assign a new anchor or not, the contiguous pages stops at > the non-moveable page. If we are trying to get a new anchor, more effort is > needed to > avoid overlapping new anchor with existing contiguous pages. Any overlapping > will > nullify the existing contiguous pages. > > To satisfy a user who wants a contiguous area with N pages, the minimal > distance > between > any two non-moveable pages should be bigger than N pages in the system memory. > Otherwise, > nothing would work. If there is such an area (PFN1, PFN1+N) in the physical > address space, > you can set the anchor to (VPN_USER, PFN1) and use exchange_pages() to > generate > a contiguous > area with N pages. Instead, alloc_contig_pages(PFN1, PFN1+N, …) could also > work, > but > only at page allocation time. It also requires the system has N free pages > when > alloc_contig_pages() are migrating the pages in (PFN1, PFN1+N) away, or you > need > to swap > pages to make the space. > > Let me know if this makes sense to you. > Yes, that is how I expected the implementation would work. Thank you. Another high level question. One of the benefits of this approach is that exchanging pages does not require N free pages as you describe above. This assumes that the vma which we are trying to make contiguous is already populated. If it is not populated, then you also need to have N free pages. Correct? If this is true, then is the expected use case to first populate a vma, and then try to make contiguous? I would assume that if it is not populated and a request to make contiguous is given, we should try to allocate/populate the vma with contiguous pages at that time? -- Mike Kravetz
Re: [RFC PATCH 00/31] Generating physically contiguous memory after page allocation
On 19 Feb 2019, at 17:42, Mike Kravetz wrote: On 2/15/19 2:08 PM, Zi Yan wrote: Thanks for working on this issue! I have not yet had a chance to take a look at the code. However, I do have some general questions/comments on the approach. Thanks for replying. The code is very intrusive and has a lot of hacks, so it is OK for us to discuss the general idea first. :) Patch structure The patchset I developed to generate physically contiguous memory/arbitrary sized pages merely moves pages around. There are three components in this patchset: 1) a new page migration mechanism, called exchange pages, that exchanges the content of two in-use pages instead of performing two back-to-back page migration. It saves on overheads and avoids page reclaim and memory compaction in the page allocation path, although it is not strictly required if enough free memory is available in the system. 2) a new mechanism that utilizes both page migration and exchange pages to produce physically contiguous memory/arbitrary sized pages without allocating any new pages, unlike what khugepaged does. It works on per-VMA basis, creating physically contiguous memory out of each VMA, which is virtually contiguous. A simple range tree is used to ensure no two VMAs are overlapping with each other in the physical address space. This appears to be a new approach to generating contiguous areas. Previous attempts had relied on finding a contiguous area that can then be used for various purposes including user mappings. Here, you take an existing mapping and make it contiguous. [RFC PATCH 04/31] mm: add mem_defrag functionality talks about creating a (VPN, PFN) anchor pair for each vma and then using this pair as the base for creating a contiguous area. I'm curious, how 'fixed' is the anchor? As you know, there could be a non-movable page in the PFN range. As a result, you will not be able to create a contiguous area starting at that PFN. In such a case, do we try another PFN? I know this could result in much page shuffling. I'm just trying to figure out how we satisfy a user who really wants a contiguous area. Is there some method to keep trying? Good question. The anchor is determined on a per-VMA basis, which can be changed easily, but in this patchiest, I used a very simple strategy — making all VMAs not overlapping in the physical address space to get maximum overall contiguity and not changing anchors even if non-moveable pages are encountered when generating physically contiguous pages. Basically, first VMA1 in the virtual address space has its anchor as (VMA1_start_VPN, ZONE_start_PFN), second VMA1 has its anchor as (VMA2_start_VPN, ZONE_start_PFN + VMA1_size), and so on. This makes all VMA not overlapping in physical address space during contiguous memory generation. When there is a non-moveable page, the anchor will not be changed, because no matter whether we assign a new anchor or not, the contiguous pages stops at the non-moveable page. If we are trying to get a new anchor, more effort is needed to avoid overlapping new anchor with existing contiguous pages. Any overlapping will nullify the existing contiguous pages. To satisfy a user who wants a contiguous area with N pages, the minimal distance between any two non-moveable pages should be bigger than N pages in the system memory. Otherwise, nothing would work. If there is such an area (PFN1, PFN1+N) in the physical address space, you can set the anchor to (VPN_USER, PFN1) and use exchange_pages() to generate a contiguous area with N pages. Instead, alloc_contig_pages(PFN1, PFN1+N, …) could also work, but only at page allocation time. It also requires the system has N free pages when alloc_contig_pages() are migrating the pages in (PFN1, PFN1+N) away, or you need to swap pages to make the space. Let me know if this makes sense to you. -- Best Regards, Yan Zi
Re: [RFC PATCH 00/31] Generating physically contiguous memory after page allocation
On 2/15/19 2:08 PM, Zi Yan wrote: Thanks for working on this issue! I have not yet had a chance to take a look at the code. However, I do have some general questions/comments on the approach. > Patch structure > > > The patchset I developed to generate physically contiguous memory/arbitrary > sized pages merely moves pages around. There are three components in this > patchset: > > 1) a new page migration mechanism, called exchange pages, that exchanges the > content of two in-use pages instead of performing two back-to-back page > migration. It saves on overheads and avoids page reclaim and memory compaction > in the page allocation path, although it is not strictly required if enough > free memory is available in the system. > > 2) a new mechanism that utilizes both page migration and exchange pages to > produce physically contiguous memory/arbitrary sized pages without allocating > any new pages, unlike what khugepaged does. It works on per-VMA basis, > creating > physically contiguous memory out of each VMA, which is virtually contiguous. > A simple range tree is used to ensure no two VMAs are overlapping with each > other in the physical address space. This appears to be a new approach to generating contiguous areas. Previous attempts had relied on finding a contiguous area that can then be used for various purposes including user mappings. Here, you take an existing mapping and make it contiguous. [RFC PATCH 04/31] mm: add mem_defrag functionality talks about creating a (VPN, PFN) anchor pair for each vma and then using this pair as the base for creating a contiguous area. I'm curious, how 'fixed' is the anchor? As you know, there could be a non-movable page in the PFN range. As a result, you will not be able to create a contiguous area starting at that PFN. In such a case, do we try another PFN? I know this could result in much page shuffling. I'm just trying to figure out how we satisfy a user who really wants a contiguous area. Is there some method to keep trying? My apologies if this is addressed in the code. This was just one of the first thoughts that came to mine when giving the series a quick look. -- Mike Kravetz