Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-18 Thread Nicholas Piggin
Christophe Leroy's on June 11, 2019 3:39 pm:
> 
> 
> Le 10/06/2019 à 06:38, Nicholas Piggin a écrit :
>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>> allocate huge pages and map them
> 
> Will this be compatible with Russell's series 
> https://patchwork.ozlabs.org/patch/1099857/ for the implementation of 
> STRICT_MODULE_RWX ?
> I see that apply_to_page_range() have things like BUG_ON(pud_huge(*pud));
> 
> Might also be an issue for arm64 as I think Russell's implementation 
> comes from there.

Yeah you're right (and correct about arm64 problem). I'll fix that up.

>> +static int vmap_hpages_range(unsigned long start, unsigned long end,
>> +   pgprot_t prot, struct page **pages,
>> +   unsigned int page_shift)
>> +{
>> +BUG_ON(page_shift != PAGE_SIZE);
> 
> Do we really need a BUG_ON() there ? What happens if this condition is 
> true ?

If it's true then vmap_pages_range would die horribly reading off the
end of the pages array thinking they are struct page pointers.

I guess it could return failure.

>> +return vmap_pages_range(start, end, prot, pages);
>> +}
>> +#endif
>> +
>> +
>>   int is_vmalloc_or_module_addr(const void *x)
>>   {
>>  /*
>> @@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>   {
>>  unsigned long addr = (unsigned long) vmalloc_addr;
>>  struct page *page = NULL;
>> -pgd_t *pgd = pgd_offset_k(addr);
>> +pgd_t *pgd;
>>  p4d_t *p4d;
>>  pud_t *pud;
>>  pmd_t *pmd;
>> @@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>   */
>>  VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>>   
>> +pgd = pgd_offset_k(addr);
>>  if (pgd_none(*pgd))
>>  return NULL;
>> +
>>  p4d = p4d_offset(pgd, addr);
>>  if (p4d_none(*p4d))
>>  return NULL;
>> -pud = pud_offset(p4d, addr);
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> 
> Do we really need that ifdef ? Won't p4d_large() always return 0 when is 
> not set ?
> Otherwise, could we use IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP) instead ?
> 
> Same several places below.

Possibly some of them are not defined without HAVE_ARCH_HUGE_VMAP
I think. I'll try to apply this pattern as much as possible.

>> @@ -2541,14 +2590,17 @@ static void *__vmalloc_area_node(struct vm_struct 
>> *area, gfp_t gfp_mask,
>>   pgprot_t prot, int node)
>>   {
>>  struct page **pages;
>> +unsigned long addr = (unsigned long)area->addr;
>> +unsigned long size = get_vm_area_size(area);
>> +unsigned int page_shift = area->page_shift;
>> +unsigned int shift = page_shift + PAGE_SHIFT;
>>  unsigned int nr_pages, array_size, i;
>>  const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>>  const gfp_t alloc_mask = gfp_mask | __GFP_NOWARN;
>>  const gfp_t highmem_mask = (gfp_mask & (GFP_DMA | GFP_DMA32)) ?
>> -0 :
>> -__GFP_HIGHMEM;
>> +0 : __GFP_HIGHMEM;
> 
> This patch is already quite big, shouldn't this kind of unrelated 
> cleanups be in another patch ?

Okay, 2 against 1. I'll minimise changes like this.

Thanks,
Nick



Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-18 Thread Nicholas Piggin
Anshuman Khandual's on June 11, 2019 4:17 pm:
> 
> 
> On 06/10/2019 08:14 PM, Nicholas Piggin wrote:
>> Mark Rutland's on June 11, 2019 12:10 am:
>>> Hi,
>>>
>>> On Mon, Jun 10, 2019 at 02:38:38PM +1000, Nicholas Piggin wrote:
 For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
 allocate huge pages and map them

 This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
 (performance is in the noise, under 1% difference, page tables are likely
 to be well cached for this workload). Similar numbers are seen on POWER9.
>>>
>>> Do you happen to know which vmalloc mappings these get used for in the
>>> above case? Where do we see vmalloc mappings that large?
>> 
>> Large module vmalloc could be subject to huge mappings.
>> 
>>> I'm worried as to how this would interact with the set_memory_*()
>>> functions, as on arm64 those can only operate on page-granular mappings.
>>> Those may need fixing up to handle huge mappings; certainly if the above
>>> is all for modules.
>> 
>> Good point, that looks like it would break on arm64 at least. I'll
>> work on it. We may have to make this opt in beyond HUGE_VMAP.
> 
> This is another reason we might need to have an arch opt-ins like the one
> I mentioned before.
> 

Let's try to get the precursor stuff like page table functions and
vmalloc_to_page in this merge window, and then concentrate on the
huge vmalloc support issues after that.

Christophe points out that powerpc is likely to have a similar 
problem which I didn't realise, so I'll re think it.

Thanks,
Nick


Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-18 Thread Nicholas Piggin
Anshuman Khandual's on June 11, 2019 4:59 pm:
> On 06/11/2019 05:46 AM, Nicholas Piggin wrote:
>> Anshuman Khandual's on June 10, 2019 6:53 pm:
>>> On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
 For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
 allocate huge pages and map them.
>>>
>>> IIUC that extends HAVE_ARCH_HUGE_VMAP from iormap to vmalloc. 
>>>

 This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
 (performance is in the noise, under 1% difference, page tables are likely
 to be well cached for this workload). Similar numbers are seen on POWER9.
>>>
>>> Sure will try this on arm64.
>>>

 Signed-off-by: Nicholas Piggin 
 ---
  include/asm-generic/4level-fixup.h |   1 +
  include/asm-generic/5level-fixup.h |   1 +
  include/linux/vmalloc.h|   1 +
  mm/vmalloc.c   | 132 +++--
  4 files changed, 107 insertions(+), 28 deletions(-)

 diff --git a/include/asm-generic/4level-fixup.h 
 b/include/asm-generic/4level-fixup.h
 index e3667c9a33a5..3cc65a4dd093 100644
 --- a/include/asm-generic/4level-fixup.h
 +++ b/include/asm-generic/4level-fixup.h
 @@ -20,6 +20,7 @@
  #define pud_none(pud) 0
  #define pud_bad(pud)  0
  #define pud_present(pud)  1
 +#define pud_large(pud)0
  #define pud_ERROR(pud)do { } while (0)
  #define pud_clear(pud)pgd_clear(pud)
  #define pud_val(pud)  pgd_val(pud)
 diff --git a/include/asm-generic/5level-fixup.h 
 b/include/asm-generic/5level-fixup.h
 index bb6cb347018c..c4377db09a4f 100644
 --- a/include/asm-generic/5level-fixup.h
 +++ b/include/asm-generic/5level-fixup.h
 @@ -22,6 +22,7 @@
  #define p4d_none(p4d) 0
  #define p4d_bad(p4d)  0
  #define p4d_present(p4d)  1
 +#define p4d_large(p4d)0
  #define p4d_ERROR(p4d)do { } while (0)
  #define p4d_clear(p4d)pgd_clear(p4d)
  #define p4d_val(p4d)  pgd_val(p4d)
>>>
>>> Both of these are required from vmalloc_to_page() which as per a later 
>>> comment
>>> should be part of a prerequisite patch before this series.
>> 
>> I'm not sure what you mean. This patch is where they get used.
> 
> In case you move out vmalloc_to_page() changes to a separate patch.

Sorry for the delay in reply.

I'll split this and see if we might be able to get it into next
merge window. I can have another try at the huge vmalloc patch
after that.

> 
>> 
>> Possibly I could split this and the vmalloc_to_page change out. I'll
>> consider it.
>> 
 diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
 index 812bea5866d6..4c92dc608928 100644
 --- a/include/linux/vmalloc.h
 +++ b/include/linux/vmalloc.h
 @@ -42,6 +42,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
 +  unsigned intpage_shift;
>>>
>>> So the entire vm_struct will be mapped with a single page_shift. It cannot 
>>> have
>>> mix and match mappings with PAGE_SIZE, PMD_SIZE, PUD_SIZE etc in case the
>>> allocation fails for larger ones, falling back etc what over other reasons.
>> 
>> For now, yes. I have a bit of follow up work to improve that and make
>> it able to fall back, but it's a bit more churn and not a significant
>> benefit just yet because there are not a lot of very large vmallocs
>> (except the early hashes which can be satisfied with large allocs).
> 
> Right but it will make this new feature complete like ioremap which logically
> supports till P4D (though AFAICT not used). If there are no actual vmalloc
> requests that large it is fine. Allocation attempts will start from the page
> table level depending on the requested size. It is better to have PUD/P4D
> considerations now rather than trying to after fit it later.

I've considered them, which is why e.g., a shift gets passed around 
rather than a bool for small/large.

I won't over complicate this page array data structure for something
that may never be supported though. I think we may actually be better
moving away from it in the vmalloc code and just referencing pages
from the page tables, so it's just something we can cross when we get
to it.

>>> Also should not we check for the alignment of the range [start...end] with
>>> respect to (1UL << [PAGE_SHIFT + page_shift]).
>> 
>> The caller should if it specifies large page. Could check and -EINVAL
>> for incorrect alignment.
> 
> That might be a good check here.

Will add.

 @@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void 
 *vmalloc_addr)

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-11 Thread Anshuman Khandual
On 06/11/2019 05:46 AM, Nicholas Piggin wrote:
> Anshuman Khandual's on June 10, 2019 6:53 pm:
>> On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
>>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>>> allocate huge pages and map them.
>>
>> IIUC that extends HAVE_ARCH_HUGE_VMAP from iormap to vmalloc. 
>>
>>>
>>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>>> (performance is in the noise, under 1% difference, page tables are likely
>>> to be well cached for this workload). Similar numbers are seen on POWER9.
>>
>> Sure will try this on arm64.
>>
>>>
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>>  include/asm-generic/4level-fixup.h |   1 +
>>>  include/asm-generic/5level-fixup.h |   1 +
>>>  include/linux/vmalloc.h|   1 +
>>>  mm/vmalloc.c   | 132 +++--
>>>  4 files changed, 107 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/include/asm-generic/4level-fixup.h 
>>> b/include/asm-generic/4level-fixup.h
>>> index e3667c9a33a5..3cc65a4dd093 100644
>>> --- a/include/asm-generic/4level-fixup.h
>>> +++ b/include/asm-generic/4level-fixup.h
>>> @@ -20,6 +20,7 @@
>>>  #define pud_none(pud)  0
>>>  #define pud_bad(pud)   0
>>>  #define pud_present(pud)   1
>>> +#define pud_large(pud) 0
>>>  #define pud_ERROR(pud) do { } while (0)
>>>  #define pud_clear(pud) pgd_clear(pud)
>>>  #define pud_val(pud)   pgd_val(pud)
>>> diff --git a/include/asm-generic/5level-fixup.h 
>>> b/include/asm-generic/5level-fixup.h
>>> index bb6cb347018c..c4377db09a4f 100644
>>> --- a/include/asm-generic/5level-fixup.h
>>> +++ b/include/asm-generic/5level-fixup.h
>>> @@ -22,6 +22,7 @@
>>>  #define p4d_none(p4d)  0
>>>  #define p4d_bad(p4d)   0
>>>  #define p4d_present(p4d)   1
>>> +#define p4d_large(p4d) 0
>>>  #define p4d_ERROR(p4d) do { } while (0)
>>>  #define p4d_clear(p4d) pgd_clear(p4d)
>>>  #define p4d_val(p4d)   pgd_val(p4d)
>>
>> Both of these are required from vmalloc_to_page() which as per a later 
>> comment
>> should be part of a prerequisite patch before this series.
> 
> I'm not sure what you mean. This patch is where they get used.

In case you move out vmalloc_to_page() changes to a separate patch.

> 
> Possibly I could split this and the vmalloc_to_page change out. I'll
> consider it.
> 
>>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>>> index 812bea5866d6..4c92dc608928 100644
>>> --- a/include/linux/vmalloc.h
>>> +++ b/include/linux/vmalloc.h
>>> @@ -42,6 +42,7 @@ struct vm_struct {
>>> unsigned long   size;
>>> unsigned long   flags;
>>> struct page **pages;
>>> +   unsigned intpage_shift;
>>
>> So the entire vm_struct will be mapped with a single page_shift. It cannot 
>> have
>> mix and match mappings with PAGE_SIZE, PMD_SIZE, PUD_SIZE etc in case the
>> allocation fails for larger ones, falling back etc what over other reasons.
> 
> For now, yes. I have a bit of follow up work to improve that and make
> it able to fall back, but it's a bit more churn and not a significant
> benefit just yet because there are not a lot of very large vmallocs
> (except the early hashes which can be satisfied with large allocs).

Right but it will make this new feature complete like ioremap which logically
supports till P4D (though AFAICT not used). If there are no actual vmalloc
requests that large it is fine. Allocation attempts will start from the page
table level depending on the requested size. It is better to have PUD/P4D
considerations now rather than trying to after fit it later.

> 
>>
>>> unsigned intnr_pages;
>>> phys_addr_t phys_addr;
>>> const void  *caller;
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index dd27cfb29b10..0cf8e861caeb 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -36,6 +36,7 @@
>>>  #include 
>>>  
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  
>>> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, 
>>> unsigned long end,
>>> return ret;
>>>  }
>>>  
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>>> +static int vmap_hpages_range(unsigned long start, unsigned long end,
>>
>> A small nit (if you agree) s/hpages/huge_pages/
> 
> Hmm. It's not actually a good function name because it can do small
> pages as well. vmap_pages_size_range or something may be better.

Right.

> 
>>
>>> +  pgprot_t prot, struct page **pages,
>>
>> Re-order (prot <---> pages) just to follow the standard like before.
> 
> Will do.
> 
>>> +  unsigned int page_shift)
>>> +{

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-11 Thread Anshuman Khandual



On 06/10/2019 08:14 PM, Nicholas Piggin wrote:
> Mark Rutland's on June 11, 2019 12:10 am:
>> Hi,
>>
>> On Mon, Jun 10, 2019 at 02:38:38PM +1000, Nicholas Piggin wrote:
>>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>>> allocate huge pages and map them
>>>
>>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>>> (performance is in the noise, under 1% difference, page tables are likely
>>> to be well cached for this workload). Similar numbers are seen on POWER9.
>>
>> Do you happen to know which vmalloc mappings these get used for in the
>> above case? Where do we see vmalloc mappings that large?
> 
> Large module vmalloc could be subject to huge mappings.
> 
>> I'm worried as to how this would interact with the set_memory_*()
>> functions, as on arm64 those can only operate on page-granular mappings.
>> Those may need fixing up to handle huge mappings; certainly if the above
>> is all for modules.
> 
> Good point, that looks like it would break on arm64 at least. I'll
> work on it. We may have to make this opt in beyond HUGE_VMAP.

This is another reason we might need to have an arch opt-ins like the one
I mentioned before.


Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Christophe Leroy




Le 10/06/2019 à 06:38, Nicholas Piggin a écrit :

For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
allocate huge pages and map them


Will this be compatible with Russell's series 
https://patchwork.ozlabs.org/patch/1099857/ for the implementation of 
STRICT_MODULE_RWX ?

I see that apply_to_page_range() have things like BUG_ON(pud_huge(*pud));

Might also be an issue for arm64 as I think Russell's implementation 
comes from there.




This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
(performance is in the noise, under 1% difference, page tables are likely
to be well cached for this workload). Similar numbers are seen on POWER9.

Signed-off-by: Nicholas Piggin 
---
  include/asm-generic/4level-fixup.h |   1 +
  include/asm-generic/5level-fixup.h |   1 +
  include/linux/vmalloc.h|   1 +
  mm/vmalloc.c   | 132 +++--
  4 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/include/asm-generic/4level-fixup.h 
b/include/asm-generic/4level-fixup.h
index e3667c9a33a5..3cc65a4dd093 100644
--- a/include/asm-generic/4level-fixup.h
+++ b/include/asm-generic/4level-fixup.h
@@ -20,6 +20,7 @@
  #define pud_none(pud) 0
  #define pud_bad(pud)  0
  #define pud_present(pud)  1
+#define pud_large(pud) 0
  #define pud_ERROR(pud)do { } while (0)
  #define pud_clear(pud)pgd_clear(pud)
  #define pud_val(pud)  pgd_val(pud)
diff --git a/include/asm-generic/5level-fixup.h 
b/include/asm-generic/5level-fixup.h
index bb6cb347018c..c4377db09a4f 100644
--- a/include/asm-generic/5level-fixup.h
+++ b/include/asm-generic/5level-fixup.h
@@ -22,6 +22,7 @@
  #define p4d_none(p4d) 0
  #define p4d_bad(p4d)  0
  #define p4d_present(p4d)  1
+#define p4d_large(p4d) 0
  #define p4d_ERROR(p4d)do { } while (0)
  #define p4d_clear(p4d)pgd_clear(p4d)
  #define p4d_val(p4d)  pgd_val(p4d)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 812bea5866d6..4c92dc608928 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -42,6 +42,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+   unsigned intpage_shift;
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd27cfb29b10..0cf8e861caeb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -36,6 +36,7 @@
  #include 
  
  #include 

+#include 
  #include 
  #include 
  
@@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, unsigned long end,

return ret;
  }
  
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP

+static int vmap_hpages_range(unsigned long start, unsigned long end,
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
+{
+   unsigned long addr = start;
+   unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
+
+   for (i = 0; i < nr; i++) {
+   int err;
+
+   err = vmap_range_noflush(addr,
+   addr + (PAGE_SIZE << page_shift),
+   __pa(page_address(pages[i])), prot,
+   PAGE_SHIFT + page_shift);
+   if (err)
+   return err;
+
+   addr += PAGE_SIZE << page_shift;
+   }
+   flush_cache_vmap(start, end);
+
+   return nr;
+}
+#else
+static int vmap_hpages_range(unsigned long start, unsigned long end,
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
+{
+   BUG_ON(page_shift != PAGE_SIZE);


Do we really need a BUG_ON() there ? What happens if this condition is 
true ?



+   return vmap_pages_range(start, end, prot, pages);
+}
+#endif
+
+
  int is_vmalloc_or_module_addr(const void *x)
  {
/*
@@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
  {
unsigned long addr = (unsigned long) vmalloc_addr;
struct page *page = NULL;
-   pgd_t *pgd = pgd_offset_k(addr);
+   pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
@@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 */
VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
  
+	pgd = pgd_offset_k(addr);

if (pgd_none(*pgd))
return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
-   pud = pud_offset(p4d, 

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Nicholas Piggin
Anshuman Khandual's on June 10, 2019 6:53 pm:
> On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>> allocate huge pages and map them.
> 
> IIUC that extends HAVE_ARCH_HUGE_VMAP from iormap to vmalloc. 
> 
>> 
>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>> (performance is in the noise, under 1% difference, page tables are likely
>> to be well cached for this workload). Similar numbers are seen on POWER9.
> 
> Sure will try this on arm64.
> 
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  include/asm-generic/4level-fixup.h |   1 +
>>  include/asm-generic/5level-fixup.h |   1 +
>>  include/linux/vmalloc.h|   1 +
>>  mm/vmalloc.c   | 132 +++--
>>  4 files changed, 107 insertions(+), 28 deletions(-)
>> 
>> diff --git a/include/asm-generic/4level-fixup.h 
>> b/include/asm-generic/4level-fixup.h
>> index e3667c9a33a5..3cc65a4dd093 100644
>> --- a/include/asm-generic/4level-fixup.h
>> +++ b/include/asm-generic/4level-fixup.h
>> @@ -20,6 +20,7 @@
>>  #define pud_none(pud)   0
>>  #define pud_bad(pud)0
>>  #define pud_present(pud)1
>> +#define pud_large(pud)  0
>>  #define pud_ERROR(pud)  do { } while (0)
>>  #define pud_clear(pud)  pgd_clear(pud)
>>  #define pud_val(pud)pgd_val(pud)
>> diff --git a/include/asm-generic/5level-fixup.h 
>> b/include/asm-generic/5level-fixup.h
>> index bb6cb347018c..c4377db09a4f 100644
>> --- a/include/asm-generic/5level-fixup.h
>> +++ b/include/asm-generic/5level-fixup.h
>> @@ -22,6 +22,7 @@
>>  #define p4d_none(p4d)   0
>>  #define p4d_bad(p4d)0
>>  #define p4d_present(p4d)1
>> +#define p4d_large(p4d)  0
>>  #define p4d_ERROR(p4d)  do { } while (0)
>>  #define p4d_clear(p4d)  pgd_clear(p4d)
>>  #define p4d_val(p4d)pgd_val(p4d)
> 
> Both of these are required from vmalloc_to_page() which as per a later comment
> should be part of a prerequisite patch before this series.

I'm not sure what you mean. This patch is where they get used.

Possibly I could split this and the vmalloc_to_page change out. I'll
consider it.

>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 812bea5866d6..4c92dc608928 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -42,6 +42,7 @@ struct vm_struct {
>>  unsigned long   size;
>>  unsigned long   flags;
>>  struct page **pages;
>> +unsigned intpage_shift;
> 
> So the entire vm_struct will be mapped with a single page_shift. It cannot 
> have
> mix and match mappings with PAGE_SIZE, PMD_SIZE, PUD_SIZE etc in case the
> allocation fails for larger ones, falling back etc what over other reasons.

For now, yes. I have a bit of follow up work to improve that and make
it able to fall back, but it's a bit more churn and not a significant
benefit just yet because there are not a lot of very large vmallocs
(except the early hashes which can be satisfied with large allocs).

> 
>>  unsigned intnr_pages;
>>  phys_addr_t phys_addr;
>>  const void  *caller;
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index dd27cfb29b10..0cf8e861caeb 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  
>> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, 
>> unsigned long end,
>>  return ret;
>>  }
>>  
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> 
> A small nit (if you agree) s/hpages/huge_pages/

Hmm. It's not actually a good function name because it can do small
pages as well. vmap_pages_size_range or something may be better.

> 
>> +   pgprot_t prot, struct page **pages,
> 
> Re-order (prot <---> pages) just to follow the standard like before.

Will do.

>> +   unsigned int page_shift)
>> +{
>> +unsigned long addr = start;
>> +unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
> 
> s/nr/nr_huge_pages ?

Sure.

> Also should not we check for the alignment of the range [start...end] with
> respect to (1UL << [PAGE_SHIFT + page_shift]).

The caller should if it specifies large page. Could check and -EINVAL
for incorrect alignment.

>> +
>> +for (i = 0; i < nr; i++) {
>> +int err;
>> +
>> +err = vmap_range_noflush(addr,
>> +addr + (PAGE_SIZE << page_shift),
>> +

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Nicholas Piggin
Mark Rutland's on June 11, 2019 12:10 am:
> Hi,
> 
> On Mon, Jun 10, 2019 at 02:38:38PM +1000, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
>> allocate huge pages and map them
>> 
>> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
>> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
>> (performance is in the noise, under 1% difference, page tables are likely
>> to be well cached for this workload). Similar numbers are seen on POWER9.
> 
> Do you happen to know which vmalloc mappings these get used for in the
> above case? Where do we see vmalloc mappings that large?

Large module vmalloc could be subject to huge mappings.

> I'm worried as to how this would interact with the set_memory_*()
> functions, as on arm64 those can only operate on page-granular mappings.
> Those may need fixing up to handle huge mappings; certainly if the above
> is all for modules.

Good point, that looks like it would break on arm64 at least. I'll
work on it. We may have to make this opt in beyond HUGE_VMAP.

Thanks,
Nick


Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Mark Rutland
Hi,

On Mon, Jun 10, 2019 at 02:38:38PM +1000, Nicholas Piggin wrote:
> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
> allocate huge pages and map them
> 
> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
> (performance is in the noise, under 1% difference, page tables are likely
> to be well cached for this workload). Similar numbers are seen on POWER9.

Do you happen to know which vmalloc mappings these get used for in the
above case? Where do we see vmalloc mappings that large?

I'm worried as to how this would interact with the set_memory_*()
functions, as on arm64 those can only operate on page-granular mappings.
Those may need fixing up to handle huge mappings; certainly if the above
is all for modules.

Thanks,
Mark.

> 
> Signed-off-by: Nicholas Piggin 
> ---
>  include/asm-generic/4level-fixup.h |   1 +
>  include/asm-generic/5level-fixup.h |   1 +
>  include/linux/vmalloc.h|   1 +
>  mm/vmalloc.c   | 132 +++--
>  4 files changed, 107 insertions(+), 28 deletions(-)
> 
> diff --git a/include/asm-generic/4level-fixup.h 
> b/include/asm-generic/4level-fixup.h
> index e3667c9a33a5..3cc65a4dd093 100644
> --- a/include/asm-generic/4level-fixup.h
> +++ b/include/asm-generic/4level-fixup.h
> @@ -20,6 +20,7 @@
>  #define pud_none(pud)0
>  #define pud_bad(pud) 0
>  #define pud_present(pud) 1
> +#define pud_large(pud)   0
>  #define pud_ERROR(pud)   do { } while (0)
>  #define pud_clear(pud)   pgd_clear(pud)
>  #define pud_val(pud) pgd_val(pud)
> diff --git a/include/asm-generic/5level-fixup.h 
> b/include/asm-generic/5level-fixup.h
> index bb6cb347018c..c4377db09a4f 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -22,6 +22,7 @@
>  #define p4d_none(p4d)0
>  #define p4d_bad(p4d) 0
>  #define p4d_present(p4d) 1
> +#define p4d_large(p4d)   0
>  #define p4d_ERROR(p4d)   do { } while (0)
>  #define p4d_clear(p4d)   pgd_clear(p4d)
>  #define p4d_val(p4d) pgd_val(p4d)
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 812bea5866d6..4c92dc608928 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -42,6 +42,7 @@ struct vm_struct {
>   unsigned long   size;
>   unsigned long   flags;
>   struct page **pages;
> + unsigned intpage_shift;
>   unsigned intnr_pages;
>   phys_addr_t phys_addr;
>   const void  *caller;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd27cfb29b10..0cf8e861caeb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -36,6 +36,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, 
> unsigned long end,
>   return ret;
>  }
>  
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> +pgprot_t prot, struct page **pages,
> +unsigned int page_shift)
> +{
> + unsigned long addr = start;
> + unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
> +
> + for (i = 0; i < nr; i++) {
> + int err;
> +
> + err = vmap_range_noflush(addr,
> + addr + (PAGE_SIZE << page_shift),
> + __pa(page_address(pages[i])), prot,
> + PAGE_SHIFT + page_shift);
> + if (err)
> + return err;
> +
> + addr += PAGE_SIZE << page_shift;
> + }
> + flush_cache_vmap(start, end);
> +
> + return nr;
> +}
> +#else
> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> +pgprot_t prot, struct page **pages,
> +unsigned int page_shift)
> +{
> + BUG_ON(page_shift != PAGE_SIZE);
> + return vmap_pages_range(start, end, prot, pages);
> +}
> +#endif
> +
> +
>  int is_vmalloc_or_module_addr(const void *x)
>  {
>   /*
> @@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>  {
>   unsigned long addr = (unsigned long) vmalloc_addr;
>   struct page *page = NULL;
> - pgd_t *pgd = pgd_offset_k(addr);
> + pgd_t *pgd;
>   p4d_t *p4d;
>   pud_t *pud;
>   pmd_t *pmd;
> @@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>*/
>   VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>  
> + pgd = pgd_offset_k(addr);
>   if 

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Anshuman Khandual
On 06/10/2019 10:08 AM, Nicholas Piggin wrote:
> For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
> allocate huge pages and map them.

IIUC that extends HAVE_ARCH_HUGE_VMAP from iormap to vmalloc. 

> 
> This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
> 8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
> (performance is in the noise, under 1% difference, page tables are likely
> to be well cached for this workload). Similar numbers are seen on POWER9.

Sure will try this on arm64.

> 
> Signed-off-by: Nicholas Piggin 
> ---
>  include/asm-generic/4level-fixup.h |   1 +
>  include/asm-generic/5level-fixup.h |   1 +
>  include/linux/vmalloc.h|   1 +
>  mm/vmalloc.c   | 132 +++--
>  4 files changed, 107 insertions(+), 28 deletions(-)
> 
> diff --git a/include/asm-generic/4level-fixup.h 
> b/include/asm-generic/4level-fixup.h
> index e3667c9a33a5..3cc65a4dd093 100644
> --- a/include/asm-generic/4level-fixup.h
> +++ b/include/asm-generic/4level-fixup.h
> @@ -20,6 +20,7 @@
>  #define pud_none(pud)0
>  #define pud_bad(pud) 0
>  #define pud_present(pud) 1
> +#define pud_large(pud)   0
>  #define pud_ERROR(pud)   do { } while (0)
>  #define pud_clear(pud)   pgd_clear(pud)
>  #define pud_val(pud) pgd_val(pud)
> diff --git a/include/asm-generic/5level-fixup.h 
> b/include/asm-generic/5level-fixup.h
> index bb6cb347018c..c4377db09a4f 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -22,6 +22,7 @@
>  #define p4d_none(p4d)0
>  #define p4d_bad(p4d) 0
>  #define p4d_present(p4d) 1
> +#define p4d_large(p4d)   0
>  #define p4d_ERROR(p4d)   do { } while (0)
>  #define p4d_clear(p4d)   pgd_clear(p4d)
>  #define p4d_val(p4d) pgd_val(p4d)

Both of these are required from vmalloc_to_page() which as per a later comment
should be part of a prerequisite patch before this series.

> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 812bea5866d6..4c92dc608928 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -42,6 +42,7 @@ struct vm_struct {
>   unsigned long   size;
>   unsigned long   flags;
>   struct page **pages;
> + unsigned intpage_shift;

So the entire vm_struct will be mapped with a single page_shift. It cannot have
mix and match mappings with PAGE_SIZE, PMD_SIZE, PUD_SIZE etc in case the
allocation fails for larger ones, falling back etc what over other reasons.

>   unsigned intnr_pages;
>   phys_addr_t phys_addr;
>   const void  *caller;
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd27cfb29b10..0cf8e861caeb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -36,6 +36,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, 
> unsigned long end,
>   return ret;
>  }
>  
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +static int vmap_hpages_range(unsigned long start, unsigned long end,

A small nit (if you agree) s/hpages/huge_pages/

> +pgprot_t prot, struct page **pages,

Re-order (prot <---> pages) just to follow the standard like before.

> +unsigned int page_shift)
> +{
> + unsigned long addr = start;
> + unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);

s/nr/nr_huge_pages ?

Also should not we check for the alignment of the range [start...end] with
respect to (1UL << [PAGE_SHIFT + page_shift]).


> +
> + for (i = 0; i < nr; i++) {
> + int err;
> +
> + err = vmap_range_noflush(addr,
> + addr + (PAGE_SIZE << page_shift),
> + __pa(page_address(pages[i])), prot,
> + PAGE_SHIFT + page_shift);
> + if (err)
> + return err;
> +
> + addr += PAGE_SIZE << page_shift;
> + }
> + flush_cache_vmap(start, end);
> +
> + return nr;
> +}
> +#else
> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> +pgprot_t prot, struct page **pages,
> +unsigned int page_shift)
> +{
> + BUG_ON(page_shift != PAGE_SIZE);
> + return vmap_pages_range(start, end, prot, pages);
> +}
> +#endif
> +
> +
>  int is_vmalloc_or_module_addr(const void *x)
>  {
>   /*
> @@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>  {
>   unsigned long addr = (unsigned long) vmalloc_addr;
>   struct page *page = NULL;
> - 

Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-10 Thread Satheesh Rajendran
On Mon, Jun 10, 2019 at 03:49:48PM +1000, Nicholas Piggin wrote:
> Nicholas Piggin's on June 10, 2019 2:38 pm:
> > +static int vmap_hpages_range(unsigned long start, unsigned long end,
> > +  pgprot_t prot, struct page **pages,
> > +  unsigned int page_shift)
> > +{
> > +   BUG_ON(page_shift != PAGE_SIZE);
> > +   return vmap_pages_range(start, end, prot, pages);
> > +}
> 
> That's a false positive BUG_ON for !HUGE_VMAP configs. I'll fix that
> and repost after a round of feedback.

Sure, Crash log for that false positive BUG_ON on Power8 Host.

[0.001718] pid_max: default: 163840 minimum: 1280
[0.010437] [ cut here ]
[0.010461] kernel BUG at mm/vmalloc.c:473!
[0.010471] Oops: Exception in kernel mode, sig: 5 [#1]
[0.010481] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[0.010491] Modules linked in:
[0.010503] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0-rc3-ga7ee9421d #1
[0.010515] NIP:  c034dbd8 LR: c034dc80 CTR: 
[0.010527] REGS: c15bf9a0 TRAP: 0700   Not tainted  
(5.2.0-rc3-ga7ee9421d)
[0.010537] MSR:  92029033   CR: 
22022422  XER: 2000
[0.010559] CFAR: c034dc88 IRQMASK: 0
[0.010559] GPR00: c034dc80 c15bfc30 c15c2f00 
c00c01fd0e00
[0.010559] GPR04:  2322  
0040
[0.010559] GPR08: c00ff908 0400 0400 
0100
[0.010559] GPR12: 42022422 c17a 0001035ae7d8 
0400
[0.010559] GPR16: 0400 818e c0ee08c8 

[0.010559] GPR20: 0001 2b22 0b20 
0022
[0.010559] GPR24: c007f92c7880 0b22 0001 
c00a
[0.010559] GPR28: c008 0400  
0b20
[0.010664] NIP [c034dbd8] __vmalloc_node_range+0x1f8/0x410
[0.010677] LR [c034dc80] __vmalloc_node_range+0x2a0/0x410
[0.010686] Call Trace:
[0.010695] [c15bfc30] [c034dc80] 
__vmalloc_node_range+0x2a0/0x410 (unreliable)
[0.010711] [c15bfd30] [c034de40] __vmalloc+0x50/0x60
[0.010724] [c15bfda0] [c101e54c] 
alloc_large_system_hash+0x200/0x304
[0.010738] [c15bfe60] [c10235bc] vfs_caches_init+0xd8/0x138
[0.010752] [c15bfee0] [c0fe428c] start_kernel+0x5c4/0x668
[0.010767] [c15bff90] [c000b774] 
start_here_common+0x1c/0x528
[0.010777] Instruction dump:
[0.010785] 6000 7c691b79 418200dc e9180020 79ea1f24 7d28512a 40920170 
8138002c
[0.010803] 394f0001 794f0020 7c095040 4181ffbc <0fe0> 6000 3f41 
4bfffedc
[0.010826] ---[ end trace dd0217488686d653 ]---
[0.010834]
[1.010946] Kernel panic - not syncing: Attempted to kill the idle task!
[1.011061] Rebooting in 10 seconds..

Regards,
-Satheesh.
> 
> Thanks,
> Nick
> 
> 



Re: [PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-09 Thread Nicholas Piggin
Nicholas Piggin's on June 10, 2019 2:38 pm:
> +static int vmap_hpages_range(unsigned long start, unsigned long end,
> +pgprot_t prot, struct page **pages,
> +unsigned int page_shift)
> +{
> + BUG_ON(page_shift != PAGE_SIZE);
> + return vmap_pages_range(start, end, prot, pages);
> +}

That's a false positive BUG_ON for !HUGE_VMAP configs. I'll fix that
and repost after a round of feedback.

Thanks,
Nick



[PATCH 4/4] mm/vmalloc: Hugepage vmalloc mappings

2019-06-09 Thread Nicholas Piggin
For platforms that define HAVE_ARCH_HUGE_VMAP, have vmap allow vmalloc to
allocate huge pages and map them

This brings dTLB misses for linux kernel tree `git diff` from 45,000 to
8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
(performance is in the noise, under 1% difference, page tables are likely
to be well cached for this workload). Similar numbers are seen on POWER9.

Signed-off-by: Nicholas Piggin 
---
 include/asm-generic/4level-fixup.h |   1 +
 include/asm-generic/5level-fixup.h |   1 +
 include/linux/vmalloc.h|   1 +
 mm/vmalloc.c   | 132 +++--
 4 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/include/asm-generic/4level-fixup.h 
b/include/asm-generic/4level-fixup.h
index e3667c9a33a5..3cc65a4dd093 100644
--- a/include/asm-generic/4level-fixup.h
+++ b/include/asm-generic/4level-fixup.h
@@ -20,6 +20,7 @@
 #define pud_none(pud)  0
 #define pud_bad(pud)   0
 #define pud_present(pud)   1
+#define pud_large(pud) 0
 #define pud_ERROR(pud) do { } while (0)
 #define pud_clear(pud) pgd_clear(pud)
 #define pud_val(pud)   pgd_val(pud)
diff --git a/include/asm-generic/5level-fixup.h 
b/include/asm-generic/5level-fixup.h
index bb6cb347018c..c4377db09a4f 100644
--- a/include/asm-generic/5level-fixup.h
+++ b/include/asm-generic/5level-fixup.h
@@ -22,6 +22,7 @@
 #define p4d_none(p4d)  0
 #define p4d_bad(p4d)   0
 #define p4d_present(p4d)   1
+#define p4d_large(p4d) 0
 #define p4d_ERROR(p4d) do { } while (0)
 #define p4d_clear(p4d) pgd_clear(p4d)
 #define p4d_val(p4d)   pgd_val(p4d)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 812bea5866d6..4c92dc608928 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -42,6 +42,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+   unsigned intpage_shift;
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dd27cfb29b10..0cf8e861caeb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -36,6 +36,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
@@ -440,6 +441,41 @@ static int vmap_pages_range(unsigned long start, unsigned 
long end,
return ret;
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+static int vmap_hpages_range(unsigned long start, unsigned long end,
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
+{
+   unsigned long addr = start;
+   unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
+
+   for (i = 0; i < nr; i++) {
+   int err;
+
+   err = vmap_range_noflush(addr,
+   addr + (PAGE_SIZE << page_shift),
+   __pa(page_address(pages[i])), prot,
+   PAGE_SHIFT + page_shift);
+   if (err)
+   return err;
+
+   addr += PAGE_SIZE << page_shift;
+   }
+   flush_cache_vmap(start, end);
+
+   return nr;
+}
+#else
+static int vmap_hpages_range(unsigned long start, unsigned long end,
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
+{
+   BUG_ON(page_shift != PAGE_SIZE);
+   return vmap_pages_range(start, end, prot, pages);
+}
+#endif
+
+
 int is_vmalloc_or_module_addr(const void *x)
 {
/*
@@ -462,7 +498,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 {
unsigned long addr = (unsigned long) vmalloc_addr;
struct page *page = NULL;
-   pgd_t *pgd = pgd_offset_k(addr);
+   pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
@@ -474,27 +510,38 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 */
VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
 
+   pgd = pgd_offset_k(addr);
if (pgd_none(*pgd))
return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
-   pud = pud_offset(p4d, addr);
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+   if (p4d_large(*p4d))
+   return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+#endif
+   if (WARN_ON_ONCE(p4d_bad(*p4d)))
+   return NULL;
 
-   /*
-* Don't dereference bad PUD or PMD (below) entries. This will also
-* identify huge mappings, which we may encounter on architectures
-* that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
-* identified as