Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, 1 Mar 2018 06:50:58 -0800 Matthew Wilcoxwrote: > On Thu, Mar 01, 2018 at 03:44:12PM +0300, Kirill A. Shutemov wrote: > > On Thu, Mar 01, 2018 at 08:17:50AM +0100, Martin Schwidefsky wrote: > > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > > > but 4K pages. If we use full pages for the pte tables we waste 2K of > > > memory for each of the tables. So we allocate 4K and split it into two > > > 2K pieces. Now we have to keep track of the pieces to be able to free > > > them again. > > > > Have you considered to use slab for page table allocation instead? > > IIRC some architectures practice this already. > > You're not allowed to do that any more. Look at pgtable_page_ctor(), > or rather ptlock_init(). Oh yes, I forgot about the ptl. This takes up some fields in struct page which the slab/slub cache want to use as well. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, 1 Mar 2018 06:50:58 -0800 Matthew Wilcox wrote: > On Thu, Mar 01, 2018 at 03:44:12PM +0300, Kirill A. Shutemov wrote: > > On Thu, Mar 01, 2018 at 08:17:50AM +0100, Martin Schwidefsky wrote: > > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > > > but 4K pages. If we use full pages for the pte tables we waste 2K of > > > memory for each of the tables. So we allocate 4K and split it into two > > > 2K pieces. Now we have to keep track of the pieces to be able to free > > > them again. > > > > Have you considered to use slab for page table allocation instead? > > IIRC some architectures practice this already. > > You're not allowed to do that any more. Look at pgtable_page_ctor(), > or rather ptlock_init(). Oh yes, I forgot about the ptl. This takes up some fields in struct page which the slab/slub cache want to use as well. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, Mar 01, 2018 at 03:44:12PM +0300, Kirill A. Shutemov wrote: > On Thu, Mar 01, 2018 at 08:17:50AM +0100, Martin Schwidefsky wrote: > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > > but 4K pages. If we use full pages for the pte tables we waste 2K of > > memory for each of the tables. So we allocate 4K and split it into two > > 2K pieces. Now we have to keep track of the pieces to be able to free > > them again. > > Have you considered to use slab for page table allocation instead? > IIRC some architectures practice this already. You're not allowed to do that any more. Look at pgtable_page_ctor(), or rather ptlock_init().
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, Mar 01, 2018 at 03:44:12PM +0300, Kirill A. Shutemov wrote: > On Thu, Mar 01, 2018 at 08:17:50AM +0100, Martin Schwidefsky wrote: > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > > but 4K pages. If we use full pages for the pte tables we waste 2K of > > memory for each of the tables. So we allocate 4K and split it into two > > 2K pieces. Now we have to keep track of the pieces to be able to free > > them again. > > Have you considered to use slab for page table allocation instead? > IIRC some architectures practice this already. You're not allowed to do that any more. Look at pgtable_page_ctor(), or rather ptlock_init().
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, 1 Mar 2018 15:44:12 +0300 "Kirill A. Shutemov"wrote: > On Thu, Mar 01, 2018 at 08:17:50AM +0100, Martin Schwidefsky wrote: > > On Wed, 28 Feb 2018 14:31:53 -0800 > > Matthew Wilcox wrote: > > > > > From: Matthew Wilcox > > > > > > I want to use the _mapcount field to record what a page is in use as. > > > This can help with debugging and we can also expose that information to > > > userspace through /proc/kpageflags to help diagnose memory usage (not > > > included as part of this patch set). > > > > > > First, we need s390 to stop using _mapcount for its own purposes; > > > Martin, I hope you have time to look at this patch. I must confess I > > > don't quite understand what the different bits are used for in the upper > > > nybble of the _mapcount, but I tried to replicate what you were doing > > > faithfully. > > > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > > but 4K pages. If we use full pages for the pte tables we waste 2K of > > memory for each of the tables. So we allocate 4K and split it into two > > 2K pieces. Now we have to keep track of the pieces to be able to free > > them again. > > Have you considered to use slab for page table allocation instead? > IIRC some architectures practice this already. Well there is a complication with KVM and the page table management for gmaps. If mm_alloc_pgste(mm) == true then a 4K page page table has to be allocated. For the gmap I need a place to store an 8 byte value, currently we use page->index. But the slab/slub code uses page->index for its own purpose. This creates a conflict, but maybe doing a get_free_page for mm_alloc_pgste(mm) == true and using a slab cache for normal page tables might work. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, 1 Mar 2018 15:44:12 +0300 "Kirill A. Shutemov" wrote: > On Thu, Mar 01, 2018 at 08:17:50AM +0100, Martin Schwidefsky wrote: > > On Wed, 28 Feb 2018 14:31:53 -0800 > > Matthew Wilcox wrote: > > > > > From: Matthew Wilcox > > > > > > I want to use the _mapcount field to record what a page is in use as. > > > This can help with debugging and we can also expose that information to > > > userspace through /proc/kpageflags to help diagnose memory usage (not > > > included as part of this patch set). > > > > > > First, we need s390 to stop using _mapcount for its own purposes; > > > Martin, I hope you have time to look at this patch. I must confess I > > > don't quite understand what the different bits are used for in the upper > > > nybble of the _mapcount, but I tried to replicate what you were doing > > > faithfully. > > > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > > but 4K pages. If we use full pages for the pte tables we waste 2K of > > memory for each of the tables. So we allocate 4K and split it into two > > 2K pieces. Now we have to keep track of the pieces to be able to free > > them again. > > Have you considered to use slab for page table allocation instead? > IIRC some architectures practice this already. Well there is a complication with KVM and the page table management for gmaps. If mm_alloc_pgste(mm) == true then a 4K page page table has to be allocated. For the gmap I need a place to store an 8 byte value, currently we use page->index. But the slab/slub code uses page->index for its own purpose. This creates a conflict, but maybe doing a get_free_page for mm_alloc_pgste(mm) == true and using a slab cache for normal page tables might work. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, Mar 01, 2018 at 08:17:50AM +0100, Martin Schwidefsky wrote: > On Wed, 28 Feb 2018 14:31:53 -0800 > Matthew Wilcoxwrote: > > > From: Matthew Wilcox > > > > I want to use the _mapcount field to record what a page is in use as. > > This can help with debugging and we can also expose that information to > > userspace through /proc/kpageflags to help diagnose memory usage (not > > included as part of this patch set). > > > > First, we need s390 to stop using _mapcount for its own purposes; > > Martin, I hope you have time to look at this patch. I must confess I > > don't quite understand what the different bits are used for in the upper > > nybble of the _mapcount, but I tried to replicate what you were doing > > faithfully. > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > but 4K pages. If we use full pages for the pte tables we waste 2K of > memory for each of the tables. So we allocate 4K and split it into two > 2K pieces. Now we have to keep track of the pieces to be able to free > them again. Have you considered to use slab for page table allocation instead? IIRC some architectures practice this already. -- Kirill A. Shutemov
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, Mar 01, 2018 at 08:17:50AM +0100, Martin Schwidefsky wrote: > On Wed, 28 Feb 2018 14:31:53 -0800 > Matthew Wilcox wrote: > > > From: Matthew Wilcox > > > > I want to use the _mapcount field to record what a page is in use as. > > This can help with debugging and we can also expose that information to > > userspace through /proc/kpageflags to help diagnose memory usage (not > > included as part of this patch set). > > > > First, we need s390 to stop using _mapcount for its own purposes; > > Martin, I hope you have time to look at this patch. I must confess I > > don't quite understand what the different bits are used for in the upper > > nybble of the _mapcount, but I tried to replicate what you were doing > > faithfully. > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > but 4K pages. If we use full pages for the pte tables we waste 2K of > memory for each of the tables. So we allocate 4K and split it into two > 2K pieces. Now we have to keep track of the pieces to be able to free > them again. Have you considered to use slab for page table allocation instead? IIRC some architectures practice this already. -- Kirill A. Shutemov
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, 1 Mar 2018 08:17:50 +0100 Martin Schwidefskywrote: > On Wed, 28 Feb 2018 14:31:53 -0800 > Matthew Wilcox wrote: > > > From: Matthew Wilcox > > > > I want to use the _mapcount field to record what a page is in use as. > > This can help with debugging and we can also expose that information to > > userspace through /proc/kpageflags to help diagnose memory usage (not > > included as part of this patch set). > > > > First, we need s390 to stop using _mapcount for its own purposes; > > Martin, I hope you have time to look at this patch. I must confess I > > don't quite understand what the different bits are used for in the upper > > nybble of the _mapcount, but I tried to replicate what you were doing > > faithfully. > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > but 4K pages. If we use full pages for the pte tables we waste 2K of > memory for each of the tables. So we allocate 4K and split it into two > 2K pieces. Now we have to keep track of the pieces to be able to free > them again. > > I try to give your patch a spin today. It should be stand-alone, no ? Ok, that seems to work just fine. System boots and survived some stress without loosing memory. Acked-by: Martin Schwidefsky -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Thu, 1 Mar 2018 08:17:50 +0100 Martin Schwidefsky wrote: > On Wed, 28 Feb 2018 14:31:53 -0800 > Matthew Wilcox wrote: > > > From: Matthew Wilcox > > > > I want to use the _mapcount field to record what a page is in use as. > > This can help with debugging and we can also expose that information to > > userspace through /proc/kpageflags to help diagnose memory usage (not > > included as part of this patch set). > > > > First, we need s390 to stop using _mapcount for its own purposes; > > Martin, I hope you have time to look at this patch. I must confess I > > don't quite understand what the different bits are used for in the upper > > nybble of the _mapcount, but I tried to replicate what you were doing > > faithfully. > > Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) > but 4K pages. If we use full pages for the pte tables we waste 2K of > memory for each of the tables. So we allocate 4K and split it into two > 2K pieces. Now we have to keep track of the pieces to be able to free > them again. > > I try to give your patch a spin today. It should be stand-alone, no ? Ok, that seems to work just fine. System boots and survived some stress without loosing memory. Acked-by: Martin Schwidefsky -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Wed, 28 Feb 2018 14:31:53 -0800 Matthew Wilcoxwrote: > From: Matthew Wilcox > > I want to use the _mapcount field to record what a page is in use as. > This can help with debugging and we can also expose that information to > userspace through /proc/kpageflags to help diagnose memory usage (not > included as part of this patch set). > > First, we need s390 to stop using _mapcount for its own purposes; > Martin, I hope you have time to look at this patch. I must confess I > don't quite understand what the different bits are used for in the upper > nybble of the _mapcount, but I tried to replicate what you were doing > faithfully. Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) but 4K pages. If we use full pages for the pte tables we waste 2K of memory for each of the tables. So we allocate 4K and split it into two 2K pieces. Now we have to keep track of the pieces to be able to free them again. I try to give your patch a spin today. It should be stand-alone, no ? -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Wed, 28 Feb 2018 14:31:53 -0800 Matthew Wilcox wrote: > From: Matthew Wilcox > > I want to use the _mapcount field to record what a page is in use as. > This can help with debugging and we can also expose that information to > userspace through /proc/kpageflags to help diagnose memory usage (not > included as part of this patch set). > > First, we need s390 to stop using _mapcount for its own purposes; > Martin, I hope you have time to look at this patch. I must confess I > don't quite understand what the different bits are used for in the upper > nybble of the _mapcount, but I tried to replicate what you were doing > faithfully. Yeah, that is a nasty bit of code. On s390 we have 2K page tables (pte) but 4K pages. If we use full pages for the pte tables we waste 2K of memory for each of the tables. So we allocate 4K and split it into two 2K pieces. Now we have to keep track of the pieces to be able to free them again. I try to give your patch a spin today. It should be stand-alone, no ? -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Wed, Feb 28, 2018 at 03:22:49PM -0800, Randy Dunlap wrote: > On 02/28/2018 02:31 PM, Matthew Wilcox wrote: > > From: Matthew Wilcox> > > > I want to use the _mapcount field to record what a page is in use as. > > This can help with debugging and we can also expose that information to > > userspace through /proc/kpageflags to help diagnose memory usage (not > > included as part of this patch set). > > Hey, > > Will there be updates to tools/vm/ also, or are these a different set of > (many) flags? Those KPF flags are the ones I was talking about. I haven't looked into what it takes to assign those flags yet.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On Wed, Feb 28, 2018 at 03:22:49PM -0800, Randy Dunlap wrote: > On 02/28/2018 02:31 PM, Matthew Wilcox wrote: > > From: Matthew Wilcox > > > > I want to use the _mapcount field to record what a page is in use as. > > This can help with debugging and we can also expose that information to > > userspace through /proc/kpageflags to help diagnose memory usage (not > > included as part of this patch set). > > Hey, > > Will there be updates to tools/vm/ also, or are these a different set of > (many) flags? Those KPF flags are the ones I was talking about. I haven't looked into what it takes to assign those flags yet.
Re: [PATCH v3 0/4] Split page_type out from mapcount
On 02/28/2018 02:31 PM, Matthew Wilcox wrote: > From: Matthew Wilcox> > I want to use the _mapcount field to record what a page is in use as. > This can help with debugging and we can also expose that information to > userspace through /proc/kpageflags to help diagnose memory usage (not > included as part of this patch set). Hey, Will there be updates to tools/vm/ also, or are these a different set of (many) flags? thanks, -- ~Randy
Re: [PATCH v3 0/4] Split page_type out from mapcount
On 02/28/2018 02:31 PM, Matthew Wilcox wrote: > From: Matthew Wilcox > > I want to use the _mapcount field to record what a page is in use as. > This can help with debugging and we can also expose that information to > userspace through /proc/kpageflags to help diagnose memory usage (not > included as part of this patch set). Hey, Will there be updates to tools/vm/ also, or are these a different set of (many) flags? thanks, -- ~Randy
[PATCH v3 0/4] Split page_type out from mapcount
From: Matthew WilcoxI want to use the _mapcount field to record what a page is in use as. This can help with debugging and we can also expose that information to userspace through /proc/kpageflags to help diagnose memory usage (not included as part of this patch set). First, we need s390 to stop using _mapcount for its own purposes; Martin, I hope you have time to look at this patch. I must confess I don't quite understand what the different bits are used for in the upper nybble of the _mapcount, but I tried to replicate what you were doing faithfully. Matthew Wilcox (4): s390: Use _refcount for pgtables mm: Split page_type out from _map_count mm: Mark pages allocated through vmalloc mm: Mark pages in use for page tables arch/s390/mm/pgalloc.c | 21 + fs/proc/page.c | 2 +- include/linux/mm.h | 2 ++ include/linux/mm_types.h | 13 +++ include/linux/page-flags.h | 57 ++ kernel/crash_core.c| 1 + mm/page_alloc.c| 13 --- mm/vmalloc.c | 2 ++ scripts/tags.sh| 6 ++--- 9 files changed, 72 insertions(+), 45 deletions(-) -- 2.16.1
[PATCH v3 0/4] Split page_type out from mapcount
From: Matthew Wilcox I want to use the _mapcount field to record what a page is in use as. This can help with debugging and we can also expose that information to userspace through /proc/kpageflags to help diagnose memory usage (not included as part of this patch set). First, we need s390 to stop using _mapcount for its own purposes; Martin, I hope you have time to look at this patch. I must confess I don't quite understand what the different bits are used for in the upper nybble of the _mapcount, but I tried to replicate what you were doing faithfully. Matthew Wilcox (4): s390: Use _refcount for pgtables mm: Split page_type out from _map_count mm: Mark pages allocated through vmalloc mm: Mark pages in use for page tables arch/s390/mm/pgalloc.c | 21 + fs/proc/page.c | 2 +- include/linux/mm.h | 2 ++ include/linux/mm_types.h | 13 +++ include/linux/page-flags.h | 57 ++ kernel/crash_core.c| 1 + mm/page_alloc.c| 13 --- mm/vmalloc.c | 2 ++ scripts/tags.sh| 6 ++--- 9 files changed, 72 insertions(+), 45 deletions(-) -- 2.16.1