Re: [PATCH V2] fork: Improve error message for corrupted page tables
On Tue, 2019-08-06 at 10:36 +0200, Michal Hocko wrote: > On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote: > > When a user process exits, the kernel cleans up the mm_struct of the user > > process and during cleanup, check_mm() checks the page tables of the user > > process for corruption (E.g: unexpected page flags set/cleared). For > > corrupted page tables, the error message printed by check_mm() isn't very > > clear as it prints the loop index instead of page table type (E.g: > > Resident > > file mapping pages vs Resident shared memory pages). The loop index in > > check_mm() is used to index rss_stat[] which represents individual memory > > type stats. Hence, instead of printing index, print memory type, thereby > > improving error message. > > > > Without patch: > > -- > > [ 204.836425] mm/pgtable-generic.c:29: bad p4d > > 89eb4e92(80025f941467) > > [ 204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2 > > [ 204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5 > > [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 > > > > With patch: > > --- > > [ 69.815453] mm/pgtable-generic.c:29: bad p4d > > 84653642(80025ca37467) > > [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 > > type:MM_FILEPAGES val:2 > > [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 > > type:MM_ANONPAGES val:5 > > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 > > I like this. On any occasion I am investigating an issue with an rss > inbalance I have to go back to kernel sources to see which pte type that > is. > Hopefully, this patch will be useful to you the next time you run into any rss imbalance issues. > > Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so > > that it matches the other print statement. > > good change as well. Maybe we should also lower the loglevel (in a > separate patch) as well. While this is not nice because we are > apparently leaking memory behind it shouldn't be really critical enough > to jump on normal consoles. Ya.. I think, probably could be lowered to pr_err() or pr_warn(). Regards, Sai
Re: [PATCH V2] fork: Improve error message for corrupted page tables
On Tue, 2019-08-06 at 09:30 -0700, Dave Hansen wrote: > On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote: > > +static const char * const resident_page_types[NR_MM_COUNTERS] = { > > + [MM_FILEPAGES] = "MM_FILEPAGES", > > + [MM_ANONPAGES] = "MM_ANONPAGES", > > + [MM_SWAPENTS] = "MM_SWAPENTS", > > + [MM_SHMEMPAGES] = "MM_SHMEMPAGES", > > +}; > > One trick to ensure that this gets updated if the names are ever > updated. You can do: > > #define NAMED_ARRAY_INDEX(x) [x] = __stringify(x), > > and > > static const char * const resident_page_types[NR_MM_COUNTERS] = { > NAMED_ARRAY_INDEX(MM_FILE_PAGES), > NAMED_ARRAY_INDEX(MM_SHMEMPAGES), > ... > }; Thanks for the suggestion Dave. I will add this in V3. Even with this, (if ever) anyone who changes the name of page types or adds an new entry would still need to update struct resident_page_types[]. So, I will add the comment as suggested by Vlastimil. > > That makes sure that any name changes make it into the strings. Then > stick a: > > BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types)); > > somewhere. That makes sure that any new array indexes get a string > added in the array. Otherwise you get nice, early, compile-time errors. Sure! this sounds good and a small nit-bit :) For the BUILD_BUG_ON() to work, the definition of struct should be changed as below static const char * const resident_page_types[] = { ... } i.e. we should not specify the size of array. Regards, Sai
Re: [PATCH V2] fork: Improve error message for corrupted page tables
On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote: > +static const char * const resident_page_types[NR_MM_COUNTERS] = { > + [MM_FILEPAGES] = "MM_FILEPAGES", > + [MM_ANONPAGES] = "MM_ANONPAGES", > + [MM_SWAPENTS] = "MM_SWAPENTS", > + [MM_SHMEMPAGES] = "MM_SHMEMPAGES", > +}; One trick to ensure that this gets updated if the names are ever updated. You can do: #define NAMED_ARRAY_INDEX(x)[x] = __stringify(x), and static const char * const resident_page_types[NR_MM_COUNTERS] = { NAMED_ARRAY_INDEX(MM_FILE_PAGES), NAMED_ARRAY_INDEX(MM_SHMEMPAGES), ... }; That makes sure that any name changes make it into the strings. Then stick a: BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types)); somewhere. That makes sure that any new array indexes get a string added in the array. Otherwise you get nice, early, compile-time errors.
RE: [PATCH V2] fork: Improve error message for corrupted page tables
> >> Without patch: > >> -- > >> [ 204.836425] mm/pgtable-generic.c:29: bad p4d > >> 89eb4e92(80025f941467) > >> [ 204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 > >> val:2 [ 204.836615] BUG: Bad rss-counter state mm:f75895ea > >> idx:1 val:5 [ 204.836685] BUG: non-zero pgtables_bytes on freeing > >> mm: 20480 > >> > >> With patch: > >> --- > >> [ 69.815453] mm/pgtable-generic.c:29: bad p4d > 84653642(80025ca37467) > >> [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 > type:MM_FILEPAGES val:2 > >> [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 > type:MM_ANONPAGES val:5 > >> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 > >> > >> Also, change print function (from printk(KERN_ALERT, ..) to > >> pr_alert()) so that it matches the other print statement. > >> > >> Cc: Ingo Molnar > >> Cc: Vlastimil Babka > >> Cc: Peter Zijlstra > >> Cc: Andrew Morton > >> Cc: Anshuman Khandual > >> Acked-by: Dave Hansen > >> Suggested-by: Dave Hansen > >> Signed-off-by: Sai Praneeth Prakhya > > > > Acked-by: Vlastimil Babka > > > > I would also add something like this to reduce risk of breaking it in > > the > > future: > > > > 8< > > diff --git a/include/linux/mm_types_task.h > > b/include/linux/mm_types_task.h index d7016dcb245e..a6f83cbe4603 > > 100644 > > --- a/include/linux/mm_types_task.h > > +++ b/include/linux/mm_types_task.h > > @@ -36,6 +36,9 @@ struct vmacache { > > struct vm_area_struct *vmas[VMACACHE_SIZE]; }; > > > > +/* > > + * When touching this, update also resident_page_types in > > +kernel/fork.c */ > > enum { > > MM_FILEPAGES, /* Resident file mapping pages */ > > MM_ANONPAGES, /* Resident anonymous pages */ > > > > Agreed and with that > > Reviewed-by: Anshuman Khandual Thanks for the review and helping me in improving the patch :) Regards, Sai
RE: [PATCH V2] fork: Improve error message for corrupted page tables
> > With patch: > > --- > > [ 69.815453] mm/pgtable-generic.c:29: bad p4d > 84653642(80025ca37467) > > [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 > type:MM_FILEPAGES val:2 > > [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 > type:MM_ANONPAGES val:5 > > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 > > > > Also, change print function (from printk(KERN_ALERT, ..) to > > pr_alert()) so that it matches the other print statement. > > > > Cc: Ingo Molnar > > Cc: Vlastimil Babka > > Cc: Peter Zijlstra > > Cc: Andrew Morton > > Cc: Anshuman Khandual > > Acked-by: Dave Hansen > > Suggested-by: Dave Hansen > > Signed-off-by: Sai Praneeth Prakhya > > Acked-by: Vlastimil Babka > > I would also add something like this to reduce risk of breaking it in the > future: Sure! Sounds good to me. Will add it to V3. Regards, Sai
Re: [PATCH V2] fork: Improve error message for corrupted page tables
On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote: > When a user process exits, the kernel cleans up the mm_struct of the user > process and during cleanup, check_mm() checks the page tables of the user > process for corruption (E.g: unexpected page flags set/cleared). For > corrupted page tables, the error message printed by check_mm() isn't very > clear as it prints the loop index instead of page table type (E.g: Resident > file mapping pages vs Resident shared memory pages). The loop index in > check_mm() is used to index rss_stat[] which represents individual memory > type stats. Hence, instead of printing index, print memory type, thereby > improving error message. > > Without patch: > -- > [ 204.836425] mm/pgtable-generic.c:29: bad p4d > 89eb4e92(80025f941467) > [ 204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2 > [ 204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5 > [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 > > With patch: > --- > [ 69.815453] mm/pgtable-generic.c:29: bad p4d > 84653642(80025ca37467) > [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 > type:MM_FILEPAGES val:2 > [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 > type:MM_ANONPAGES val:5 > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 I like this. On any occasion I am investigating an issue with an rss inbalance I have to go back to kernel sources to see which pte type that is. > Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so > that it matches the other print statement. good change as well. Maybe we should also lower the loglevel (in a separate patch) as well. While this is not nice because we are apparently leaking memory behind it shouldn't be really critical enough to jump on normal consoles. > Cc: Ingo Molnar > Cc: Vlastimil Babka > Cc: Peter Zijlstra > Cc: Andrew Morton > Cc: Anshuman Khandual > Acked-by: Dave Hansen > Suggested-by: Dave Hansen > Signed-off-by: Sai Praneeth Prakhya Acked-by: Michal Hocko > --- > > Changes from V1 to V2: > -- > 1. Move struct definition from header file to fork.c file, so that it won't be >included in every compilation unit. As this struct is used *only* in > fork.c, >include the definition in fork.c itself. > 2. Index the struct to match respective macros. > 3. Mention about print function change in commit message. > > kernel/fork.c | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index d8ae0f1b4148..f34f441c50c0 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -125,6 +125,13 @@ int nr_threads; /* The idle threads do > not count.. */ > > static int max_threads; /* tunable limit on nr_threads */ > > +static const char * const resident_page_types[NR_MM_COUNTERS] = { > + [MM_FILEPAGES] = "MM_FILEPAGES", > + [MM_ANONPAGES] = "MM_ANONPAGES", > + [MM_SWAPENTS] = "MM_SWAPENTS", > + [MM_SHMEMPAGES] = "MM_SHMEMPAGES", > +}; > + > DEFINE_PER_CPU(unsigned long, process_counts) = 0; > > __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */ > @@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm) > long x = atomic_long_read(>rss_stat.count[i]); > > if (unlikely(x)) > - printk(KERN_ALERT "BUG: Bad rss-counter state " > - "mm:%p idx:%d val:%ld\n", mm, i, x); > + pr_alert("BUG: Bad rss-counter state mm:%p type:%s > val:%ld\n", > + mm, resident_page_types[i], x); > } > > if (mm_pgtables_bytes(mm)) > -- > 2.7.4 -- Michal Hocko SUSE Labs
Re: [PATCH V2] fork: Improve error message for corrupted page tables
On 08/06/2019 01:23 PM, Vlastimil Babka wrote: > > On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote: >> When a user process exits, the kernel cleans up the mm_struct of the user >> process and during cleanup, check_mm() checks the page tables of the user >> process for corruption (E.g: unexpected page flags set/cleared). For >> corrupted page tables, the error message printed by check_mm() isn't very >> clear as it prints the loop index instead of page table type (E.g: Resident >> file mapping pages vs Resident shared memory pages). The loop index in >> check_mm() is used to index rss_stat[] which represents individual memory >> type stats. Hence, instead of printing index, print memory type, thereby >> improving error message. >> >> Without patch: >> -- >> [ 204.836425] mm/pgtable-generic.c:29: bad p4d >> 89eb4e92(80025f941467) >> [ 204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2 >> [ 204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5 >> [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 >> >> With patch: >> --- >> [ 69.815453] mm/pgtable-generic.c:29: bad p4d >> 84653642(80025ca37467) >> [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 >> type:MM_FILEPAGES val:2 >> [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 >> type:MM_ANONPAGES val:5 >> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 >> >> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so >> that it matches the other print statement. >> >> Cc: Ingo Molnar >> Cc: Vlastimil Babka >> Cc: Peter Zijlstra >> Cc: Andrew Morton >> Cc: Anshuman Khandual >> Acked-by: Dave Hansen >> Suggested-by: Dave Hansen >> Signed-off-by: Sai Praneeth Prakhya > > Acked-by: Vlastimil Babka > > I would also add something like this to reduce risk of breaking it in the > future: > > 8< > diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h > index d7016dcb245e..a6f83cbe4603 100644 > --- a/include/linux/mm_types_task.h > +++ b/include/linux/mm_types_task.h > @@ -36,6 +36,9 @@ struct vmacache { > struct vm_area_struct *vmas[VMACACHE_SIZE]; > }; > > +/* > + * When touching this, update also resident_page_types in kernel/fork.c > + */ > enum { > MM_FILEPAGES, /* Resident file mapping pages */ > MM_ANONPAGES, /* Resident anonymous pages */ > Agreed and with that Reviewed-by: Anshuman Khandual
Re: [PATCH V2] fork: Improve error message for corrupted page tables
On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote: > When a user process exits, the kernel cleans up the mm_struct of the user > process and during cleanup, check_mm() checks the page tables of the user > process for corruption (E.g: unexpected page flags set/cleared). For > corrupted page tables, the error message printed by check_mm() isn't very > clear as it prints the loop index instead of page table type (E.g: Resident > file mapping pages vs Resident shared memory pages). The loop index in > check_mm() is used to index rss_stat[] which represents individual memory > type stats. Hence, instead of printing index, print memory type, thereby > improving error message. > > Without patch: > -- > [ 204.836425] mm/pgtable-generic.c:29: bad p4d > 89eb4e92(80025f941467) > [ 204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2 > [ 204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5 > [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 > > With patch: > --- > [ 69.815453] mm/pgtable-generic.c:29: bad p4d > 84653642(80025ca37467) > [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 > type:MM_FILEPAGES val:2 > [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 > type:MM_ANONPAGES val:5 > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 > > Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so > that it matches the other print statement. > > Cc: Ingo Molnar > Cc: Vlastimil Babka > Cc: Peter Zijlstra > Cc: Andrew Morton > Cc: Anshuman Khandual > Acked-by: Dave Hansen > Suggested-by: Dave Hansen > Signed-off-by: Sai Praneeth Prakhya Acked-by: Vlastimil Babka I would also add something like this to reduce risk of breaking it in the future: 8< diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index d7016dcb245e..a6f83cbe4603 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -36,6 +36,9 @@ struct vmacache { struct vm_area_struct *vmas[VMACACHE_SIZE]; }; +/* + * When touching this, update also resident_page_types in kernel/fork.c + */ enum { MM_FILEPAGES, /* Resident file mapping pages */ MM_ANONPAGES, /* Resident anonymous pages */
[PATCH V2] fork: Improve error message for corrupted page tables
When a user process exits, the kernel cleans up the mm_struct of the user process and during cleanup, check_mm() checks the page tables of the user process for corruption (E.g: unexpected page flags set/cleared). For corrupted page tables, the error message printed by check_mm() isn't very clear as it prints the loop index instead of page table type (E.g: Resident file mapping pages vs Resident shared memory pages). The loop index in check_mm() is used to index rss_stat[] which represents individual memory type stats. Hence, instead of printing index, print memory type, thereby improving error message. Without patch: -- [ 204.836425] mm/pgtable-generic.c:29: bad p4d 89eb4e92(80025f941467) [ 204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2 [ 204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5 [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 With patch: --- [ 69.815453] mm/pgtable-generic.c:29: bad p4d 84653642(80025ca37467) [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 type:MM_FILEPAGES val:2 [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 type:MM_ANONPAGES val:5 [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so that it matches the other print statement. Cc: Ingo Molnar Cc: Vlastimil Babka Cc: Peter Zijlstra Cc: Andrew Morton Cc: Anshuman Khandual Acked-by: Dave Hansen Suggested-by: Dave Hansen Signed-off-by: Sai Praneeth Prakhya --- Changes from V1 to V2: -- 1. Move struct definition from header file to fork.c file, so that it won't be included in every compilation unit. As this struct is used *only* in fork.c, include the definition in fork.c itself. 2. Index the struct to match respective macros. 3. Mention about print function change in commit message. kernel/fork.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index d8ae0f1b4148..f34f441c50c0 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -125,6 +125,13 @@ int nr_threads;/* The idle threads do not count.. */ static int max_threads;/* tunable limit on nr_threads */ +static const char * const resident_page_types[NR_MM_COUNTERS] = { + [MM_FILEPAGES] = "MM_FILEPAGES", + [MM_ANONPAGES] = "MM_ANONPAGES", + [MM_SWAPENTS] = "MM_SWAPENTS", + [MM_SHMEMPAGES] = "MM_SHMEMPAGES", +}; + DEFINE_PER_CPU(unsigned long, process_counts) = 0; __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */ @@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm) long x = atomic_long_read(>rss_stat.count[i]); if (unlikely(x)) - printk(KERN_ALERT "BUG: Bad rss-counter state " - "mm:%p idx:%d val:%ld\n", mm, i, x); + pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n", +mm, resident_page_types[i], x); } if (mm_pgtables_bytes(mm)) -- 2.7.4