Re: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Sai Praneeth Prakhya
On Tue, 2019-08-06 at 10:36 +0200, Michal Hocko wrote:
> On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote:
> > When a user process exits, the kernel cleans up the mm_struct of the user
> > process and during cleanup, check_mm() checks the page tables of the user
> > process for corruption (E.g: unexpected page flags set/cleared). For
> > corrupted page tables, the error message printed by check_mm() isn't very
> > clear as it prints the loop index instead of page table type (E.g:
> > Resident
> > file mapping pages vs Resident shared memory pages). The loop index in
> > check_mm() is used to index rss_stat[] which represents individual memory
> > type stats. Hence, instead of printing index, print memory type, thereby
> > improving error message.
> > 
> > Without patch:
> > --
> > [  204.836425] mm/pgtable-generic.c:29: bad p4d
> > 89eb4e92(80025f941467)
> > [  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
> > [  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
> > [  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
> > 
> > With patch:
> > ---
> > [   69.815453] mm/pgtable-generic.c:29: bad p4d
> > 84653642(80025ca37467)
> > [   69.815872] BUG: Bad rss-counter state mm:014a6c03
> > type:MM_FILEPAGES val:2
> > [   69.815962] BUG: Bad rss-counter state mm:014a6c03
> > type:MM_ANONPAGES val:5
> > [   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> 
> I like this. On any occasion I am investigating an issue with an rss
> inbalance I have to go back to kernel sources to see which pte type that
> is.
> 

Hopefully, this patch will be useful to you the next time you run into any rss
imbalance issues.

> > Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> > that it matches the other print statement.
> 
> good change as well. Maybe we should also lower the loglevel (in a
> separate patch) as well. While this is not nice because we are
> apparently leaking memory behind it shouldn't be really critical enough
> to jump on normal consoles.

Ya.. I think, probably could be lowered to pr_err() or pr_warn().

Regards,
Sai



Re: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Sai Praneeth Prakhya
On Tue, 2019-08-06 at 09:30 -0700, Dave Hansen wrote:
> On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote:
> > +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> > +   [MM_FILEPAGES]  = "MM_FILEPAGES",
> > +   [MM_ANONPAGES]  = "MM_ANONPAGES",
> > +   [MM_SWAPENTS]   = "MM_SWAPENTS",
> > +   [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> > +};
> 
> One trick to ensure that this gets updated if the names are ever
> updated.  You can do:
> 
> #define NAMED_ARRAY_INDEX(x)  [x] = __stringify(x),
> 
> and
> 
> static const char * const resident_page_types[NR_MM_COUNTERS] = {
>   NAMED_ARRAY_INDEX(MM_FILE_PAGES),
>   NAMED_ARRAY_INDEX(MM_SHMEMPAGES),
>   ...
> };

Thanks for the suggestion Dave. I will add this in V3.
Even with this, (if ever) anyone who changes the name of page types or adds an
new entry would still need to update struct resident_page_types[]. So, I will
add the comment as suggested by Vlastimil.

> 
> That makes sure that any name changes make it into the strings.  Then
> stick a:
> 
>   BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types));
> 
> somewhere.  That makes sure that any new array indexes get a string
> added in the array.  Otherwise you get nice, early, compile-time errors.

Sure! this sounds good and a small nit-bit :)
For the BUILD_BUG_ON() to work, the definition of struct should be changed as
below

static const char * const resident_page_types[] = {
...
}

i.e. we should not specify the size of array.

Regards,
Sai



Re: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Dave Hansen
On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote:
> +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> + [MM_FILEPAGES]  = "MM_FILEPAGES",
> + [MM_ANONPAGES]  = "MM_ANONPAGES",
> + [MM_SWAPENTS]   = "MM_SWAPENTS",
> + [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> +};

One trick to ensure that this gets updated if the names are ever
updated.  You can do:

#define NAMED_ARRAY_INDEX(x)[x] = __stringify(x),

and

static const char * const resident_page_types[NR_MM_COUNTERS] = {
NAMED_ARRAY_INDEX(MM_FILE_PAGES),
NAMED_ARRAY_INDEX(MM_SHMEMPAGES),
...
};

That makes sure that any name changes make it into the strings.  Then
stick a:

BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types));

somewhere.  That makes sure that any new array indexes get a string
added in the array.  Otherwise you get nice, early, compile-time errors.


RE: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Prakhya, Sai Praneeth
> >> Without patch:
> >> --
> >> [  204.836425] mm/pgtable-generic.c:29: bad p4d
> >> 89eb4e92(80025f941467)
> >> [  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0
> >> val:2 [  204.836615] BUG: Bad rss-counter state mm:f75895ea
> >> idx:1 val:5 [  204.836685] BUG: non-zero pgtables_bytes on freeing
> >> mm: 20480
> >>
> >> With patch:
> >> ---
> >> [   69.815453] mm/pgtable-generic.c:29: bad p4d
> 84653642(80025ca37467)
> >> [   69.815872] BUG: Bad rss-counter state mm:014a6c03
> type:MM_FILEPAGES val:2
> >> [   69.815962] BUG: Bad rss-counter state mm:014a6c03
> type:MM_ANONPAGES val:5
> >> [   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> >>
> >> Also, change print function (from printk(KERN_ALERT, ..) to
> >> pr_alert()) so that it matches the other print statement.
> >>
> >> Cc: Ingo Molnar 
> >> Cc: Vlastimil Babka 
> >> Cc: Peter Zijlstra 
> >> Cc: Andrew Morton 
> >> Cc: Anshuman Khandual 
> >> Acked-by: Dave Hansen 
> >> Suggested-by: Dave Hansen 
> >> Signed-off-by: Sai Praneeth Prakhya 
> >
> > Acked-by: Vlastimil Babka 
> >
> > I would also add something like this to reduce risk of breaking it in
> > the
> > future:
> >
> > 8<
> > diff --git a/include/linux/mm_types_task.h
> > b/include/linux/mm_types_task.h index d7016dcb245e..a6f83cbe4603
> > 100644
> > --- a/include/linux/mm_types_task.h
> > +++ b/include/linux/mm_types_task.h
> > @@ -36,6 +36,9 @@ struct vmacache {
> > struct vm_area_struct *vmas[VMACACHE_SIZE];  };
> >
> > +/*
> > + * When touching this, update also resident_page_types in
> > +kernel/fork.c  */
> >  enum {
> > MM_FILEPAGES,   /* Resident file mapping pages */
> > MM_ANONPAGES,   /* Resident anonymous pages */
> >
> 
> Agreed and with that
> 
> Reviewed-by: Anshuman Khandual 

Thanks for the review and helping me in improving the patch :)

Regards,
Sai


RE: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Prakhya, Sai Praneeth
> > With patch:
> > ---
> > [   69.815453] mm/pgtable-generic.c:29: bad p4d
> 84653642(80025ca37467)
> > [   69.815872] BUG: Bad rss-counter state mm:014a6c03
> type:MM_FILEPAGES val:2
> > [   69.815962] BUG: Bad rss-counter state mm:014a6c03
> type:MM_ANONPAGES val:5
> > [   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> >
> > Also, change print function (from printk(KERN_ALERT, ..) to
> > pr_alert()) so that it matches the other print statement.
> >
> > Cc: Ingo Molnar 
> > Cc: Vlastimil Babka 
> > Cc: Peter Zijlstra 
> > Cc: Andrew Morton 
> > Cc: Anshuman Khandual 
> > Acked-by: Dave Hansen 
> > Suggested-by: Dave Hansen 
> > Signed-off-by: Sai Praneeth Prakhya 
> 
> Acked-by: Vlastimil Babka 
> 
> I would also add something like this to reduce risk of breaking it in the
> future:

Sure! Sounds good to me. Will add it to V3.

Regards,
Sai


Re: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Michal Hocko
On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote:
> When a user process exits, the kernel cleans up the mm_struct of the user
> process and during cleanup, check_mm() checks the page tables of the user
> process for corruption (E.g: unexpected page flags set/cleared). For
> corrupted page tables, the error message printed by check_mm() isn't very
> clear as it prints the loop index instead of page table type (E.g: Resident
> file mapping pages vs Resident shared memory pages). The loop index in
> check_mm() is used to index rss_stat[] which represents individual memory
> type stats. Hence, instead of printing index, print memory type, thereby
> improving error message.
> 
> Without patch:
> --
> [  204.836425] mm/pgtable-generic.c:29: bad p4d 
> 89eb4e92(80025f941467)
> [  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
> [  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
> [  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
> 
> With patch:
> ---
> [   69.815453] mm/pgtable-generic.c:29: bad p4d 
> 84653642(80025ca37467)
> [   69.815872] BUG: Bad rss-counter state mm:014a6c03 
> type:MM_FILEPAGES val:2
> [   69.815962] BUG: Bad rss-counter state mm:014a6c03 
> type:MM_ANONPAGES val:5
> [   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480

I like this. On any occasion I am investigating an issue with an rss
inbalance I have to go back to kernel sources to see which pte type that
is.

> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> that it matches the other print statement.

good change as well. Maybe we should also lower the loglevel (in a
separate patch) as well. While this is not nice because we are
apparently leaking memory behind it shouldn't be really critical enough
to jump on normal consoles.

> Cc: Ingo Molnar 
> Cc: Vlastimil Babka 
> Cc: Peter Zijlstra 
> Cc: Andrew Morton 
> Cc: Anshuman Khandual 
> Acked-by: Dave Hansen 
> Suggested-by: Dave Hansen 
> Signed-off-by: Sai Praneeth Prakhya 

Acked-by: Michal Hocko 

> ---
> 
> Changes from V1 to V2:
> --
> 1. Move struct definition from header file to fork.c file, so that it won't be
>included in every compilation unit. As this struct is used *only* in 
> fork.c,
>include the definition in fork.c itself.
> 2. Index the struct to match respective macros.
> 3. Mention about print function change in commit message.
> 
>  kernel/fork.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d8ae0f1b4148..f34f441c50c0 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -125,6 +125,13 @@ int nr_threads;  /* The idle threads do 
> not count.. */
>  
>  static int max_threads;  /* tunable limit on nr_threads */
>  
> +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> + [MM_FILEPAGES]  = "MM_FILEPAGES",
> + [MM_ANONPAGES]  = "MM_ANONPAGES",
> + [MM_SWAPENTS]   = "MM_SWAPENTS",
> + [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> +};
> +
>  DEFINE_PER_CPU(unsigned long, process_counts) = 0;
>  
>  __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
> @@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm)
>   long x = atomic_long_read(>rss_stat.count[i]);
>  
>   if (unlikely(x))
> - printk(KERN_ALERT "BUG: Bad rss-counter state "
> -   "mm:%p idx:%d val:%ld\n", mm, i, x);
> + pr_alert("BUG: Bad rss-counter state mm:%p type:%s 
> val:%ld\n",
> +  mm, resident_page_types[i], x);
>   }
>  
>   if (mm_pgtables_bytes(mm))
> -- 
> 2.7.4

-- 
Michal Hocko
SUSE Labs


Re: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Anshuman Khandual



On 08/06/2019 01:23 PM, Vlastimil Babka wrote:
> 
> On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote:
>> When a user process exits, the kernel cleans up the mm_struct of the user
>> process and during cleanup, check_mm() checks the page tables of the user
>> process for corruption (E.g: unexpected page flags set/cleared). For
>> corrupted page tables, the error message printed by check_mm() isn't very
>> clear as it prints the loop index instead of page table type (E.g: Resident
>> file mapping pages vs Resident shared memory pages). The loop index in
>> check_mm() is used to index rss_stat[] which represents individual memory
>> type stats. Hence, instead of printing index, print memory type, thereby
>> improving error message.
>>
>> Without patch:
>> --
>> [  204.836425] mm/pgtable-generic.c:29: bad p4d 
>> 89eb4e92(80025f941467)
>> [  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
>> [  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
>> [  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
>>
>> With patch:
>> ---
>> [   69.815453] mm/pgtable-generic.c:29: bad p4d 
>> 84653642(80025ca37467)
>> [   69.815872] BUG: Bad rss-counter state mm:014a6c03 
>> type:MM_FILEPAGES val:2
>> [   69.815962] BUG: Bad rss-counter state mm:014a6c03 
>> type:MM_ANONPAGES val:5
>> [   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
>>
>> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
>> that it matches the other print statement.
>>
>> Cc: Ingo Molnar 
>> Cc: Vlastimil Babka 
>> Cc: Peter Zijlstra 
>> Cc: Andrew Morton 
>> Cc: Anshuman Khandual 
>> Acked-by: Dave Hansen 
>> Suggested-by: Dave Hansen 
>> Signed-off-by: Sai Praneeth Prakhya 
> 
> Acked-by: Vlastimil Babka 
> 
> I would also add something like this to reduce risk of breaking it in the
> future:
> 
> 8<
> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
> index d7016dcb245e..a6f83cbe4603 100644
> --- a/include/linux/mm_types_task.h
> +++ b/include/linux/mm_types_task.h
> @@ -36,6 +36,9 @@ struct vmacache {
>   struct vm_area_struct *vmas[VMACACHE_SIZE];
>  };
>  
> +/*
> + * When touching this, update also resident_page_types in kernel/fork.c
> + */
>  enum {
>   MM_FILEPAGES,   /* Resident file mapping pages */
>   MM_ANONPAGES,   /* Resident anonymous pages */
> 

Agreed and with that

Reviewed-by: Anshuman Khandual 


Re: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Vlastimil Babka


On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote:
> When a user process exits, the kernel cleans up the mm_struct of the user
> process and during cleanup, check_mm() checks the page tables of the user
> process for corruption (E.g: unexpected page flags set/cleared). For
> corrupted page tables, the error message printed by check_mm() isn't very
> clear as it prints the loop index instead of page table type (E.g: Resident
> file mapping pages vs Resident shared memory pages). The loop index in
> check_mm() is used to index rss_stat[] which represents individual memory
> type stats. Hence, instead of printing index, print memory type, thereby
> improving error message.
> 
> Without patch:
> --
> [  204.836425] mm/pgtable-generic.c:29: bad p4d 
> 89eb4e92(80025f941467)
> [  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
> [  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
> [  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
> 
> With patch:
> ---
> [   69.815453] mm/pgtable-generic.c:29: bad p4d 
> 84653642(80025ca37467)
> [   69.815872] BUG: Bad rss-counter state mm:014a6c03 
> type:MM_FILEPAGES val:2
> [   69.815962] BUG: Bad rss-counter state mm:014a6c03 
> type:MM_ANONPAGES val:5
> [   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> 
> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> that it matches the other print statement.
> 
> Cc: Ingo Molnar 
> Cc: Vlastimil Babka 
> Cc: Peter Zijlstra 
> Cc: Andrew Morton 
> Cc: Anshuman Khandual 
> Acked-by: Dave Hansen 
> Suggested-by: Dave Hansen 
> Signed-off-by: Sai Praneeth Prakhya 

Acked-by: Vlastimil Babka 

I would also add something like this to reduce risk of breaking it in the
future:

8<
diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
index d7016dcb245e..a6f83cbe4603 100644
--- a/include/linux/mm_types_task.h
+++ b/include/linux/mm_types_task.h
@@ -36,6 +36,9 @@ struct vmacache {
struct vm_area_struct *vmas[VMACACHE_SIZE];
 };
 
+/*
+ * When touching this, update also resident_page_types in kernel/fork.c
+ */
 enum {
MM_FILEPAGES,   /* Resident file mapping pages */
MM_ANONPAGES,   /* Resident anonymous pages */


[PATCH V2] fork: Improve error message for corrupted page tables

2019-08-05 Thread Sai Praneeth Prakhya
When a user process exits, the kernel cleans up the mm_struct of the user
process and during cleanup, check_mm() checks the page tables of the user
process for corruption (E.g: unexpected page flags set/cleared). For
corrupted page tables, the error message printed by check_mm() isn't very
clear as it prints the loop index instead of page table type (E.g: Resident
file mapping pages vs Resident shared memory pages). The loop index in
check_mm() is used to index rss_stat[] which represents individual memory
type stats. Hence, instead of printing index, print memory type, thereby
improving error message.

Without patch:
--
[  204.836425] mm/pgtable-generic.c:29: bad p4d 
89eb4e92(80025f941467)
[  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
[  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
[  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480

With patch:
---
[   69.815453] mm/pgtable-generic.c:29: bad p4d 
84653642(80025ca37467)
[   69.815872] BUG: Bad rss-counter state mm:014a6c03 type:MM_FILEPAGES 
val:2
[   69.815962] BUG: Bad rss-counter state mm:014a6c03 type:MM_ANONPAGES 
val:5
[   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480

Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
that it matches the other print statement.

Cc: Ingo Molnar 
Cc: Vlastimil Babka 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Anshuman Khandual 
Acked-by: Dave Hansen 
Suggested-by: Dave Hansen 
Signed-off-by: Sai Praneeth Prakhya 
---

Changes from V1 to V2:
--
1. Move struct definition from header file to fork.c file, so that it won't be
   included in every compilation unit. As this struct is used *only* in fork.c,
   include the definition in fork.c itself.
2. Index the struct to match respective macros.
3. Mention about print function change in commit message.

 kernel/fork.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..f34f441c50c0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -125,6 +125,13 @@ int nr_threads;/* The idle threads do 
not count.. */
 
 static int max_threads;/* tunable limit on nr_threads */
 
+static const char * const resident_page_types[NR_MM_COUNTERS] = {
+   [MM_FILEPAGES]  = "MM_FILEPAGES",
+   [MM_ANONPAGES]  = "MM_ANONPAGES",
+   [MM_SWAPENTS]   = "MM_SWAPENTS",
+   [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
+};
+
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
@@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm)
long x = atomic_long_read(>rss_stat.count[i]);
 
if (unlikely(x))
-   printk(KERN_ALERT "BUG: Bad rss-counter state "
- "mm:%p idx:%d val:%ld\n", mm, i, x);
+   pr_alert("BUG: Bad rss-counter state mm:%p type:%s 
val:%ld\n",
+mm, resident_page_types[i], x);
}
 
if (mm_pgtables_bytes(mm))
-- 
2.7.4