2015-05-08 16:01 GMT+08:00 Yogesh Narayan Gaur <yn.g...@samsung.com>:
> EP-2DAD0AFA905A4ACB804C4F82A001242F
>
> ------- Original Message -------
> Sender : yalin wang<yalin.wang2...@gmail.com>
> Date : May 08, 2015 13:17 (GMT+05:30)
> Title : Re: [EDT] oom_killer: find bulkiest task based on pss value
>
> 2015-05-08 13:29 GMT+08:00 Yogesh Narayan Gaur :
>>>
>>> EP-2DAD0AFA905A4ACB804C4F82A001242F
>>> Hi Andrew,
>>>
>>> Presently in oom_kill.c we calculate badness score of the victim task as 
>>> per the present RSS counter value of the task.
>>> RSS counter value for any task is usually '[Private (Dirty/Clean)] + 
>>> [Shared (Dirty/Clean)]' of the task.
>>> We have encountered a situation where values for Private fields are less 
>>> but value for Shared fields are more and hence make total RSS counter value 
>>> large. Later on oom situation killing task with highest RSS value but as 
>>> Private field values are not large hence memory gain after killing this 
>>> process is not as per the expectation.
>>>
>>> For e.g. take below use-case scenario, in which 3 process are running in 
>>> system.
>>> All these process done mmap for file exist in present directory and then 
>>> copying data from this file to local allocated pointers in while(1) loop 
>>> with some sleep. Out of 3 process, 2 process has mmaped file with 
>>> MAP_SHARED setting and one has mapped file with MAP_PRIVATE setting.
>>> I have all 3 processes in background and checks RSS/PSS value from user 
>>> space utility (utility over cat /proc/pid/smaps)
>>> Before OOM, below is the consumed memory status for these 3 process (all 
>>> processes run with oom_score_adj = 0)
>>> ====================================================
>>> Comm : 1prg,  Pid : 213 (values in kB)
>>>                       Rss     Shared      Private          Pss
>>>   Process :  375764    194596    181168     278460
>>> ====================================================
>>> Comm : 3prg,  Pid : 217 (values in kB)
>>>                       Rss    Shared       Private         Pss
>>>   Process :  305760          32     305728    305738
>>> ====================================================
>>> Comm : 2prg,  Pid : 218 (values in kB)
>>>                       Rss      Shared       Private         Pss
>>>   Process :  389980     194596     195384    292676
>>> ====================================================
>>>
>>> Thus as per present code design, first it would select process [2prg : 218] 
>>> as bulkiest process as its RSS value is highest to kill. But if we kill 
>>> this process then only ~195MB would be free as compare to expected ~389MB.
>>> Thus identifying the task based on RSS value is not accurate design and 
>>> killing that identified process didn’t release expected memory back to 
>>> system.
>>>
>>> We need to calculate victim task based on PSS instead of RSS as PSS value 
>>> calculates as
>>> PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
>>> task]
>>> For above use-case scenario also, it can be checked that process [3prg : 
>>> 217] is having largest PSS value and by killing this process we can gain 
>>> maximum memory (~305MB) as compare to killing process identified based on 
>>> RSS value.
>>>
>>> --
>>> Regards,
>>> Yogesh Gaur.
>
>>
>>Great,
>>
>> in fact, i also encounter this scenario,
>> I  use USS (page map counter == 1) pages
>> to decide which process should be killed,
>> seems have the same result as you use PSS,
>> but PSS is better , it also consider shared pages,
>> in case some process have large shared pages mapping
>> but little Private page mapping
>>
>> BRs,
>> Yalin
>
> I have made patch which identifies bulkiest task on basis of PSS value. 
> Please check below patch.
> This patch is correcting the way victim task gets identified in oom condition.
>
> ==================
>
> From 1c3d7f552f696bdbc0126c8e23beabedbd80e423 Mon Sep 17 00:00:00 2001
> From: Yogesh Gaur <yn.g...@samsung.com>
> Date: Thu, 7 May 2015 01:52:13 +0530
> Subject: [PATCH] oom: find victim task based on pss
>
> This patch is identifying bulkiest task to kill by OOM on the basis of PSS 
> value
> instead of present RSS values.
> There can be scenario where task with highest RSS counter is consuming lot of 
> shared
> memory and killing that task didn't release expected amount of memory to 
> system.
> PSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean) / no. of shared 
> task]
> RSS value = [Private (Dirty/Clean)] + [Shared (Dirty/Clean)]
> Thus, using PSS value instead of RSS value as PSS value closely matches with 
> actual
> memory usage by the task.
> This patch is using smaps_pte_range() interface defined in 
> CONFIG_PROC_PAGE_MONITOR.
> For case when CONFIG_PROC_PAGE_MONITOR disabled, this simply returns RSS 
> value count.
>
> Signed-off-by: Yogesh Gaur <yn.g...@samsung.com>
> Signed-off-by: Amit Arora <amit.ar...@samsung.com>
> Reviewed-by: Ajeet Yadav <ajee...@samsung.com>
> ---
>  fs/proc/task_mmu.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/mm.h |    9 +++++++++
>  mm/oom_kill.c      |    9 +++++++--
>  3 files changed, 63 insertions(+), 2 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 956b75d..dd962ff 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -964,6 +964,53 @@ struct pagemapread {
>         bool v2;
>  };
>
> +/**
> + * get_mm_pss - function to determine PSS count of pages being used for proc 
> p
> + * PSS value=[Private(Dirty/Clean)] + [Shared(Dirty/Clean)/no. of shared 
> task]
> + * @p: task struct of which task we should calculate
> + * @mm: mm struct of the task.
> + *
> + * This function needs to be called under task_lock for calling task 'p'.
> + */
> +long get_mm_pss(struct task_struct *p, struct mm_struct *mm)
> +{
> +       long pss = 0;
> +       struct vm_area_struct *vma = NULL;
> +       struct mem_size_stats mss;
> +       struct mm_walk smaps_walk = {
> +               .pmd_entry = smaps_pte_range,
> +               .private = &mss,
> +       };
> +
> +       if (mm == NULL)
> +               return 0;
> +
> +       /* task_lock held in oom_badness */
> +       smaps_walk.mm = mm;
> +
> +       if (!down_read_trylock(&mm->mmap_sem)) {
> +               pr_warn("Skipping task:%s\n", p->comm);
> +           return 0;
> +       }
> +
> +       vma = mm->mmap;
> +       if (!vma) {
> +               up_read(&mm->mmap_sem);
> +               return 0;
> +       }
> +
> +       while (vma) {
> +               memset(&mss, 0, sizeof(struct mem_size_stats));
> +               walk_page_vma(vma, &smaps_walk);
> +               pss += (unsigned long) (mss.pss >> (12 + PSS_SHIFT)); /*PSS 
> in PAGE */
> +
> +               /* Check next vma in list */
> +               vma = vma->vm_next;
> +       }
> +       up_read(&mm->mmap_sem);
> +       return pss;
> +}
> +
>  #define PAGEMAP_WALK_SIZE      (PMD_SIZE)
>  #define PAGEMAP_WALK_MASK      (PMD_MASK)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 47a9392..b6bb521 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1423,6 +1423,15 @@ static inline void setmax_mm_hiwater_rss(unsigned long 
> *maxrss,
>                 *maxrss = hiwater_rss;
>  }
>
> +#ifdef CONFIG_PROC_PAGE_MONITOR
> +long get_mm_pss(struct task_struct *p, struct mm_struct *mm);
> +#else
> +static inline long get_mm_pss(struct task_struct *p, struct mm_struct *mm)
> +{
> +       return 0;
> +}
> +#endif
> +
>  #if defined(SPLIT_RSS_COUNTING)
>  void sync_mm_rss(struct mm_struct *mm);
>  #else
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 642f38c..537eb4c 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -151,6 +151,7 @@ unsigned long oom_badness(struct task_struct *p, struct 
> mem_cgroup *memcg,
>  {
>         long points;
>         long adj;
> +       long pss = 0;
>
>         if (oom_unkillable_task(p, memcg, nodemask))
>                 return 0;
> @@ -167,9 +168,13 @@ unsigned long oom_badness(struct task_struct *p, struct 
> mem_cgroup *memcg,
>
>         /*
>          * The baseline for the badness score is the proportion of RAM that 
> each
> -        * task's rss, pagetable and swap space use.
> +        * task's pss, pagetable and swap space use.
>          */
> -       points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
> +       pss = get_mm_pss(p, p->mm);
> +       if (pss == 0) /* make pss equals to rss, pseudo-pss */
> +               pss = get_mm_rss(p->mm);
> +
> +       points = pss + get_mm_counter(p->mm, MM_SWAPENTS) +
>                 atomic_long_read(&p->mm->nr_ptes) + mm_nr_pmds(p->mm);
>         task_unlock(p);
>
> --
> 1.7.1
>
>
> --
> BRs
> Yogesh Gaur.
The logic seems ok ,
but i feel the code footprint is too large than the original method,
maybe have some performance problems,
i have add linux-mm mail list for mm expert to review .

BRs
Yalin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to