On Tue, Apr 28, 2015 at 03:24:42PM +0300, Vladimir Davydov wrote:
> Knowing the portion of memory that is not used by a certain application
> or memory cgroup (idle memory) can be useful for partitioning the system
> efficiently, e.g. by setting memory cgroup limits appropriately.
> Currently, the only means to estimate the amount of idle memory provided
> by the kernel is /proc/PID/{clear_refs,smaps}: the user can clear the
> access bit for all pages mapped to a particular process by writing 1 to
> clear_refs, wait for some time, and then count smaps:Referenced.
> However, this method has two serious shortcomings:
> 
>  - it does not count unmapped file pages
>  - it affects the reclaimer logic
> 
> To overcome these drawbacks, this patch introduces two new page flags,
> Idle and Young, and a new proc file, /proc/kpageidle. A page's Idle flag
> can only be set from userspace by writing 1 to /proc/kpageidle at the
> offset corresponding to the page, and it is cleared whenever the page is
> accessed either through page tables (it is cleared in page_referenced()
> in this case) or using the read(2) system call (mark_page_accessed()).
> Thus by setting the Idle flag for pages of a particular workload, which
> can be found e.g. by reading /proc/PID/pagemap, waiting for some time to
> let the workload access its working set, and then reading the kpageidle
> file, one can estimate the amount of pages that are not used by the
> workload.
> 
> The Young page flag is used to avoid interference with the memory
> reclaimer. A page's Young flag is set whenever the Access bit of a page
> table entry pointing to the page is cleared by writing to kpageidle. If
> page_referenced() is called on a Young page, it will add 1 to its return
> value, therefore concealing the fact that the Access bit was cleared.
> 
> Note, since there is no room for extra page flags on 32 bit, this
> feature uses extended page flags when compiled on 32 bit.
> 
> Signed-off-by: Vladimir Davydov <vdavy...@parallels.com>
> ---
>  Documentation/vm/pagemap.txt |   10 ++-
>  fs/proc/page.c               |  154 
> ++++++++++++++++++++++++++++++++++++++++++
>  fs/proc/task_mmu.c           |    4 +-
>  include/linux/mm.h           |   88 ++++++++++++++++++++++++
>  include/linux/page-flags.h   |    9 +++
>  include/linux/page_ext.h     |    4 ++
>  mm/Kconfig                   |   12 ++++
>  mm/debug.c                   |    4 ++
>  mm/page_ext.c                |    3 +
>  mm/rmap.c                    |    7 ++
>  mm/swap.c                    |    2 +
>  11 files changed, 295 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
> index a9b7afc8fbc6..ac6fd32a9296 100644
> --- a/Documentation/vm/pagemap.txt
> +++ b/Documentation/vm/pagemap.txt
> @@ -5,7 +5,7 @@ pagemap is a new (as of 2.6.25) set of interfaces in the 
> kernel that allow
>  userspace programs to examine the page tables and related information by
>  reading files in /proc.
>  
> -There are four components to pagemap:
> +There are five components to pagemap:
>  
>   * /proc/pid/pagemap.  This file lets a userspace process find out which
>     physical frame each virtual page is mapped to.  It contains one 64-bit
> @@ -69,6 +69,14 @@ There are four components to pagemap:
>     memory cgroup each page is charged to, indexed by PFN. Only available when
>     CONFIG_MEMCG is set.
>  
> + * /proc/kpageidle.  For each page this file contains a 64-bit number, which
> +   equals 1 if the page is idle or 0 otherwise, indexed by PFN. A page is
> +   considered idle if it has not been accessed since it was marked idle. To
> +   mark a page idle one should write 1 to this file at the offset 
> corresponding
> +   to the page. Only user memory pages can be marked idle, for other page 
> types
> +   input is silently ignored. Writing to this file beyond max PFN results in
> +   the ENXIO error. Only available when CONFIG_IDLE_PAGE_TRACKING is set.
> +

How about using kpageflags for reading part?

I mean PG_idle is one of the page flags and we already have a feature to
parse of each PFN flag so we could reuse existing feature for reading
idleness.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to