Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-02-16 Thread Minchan Kim
On Mon, Feb 14, 2011 at 2:33 AM, Balbir Singh  wrote:
> * MinChan Kim  [2011-02-10 14:41:44]:
>
>> I don't know why the part of message is deleted only when I send you.
>> Maybe it's gmail bug.
>>
>> I hope mail sending is successful in this turn. :)
>>
>> On Thu, Feb 10, 2011 at 2:33 PM, Minchan Kim  wrote:
>> > Sorry for late response.
>> >
>> > On Fri, Jan 28, 2011 at 8:18 PM, Balbir Singh  
>> > wrote:
>> >> * MinChan Kim  [2011-01-28 16:24:19]:
>> >>
>> >>> >
>> >>> > But the assumption for LRU order to change happens only if the page
>> >>> > cannot be successfully freed, which means it is in some way active..
>> >>> > and needs to be moved no?
>> >>>
>> >>> 1. holded page by someone
>> >>> 2. mapped pages
>> >>> 3. active pages
>> >>>
>> >>> 1 is rare so it isn't the problem.
>> >>> Of course, in case of 3, we have to activate it so no problem.
>> >>> The problem is 2.
>> >>>
>> >>
>> >> 2 is a problem, but due to the size aspects not a big one. Like you
>> >> said even lumpy reclaim affects it. May be the reclaim code could
>> >> honour may_unmap much earlier.
>> >
>> > Even if it is, it's a trade-off to get a big contiguous memory. I
>> > don't want to add new mess. (In addition, lumpy is weak by compaction
>> > as time goes by)
>> > What I have in mind for preventing LRU ignore is that put the page
>> > into original position instead of head of lru. Maybe it can help the
>> > situation both lumpy and your case. But it's another story.
>> >
>> > How about the idea?
>> >
>> > I borrow the idea from CFLRU[1]
>> > - PCFLRU(Page-Cache First LRU)
>> >
>> > When we allocates new page for page cache, we adds the page into LRU's 
>> > tail.
>> > When we map the page cache into page table, we rotate the page into LRU's 
>> > head.
>> >
>> > So, inactive list's result is following as.
>> >
>> > M.P : mapped page
>> > N.P : none-mapped page
>> >
>> > HEAD-M.P-M.P-M.P-M.P-N.P-N.P-N.P-N.P-N.P-TAIL
>> >
>> > Admin can set threshold window size which determines stop reclaiming
>> > none-mapped page contiguously.
>> >
>> > I think it needs some tweak of page cache/page mapping functions but
>> > we can use kswapd/direct reclaim without change.
>> >
>> > Also, it can change page reclaim policy totally but it's just what you
>> > want, I think.
>> >
>
> I am not sure how this would work, moreover the idea behind
> min_unmapped_pages is to keep sufficient unmapped pages around for the
> FS metadata and has been working with the existing code for zone
> reclaim. What you propose is more drastic re-org of the LRU and I am
> not sure I have the apetite for it.

Yes. My suggestion can change LRU order totally but it can't meet your
goal so it was a bad idea. Sorry for bothering you.

I can add reviewed-by [1/3],[2/3], but still doubt [3/3].
LRU ordering problem as I mentioned is not only your problem but it's
general problem these day(ex, more aggressive compaction/lumpy
reclaim). So we may need general solution if it is real problem.
Okay. I don't oppose your approach from now on until I can prove how
much LRU-reordering makes bad effect. (But still I  raise my eyebrow
on implementation [3/3] but I don't oppose it until I suggest better
approach)

Thanks.
-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-02-14 Thread Balbir Singh
* MinChan Kim  [2011-02-10 14:41:44]:

> I don't know why the part of message is deleted only when I send you.
> Maybe it's gmail bug.
> 
> I hope mail sending is successful in this turn. :)
> 
> On Thu, Feb 10, 2011 at 2:33 PM, Minchan Kim  wrote:
> > Sorry for late response.
> >
> > On Fri, Jan 28, 2011 at 8:18 PM, Balbir Singh  
> > wrote:
> >> * MinChan Kim  [2011-01-28 16:24:19]:
> >>
> >>> >
> >>> > But the assumption for LRU order to change happens only if the page
> >>> > cannot be successfully freed, which means it is in some way active..
> >>> > and needs to be moved no?
> >>>
> >>> 1. holded page by someone
> >>> 2. mapped pages
> >>> 3. active pages
> >>>
> >>> 1 is rare so it isn't the problem.
> >>> Of course, in case of 3, we have to activate it so no problem.
> >>> The problem is 2.
> >>>
> >>
> >> 2 is a problem, but due to the size aspects not a big one. Like you
> >> said even lumpy reclaim affects it. May be the reclaim code could
> >> honour may_unmap much earlier.
> >
> > Even if it is, it's a trade-off to get a big contiguous memory. I
> > don't want to add new mess. (In addition, lumpy is weak by compaction
> > as time goes by)
> > What I have in mind for preventing LRU ignore is that put the page
> > into original position instead of head of lru. Maybe it can help the
> > situation both lumpy and your case. But it's another story.
> >
> > How about the idea?
> >
> > I borrow the idea from CFLRU[1]
> > - PCFLRU(Page-Cache First LRU)
> >
> > When we allocates new page for page cache, we adds the page into LRU's tail.
> > When we map the page cache into page table, we rotate the page into LRU's 
> > head.
> >
> > So, inactive list's result is following as.
> >
> > M.P : mapped page
> > N.P : none-mapped page
> >
> > HEAD-M.P-M.P-M.P-M.P-N.P-N.P-N.P-N.P-N.P-TAIL
> >
> > Admin can set threshold window size which determines stop reclaiming
> > none-mapped page contiguously.
> >
> > I think it needs some tweak of page cache/page mapping functions but
> > we can use kswapd/direct reclaim without change.
> >
> > Also, it can change page reclaim policy totally but it's just what you
> > want, I think.
> >

I am not sure how this would work, moreover the idea behind
min_unmapped_pages is to keep sufficient unmapped pages around for the
FS metadata and has been working with the existing code for zone
reclaim. What you propose is more drastic re-org of the LRU and I am
not sure I have the apetite for it.

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-02-09 Thread Minchan Kim
I don't know why the part of message is deleted only when I send you.
Maybe it's gmail bug.

I hope mail sending is successful in this turn. :)

On Thu, Feb 10, 2011 at 2:33 PM, Minchan Kim  wrote:
> Sorry for late response.
>
> On Fri, Jan 28, 2011 at 8:18 PM, Balbir Singh  
> wrote:
>> * MinChan Kim  [2011-01-28 16:24:19]:
>>
>>> >
>>> > But the assumption for LRU order to change happens only if the page
>>> > cannot be successfully freed, which means it is in some way active..
>>> > and needs to be moved no?
>>>
>>> 1. holded page by someone
>>> 2. mapped pages
>>> 3. active pages
>>>
>>> 1 is rare so it isn't the problem.
>>> Of course, in case of 3, we have to activate it so no problem.
>>> The problem is 2.
>>>
>>
>> 2 is a problem, but due to the size aspects not a big one. Like you
>> said even lumpy reclaim affects it. May be the reclaim code could
>> honour may_unmap much earlier.
>
> Even if it is, it's a trade-off to get a big contiguous memory. I
> don't want to add new mess. (In addition, lumpy is weak by compaction
> as time goes by)
> What I have in mind for preventing LRU ignore is that put the page
> into original position instead of head of lru. Maybe it can help the
> situation both lumpy and your case. But it's another story.
>
> How about the idea?
>
> I borrow the idea from CFLRU[1]
> - PCFLRU(Page-Cache First LRU)
>
> When we allocates new page for page cache, we adds the page into LRU's tail.
> When we map the page cache into page table, we rotate the page into LRU's 
> head.
>
> So, inactive list's result is following as.
>
> M.P : mapped page
> N.P : none-mapped page
>
> HEAD-M.P-M.P-M.P-M.P-N.P-N.P-N.P-N.P-N.P-TAIL
>
> Admin can set threshold window size which determines stop reclaiming
> none-mapped page contiguously.
>
> I think it needs some tweak of page cache/page mapping functions but
> we can use kswapd/direct reclaim without change.
>
> Also, it can change page reclaim policy totally but it's just what you
> want, I think.
>
> [1] 
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.6188&rep=rep1&type=pdf
>
>>
>> --
>>        Three Cheers,
>>        Balbir
>>
>
>
>
> --
> Kind regards,
> Minchan Kim
>



-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-02-09 Thread Minchan Kim
Sorry for late response.

On Fri, Jan 28, 2011 at 8:18 PM, Balbir Singh  wrote:
> * MinChan Kim  [2011-01-28 16:24:19]:
>
>> >
>> > But the assumption for LRU order to change happens only if the page
>> > cannot be successfully freed, which means it is in some way active..
>> > and needs to be moved no?
>>
>> 1. holded page by someone
>> 2. mapped pages
>> 3. active pages
>>
>> 1 is rare so it isn't the problem.
>> Of course, in case of 3, we have to activate it so no problem.
>> The problem is 2.
>>
>
> 2 is a problem, but due to the size aspects not a big one. Like you
> said even lumpy reclaim affects it. May be the reclaim code could
> honour may_unmap much earlier.

Even if it is, it's a trade-off to get a big contiguous memory. I
don't want to add new mess. (In addition, lumpy is weak by compaction
as time goes by)
What I have in mind for preventing LRU ignore is that put the page
into original position instead of head of lru. Maybe it can help the
situation both lumpy and your case. But it's another story.

How about the idea?

I borrow the idea from CFLRU[1]
- PCFLRU(Page-Cache First LRU)

When we allocates new page for page cache, we adds the page into LRU's tail.
When we map the page cache into page table, we rotate the page into LRU's head.

So, inactive list's result is following as.

M.P : mapped page
N.P : none-mapped page

HEAD-M.P-M.P-M.P-M.P-N.P-N.P-N.P-N.P-N.P-TAIL

Admin can set threshold window size which determines stop reclaiming
none-mapped page contiguously.

I think it needs some tweak of page cache/page mapping functions but
we can use kswapd/direct reclaim without change.

Also, it can change page reclaim policy totally but it's just what you
want, I think.

[1] 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.6188&rep=rep1&type=pdf

>
> --
>        Three Cheers,
>        Balbir
>



-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-31 Thread Balbir Singh
* KAMEZAWA Hiroyuki  [2011-01-31 08:58:53]:

> On Fri, 28 Jan 2011 09:20:02 -0600 (CST)
> Christoph Lameter  wrote:
> 
> > On Fri, 28 Jan 2011, KAMEZAWA Hiroyuki wrote:
> > 
> > > > > I see it as a tradeoff of when to check? add_to_page_cache or when we
> > > > > are want more free memory (due to allocation). It is OK to wakeup
> > > > > kswapd while allocating memory, somehow for this purpose (global page
> > > > > cache), add_to_page_cache or add_to_page_cache_locked does not seem
> > > > > the right place to hook into. I'd be open to comments/suggestions
> > > > > though from others as well.
> > >
> > > I don't like add hook here.
> > > AND I don't want to run kswapd because 'kswapd' has been a sign as
> > > there are memory shortage. (reusing code is ok.)
> > >
> > > How about adding new daemon ? Recently, khugepaged, ksmd works for
> > > managing memory. Adding one more daemon for special purpose is not
> > > very bad, I think. Then, you can do
> > >  - wake up without hook
> > >  - throttle its work.
> > >  - balance the whole system rather than zone.
> > >I think per-node balance is enough...
> > 
> > 
> > I think we already have enough kernel daemons floating around. They are
> > multiplying in an amazing way. What would be useful is to map all
> > the memory management background stuff into a process. May call this memd
> > instead? Perhaps we can fold khugepaged into kswapd as well etc.
> > 
> 
> Making kswapd slow for whis "additional", "requested by user, not by system"
> work is good thing ? I think workqueue works enough well, it's scale based on
> workloads, if using thread is bad.
>

Making it slow is a generic statement, kswapd
is supposed to do background reclaim, in this case a special request
for unmapped pages, specifically and deliberately requested by the
admin via a boot option.
 
-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-30 Thread KAMEZAWA Hiroyuki
On Fri, 28 Jan 2011 09:20:02 -0600 (CST)
Christoph Lameter  wrote:

> On Fri, 28 Jan 2011, KAMEZAWA Hiroyuki wrote:
> 
> > > > I see it as a tradeoff of when to check? add_to_page_cache or when we
> > > > are want more free memory (due to allocation). It is OK to wakeup
> > > > kswapd while allocating memory, somehow for this purpose (global page
> > > > cache), add_to_page_cache or add_to_page_cache_locked does not seem
> > > > the right place to hook into. I'd be open to comments/suggestions
> > > > though from others as well.
> >
> > I don't like add hook here.
> > AND I don't want to run kswapd because 'kswapd' has been a sign as
> > there are memory shortage. (reusing code is ok.)
> >
> > How about adding new daemon ? Recently, khugepaged, ksmd works for
> > managing memory. Adding one more daemon for special purpose is not
> > very bad, I think. Then, you can do
> >  - wake up without hook
> >  - throttle its work.
> >  - balance the whole system rather than zone.
> >I think per-node balance is enough...
> 
> 
> I think we already have enough kernel daemons floating around. They are
> multiplying in an amazing way. What would be useful is to map all
> the memory management background stuff into a process. May call this memd
> instead? Perhaps we can fold khugepaged into kswapd as well etc.
> 

Making kswapd slow for whis "additional", "requested by user, not by system"
work is good thing ? I think workqueue works enough well, it's scale based on
workloads, if using thread is bad.

Thanks,
-Kame




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-28 Thread Christoph Lameter
On Fri, 28 Jan 2011, KAMEZAWA Hiroyuki wrote:

> > > I see it as a tradeoff of when to check? add_to_page_cache or when we
> > > are want more free memory (due to allocation). It is OK to wakeup
> > > kswapd while allocating memory, somehow for this purpose (global page
> > > cache), add_to_page_cache or add_to_page_cache_locked does not seem
> > > the right place to hook into. I'd be open to comments/suggestions
> > > though from others as well.
>
> I don't like add hook here.
> AND I don't want to run kswapd because 'kswapd' has been a sign as
> there are memory shortage. (reusing code is ok.)
>
> How about adding new daemon ? Recently, khugepaged, ksmd works for
> managing memory. Adding one more daemon for special purpose is not
> very bad, I think. Then, you can do
>  - wake up without hook
>  - throttle its work.
>  - balance the whole system rather than zone.
>I think per-node balance is enough...


I think we already have enough kernel daemons floating around. They are
multiplying in an amazing way. What would be useful is to map all
the memory management background stuff into a process. May call this memd
instead? Perhaps we can fold khugepaged into kswapd as well etc.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-28 Thread Balbir Singh
* KAMEZAWA Hiroyuki  [2011-01-28 17:17:44]:

> On Fri, 28 Jan 2011 13:49:28 +0530
> Balbir Singh  wrote:
> 
> > * KAMEZAWA Hiroyuki  [2011-01-28 16:56:05]:
> 
> > > BTW, it seems this doesn't work when some apps use huge shmem.
> > > How to handle the issue ?
> > >
> > 
> > Could you elaborate further? 
> > 
> ==
> static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
> {
> unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
> unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
> zone_page_state(zone, NR_ACTIVE_FILE);
> 
> /*
>  * It's possible for there to be more file mapped pages than
>  * accounted for by the pages on the file LRU lists because
>  * tmpfs pages accounted for as ANON can also be FILE_MAPPED
>  */
> return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
> }

Yes, I did :) The word huge confused me. I am not sure if there is an
easy accounting fix for this one, though given the approximate nature
of the control, I am not sure it would matter very much. But you do
have a very good point.

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-28 Thread Balbir Singh
* MinChan Kim  [2011-01-28 16:24:19]:

> >
> > But the assumption for LRU order to change happens only if the page
> > cannot be successfully freed, which means it is in some way active..
> > and needs to be moved no?
> 
> 1. holded page by someone
> 2. mapped pages
> 3. active pages
> 
> 1 is rare so it isn't the problem.
> Of course, in case of 3, we have to activate it so no problem.
> The problem is 2.
>

2 is a problem, but due to the size aspects not a big one. Like you
said even lumpy reclaim affects it. May be the reclaim code could
honour may_unmap much earlier. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-28 Thread KAMEZAWA Hiroyuki
On Fri, 28 Jan 2011 13:49:28 +0530
Balbir Singh  wrote:

> * KAMEZAWA Hiroyuki  [2011-01-28 16:56:05]:
 
> > BTW, it seems this doesn't work when some apps use huge shmem.
> > How to handle the issue ?
> >
> 
> Could you elaborate further? 
> 
==
static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
{
unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED);
unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) +
zone_page_state(zone, NR_ACTIVE_FILE);

/*
 * It's possible for there to be more file mapped pages than
 * accounted for by the pages on the file LRU lists because
 * tmpfs pages accounted for as ANON can also be FILE_MAPPED
 */
return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0;
}
==

Did you read ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-28 Thread Balbir Singh
* KAMEZAWA Hiroyuki  [2011-01-28 16:56:05]:

> On Fri, 28 Jan 2011 16:24:19 +0900
> Minchan Kim  wrote:
> 
> > On Fri, Jan 28, 2011 at 3:48 PM, Balbir Singh  
> > wrote:
> > > * MinChan Kim  [2011-01-28 14:44:50]:
> > >
> > >> On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh
> > >>  wrote:
> > >> > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim  
> > >> > wrote:
> > >> > [snip]
> > >> >
> > >> >>> index 7b56473..2ac8549 100644
> > >> >>> --- a/mm/page_alloc.c
> > >> >>> +++ b/mm/page_alloc.c
> > >> >>> @@ -1660,6 +1660,9 @@ zonelist_scan:
> > >> >>>                        unsigned long mark;
> > >> >>>                        int ret;
> > >> >>>
> > >> >>> +                       if (should_reclaim_unmapped_pages(zone))
> > >> >>> +                               wakeup_kswapd(zone, order, 
> > >> >>> classzone_idx);
> > >> >>> +
> > >> >>
> > >> >> Do we really need the check in fastpath?
> > >> >> There are lost of caller of alloc_pages.
> > >> >> Many of them are not related to mapped pages.
> > >> >> Could we move the check into add_to_page_cache_locked?
> > >> >
> > >> > The check is a simple check to see if the unmapped pages need
> > >> > balancing, the reason I placed this check here is to allow other
> > >> > allocations to benefit as well, if there are some unmapped pages to be
> > >> > freed. add_to_page_cache_locked (check under a critical section) is
> > >> > even worse, IMHO.
> > >>
> > >> It just moves the overhead from general into specific case(ie,
> > >> allocates page for just page cache).
> > >> Another cases(ie, allocates pages for other purpose except page cache,
> > >> ex device drivers or fs allocation for internal using) aren't
> > >> affected.
> > >> So, It would be better.
> > >>
> > >> The goal in this patch is to remove only page cache page, isn't it?
> > >> So I think we could the balance check in add_to_page_cache and trigger 
> > >> reclaim.
> > >> If we do so, what's the problem?
> > >>
> > >
> > > I see it as a tradeoff of when to check? add_to_page_cache or when we
> > > are want more free memory (due to allocation). It is OK to wakeup
> > > kswapd while allocating memory, somehow for this purpose (global page
> > > cache), add_to_page_cache or add_to_page_cache_locked does not seem
> > > the right place to hook into. I'd be open to comments/suggestions
> > > though from others as well.
> 
> I don't like add hook here.
> AND I don't want to run kswapd because 'kswapd' has been a sign as
> there are memory shortage. (reusing code is ok.)
> 
> How about adding new daemon ? Recently, khugepaged, ksmd works for
> managing memory. Adding one more daemon for special purpose is not
> very bad, I think. Then, you can do
>  - wake up without hook
>  - throttle its work.
>  - balance the whole system rather than zone.
>I think per-node balance is enough...
> 
> 
> 
> 
>

Honestly, I did look at that option, but balancing via kswapd seemed
like the best option. Creating a new thread/daemon did not make sense
because

1. The control is very lose
2. kswapd can deal with it while balancing other things, in fact
imagine kswapd waking up to free memory, but there being other free
memory easily available. Parallel reclaim, zone lock contention
addition does not help, IMHO.
3. kswapd does not indicate memory shortage per-se, please see
min_free_kbytes_sysctl_handler, kswapd is to balance the nodes/zone.
If you tune min_free_kbytes and kswapd runs, it does not mean memory
shortage on the system

> 
> 
> 
> > >
> > >> >
> > >> >
> > >> >>
> > >> >>>                        mark = zone->watermark[alloc_flags & 
> > >> >>> ALLOC_WMARK_MASK];
> > >> >>>                        if (zone_watermark_ok(zone, order, mark,
> > >> >>>                                    classzone_idx, alloc_flags))
> > >> >>> @@ -4167,8 +4170,12 @@ static void __paginginit 
> > >> >>> free_area_init_core(struct pglist_data *pgdat,
> > >> >>>
> > >> >>>                zone->spanned_pages = size;
> > >> >>>                zone->present_pages = realsize;
> > >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
> > >> >>>                zone->min_unmapped_pages = 
> > >> >>> (realsize*sysctl_min_unmapped_ratio)
> > >> >>>                                                / 100;
> > >> >>> +               zone->max_unmapped_pages = 
> > >> >>> (realsize*sysctl_max_unmapped_ratio)
> > >> >>> +                                               / 100;
> > >> >>> +#endif
> > >> >>>  #ifdef CONFIG_NUMA
> > >> >>>                zone->node = nid;
> > >> >>>                zone->min_slab_pages = (realsize * 
> > >> >>> sysctl_min_slab_ratio) / 100;
> > >> >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table 
> > >> >>> *table, int write,
> > >> >>>        return 0;
> > >> >>>  }
> > >> >>>
> > >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
> > >> >>>  int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int 
> > >> >>> write,
> > >> >>>        void __user *buffer, si

Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-28 Thread KAMEZAWA Hiroyuki
On Fri, 28 Jan 2011 16:24:19 +0900
Minchan Kim  wrote:

> On Fri, Jan 28, 2011 at 3:48 PM, Balbir Singh  
> wrote:
> > * MinChan Kim  [2011-01-28 14:44:50]:
> >
> >> On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh
> >>  wrote:
> >> > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim  
> >> > wrote:
> >> > [snip]
> >> >
> >> >>> index 7b56473..2ac8549 100644
> >> >>> --- a/mm/page_alloc.c
> >> >>> +++ b/mm/page_alloc.c
> >> >>> @@ -1660,6 +1660,9 @@ zonelist_scan:
> >> >>>                        unsigned long mark;
> >> >>>                        int ret;
> >> >>>
> >> >>> +                       if (should_reclaim_unmapped_pages(zone))
> >> >>> +                               wakeup_kswapd(zone, order, 
> >> >>> classzone_idx);
> >> >>> +
> >> >>
> >> >> Do we really need the check in fastpath?
> >> >> There are lost of caller of alloc_pages.
> >> >> Many of them are not related to mapped pages.
> >> >> Could we move the check into add_to_page_cache_locked?
> >> >
> >> > The check is a simple check to see if the unmapped pages need
> >> > balancing, the reason I placed this check here is to allow other
> >> > allocations to benefit as well, if there are some unmapped pages to be
> >> > freed. add_to_page_cache_locked (check under a critical section) is
> >> > even worse, IMHO.
> >>
> >> It just moves the overhead from general into specific case(ie,
> >> allocates page for just page cache).
> >> Another cases(ie, allocates pages for other purpose except page cache,
> >> ex device drivers or fs allocation for internal using) aren't
> >> affected.
> >> So, It would be better.
> >>
> >> The goal in this patch is to remove only page cache page, isn't it?
> >> So I think we could the balance check in add_to_page_cache and trigger 
> >> reclaim.
> >> If we do so, what's the problem?
> >>
> >
> > I see it as a tradeoff of when to check? add_to_page_cache or when we
> > are want more free memory (due to allocation). It is OK to wakeup
> > kswapd while allocating memory, somehow for this purpose (global page
> > cache), add_to_page_cache or add_to_page_cache_locked does not seem
> > the right place to hook into. I'd be open to comments/suggestions
> > though from others as well.

I don't like add hook here.
AND I don't want to run kswapd because 'kswapd' has been a sign as
there are memory shortage. (reusing code is ok.)

How about adding new daemon ? Recently, khugepaged, ksmd works for
managing memory. Adding one more daemon for special purpose is not
very bad, I think. Then, you can do
 - wake up without hook
 - throttle its work.
 - balance the whole system rather than zone.
   I think per-node balance is enough...








> >
> >> >
> >> >
> >> >>
> >> >>>                        mark = zone->watermark[alloc_flags & 
> >> >>> ALLOC_WMARK_MASK];
> >> >>>                        if (zone_watermark_ok(zone, order, mark,
> >> >>>                                    classzone_idx, alloc_flags))
> >> >>> @@ -4167,8 +4170,12 @@ static void __paginginit 
> >> >>> free_area_init_core(struct pglist_data *pgdat,
> >> >>>
> >> >>>                zone->spanned_pages = size;
> >> >>>                zone->present_pages = realsize;
> >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
> >> >>>                zone->min_unmapped_pages = 
> >> >>> (realsize*sysctl_min_unmapped_ratio)
> >> >>>                                                / 100;
> >> >>> +               zone->max_unmapped_pages = 
> >> >>> (realsize*sysctl_max_unmapped_ratio)
> >> >>> +                                               / 100;
> >> >>> +#endif
> >> >>>  #ifdef CONFIG_NUMA
> >> >>>                zone->node = nid;
> >> >>>                zone->min_slab_pages = (realsize * 
> >> >>> sysctl_min_slab_ratio) / 100;
> >> >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table 
> >> >>> *table, int write,
> >> >>>        return 0;
> >> >>>  }
> >> >>>
> >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
> >> >>>  int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int 
> >> >>> write,
> >> >>>        void __user *buffer, size_t *length, loff_t *ppos)
> >> >>>  {
> >> >>> @@ -5100,6 +5108,23 @@ int 
> >> >>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
> >> >>>        return 0;
> >> >>>  }
> >> >>>
> >> >>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int 
> >> >>> write,
> >> >>> +       void __user *buffer, size_t *length, loff_t *ppos)
> >> >>> +{
> >> >>> +       struct zone *zone;
> >> >>> +       int rc;
> >> >>> +
> >> >>> +       rc = proc_dointvec_minmax(table, write, buffer, length, ppos);
> >> >>> +       if (rc)
> >> >>> +               return rc;
> >> >>> +
> >> >>> +       for_each_zone(zone)
> >> >>> +               zone->max_unmapped_pages = (zone->present_pages *
> >> >>> +                               sysctl_max_unmapped_ratio) / 100;
> >> >>> +       return 0;
> >> >>> +}
> >> >>> +#endif
> >> >>> +
> >> >>>  #ifdef CONFIG_NUMA
> >> >>

Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-27 Thread Minchan Kim
On Fri, Jan 28, 2011 at 3:48 PM, Balbir Singh  wrote:
> * MinChan Kim  [2011-01-28 14:44:50]:
>
>> On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh
>>  wrote:
>> > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim  wrote:
>> > [snip]
>> >
>> >>> index 7b56473..2ac8549 100644
>> >>> --- a/mm/page_alloc.c
>> >>> +++ b/mm/page_alloc.c
>> >>> @@ -1660,6 +1660,9 @@ zonelist_scan:
>> >>>                        unsigned long mark;
>> >>>                        int ret;
>> >>>
>> >>> +                       if (should_reclaim_unmapped_pages(zone))
>> >>> +                               wakeup_kswapd(zone, order, 
>> >>> classzone_idx);
>> >>> +
>> >>
>> >> Do we really need the check in fastpath?
>> >> There are lost of caller of alloc_pages.
>> >> Many of them are not related to mapped pages.
>> >> Could we move the check into add_to_page_cache_locked?
>> >
>> > The check is a simple check to see if the unmapped pages need
>> > balancing, the reason I placed this check here is to allow other
>> > allocations to benefit as well, if there are some unmapped pages to be
>> > freed. add_to_page_cache_locked (check under a critical section) is
>> > even worse, IMHO.
>>
>> It just moves the overhead from general into specific case(ie,
>> allocates page for just page cache).
>> Another cases(ie, allocates pages for other purpose except page cache,
>> ex device drivers or fs allocation for internal using) aren't
>> affected.
>> So, It would be better.
>>
>> The goal in this patch is to remove only page cache page, isn't it?
>> So I think we could the balance check in add_to_page_cache and trigger 
>> reclaim.
>> If we do so, what's the problem?
>>
>
> I see it as a tradeoff of when to check? add_to_page_cache or when we
> are want more free memory (due to allocation). It is OK to wakeup
> kswapd while allocating memory, somehow for this purpose (global page
> cache), add_to_page_cache or add_to_page_cache_locked does not seem
> the right place to hook into. I'd be open to comments/suggestions
> though from others as well.
>
>> >
>> >
>> >>
>> >>>                        mark = zone->watermark[alloc_flags & 
>> >>> ALLOC_WMARK_MASK];
>> >>>                        if (zone_watermark_ok(zone, order, mark,
>> >>>                                    classzone_idx, alloc_flags))
>> >>> @@ -4167,8 +4170,12 @@ static void __paginginit 
>> >>> free_area_init_core(struct pglist_data *pgdat,
>> >>>
>> >>>                zone->spanned_pages = size;
>> >>>                zone->present_pages = realsize;
>> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
>> >>>                zone->min_unmapped_pages = 
>> >>> (realsize*sysctl_min_unmapped_ratio)
>> >>>                                                / 100;
>> >>> +               zone->max_unmapped_pages = 
>> >>> (realsize*sysctl_max_unmapped_ratio)
>> >>> +                                               / 100;
>> >>> +#endif
>> >>>  #ifdef CONFIG_NUMA
>> >>>                zone->node = nid;
>> >>>                zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) 
>> >>> / 100;
>> >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table 
>> >>> *table, int write,
>> >>>        return 0;
>> >>>  }
>> >>>
>> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
>> >>>  int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int 
>> >>> write,
>> >>>        void __user *buffer, size_t *length, loff_t *ppos)
>> >>>  {
>> >>> @@ -5100,6 +5108,23 @@ int 
>> >>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
>> >>>        return 0;
>> >>>  }
>> >>>
>> >>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int 
>> >>> write,
>> >>> +       void __user *buffer, size_t *length, loff_t *ppos)
>> >>> +{
>> >>> +       struct zone *zone;
>> >>> +       int rc;
>> >>> +
>> >>> +       rc = proc_dointvec_minmax(table, write, buffer, length, ppos);
>> >>> +       if (rc)
>> >>> +               return rc;
>> >>> +
>> >>> +       for_each_zone(zone)
>> >>> +               zone->max_unmapped_pages = (zone->present_pages *
>> >>> +                               sysctl_max_unmapped_ratio) / 100;
>> >>> +       return 0;
>> >>> +}
>> >>> +#endif
>> >>> +
>> >>>  #ifdef CONFIG_NUMA
>> >>>  int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
>> >>>        void __user *buffer, size_t *length, loff_t *ppos)
>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> >>> index 02cc82e..6377411 100644
>> >>> --- a/mm/vmscan.c
>> >>> +++ b/mm/vmscan.c
>> >>> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem);
>> >>>  #define scanning_global_lru(sc)        (1)
>> >>>  #endif
>> >>>
>> >>> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
>> >>> +static unsigned long reclaim_unmapped_pages(int priority, struct zone 
>> >>> *zone,
>> >>> +                                               struct scan_control *sc);
>> >>> +static int unmapped_page_control __read_mostly;
>> >>> +
>> >>> +static int __init unmapped_page_

Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-27 Thread Balbir Singh
* MinChan Kim  [2011-01-28 14:44:50]:

> On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh
>  wrote:
> > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim  wrote:
> > [snip]
> >
> >>> index 7b56473..2ac8549 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -1660,6 +1660,9 @@ zonelist_scan:
> >>>                        unsigned long mark;
> >>>                        int ret;
> >>>
> >>> +                       if (should_reclaim_unmapped_pages(zone))
> >>> +                               wakeup_kswapd(zone, order, classzone_idx);
> >>> +
> >>
> >> Do we really need the check in fastpath?
> >> There are lost of caller of alloc_pages.
> >> Many of them are not related to mapped pages.
> >> Could we move the check into add_to_page_cache_locked?
> >
> > The check is a simple check to see if the unmapped pages need
> > balancing, the reason I placed this check here is to allow other
> > allocations to benefit as well, if there are some unmapped pages to be
> > freed. add_to_page_cache_locked (check under a critical section) is
> > even worse, IMHO.
> 
> It just moves the overhead from general into specific case(ie,
> allocates page for just page cache).
> Another cases(ie, allocates pages for other purpose except page cache,
> ex device drivers or fs allocation for internal using) aren't
> affected.
> So, It would be better.
> 
> The goal in this patch is to remove only page cache page, isn't it?
> So I think we could the balance check in add_to_page_cache and trigger 
> reclaim.
> If we do so, what's the problem?
>

I see it as a tradeoff of when to check? add_to_page_cache or when we
are want more free memory (due to allocation). It is OK to wakeup
kswapd while allocating memory, somehow for this purpose (global page
cache), add_to_page_cache or add_to_page_cache_locked does not seem
the right place to hook into. I'd be open to comments/suggestions
though from others as well.
 
> >
> >
> >>
> >>>                        mark = zone->watermark[alloc_flags & 
> >>> ALLOC_WMARK_MASK];
> >>>                        if (zone_watermark_ok(zone, order, mark,
> >>>                                    classzone_idx, alloc_flags))
> >>> @@ -4167,8 +4170,12 @@ static void __paginginit 
> >>> free_area_init_core(struct pglist_data *pgdat,
> >>>
> >>>                zone->spanned_pages = size;
> >>>                zone->present_pages = realsize;
> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
> >>>                zone->min_unmapped_pages = 
> >>> (realsize*sysctl_min_unmapped_ratio)
> >>>                                                / 100;
> >>> +               zone->max_unmapped_pages = 
> >>> (realsize*sysctl_max_unmapped_ratio)
> >>> +                                               / 100;
> >>> +#endif
> >>>  #ifdef CONFIG_NUMA
> >>>                zone->node = nid;
> >>>                zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) 
> >>> / 100;
> >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table 
> >>> *table, int write,
> >>>        return 0;
> >>>  }
> >>>
> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
> >>>  int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
> >>>        void __user *buffer, size_t *length, loff_t *ppos)
> >>>  {
> >>> @@ -5100,6 +5108,23 @@ int 
> >>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
> >>>        return 0;
> >>>  }
> >>>
> >>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
> >>> +       void __user *buffer, size_t *length, loff_t *ppos)
> >>> +{
> >>> +       struct zone *zone;
> >>> +       int rc;
> >>> +
> >>> +       rc = proc_dointvec_minmax(table, write, buffer, length, ppos);
> >>> +       if (rc)
> >>> +               return rc;
> >>> +
> >>> +       for_each_zone(zone)
> >>> +               zone->max_unmapped_pages = (zone->present_pages *
> >>> +                               sysctl_max_unmapped_ratio) / 100;
> >>> +       return 0;
> >>> +}
> >>> +#endif
> >>> +
> >>>  #ifdef CONFIG_NUMA
> >>>  int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
> >>>        void __user *buffer, size_t *length, loff_t *ppos)
> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>> index 02cc82e..6377411 100644
> >>> --- a/mm/vmscan.c
> >>> +++ b/mm/vmscan.c
> >>> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem);
> >>>  #define scanning_global_lru(sc)        (1)
> >>>  #endif
> >>>
> >>> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
> >>> +static unsigned long reclaim_unmapped_pages(int priority, struct zone 
> >>> *zone,
> >>> +                                               struct scan_control *sc);
> >>> +static int unmapped_page_control __read_mostly;
> >>> +
> >>> +static int __init unmapped_page_control_parm(char *str)
> >>> +{
> >>> +       unmapped_page_control = 1;
> >>> +       /*
> >>> +        * XXX: Should we tweak swappiness here?
> >>> +        */
> >>> +       return 1;
> >>> +}
> >>> +

Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-27 Thread Balbir Singh
* Christoph Lameter  [2011-01-26 10:57:37]:

> 
> Reviewed-by: Christoph Lameter 
>

Thanks for the review! 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-27 Thread Minchan Kim
On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh
 wrote:
> On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim  wrote:
> [snip]
>
>>> index 7b56473..2ac8549 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1660,6 +1660,9 @@ zonelist_scan:
>>>                        unsigned long mark;
>>>                        int ret;
>>>
>>> +                       if (should_reclaim_unmapped_pages(zone))
>>> +                               wakeup_kswapd(zone, order, classzone_idx);
>>> +
>>
>> Do we really need the check in fastpath?
>> There are lost of caller of alloc_pages.
>> Many of them are not related to mapped pages.
>> Could we move the check into add_to_page_cache_locked?
>
> The check is a simple check to see if the unmapped pages need
> balancing, the reason I placed this check here is to allow other
> allocations to benefit as well, if there are some unmapped pages to be
> freed. add_to_page_cache_locked (check under a critical section) is
> even worse, IMHO.

It just moves the overhead from general into specific case(ie,
allocates page for just page cache).
Another cases(ie, allocates pages for other purpose except page cache,
ex device drivers or fs allocation for internal using) aren't
affected.
So, It would be better.

The goal in this patch is to remove only page cache page, isn't it?
So I think we could the balance check in add_to_page_cache and trigger reclaim.
If we do so, what's the problem?

>
>
>>
>>>                        mark = zone->watermark[alloc_flags & 
>>> ALLOC_WMARK_MASK];
>>>                        if (zone_watermark_ok(zone, order, mark,
>>>                                    classzone_idx, alloc_flags))
>>> @@ -4167,8 +4170,12 @@ static void __paginginit free_area_init_core(struct 
>>> pglist_data *pgdat,
>>>
>>>                zone->spanned_pages = size;
>>>                zone->present_pages = realsize;
>>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
>>>                zone->min_unmapped_pages = 
>>> (realsize*sysctl_min_unmapped_ratio)
>>>                                                / 100;
>>> +               zone->max_unmapped_pages = 
>>> (realsize*sysctl_max_unmapped_ratio)
>>> +                                               / 100;
>>> +#endif
>>>  #ifdef CONFIG_NUMA
>>>                zone->node = nid;
>>>                zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 
>>> 100;
>>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, 
>>> int write,
>>>        return 0;
>>>  }
>>>
>>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
>>>  int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
>>>        void __user *buffer, size_t *length, loff_t *ppos)
>>>  {
>>> @@ -5100,6 +5108,23 @@ int 
>>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
>>>        return 0;
>>>  }
>>>
>>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
>>> +       void __user *buffer, size_t *length, loff_t *ppos)
>>> +{
>>> +       struct zone *zone;
>>> +       int rc;
>>> +
>>> +       rc = proc_dointvec_minmax(table, write, buffer, length, ppos);
>>> +       if (rc)
>>> +               return rc;
>>> +
>>> +       for_each_zone(zone)
>>> +               zone->max_unmapped_pages = (zone->present_pages *
>>> +                               sysctl_max_unmapped_ratio) / 100;
>>> +       return 0;
>>> +}
>>> +#endif
>>> +
>>>  #ifdef CONFIG_NUMA
>>>  int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
>>>        void __user *buffer, size_t *length, loff_t *ppos)
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 02cc82e..6377411 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem);
>>>  #define scanning_global_lru(sc)        (1)
>>>  #endif
>>>
>>> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
>>> +static unsigned long reclaim_unmapped_pages(int priority, struct zone 
>>> *zone,
>>> +                                               struct scan_control *sc);
>>> +static int unmapped_page_control __read_mostly;
>>> +
>>> +static int __init unmapped_page_control_parm(char *str)
>>> +{
>>> +       unmapped_page_control = 1;
>>> +       /*
>>> +        * XXX: Should we tweak swappiness here?
>>> +        */
>>> +       return 1;
>>> +}
>>> +__setup("unmapped_page_control", unmapped_page_control_parm);
>>> +
>>> +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
>>> +static inline unsigned long reclaim_unmapped_pages(int priority,
>>> +                               struct zone *zone, struct scan_control *sc)
>>> +{
>>> +       return 0;
>>> +}
>>> +#endif
>>> +
>>>  static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
>>>                                                  struct scan_control *sc)
>>>  {
>>> @@ -2359,6 +2382,12 @@ loop_again:
>>>                                shrink_active_list(SWAP_CLUSTER_MAX, zone,
>>>                                               

Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-27 Thread Balbir Singh
On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim  wrote:
[snip]

>> index 7b56473..2ac8549 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1660,6 +1660,9 @@ zonelist_scan:
>>                        unsigned long mark;
>>                        int ret;
>>
>> +                       if (should_reclaim_unmapped_pages(zone))
>> +                               wakeup_kswapd(zone, order, classzone_idx);
>> +
>
> Do we really need the check in fastpath?
> There are lost of caller of alloc_pages.
> Many of them are not related to mapped pages.
> Could we move the check into add_to_page_cache_locked?

The check is a simple check to see if the unmapped pages need
balancing, the reason I placed this check here is to allow other
allocations to benefit as well, if there are some unmapped pages to be
freed. add_to_page_cache_locked (check under a critical section) is
even worse, IMHO.


>
>>                        mark = zone->watermark[alloc_flags & 
>> ALLOC_WMARK_MASK];
>>                        if (zone_watermark_ok(zone, order, mark,
>>                                    classzone_idx, alloc_flags))
>> @@ -4167,8 +4170,12 @@ static void __paginginit free_area_init_core(struct 
>> pglist_data *pgdat,
>>
>>                zone->spanned_pages = size;
>>                zone->present_pages = realsize;
>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
>>                zone->min_unmapped_pages = 
>> (realsize*sysctl_min_unmapped_ratio)
>>                                                / 100;
>> +               zone->max_unmapped_pages = 
>> (realsize*sysctl_max_unmapped_ratio)
>> +                                               / 100;
>> +#endif
>>  #ifdef CONFIG_NUMA
>>                zone->node = nid;
>>                zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 
>> 100;
>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, 
>> int write,
>>        return 0;
>>  }
>>
>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
>>  int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
>>        void __user *buffer, size_t *length, loff_t *ppos)
>>  {
>> @@ -5100,6 +5108,23 @@ int 
>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
>>        return 0;
>>  }
>>
>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
>> +       void __user *buffer, size_t *length, loff_t *ppos)
>> +{
>> +       struct zone *zone;
>> +       int rc;
>> +
>> +       rc = proc_dointvec_minmax(table, write, buffer, length, ppos);
>> +       if (rc)
>> +               return rc;
>> +
>> +       for_each_zone(zone)
>> +               zone->max_unmapped_pages = (zone->present_pages *
>> +                               sysctl_max_unmapped_ratio) / 100;
>> +       return 0;
>> +}
>> +#endif
>> +
>>  #ifdef CONFIG_NUMA
>>  int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
>>        void __user *buffer, size_t *length, loff_t *ppos)
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 02cc82e..6377411 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem);
>>  #define scanning_global_lru(sc)        (1)
>>  #endif
>>
>> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
>> +static unsigned long reclaim_unmapped_pages(int priority, struct zone *zone,
>> +                                               struct scan_control *sc);
>> +static int unmapped_page_control __read_mostly;
>> +
>> +static int __init unmapped_page_control_parm(char *str)
>> +{
>> +       unmapped_page_control = 1;
>> +       /*
>> +        * XXX: Should we tweak swappiness here?
>> +        */
>> +       return 1;
>> +}
>> +__setup("unmapped_page_control", unmapped_page_control_parm);
>> +
>> +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */
>> +static inline unsigned long reclaim_unmapped_pages(int priority,
>> +                               struct zone *zone, struct scan_control *sc)
>> +{
>> +       return 0;
>> +}
>> +#endif
>> +
>>  static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
>>                                                  struct scan_control *sc)
>>  {
>> @@ -2359,6 +2382,12 @@ loop_again:
>>                                shrink_active_list(SWAP_CLUSTER_MAX, zone,
>>                                                        &sc, priority, 0);
>>
>> +                       /*
>> +                        * We do unmapped page reclaim once here and once
>> +                        * below, so that we don't lose out
>> +                        */
>> +                       reclaim_unmapped_pages(priority, zone, &sc);
>> +
>>                        if (!zone_watermark_ok_safe(zone, order,
>>                                        high_wmark_pages(zone), 0, 0)) {
>>                                end_zone = i;
>> @@ -2396,6 +2425,11 @@ loop_again:
>>                                continue;
>>
>>                        sc.nr_scanned = 0;
>> +    

Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-26 Thread Minchan Kim
Hi Balbir,

On Tue, Jan 25, 2011 at 2:10 PM, Balbir Singh  wrote:
> Changelog v4
> 1. Add max_unmapped_ratio and use that as the upper limit
> to check when to shrink the unmapped page cache (Christoph
> Lameter)
>
> Changelog v2
> 1. Use a config option to enable the code (Andrew Morton)
> 2. Explain the magic tunables in the code or at-least attempt
>   to explain them (General comment)
> 3. Hint uses of the boot parameter with unlikely (Andrew Morton)
> 4. Use better names (balanced is not a good naming convention)
>
> Provide control using zone_reclaim() and a boot parameter. The
> code reuses functionality from zone_reclaim() to isolate unmapped
> pages and reclaim them as a priority, ahead of other mapped pages.
> A new sysctl for max_unmapped_ratio is provided and set to 16,
> indicating 16% of the total zone pages are unmapped, we start
> shrinking unmapped page cache.
>
> Signed-off-by: Balbir Singh 
> ---
>  Documentation/kernel-parameters.txt |    8 +++
>  include/linux/mmzone.h              |    5 ++
>  include/linux/swap.h                |   23 -
>  init/Kconfig                        |   12 +
>  kernel/sysctl.c                     |   11 
>  mm/page_alloc.c                     |   25 ++
>  mm/vmscan.c                         |   87 
> +++
>  7 files changed, 166 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index fee5f57..65a4ee6 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2500,6 +2500,14 @@ and is between 256 and 4096 characters. It is defined 
> in the file
>                        [X86]
>                        Set unknown_nmi_panic=1 early on boot.
>
> +       unmapped_page_control
> +                       [KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
> +                       is enabled. It controls the amount of unmapped memory
> +                       that is present in the system. This boot option plus
> +                       vm.min_unmapped_ratio (sysctl) provide granular 
> control
> +                       over how much unmapped page cache can exist in the 
> system
> +                       before kswapd starts reclaiming unmapped page cache 
> pages.
> +
>        usbcore.autosuspend=
>                        [USB] The autosuspend time delay (in seconds) used
>                        for newly-detected USB devices (default 2).  This
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2485acc..18f0f09 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -306,7 +306,10 @@ struct zone {
>        /*
>         * zone reclaim becomes active if more unmapped pages exist.
>         */
> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
>        unsigned long           min_unmapped_pages;
> +       unsigned long           max_unmapped_pages;
> +#endif
>  #ifdef CONFIG_NUMA
>        int node;
>        unsigned long           min_slab_pages;
> @@ -773,6 +776,8 @@ int percpu_pagelist_fraction_sysctl_handler(struct 
> ctl_table *, int,
>                                        void __user *, size_t *, loff_t *);
>  int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
>                        void __user *, size_t *, loff_t *);
> +int sysctl_max_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
> +                       void __user *, size_t *, loff_t *);
>  int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int,
>                        void __user *, size_t *, loff_t *);
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 7b75626..ae62a03 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -255,19 +255,34 @@ extern int vm_swappiness;
>  extern int remove_mapping(struct address_space *mapping, struct page *page);
>  extern long vm_total_pages;
>
> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
>  extern int sysctl_min_unmapped_ratio;
> +extern int sysctl_max_unmapped_ratio;
> +
>  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> -#ifdef CONFIG_NUMA
> -extern int zone_reclaim_mode;
> -extern int sysctl_min_slab_ratio;
>  #else
> -#define zone_reclaim_mode 0
>  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int 
> order)
>  {
>        return 0;
>  }
>  #endif
>
> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
> +extern bool should_reclaim_unmapped_pages(struct zone *zone);
> +#else
> +static inline bool should_reclaim_unmapped_pages(struct zone *zone)
> +{
> +       return false;
> +}
> +#endif
> +
> +#ifdef CONFIG_NUMA
> +extern int zone_reclaim_mode;
> +extern int sysctl_min_slab_ratio;
> +#else
> +#define zone_reclaim_mode 0
> +#endif
> +
>  extern int page_evictable(struct page *page, struct vm_area_struct *vma);
>  extern void scan_mapping_unevictable_pages(struct address_space *);
>
> diff --git a/init/Kconfi

Re: [PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-26 Thread Christoph Lameter

Reviewed-by: Christoph Lameter 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Provide control over unmapped pages (v4)

2011-01-24 Thread Balbir Singh
Changelog v4
1. Add max_unmapped_ratio and use that as the upper limit
to check when to shrink the unmapped page cache (Christoph
Lameter)

Changelog v2
1. Use a config option to enable the code (Andrew Morton)
2. Explain the magic tunables in the code or at-least attempt
   to explain them (General comment)
3. Hint uses of the boot parameter with unlikely (Andrew Morton)
4. Use better names (balanced is not a good naming convention)

Provide control using zone_reclaim() and a boot parameter. The
code reuses functionality from zone_reclaim() to isolate unmapped
pages and reclaim them as a priority, ahead of other mapped pages.
A new sysctl for max_unmapped_ratio is provided and set to 16,
indicating 16% of the total zone pages are unmapped, we start
shrinking unmapped page cache.

Signed-off-by: Balbir Singh 
---
 Documentation/kernel-parameters.txt |8 +++
 include/linux/mmzone.h  |5 ++
 include/linux/swap.h|   23 -
 init/Kconfig|   12 +
 kernel/sysctl.c |   11 
 mm/page_alloc.c |   25 ++
 mm/vmscan.c |   87 +++
 7 files changed, 166 insertions(+), 5 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index fee5f57..65a4ee6 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2500,6 +2500,14 @@ and is between 256 and 4096 characters. It is defined in 
the file
[X86]
Set unknown_nmi_panic=1 early on boot.
 
+   unmapped_page_control
+   [KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL
+   is enabled. It controls the amount of unmapped memory
+   that is present in the system. This boot option plus
+   vm.min_unmapped_ratio (sysctl) provide granular control
+   over how much unmapped page cache can exist in the 
system
+   before kswapd starts reclaiming unmapped page cache 
pages.
+
usbcore.autosuspend=
[USB] The autosuspend time delay (in seconds) used
for newly-detected USB devices (default 2).  This
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2485acc..18f0f09 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -306,7 +306,10 @@ struct zone {
/*
 * zone reclaim becomes active if more unmapped pages exist.
 */
+#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA)
unsigned long   min_unmapped_pages;
+   unsigned long   max_unmapped_pages;
+#endif
 #ifdef CONFIG_NUMA
int node;
unsigned long   min_slab_pages;
@@ -773,6 +776,8 @@ int percpu_pagelist_fraction_sysctl_handler(struct 
ctl_table *, int,
void __user *, size_t *, loff_t *);
 int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+int sysctl_max_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
+   void __user *, size_t *, loff_t *);
 int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7b75626..ae62a03 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -255,19 +255,34 @@ extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
 
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA)
 extern int sysctl_min_unmapped_ratio;
+extern int sysctl_max_unmapped_ratio;
+
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
-extern int sysctl_min_slab_ratio;
 #else
-#define zone_reclaim_mode 0
 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
 {
return 0;
 }
 #endif
 
+#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL)
+extern bool should_reclaim_unmapped_pages(struct zone *zone);
+#else
+static inline bool should_reclaim_unmapped_pages(struct zone *zone)
+{
+   return false;
+}
+#endif
+
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
+extern int sysctl_min_slab_ratio;
+#else
+#define zone_reclaim_mode 0
+#endif
+
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
 extern void scan_mapping_unevictable_pages(struct address_space *);
 
diff --git a/init/Kconfig b/init/Kconfig
index 4f6cdbf..2dfbc09 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -828,6 +828,18 @@ config SCHED_AUTOGROUP
 config MM_OWNER
bool
 
+config UNMAPPED_PAGECACHE_CONTROL
+   bool "Provide control over unmapped page cache"
+   default n
+   help
+