Re: [PATCH 3/3] Provide control over unmapped pages (v4)
On Mon, Feb 14, 2011 at 2:33 AM, Balbir Singh wrote: > * MinChan Kim [2011-02-10 14:41:44]: > >> I don't know why the part of message is deleted only when I send you. >> Maybe it's gmail bug. >> >> I hope mail sending is successful in this turn. :) >> >> On Thu, Feb 10, 2011 at 2:33 PM, Minchan Kim wrote: >> > Sorry for late response. >> > >> > On Fri, Jan 28, 2011 at 8:18 PM, Balbir Singh >> > wrote: >> >> * MinChan Kim [2011-01-28 16:24:19]: >> >> >> >>> > >> >>> > But the assumption for LRU order to change happens only if the page >> >>> > cannot be successfully freed, which means it is in some way active.. >> >>> > and needs to be moved no? >> >>> >> >>> 1. holded page by someone >> >>> 2. mapped pages >> >>> 3. active pages >> >>> >> >>> 1 is rare so it isn't the problem. >> >>> Of course, in case of 3, we have to activate it so no problem. >> >>> The problem is 2. >> >>> >> >> >> >> 2 is a problem, but due to the size aspects not a big one. Like you >> >> said even lumpy reclaim affects it. May be the reclaim code could >> >> honour may_unmap much earlier. >> > >> > Even if it is, it's a trade-off to get a big contiguous memory. I >> > don't want to add new mess. (In addition, lumpy is weak by compaction >> > as time goes by) >> > What I have in mind for preventing LRU ignore is that put the page >> > into original position instead of head of lru. Maybe it can help the >> > situation both lumpy and your case. But it's another story. >> > >> > How about the idea? >> > >> > I borrow the idea from CFLRU[1] >> > - PCFLRU(Page-Cache First LRU) >> > >> > When we allocates new page for page cache, we adds the page into LRU's >> > tail. >> > When we map the page cache into page table, we rotate the page into LRU's >> > head. >> > >> > So, inactive list's result is following as. >> > >> > M.P : mapped page >> > N.P : none-mapped page >> > >> > HEAD-M.P-M.P-M.P-M.P-N.P-N.P-N.P-N.P-N.P-TAIL >> > >> > Admin can set threshold window size which determines stop reclaiming >> > none-mapped page contiguously. >> > >> > I think it needs some tweak of page cache/page mapping functions but >> > we can use kswapd/direct reclaim without change. >> > >> > Also, it can change page reclaim policy totally but it's just what you >> > want, I think. >> > > > I am not sure how this would work, moreover the idea behind > min_unmapped_pages is to keep sufficient unmapped pages around for the > FS metadata and has been working with the existing code for zone > reclaim. What you propose is more drastic re-org of the LRU and I am > not sure I have the apetite for it. Yes. My suggestion can change LRU order totally but it can't meet your goal so it was a bad idea. Sorry for bothering you. I can add reviewed-by [1/3],[2/3], but still doubt [3/3]. LRU ordering problem as I mentioned is not only your problem but it's general problem these day(ex, more aggressive compaction/lumpy reclaim). So we may need general solution if it is real problem. Okay. I don't oppose your approach from now on until I can prove how much LRU-reordering makes bad effect. (But still I raise my eyebrow on implementation [3/3] but I don't oppose it until I suggest better approach) Thanks. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
* MinChan Kim [2011-02-10 14:41:44]: > I don't know why the part of message is deleted only when I send you. > Maybe it's gmail bug. > > I hope mail sending is successful in this turn. :) > > On Thu, Feb 10, 2011 at 2:33 PM, Minchan Kim wrote: > > Sorry for late response. > > > > On Fri, Jan 28, 2011 at 8:18 PM, Balbir Singh > > wrote: > >> * MinChan Kim [2011-01-28 16:24:19]: > >> > >>> > > >>> > But the assumption for LRU order to change happens only if the page > >>> > cannot be successfully freed, which means it is in some way active.. > >>> > and needs to be moved no? > >>> > >>> 1. holded page by someone > >>> 2. mapped pages > >>> 3. active pages > >>> > >>> 1 is rare so it isn't the problem. > >>> Of course, in case of 3, we have to activate it so no problem. > >>> The problem is 2. > >>> > >> > >> 2 is a problem, but due to the size aspects not a big one. Like you > >> said even lumpy reclaim affects it. May be the reclaim code could > >> honour may_unmap much earlier. > > > > Even if it is, it's a trade-off to get a big contiguous memory. I > > don't want to add new mess. (In addition, lumpy is weak by compaction > > as time goes by) > > What I have in mind for preventing LRU ignore is that put the page > > into original position instead of head of lru. Maybe it can help the > > situation both lumpy and your case. But it's another story. > > > > How about the idea? > > > > I borrow the idea from CFLRU[1] > > - PCFLRU(Page-Cache First LRU) > > > > When we allocates new page for page cache, we adds the page into LRU's tail. > > When we map the page cache into page table, we rotate the page into LRU's > > head. > > > > So, inactive list's result is following as. > > > > M.P : mapped page > > N.P : none-mapped page > > > > HEAD-M.P-M.P-M.P-M.P-N.P-N.P-N.P-N.P-N.P-TAIL > > > > Admin can set threshold window size which determines stop reclaiming > > none-mapped page contiguously. > > > > I think it needs some tweak of page cache/page mapping functions but > > we can use kswapd/direct reclaim without change. > > > > Also, it can change page reclaim policy totally but it's just what you > > want, I think. > > I am not sure how this would work, moreover the idea behind min_unmapped_pages is to keep sufficient unmapped pages around for the FS metadata and has been working with the existing code for zone reclaim. What you propose is more drastic re-org of the LRU and I am not sure I have the apetite for it. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
I don't know why the part of message is deleted only when I send you. Maybe it's gmail bug. I hope mail sending is successful in this turn. :) On Thu, Feb 10, 2011 at 2:33 PM, Minchan Kim wrote: > Sorry for late response. > > On Fri, Jan 28, 2011 at 8:18 PM, Balbir Singh > wrote: >> * MinChan Kim [2011-01-28 16:24:19]: >> >>> > >>> > But the assumption for LRU order to change happens only if the page >>> > cannot be successfully freed, which means it is in some way active.. >>> > and needs to be moved no? >>> >>> 1. holded page by someone >>> 2. mapped pages >>> 3. active pages >>> >>> 1 is rare so it isn't the problem. >>> Of course, in case of 3, we have to activate it so no problem. >>> The problem is 2. >>> >> >> 2 is a problem, but due to the size aspects not a big one. Like you >> said even lumpy reclaim affects it. May be the reclaim code could >> honour may_unmap much earlier. > > Even if it is, it's a trade-off to get a big contiguous memory. I > don't want to add new mess. (In addition, lumpy is weak by compaction > as time goes by) > What I have in mind for preventing LRU ignore is that put the page > into original position instead of head of lru. Maybe it can help the > situation both lumpy and your case. But it's another story. > > How about the idea? > > I borrow the idea from CFLRU[1] > - PCFLRU(Page-Cache First LRU) > > When we allocates new page for page cache, we adds the page into LRU's tail. > When we map the page cache into page table, we rotate the page into LRU's > head. > > So, inactive list's result is following as. > > M.P : mapped page > N.P : none-mapped page > > HEAD-M.P-M.P-M.P-M.P-N.P-N.P-N.P-N.P-N.P-TAIL > > Admin can set threshold window size which determines stop reclaiming > none-mapped page contiguously. > > I think it needs some tweak of page cache/page mapping functions but > we can use kswapd/direct reclaim without change. > > Also, it can change page reclaim policy totally but it's just what you > want, I think. > > [1] > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.6188&rep=rep1&type=pdf > >> >> -- >> Three Cheers, >> Balbir >> > > > > -- > Kind regards, > Minchan Kim > -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
Sorry for late response. On Fri, Jan 28, 2011 at 8:18 PM, Balbir Singh wrote: > * MinChan Kim [2011-01-28 16:24:19]: > >> > >> > But the assumption for LRU order to change happens only if the page >> > cannot be successfully freed, which means it is in some way active.. >> > and needs to be moved no? >> >> 1. holded page by someone >> 2. mapped pages >> 3. active pages >> >> 1 is rare so it isn't the problem. >> Of course, in case of 3, we have to activate it so no problem. >> The problem is 2. >> > > 2 is a problem, but due to the size aspects not a big one. Like you > said even lumpy reclaim affects it. May be the reclaim code could > honour may_unmap much earlier. Even if it is, it's a trade-off to get a big contiguous memory. I don't want to add new mess. (In addition, lumpy is weak by compaction as time goes by) What I have in mind for preventing LRU ignore is that put the page into original position instead of head of lru. Maybe it can help the situation both lumpy and your case. But it's another story. How about the idea? I borrow the idea from CFLRU[1] - PCFLRU(Page-Cache First LRU) When we allocates new page for page cache, we adds the page into LRU's tail. When we map the page cache into page table, we rotate the page into LRU's head. So, inactive list's result is following as. M.P : mapped page N.P : none-mapped page HEAD-M.P-M.P-M.P-M.P-N.P-N.P-N.P-N.P-N.P-TAIL Admin can set threshold window size which determines stop reclaiming none-mapped page contiguously. I think it needs some tweak of page cache/page mapping functions but we can use kswapd/direct reclaim without change. Also, it can change page reclaim policy totally but it's just what you want, I think. [1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.6188&rep=rep1&type=pdf > > -- > Three Cheers, > Balbir > -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
* KAMEZAWA Hiroyuki [2011-01-31 08:58:53]: > On Fri, 28 Jan 2011 09:20:02 -0600 (CST) > Christoph Lameter wrote: > > > On Fri, 28 Jan 2011, KAMEZAWA Hiroyuki wrote: > > > > > > > I see it as a tradeoff of when to check? add_to_page_cache or when we > > > > > are want more free memory (due to allocation). It is OK to wakeup > > > > > kswapd while allocating memory, somehow for this purpose (global page > > > > > cache), add_to_page_cache or add_to_page_cache_locked does not seem > > > > > the right place to hook into. I'd be open to comments/suggestions > > > > > though from others as well. > > > > > > I don't like add hook here. > > > AND I don't want to run kswapd because 'kswapd' has been a sign as > > > there are memory shortage. (reusing code is ok.) > > > > > > How about adding new daemon ? Recently, khugepaged, ksmd works for > > > managing memory. Adding one more daemon for special purpose is not > > > very bad, I think. Then, you can do > > > - wake up without hook > > > - throttle its work. > > > - balance the whole system rather than zone. > > >I think per-node balance is enough... > > > > > > I think we already have enough kernel daemons floating around. They are > > multiplying in an amazing way. What would be useful is to map all > > the memory management background stuff into a process. May call this memd > > instead? Perhaps we can fold khugepaged into kswapd as well etc. > > > > Making kswapd slow for whis "additional", "requested by user, not by system" > work is good thing ? I think workqueue works enough well, it's scale based on > workloads, if using thread is bad. > Making it slow is a generic statement, kswapd is supposed to do background reclaim, in this case a special request for unmapped pages, specifically and deliberately requested by the admin via a boot option. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
On Fri, 28 Jan 2011 09:20:02 -0600 (CST) Christoph Lameter wrote: > On Fri, 28 Jan 2011, KAMEZAWA Hiroyuki wrote: > > > > > I see it as a tradeoff of when to check? add_to_page_cache or when we > > > > are want more free memory (due to allocation). It is OK to wakeup > > > > kswapd while allocating memory, somehow for this purpose (global page > > > > cache), add_to_page_cache or add_to_page_cache_locked does not seem > > > > the right place to hook into. I'd be open to comments/suggestions > > > > though from others as well. > > > > I don't like add hook here. > > AND I don't want to run kswapd because 'kswapd' has been a sign as > > there are memory shortage. (reusing code is ok.) > > > > How about adding new daemon ? Recently, khugepaged, ksmd works for > > managing memory. Adding one more daemon for special purpose is not > > very bad, I think. Then, you can do > > - wake up without hook > > - throttle its work. > > - balance the whole system rather than zone. > >I think per-node balance is enough... > > > I think we already have enough kernel daemons floating around. They are > multiplying in an amazing way. What would be useful is to map all > the memory management background stuff into a process. May call this memd > instead? Perhaps we can fold khugepaged into kswapd as well etc. > Making kswapd slow for whis "additional", "requested by user, not by system" work is good thing ? I think workqueue works enough well, it's scale based on workloads, if using thread is bad. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
On Fri, 28 Jan 2011, KAMEZAWA Hiroyuki wrote: > > > I see it as a tradeoff of when to check? add_to_page_cache or when we > > > are want more free memory (due to allocation). It is OK to wakeup > > > kswapd while allocating memory, somehow for this purpose (global page > > > cache), add_to_page_cache or add_to_page_cache_locked does not seem > > > the right place to hook into. I'd be open to comments/suggestions > > > though from others as well. > > I don't like add hook here. > AND I don't want to run kswapd because 'kswapd' has been a sign as > there are memory shortage. (reusing code is ok.) > > How about adding new daemon ? Recently, khugepaged, ksmd works for > managing memory. Adding one more daemon for special purpose is not > very bad, I think. Then, you can do > - wake up without hook > - throttle its work. > - balance the whole system rather than zone. >I think per-node balance is enough... I think we already have enough kernel daemons floating around. They are multiplying in an amazing way. What would be useful is to map all the memory management background stuff into a process. May call this memd instead? Perhaps we can fold khugepaged into kswapd as well etc. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
* KAMEZAWA Hiroyuki [2011-01-28 17:17:44]: > On Fri, 28 Jan 2011 13:49:28 +0530 > Balbir Singh wrote: > > > * KAMEZAWA Hiroyuki [2011-01-28 16:56:05]: > > > > BTW, it seems this doesn't work when some apps use huge shmem. > > > How to handle the issue ? > > > > > > > Could you elaborate further? > > > == > static inline unsigned long zone_unmapped_file_pages(struct zone *zone) > { > unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED); > unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) + > zone_page_state(zone, NR_ACTIVE_FILE); > > /* > * It's possible for there to be more file mapped pages than > * accounted for by the pages on the file LRU lists because > * tmpfs pages accounted for as ANON can also be FILE_MAPPED > */ > return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0; > } Yes, I did :) The word huge confused me. I am not sure if there is an easy accounting fix for this one, though given the approximate nature of the control, I am not sure it would matter very much. But you do have a very good point. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
* MinChan Kim [2011-01-28 16:24:19]: > > > > But the assumption for LRU order to change happens only if the page > > cannot be successfully freed, which means it is in some way active.. > > and needs to be moved no? > > 1. holded page by someone > 2. mapped pages > 3. active pages > > 1 is rare so it isn't the problem. > Of course, in case of 3, we have to activate it so no problem. > The problem is 2. > 2 is a problem, but due to the size aspects not a big one. Like you said even lumpy reclaim affects it. May be the reclaim code could honour may_unmap much earlier. -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
On Fri, 28 Jan 2011 13:49:28 +0530 Balbir Singh wrote: > * KAMEZAWA Hiroyuki [2011-01-28 16:56:05]: > > BTW, it seems this doesn't work when some apps use huge shmem. > > How to handle the issue ? > > > > Could you elaborate further? > == static inline unsigned long zone_unmapped_file_pages(struct zone *zone) { unsigned long file_mapped = zone_page_state(zone, NR_FILE_MAPPED); unsigned long file_lru = zone_page_state(zone, NR_INACTIVE_FILE) + zone_page_state(zone, NR_ACTIVE_FILE); /* * It's possible for there to be more file mapped pages than * accounted for by the pages on the file LRU lists because * tmpfs pages accounted for as ANON can also be FILE_MAPPED */ return (file_lru > file_mapped) ? (file_lru - file_mapped) : 0; } == Did you read ? Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
* KAMEZAWA Hiroyuki [2011-01-28 16:56:05]: > On Fri, 28 Jan 2011 16:24:19 +0900 > Minchan Kim wrote: > > > On Fri, Jan 28, 2011 at 3:48 PM, Balbir Singh > > wrote: > > > * MinChan Kim [2011-01-28 14:44:50]: > > > > > >> On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh > > >> wrote: > > >> > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim > > >> > wrote: > > >> > [snip] > > >> > > > >> >>> index 7b56473..2ac8549 100644 > > >> >>> --- a/mm/page_alloc.c > > >> >>> +++ b/mm/page_alloc.c > > >> >>> @@ -1660,6 +1660,9 @@ zonelist_scan: > > >> >>> unsigned long mark; > > >> >>> int ret; > > >> >>> > > >> >>> + if (should_reclaim_unmapped_pages(zone)) > > >> >>> + wakeup_kswapd(zone, order, > > >> >>> classzone_idx); > > >> >>> + > > >> >> > > >> >> Do we really need the check in fastpath? > > >> >> There are lost of caller of alloc_pages. > > >> >> Many of them are not related to mapped pages. > > >> >> Could we move the check into add_to_page_cache_locked? > > >> > > > >> > The check is a simple check to see if the unmapped pages need > > >> > balancing, the reason I placed this check here is to allow other > > >> > allocations to benefit as well, if there are some unmapped pages to be > > >> > freed. add_to_page_cache_locked (check under a critical section) is > > >> > even worse, IMHO. > > >> > > >> It just moves the overhead from general into specific case(ie, > > >> allocates page for just page cache). > > >> Another cases(ie, allocates pages for other purpose except page cache, > > >> ex device drivers or fs allocation for internal using) aren't > > >> affected. > > >> So, It would be better. > > >> > > >> The goal in this patch is to remove only page cache page, isn't it? > > >> So I think we could the balance check in add_to_page_cache and trigger > > >> reclaim. > > >> If we do so, what's the problem? > > >> > > > > > > I see it as a tradeoff of when to check? add_to_page_cache or when we > > > are want more free memory (due to allocation). It is OK to wakeup > > > kswapd while allocating memory, somehow for this purpose (global page > > > cache), add_to_page_cache or add_to_page_cache_locked does not seem > > > the right place to hook into. I'd be open to comments/suggestions > > > though from others as well. > > I don't like add hook here. > AND I don't want to run kswapd because 'kswapd' has been a sign as > there are memory shortage. (reusing code is ok.) > > How about adding new daemon ? Recently, khugepaged, ksmd works for > managing memory. Adding one more daemon for special purpose is not > very bad, I think. Then, you can do > - wake up without hook > - throttle its work. > - balance the whole system rather than zone. >I think per-node balance is enough... > > > > > Honestly, I did look at that option, but balancing via kswapd seemed like the best option. Creating a new thread/daemon did not make sense because 1. The control is very lose 2. kswapd can deal with it while balancing other things, in fact imagine kswapd waking up to free memory, but there being other free memory easily available. Parallel reclaim, zone lock contention addition does not help, IMHO. 3. kswapd does not indicate memory shortage per-se, please see min_free_kbytes_sysctl_handler, kswapd is to balance the nodes/zone. If you tune min_free_kbytes and kswapd runs, it does not mean memory shortage on the system > > > > > > > > >> > > > >> > > > >> >> > > >> >>> mark = zone->watermark[alloc_flags & > > >> >>> ALLOC_WMARK_MASK]; > > >> >>> if (zone_watermark_ok(zone, order, mark, > > >> >>> classzone_idx, alloc_flags)) > > >> >>> @@ -4167,8 +4170,12 @@ static void __paginginit > > >> >>> free_area_init_core(struct pglist_data *pgdat, > > >> >>> > > >> >>> zone->spanned_pages = size; > > >> >>> zone->present_pages = realsize; > > >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) > > >> >>> zone->min_unmapped_pages = > > >> >>> (realsize*sysctl_min_unmapped_ratio) > > >> >>> / 100; > > >> >>> + zone->max_unmapped_pages = > > >> >>> (realsize*sysctl_max_unmapped_ratio) > > >> >>> + / 100; > > >> >>> +#endif > > >> >>> #ifdef CONFIG_NUMA > > >> >>> zone->node = nid; > > >> >>> zone->min_slab_pages = (realsize * > > >> >>> sysctl_min_slab_ratio) / 100; > > >> >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table > > >> >>> *table, int write, > > >> >>> return 0; > > >> >>> } > > >> >>> > > >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) > > >> >>> int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int > > >> >>> write, > > >> >>> void __user *buffer, si
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
On Fri, 28 Jan 2011 16:24:19 +0900 Minchan Kim wrote: > On Fri, Jan 28, 2011 at 3:48 PM, Balbir Singh > wrote: > > * MinChan Kim [2011-01-28 14:44:50]: > > > >> On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh > >> wrote: > >> > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim > >> > wrote: > >> > [snip] > >> > > >> >>> index 7b56473..2ac8549 100644 > >> >>> --- a/mm/page_alloc.c > >> >>> +++ b/mm/page_alloc.c > >> >>> @@ -1660,6 +1660,9 @@ zonelist_scan: > >> >>> unsigned long mark; > >> >>> int ret; > >> >>> > >> >>> + if (should_reclaim_unmapped_pages(zone)) > >> >>> + wakeup_kswapd(zone, order, > >> >>> classzone_idx); > >> >>> + > >> >> > >> >> Do we really need the check in fastpath? > >> >> There are lost of caller of alloc_pages. > >> >> Many of them are not related to mapped pages. > >> >> Could we move the check into add_to_page_cache_locked? > >> > > >> > The check is a simple check to see if the unmapped pages need > >> > balancing, the reason I placed this check here is to allow other > >> > allocations to benefit as well, if there are some unmapped pages to be > >> > freed. add_to_page_cache_locked (check under a critical section) is > >> > even worse, IMHO. > >> > >> It just moves the overhead from general into specific case(ie, > >> allocates page for just page cache). > >> Another cases(ie, allocates pages for other purpose except page cache, > >> ex device drivers or fs allocation for internal using) aren't > >> affected. > >> So, It would be better. > >> > >> The goal in this patch is to remove only page cache page, isn't it? > >> So I think we could the balance check in add_to_page_cache and trigger > >> reclaim. > >> If we do so, what's the problem? > >> > > > > I see it as a tradeoff of when to check? add_to_page_cache or when we > > are want more free memory (due to allocation). It is OK to wakeup > > kswapd while allocating memory, somehow for this purpose (global page > > cache), add_to_page_cache or add_to_page_cache_locked does not seem > > the right place to hook into. I'd be open to comments/suggestions > > though from others as well. I don't like add hook here. AND I don't want to run kswapd because 'kswapd' has been a sign as there are memory shortage. (reusing code is ok.) How about adding new daemon ? Recently, khugepaged, ksmd works for managing memory. Adding one more daemon for special purpose is not very bad, I think. Then, you can do - wake up without hook - throttle its work. - balance the whole system rather than zone. I think per-node balance is enough... > > > >> > > >> > > >> >> > >> >>> mark = zone->watermark[alloc_flags & > >> >>> ALLOC_WMARK_MASK]; > >> >>> if (zone_watermark_ok(zone, order, mark, > >> >>> classzone_idx, alloc_flags)) > >> >>> @@ -4167,8 +4170,12 @@ static void __paginginit > >> >>> free_area_init_core(struct pglist_data *pgdat, > >> >>> > >> >>> zone->spanned_pages = size; > >> >>> zone->present_pages = realsize; > >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) > >> >>> zone->min_unmapped_pages = > >> >>> (realsize*sysctl_min_unmapped_ratio) > >> >>> / 100; > >> >>> + zone->max_unmapped_pages = > >> >>> (realsize*sysctl_max_unmapped_ratio) > >> >>> + / 100; > >> >>> +#endif > >> >>> #ifdef CONFIG_NUMA > >> >>> zone->node = nid; > >> >>> zone->min_slab_pages = (realsize * > >> >>> sysctl_min_slab_ratio) / 100; > >> >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table > >> >>> *table, int write, > >> >>> return 0; > >> >>> } > >> >>> > >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) > >> >>> int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int > >> >>> write, > >> >>> void __user *buffer, size_t *length, loff_t *ppos) > >> >>> { > >> >>> @@ -5100,6 +5108,23 @@ int > >> >>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, > >> >>> return 0; > >> >>> } > >> >>> > >> >>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int > >> >>> write, > >> >>> + void __user *buffer, size_t *length, loff_t *ppos) > >> >>> +{ > >> >>> + struct zone *zone; > >> >>> + int rc; > >> >>> + > >> >>> + rc = proc_dointvec_minmax(table, write, buffer, length, ppos); > >> >>> + if (rc) > >> >>> + return rc; > >> >>> + > >> >>> + for_each_zone(zone) > >> >>> + zone->max_unmapped_pages = (zone->present_pages * > >> >>> + sysctl_max_unmapped_ratio) / 100; > >> >>> + return 0; > >> >>> +} > >> >>> +#endif > >> >>> + > >> >>> #ifdef CONFIG_NUMA > >> >>
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
On Fri, Jan 28, 2011 at 3:48 PM, Balbir Singh wrote: > * MinChan Kim [2011-01-28 14:44:50]: > >> On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh >> wrote: >> > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim wrote: >> > [snip] >> > >> >>> index 7b56473..2ac8549 100644 >> >>> --- a/mm/page_alloc.c >> >>> +++ b/mm/page_alloc.c >> >>> @@ -1660,6 +1660,9 @@ zonelist_scan: >> >>> unsigned long mark; >> >>> int ret; >> >>> >> >>> + if (should_reclaim_unmapped_pages(zone)) >> >>> + wakeup_kswapd(zone, order, >> >>> classzone_idx); >> >>> + >> >> >> >> Do we really need the check in fastpath? >> >> There are lost of caller of alloc_pages. >> >> Many of them are not related to mapped pages. >> >> Could we move the check into add_to_page_cache_locked? >> > >> > The check is a simple check to see if the unmapped pages need >> > balancing, the reason I placed this check here is to allow other >> > allocations to benefit as well, if there are some unmapped pages to be >> > freed. add_to_page_cache_locked (check under a critical section) is >> > even worse, IMHO. >> >> It just moves the overhead from general into specific case(ie, >> allocates page for just page cache). >> Another cases(ie, allocates pages for other purpose except page cache, >> ex device drivers or fs allocation for internal using) aren't >> affected. >> So, It would be better. >> >> The goal in this patch is to remove only page cache page, isn't it? >> So I think we could the balance check in add_to_page_cache and trigger >> reclaim. >> If we do so, what's the problem? >> > > I see it as a tradeoff of when to check? add_to_page_cache or when we > are want more free memory (due to allocation). It is OK to wakeup > kswapd while allocating memory, somehow for this purpose (global page > cache), add_to_page_cache or add_to_page_cache_locked does not seem > the right place to hook into. I'd be open to comments/suggestions > though from others as well. > >> > >> > >> >> >> >>> mark = zone->watermark[alloc_flags & >> >>> ALLOC_WMARK_MASK]; >> >>> if (zone_watermark_ok(zone, order, mark, >> >>> classzone_idx, alloc_flags)) >> >>> @@ -4167,8 +4170,12 @@ static void __paginginit >> >>> free_area_init_core(struct pglist_data *pgdat, >> >>> >> >>> zone->spanned_pages = size; >> >>> zone->present_pages = realsize; >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) >> >>> zone->min_unmapped_pages = >> >>> (realsize*sysctl_min_unmapped_ratio) >> >>> / 100; >> >>> + zone->max_unmapped_pages = >> >>> (realsize*sysctl_max_unmapped_ratio) >> >>> + / 100; >> >>> +#endif >> >>> #ifdef CONFIG_NUMA >> >>> zone->node = nid; >> >>> zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) >> >>> / 100; >> >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table >> >>> *table, int write, >> >>> return 0; >> >>> } >> >>> >> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) >> >>> int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int >> >>> write, >> >>> void __user *buffer, size_t *length, loff_t *ppos) >> >>> { >> >>> @@ -5100,6 +5108,23 @@ int >> >>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >> >>> return 0; >> >>> } >> >>> >> >>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int >> >>> write, >> >>> + void __user *buffer, size_t *length, loff_t *ppos) >> >>> +{ >> >>> + struct zone *zone; >> >>> + int rc; >> >>> + >> >>> + rc = proc_dointvec_minmax(table, write, buffer, length, ppos); >> >>> + if (rc) >> >>> + return rc; >> >>> + >> >>> + for_each_zone(zone) >> >>> + zone->max_unmapped_pages = (zone->present_pages * >> >>> + sysctl_max_unmapped_ratio) / 100; >> >>> + return 0; >> >>> +} >> >>> +#endif >> >>> + >> >>> #ifdef CONFIG_NUMA >> >>> int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write, >> >>> void __user *buffer, size_t *length, loff_t *ppos) >> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >> >>> index 02cc82e..6377411 100644 >> >>> --- a/mm/vmscan.c >> >>> +++ b/mm/vmscan.c >> >>> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem); >> >>> #define scanning_global_lru(sc) (1) >> >>> #endif >> >>> >> >>> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) >> >>> +static unsigned long reclaim_unmapped_pages(int priority, struct zone >> >>> *zone, >> >>> + struct scan_control *sc); >> >>> +static int unmapped_page_control __read_mostly; >> >>> + >> >>> +static int __init unmapped_page_
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
* MinChan Kim [2011-01-28 14:44:50]: > On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh > wrote: > > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim wrote: > > [snip] > > > >>> index 7b56473..2ac8549 100644 > >>> --- a/mm/page_alloc.c > >>> +++ b/mm/page_alloc.c > >>> @@ -1660,6 +1660,9 @@ zonelist_scan: > >>> unsigned long mark; > >>> int ret; > >>> > >>> + if (should_reclaim_unmapped_pages(zone)) > >>> + wakeup_kswapd(zone, order, classzone_idx); > >>> + > >> > >> Do we really need the check in fastpath? > >> There are lost of caller of alloc_pages. > >> Many of them are not related to mapped pages. > >> Could we move the check into add_to_page_cache_locked? > > > > The check is a simple check to see if the unmapped pages need > > balancing, the reason I placed this check here is to allow other > > allocations to benefit as well, if there are some unmapped pages to be > > freed. add_to_page_cache_locked (check under a critical section) is > > even worse, IMHO. > > It just moves the overhead from general into specific case(ie, > allocates page for just page cache). > Another cases(ie, allocates pages for other purpose except page cache, > ex device drivers or fs allocation for internal using) aren't > affected. > So, It would be better. > > The goal in this patch is to remove only page cache page, isn't it? > So I think we could the balance check in add_to_page_cache and trigger > reclaim. > If we do so, what's the problem? > I see it as a tradeoff of when to check? add_to_page_cache or when we are want more free memory (due to allocation). It is OK to wakeup kswapd while allocating memory, somehow for this purpose (global page cache), add_to_page_cache or add_to_page_cache_locked does not seem the right place to hook into. I'd be open to comments/suggestions though from others as well. > > > > > >> > >>> mark = zone->watermark[alloc_flags & > >>> ALLOC_WMARK_MASK]; > >>> if (zone_watermark_ok(zone, order, mark, > >>> classzone_idx, alloc_flags)) > >>> @@ -4167,8 +4170,12 @@ static void __paginginit > >>> free_area_init_core(struct pglist_data *pgdat, > >>> > >>> zone->spanned_pages = size; > >>> zone->present_pages = realsize; > >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) > >>> zone->min_unmapped_pages = > >>> (realsize*sysctl_min_unmapped_ratio) > >>> / 100; > >>> + zone->max_unmapped_pages = > >>> (realsize*sysctl_max_unmapped_ratio) > >>> + / 100; > >>> +#endif > >>> #ifdef CONFIG_NUMA > >>> zone->node = nid; > >>> zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) > >>> / 100; > >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table > >>> *table, int write, > >>> return 0; > >>> } > >>> > >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) > >>> int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, > >>> void __user *buffer, size_t *length, loff_t *ppos) > >>> { > >>> @@ -5100,6 +5108,23 @@ int > >>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, > >>> return 0; > >>> } > >>> > >>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int write, > >>> + void __user *buffer, size_t *length, loff_t *ppos) > >>> +{ > >>> + struct zone *zone; > >>> + int rc; > >>> + > >>> + rc = proc_dointvec_minmax(table, write, buffer, length, ppos); > >>> + if (rc) > >>> + return rc; > >>> + > >>> + for_each_zone(zone) > >>> + zone->max_unmapped_pages = (zone->present_pages * > >>> + sysctl_max_unmapped_ratio) / 100; > >>> + return 0; > >>> +} > >>> +#endif > >>> + > >>> #ifdef CONFIG_NUMA > >>> int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write, > >>> void __user *buffer, size_t *length, loff_t *ppos) > >>> diff --git a/mm/vmscan.c b/mm/vmscan.c > >>> index 02cc82e..6377411 100644 > >>> --- a/mm/vmscan.c > >>> +++ b/mm/vmscan.c > >>> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem); > >>> #define scanning_global_lru(sc) (1) > >>> #endif > >>> > >>> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) > >>> +static unsigned long reclaim_unmapped_pages(int priority, struct zone > >>> *zone, > >>> + struct scan_control *sc); > >>> +static int unmapped_page_control __read_mostly; > >>> + > >>> +static int __init unmapped_page_control_parm(char *str) > >>> +{ > >>> + unmapped_page_control = 1; > >>> + /* > >>> + * XXX: Should we tweak swappiness here? > >>> + */ > >>> + return 1; > >>> +} > >>> +
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
* Christoph Lameter [2011-01-26 10:57:37]: > > Reviewed-by: Christoph Lameter > Thanks for the review! -- Three Cheers, Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
On Fri, Jan 28, 2011 at 11:56 AM, Balbir Singh wrote: > On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim wrote: > [snip] > >>> index 7b56473..2ac8549 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -1660,6 +1660,9 @@ zonelist_scan: >>> unsigned long mark; >>> int ret; >>> >>> + if (should_reclaim_unmapped_pages(zone)) >>> + wakeup_kswapd(zone, order, classzone_idx); >>> + >> >> Do we really need the check in fastpath? >> There are lost of caller of alloc_pages. >> Many of them are not related to mapped pages. >> Could we move the check into add_to_page_cache_locked? > > The check is a simple check to see if the unmapped pages need > balancing, the reason I placed this check here is to allow other > allocations to benefit as well, if there are some unmapped pages to be > freed. add_to_page_cache_locked (check under a critical section) is > even worse, IMHO. It just moves the overhead from general into specific case(ie, allocates page for just page cache). Another cases(ie, allocates pages for other purpose except page cache, ex device drivers or fs allocation for internal using) aren't affected. So, It would be better. The goal in this patch is to remove only page cache page, isn't it? So I think we could the balance check in add_to_page_cache and trigger reclaim. If we do so, what's the problem? > > >> >>> mark = zone->watermark[alloc_flags & >>> ALLOC_WMARK_MASK]; >>> if (zone_watermark_ok(zone, order, mark, >>> classzone_idx, alloc_flags)) >>> @@ -4167,8 +4170,12 @@ static void __paginginit free_area_init_core(struct >>> pglist_data *pgdat, >>> >>> zone->spanned_pages = size; >>> zone->present_pages = realsize; >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) >>> zone->min_unmapped_pages = >>> (realsize*sysctl_min_unmapped_ratio) >>> / 100; >>> + zone->max_unmapped_pages = >>> (realsize*sysctl_max_unmapped_ratio) >>> + / 100; >>> +#endif >>> #ifdef CONFIG_NUMA >>> zone->node = nid; >>> zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / >>> 100; >>> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, >>> int write, >>> return 0; >>> } >>> >>> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) >>> int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >>> void __user *buffer, size_t *length, loff_t *ppos) >>> { >>> @@ -5100,6 +5108,23 @@ int >>> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >>> return 0; >>> } >>> >>> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >>> + void __user *buffer, size_t *length, loff_t *ppos) >>> +{ >>> + struct zone *zone; >>> + int rc; >>> + >>> + rc = proc_dointvec_minmax(table, write, buffer, length, ppos); >>> + if (rc) >>> + return rc; >>> + >>> + for_each_zone(zone) >>> + zone->max_unmapped_pages = (zone->present_pages * >>> + sysctl_max_unmapped_ratio) / 100; >>> + return 0; >>> +} >>> +#endif >>> + >>> #ifdef CONFIG_NUMA >>> int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write, >>> void __user *buffer, size_t *length, loff_t *ppos) >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 02cc82e..6377411 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem); >>> #define scanning_global_lru(sc) (1) >>> #endif >>> >>> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) >>> +static unsigned long reclaim_unmapped_pages(int priority, struct zone >>> *zone, >>> + struct scan_control *sc); >>> +static int unmapped_page_control __read_mostly; >>> + >>> +static int __init unmapped_page_control_parm(char *str) >>> +{ >>> + unmapped_page_control = 1; >>> + /* >>> + * XXX: Should we tweak swappiness here? >>> + */ >>> + return 1; >>> +} >>> +__setup("unmapped_page_control", unmapped_page_control_parm); >>> + >>> +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */ >>> +static inline unsigned long reclaim_unmapped_pages(int priority, >>> + struct zone *zone, struct scan_control *sc) >>> +{ >>> + return 0; >>> +} >>> +#endif >>> + >>> static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone, >>> struct scan_control *sc) >>> { >>> @@ -2359,6 +2382,12 @@ loop_again: >>> shrink_active_list(SWAP_CLUSTER_MAX, zone, >>>
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
On Thu, Jan 27, 2011 at 4:42 AM, Minchan Kim wrote: [snip] >> index 7b56473..2ac8549 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1660,6 +1660,9 @@ zonelist_scan: >> unsigned long mark; >> int ret; >> >> + if (should_reclaim_unmapped_pages(zone)) >> + wakeup_kswapd(zone, order, classzone_idx); >> + > > Do we really need the check in fastpath? > There are lost of caller of alloc_pages. > Many of them are not related to mapped pages. > Could we move the check into add_to_page_cache_locked? The check is a simple check to see if the unmapped pages need balancing, the reason I placed this check here is to allow other allocations to benefit as well, if there are some unmapped pages to be freed. add_to_page_cache_locked (check under a critical section) is even worse, IMHO. > >> mark = zone->watermark[alloc_flags & >> ALLOC_WMARK_MASK]; >> if (zone_watermark_ok(zone, order, mark, >> classzone_idx, alloc_flags)) >> @@ -4167,8 +4170,12 @@ static void __paginginit free_area_init_core(struct >> pglist_data *pgdat, >> >> zone->spanned_pages = size; >> zone->present_pages = realsize; >> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) >> zone->min_unmapped_pages = >> (realsize*sysctl_min_unmapped_ratio) >> / 100; >> + zone->max_unmapped_pages = >> (realsize*sysctl_max_unmapped_ratio) >> + / 100; >> +#endif >> #ifdef CONFIG_NUMA >> zone->node = nid; >> zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / >> 100; >> @@ -5084,6 +5091,7 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, >> int write, >> return 0; >> } >> >> +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) >> int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >> void __user *buffer, size_t *length, loff_t *ppos) >> { >> @@ -5100,6 +5108,23 @@ int >> sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >> return 0; >> } >> >> +int sysctl_max_unmapped_ratio_sysctl_handler(ctl_table *table, int write, >> + void __user *buffer, size_t *length, loff_t *ppos) >> +{ >> + struct zone *zone; >> + int rc; >> + >> + rc = proc_dointvec_minmax(table, write, buffer, length, ppos); >> + if (rc) >> + return rc; >> + >> + for_each_zone(zone) >> + zone->max_unmapped_pages = (zone->present_pages * >> + sysctl_max_unmapped_ratio) / 100; >> + return 0; >> +} >> +#endif >> + >> #ifdef CONFIG_NUMA >> int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write, >> void __user *buffer, size_t *length, loff_t *ppos) >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 02cc82e..6377411 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -159,6 +159,29 @@ static DECLARE_RWSEM(shrinker_rwsem); >> #define scanning_global_lru(sc) (1) >> #endif >> >> +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) >> +static unsigned long reclaim_unmapped_pages(int priority, struct zone *zone, >> + struct scan_control *sc); >> +static int unmapped_page_control __read_mostly; >> + >> +static int __init unmapped_page_control_parm(char *str) >> +{ >> + unmapped_page_control = 1; >> + /* >> + * XXX: Should we tweak swappiness here? >> + */ >> + return 1; >> +} >> +__setup("unmapped_page_control", unmapped_page_control_parm); >> + >> +#else /* !CONFIG_UNMAPPED_PAGECACHE_CONTROL */ >> +static inline unsigned long reclaim_unmapped_pages(int priority, >> + struct zone *zone, struct scan_control *sc) >> +{ >> + return 0; >> +} >> +#endif >> + >> static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone, >> struct scan_control *sc) >> { >> @@ -2359,6 +2382,12 @@ loop_again: >> shrink_active_list(SWAP_CLUSTER_MAX, zone, >> &sc, priority, 0); >> >> + /* >> + * We do unmapped page reclaim once here and once >> + * below, so that we don't lose out >> + */ >> + reclaim_unmapped_pages(priority, zone, &sc); >> + >> if (!zone_watermark_ok_safe(zone, order, >> high_wmark_pages(zone), 0, 0)) { >> end_zone = i; >> @@ -2396,6 +2425,11 @@ loop_again: >> continue; >> >> sc.nr_scanned = 0; >> +
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
Hi Balbir, On Tue, Jan 25, 2011 at 2:10 PM, Balbir Singh wrote: > Changelog v4 > 1. Add max_unmapped_ratio and use that as the upper limit > to check when to shrink the unmapped page cache (Christoph > Lameter) > > Changelog v2 > 1. Use a config option to enable the code (Andrew Morton) > 2. Explain the magic tunables in the code or at-least attempt > to explain them (General comment) > 3. Hint uses of the boot parameter with unlikely (Andrew Morton) > 4. Use better names (balanced is not a good naming convention) > > Provide control using zone_reclaim() and a boot parameter. The > code reuses functionality from zone_reclaim() to isolate unmapped > pages and reclaim them as a priority, ahead of other mapped pages. > A new sysctl for max_unmapped_ratio is provided and set to 16, > indicating 16% of the total zone pages are unmapped, we start > shrinking unmapped page cache. > > Signed-off-by: Balbir Singh > --- > Documentation/kernel-parameters.txt | 8 +++ > include/linux/mmzone.h | 5 ++ > include/linux/swap.h | 23 - > init/Kconfig | 12 + > kernel/sysctl.c | 11 > mm/page_alloc.c | 25 ++ > mm/vmscan.c | 87 > +++ > 7 files changed, 166 insertions(+), 5 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt > b/Documentation/kernel-parameters.txt > index fee5f57..65a4ee6 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -2500,6 +2500,14 @@ and is between 256 and 4096 characters. It is defined > in the file > [X86] > Set unknown_nmi_panic=1 early on boot. > > + unmapped_page_control > + [KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL > + is enabled. It controls the amount of unmapped memory > + that is present in the system. This boot option plus > + vm.min_unmapped_ratio (sysctl) provide granular > control > + over how much unmapped page cache can exist in the > system > + before kswapd starts reclaiming unmapped page cache > pages. > + > usbcore.autosuspend= > [USB] The autosuspend time delay (in seconds) used > for newly-detected USB devices (default 2). This > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 2485acc..18f0f09 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -306,7 +306,10 @@ struct zone { > /* > * zone reclaim becomes active if more unmapped pages exist. > */ > +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) > unsigned long min_unmapped_pages; > + unsigned long max_unmapped_pages; > +#endif > #ifdef CONFIG_NUMA > int node; > unsigned long min_slab_pages; > @@ -773,6 +776,8 @@ int percpu_pagelist_fraction_sysctl_handler(struct > ctl_table *, int, > void __user *, size_t *, loff_t *); > int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int, > void __user *, size_t *, loff_t *); > +int sysctl_max_unmapped_ratio_sysctl_handler(struct ctl_table *, int, > + void __user *, size_t *, loff_t *); > int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int, > void __user *, size_t *, loff_t *); > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 7b75626..ae62a03 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -255,19 +255,34 @@ extern int vm_swappiness; > extern int remove_mapping(struct address_space *mapping, struct page *page); > extern long vm_total_pages; > > +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA) > extern int sysctl_min_unmapped_ratio; > +extern int sysctl_max_unmapped_ratio; > + > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > -#ifdef CONFIG_NUMA > -extern int zone_reclaim_mode; > -extern int sysctl_min_slab_ratio; > #else > -#define zone_reclaim_mode 0 > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int > order) > { > return 0; > } > #endif > > +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) > +extern bool should_reclaim_unmapped_pages(struct zone *zone); > +#else > +static inline bool should_reclaim_unmapped_pages(struct zone *zone) > +{ > + return false; > +} > +#endif > + > +#ifdef CONFIG_NUMA > +extern int zone_reclaim_mode; > +extern int sysctl_min_slab_ratio; > +#else > +#define zone_reclaim_mode 0 > +#endif > + > extern int page_evictable(struct page *page, struct vm_area_struct *vma); > extern void scan_mapping_unevictable_pages(struct address_space *); > > diff --git a/init/Kconfi
Re: [PATCH 3/3] Provide control over unmapped pages (v4)
Reviewed-by: Christoph Lameter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Provide control over unmapped pages (v4)
Changelog v4 1. Add max_unmapped_ratio and use that as the upper limit to check when to shrink the unmapped page cache (Christoph Lameter) Changelog v2 1. Use a config option to enable the code (Andrew Morton) 2. Explain the magic tunables in the code or at-least attempt to explain them (General comment) 3. Hint uses of the boot parameter with unlikely (Andrew Morton) 4. Use better names (balanced is not a good naming convention) Provide control using zone_reclaim() and a boot parameter. The code reuses functionality from zone_reclaim() to isolate unmapped pages and reclaim them as a priority, ahead of other mapped pages. A new sysctl for max_unmapped_ratio is provided and set to 16, indicating 16% of the total zone pages are unmapped, we start shrinking unmapped page cache. Signed-off-by: Balbir Singh --- Documentation/kernel-parameters.txt |8 +++ include/linux/mmzone.h |5 ++ include/linux/swap.h| 23 - init/Kconfig| 12 + kernel/sysctl.c | 11 mm/page_alloc.c | 25 ++ mm/vmscan.c | 87 +++ 7 files changed, 166 insertions(+), 5 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index fee5f57..65a4ee6 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2500,6 +2500,14 @@ and is between 256 and 4096 characters. It is defined in the file [X86] Set unknown_nmi_panic=1 early on boot. + unmapped_page_control + [KNL] Available if CONFIG_UNMAPPED_PAGECACHE_CONTROL + is enabled. It controls the amount of unmapped memory + that is present in the system. This boot option plus + vm.min_unmapped_ratio (sysctl) provide granular control + over how much unmapped page cache can exist in the system + before kswapd starts reclaiming unmapped page cache pages. + usbcore.autosuspend= [USB] The autosuspend time delay (in seconds) used for newly-detected USB devices (default 2). This diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 2485acc..18f0f09 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -306,7 +306,10 @@ struct zone { /* * zone reclaim becomes active if more unmapped pages exist. */ +#if defined(CONFIG_UNMAPPED_PAGE_CONTROL) || defined(CONFIG_NUMA) unsigned long min_unmapped_pages; + unsigned long max_unmapped_pages; +#endif #ifdef CONFIG_NUMA int node; unsigned long min_slab_pages; @@ -773,6 +776,8 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); +int sysctl_max_unmapped_ratio_sysctl_handler(struct ctl_table *, int, + void __user *, size_t *, loff_t *); int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); diff --git a/include/linux/swap.h b/include/linux/swap.h index 7b75626..ae62a03 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -255,19 +255,34 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern long vm_total_pages; +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) || defined(CONFIG_NUMA) extern int sysctl_min_unmapped_ratio; +extern int sysctl_max_unmapped_ratio; + extern int zone_reclaim(struct zone *, gfp_t, unsigned int); -#ifdef CONFIG_NUMA -extern int zone_reclaim_mode; -extern int sysctl_min_slab_ratio; #else -#define zone_reclaim_mode 0 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) { return 0; } #endif +#if defined(CONFIG_UNMAPPED_PAGECACHE_CONTROL) +extern bool should_reclaim_unmapped_pages(struct zone *zone); +#else +static inline bool should_reclaim_unmapped_pages(struct zone *zone) +{ + return false; +} +#endif + +#ifdef CONFIG_NUMA +extern int zone_reclaim_mode; +extern int sysctl_min_slab_ratio; +#else +#define zone_reclaim_mode 0 +#endif + extern int page_evictable(struct page *page, struct vm_area_struct *vma); extern void scan_mapping_unevictable_pages(struct address_space *); diff --git a/init/Kconfig b/init/Kconfig index 4f6cdbf..2dfbc09 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -828,6 +828,18 @@ config SCHED_AUTOGROUP config MM_OWNER bool +config UNMAPPED_PAGECACHE_CONTROL + bool "Provide control over unmapped page cache" + default n + help +