Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel writes: > Nikita Danilov wrote: > > > Generally speaking, multi-queue replacement mechanisms were tried in the > > past, and they all suffer from the common drawback: once scanning rate > > is different for different queues, so is the notion of "hotness", > > measured by scanner. As a result multi-queue scanner fails to capture > > working set properly. > > You realize that the current "single" queue in the 2.6 kernel > has this problem in a much worse way: when swappiness is low > and the kernel does not want to reclaim mapped pages, it will > randomly rotate those pages around the list. Agree. Some time ago I tried to solve this very problem with dont-rotate-active-list patch (http://linuxhacker.ru/~nikita/patches/2.6.12-rc6/2005.06.11/vm_03-dont-rotate-active-list.patch), but it had problems on its own. > > In addition, the referenced bit on unmapped page cache pages > was ignored completely, making it impossible for the VM to > separate the page cache working set from transient pages due > to streaming IO. Yes, basically FIFO for clean file system pages and FIFO-second-chance for dirty file pages. Very bad. > > I agree that we should put some more negative feedback in > place if it turns out we need it. I have refault code ready > that can be plugged into this patch, but I don't want to add > the overhead of such code if it turns out we do not actually > need it. In my humble opinion VM already has too many mechanisms that are supposed to help in corner cases, but there is little to do with that, except for major rewrite. Nikita. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Matt Mackall wrote: On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote: - "Active: %8lu kB\n" - "Inactive: %8lu kB\n" ... + "Active(anon): %8lu kB\n" + "Inactive(anon): %8lu kB\n" + "Active(file): %8lu kB\n" + "Inactive(file): %8lu kB\n" Potentially incompatible change. How about preserving the original fields (by totalling), then adding the other fields in a second patch. Fixed in the attached patch. if (!pagevec_add(_pvec, page)) - __pagevec_lru_add(_pvec); + __pagevec_lru_add_file(_pvec); Wouldn't lru_file_add or file_lru_add be a better name? If the object is a "file lru" then sticking "add" in the middle is a little ugly. Not sure about this one. Does anybody else have an opinion here? spin_lock_irq(>lru_lock); if (PageLRU(page) && !PageActive(page)) { - del_page_from_inactive_list(zone, page); + if (page_anon(page)) { + del_page_from_inactive_anon_list(zone,page); SetPageActive(page); - add_page_to_active_list(zone, page); + add_page_to_active_anon_list(zone, page); + } else { + del_page_from_inactive_file_list(zone, page); + SetPageActive(page); + add_page_to_active_file_list(zone, page); + } __count_vm_event(PGACTIVATE); } Missing a level of indentation. Fixed. -- All Rights Reversed --- linux-2.6.20.x86_64/fs/proc/proc_misc.c.vmsplit 2007-03-19 20:09:22.0 -0400 +++ linux-2.6.20.x86_64/fs/proc/proc_misc.c 2007-03-21 15:10:25.0 -0400 @@ -147,43 +147,53 @@ static int meminfo_read_proc(char *page, * Tagged format, for easy grepping and expansion. */ len = sprintf(page, - "MemTotal: %8lu kB\n" - "MemFree: %8lu kB\n" - "Buffers: %8lu kB\n" - "Cached: %8lu kB\n" - "SwapCached: %8lu kB\n" - "Active: %8lu kB\n" - "Inactive: %8lu kB\n" + "MemTotal: %8lu kB\n" + "MemFree:%8lu kB\n" + "Buffers:%8lu kB\n" + "Cached: %8lu kB\n" + "SwapCached: %8lu kB\n" + "Active: %8lu kB\n" + "Inactive: %8lu kB\n" + "Active(anon): %8lu kB\n" + "Inactive(anon): %8lu kB\n" + "Active(file): %8lu kB\n" + "Inactive(file): %8lu kB\n" #ifdef CONFIG_HIGHMEM - "HighTotal:%8lu kB\n" - "HighFree: %8lu kB\n" - "LowTotal: %8lu kB\n" - "LowFree: %8lu kB\n" -#endif - "SwapTotal:%8lu kB\n" - "SwapFree: %8lu kB\n" - "Dirty:%8lu kB\n" - "Writeback:%8lu kB\n" - "AnonPages:%8lu kB\n" - "Mapped: %8lu kB\n" - "Slab: %8lu kB\n" - "SReclaimable: %8lu kB\n" - "SUnreclaim: %8lu kB\n" - "PageTables: %8lu kB\n" - "NFS_Unstable: %8lu kB\n" - "Bounce: %8lu kB\n" - "CommitLimit: %8lu kB\n" - "Committed_AS: %8lu kB\n" - "VmallocTotal: %8lu kB\n" - "VmallocUsed: %8lu kB\n" - "VmallocChunk: %8lu kB\n", + "HighTotal: %8lu kB\n" + "HighFree: %8lu kB\n" + "LowTotal: %8lu kB\n" + "LowFree:%8lu kB\n" +#endif + "SwapTotal: %8lu kB\n" + "SwapFree: %8lu kB\n" + "Dirty: %8lu kB\n" + "Writeback: %8lu kB\n" + "AnonPages: %8lu kB\n" + "Mapped: %8lu kB\n" + "Slab: %8lu kB\n" + "SReclaimable: %8lu kB\n" + "SUnreclaim: %8lu kB\n" + "PageTables: %8lu kB\n" + "NFS_Unstable: %8lu kB\n" + "Bounce: %8lu kB\n" + "CommitLimit:%8lu kB\n" + "Committed_AS: %8lu kB\n" + "VmallocTotal: %8lu kB\n" + "VmallocUsed:%8lu kB\n" + "VmallocChunk: %8lu kB\n", K(i.totalram), K(i.freeram), K(i.bufferram), K(cached), K(total_swapcache_pages), - K(global_page_state(NR_ACTIVE)), - K(global_page_state(NR_INACTIVE)), + K(global_page_state(NR_ACTIVE_ANON) + +global_page_state(NR_ACTIVE_FILE)), + K(global_page_state(NR_INACTIVE_ANON) + +global_page_state(NR_INACTIVE_FILE)), + K(global_page_state(NR_ACTIVE_ANON)), + K(global_page_state(NR_INACTIVE_ANON)), + K(global_page_state(NR_ACTIVE_FILE)), + K(global_page_state(NR_INACTIVE_FILE)), #ifdef CONFIG_HIGHMEM K(i.totalhigh), K(i.freehigh), --- linux-2.6.20.x86_64/fs/mpage.c.vmsplit 2007-02-04 13:44:54.0 -0500 +++ linux-2.6.20.x86_64/fs/mpage.c 2007-03-19 20:09:36.0 -0400 @@ -408,12 +408,12 @@ mpage_readpages(struct address_space *ma _logical_block, get_block); if (!pagevec_add(_pvec, page)) -__pagevec_lru_add(_pvec); +__pagevec_lru_add_file(_pvec); } else { page_cache_release(page); } } - pagevec_lru_add(_pvec); + pagevec_lru_add_file(_pvec); BUG_ON(!list_empty(pages)); if (bio) mpage_bio_submit(READ, bio); --- linux-2.6.20.x86_64/fs/cifs/file.c.vmsplit 2007-03-19 20:09:21.0 -0400 +++ linux-2.6.20.x86_64/fs/cifs/file.c 2007-03-19
Re: [RFC][PATCH] split file and anonymous page queues #3
Chuck Ebbert wrote: I think you're going to have to use refault rates. AIX 3.5 had to add that. Something like: if refault_rate(anonymous/mmap) > refault_rate(pagecache) drop a pagecache page else drop either How about the opposite? If the page cache refault rate is way higher than the anonymous refault rate, did you favor page cache? Btw, just a higher fault rate will already make that cache grow faster than the other, simply because it will have more allocations than the other cache and they both get shrunk to some degree... You do have anonymous memory and mmapped executables in the same queue, right? Nope. It is very hard to see the difference between mmapped executables and mmapped data from any structure linked off the struct page... -- All Rights Reversed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel wrote: > Nikita Danilov wrote: > >> Probably I am missing something, but I don't see how that can help. For >> example, suppose (for simplicity) that we have swappiness of 100%, and >> that fraction of referenced anon pages gets slightly less than of file >> pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts >> scanning anon queue more aggressively. As a result, pages spend less >> time there, and have less chance of ever being accessed, reducing >> fraction of referenced anon pages further, and triggering further >> increase in the amount of scanning, etc. Doesn't this introduce positive >> feed-back loop? > > It's a possibility, but I don't think it will be much of an > issue in practice. > > If it is, we can always use refaults as a correcting > mechanism - which would have the added benefit of being > able to do streaming IO without putting any pressure on > the active list, essentially clock-pro replacement with > just some tweaks to shrink_list()... > I think you're going to have to use refault rates. AIX 3.5 had to add that. Something like: if refault_rate(anonymous/mmap) > refault_rate(pagecache) drop a pagecache page else drop either You do have anonymous memory and mmapped executables in the same queue, right? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Nikita Danilov wrote: Generally speaking, multi-queue replacement mechanisms were tried in the past, and they all suffer from the common drawback: once scanning rate is different for different queues, so is the notion of "hotness", measured by scanner. As a result multi-queue scanner fails to capture working set properly. You realize that the current "single" queue in the 2.6 kernel has this problem in a much worse way: when swappiness is low and the kernel does not want to reclaim mapped pages, it will randomly rotate those pages around the list. In addition, the referenced bit on unmapped page cache pages was ignored completely, making it impossible for the VM to separate the page cache working set from transient pages due to streaming IO. I agree that we should put some more negative feedback in place if it turns out we need it. I have refault code ready that can be plugged into this patch, but I don't want to add the overhead of such code if it turns out we do not actually need it. -- All Rights Reversed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel writes: > Rik van Riel wrote: > > Nikita Danilov wrote: > > > >> Probably I am missing something, but I don't see how that can help. For > >> example, suppose (for simplicity) that we have swappiness of 100%, and > >> that fraction of referenced anon pages gets slightly less than of file > >> pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts > >> scanning anon queue more aggressively. As a result, pages spend less > >> time there, and have less chance of ever being accessed, reducing > >> fraction of referenced anon pages further, and triggering further > >> increase in the amount of scanning, etc. Doesn't this introduce positive > >> feed-back loop? > > > > It's a possibility, but I don't think it will be much of an > > issue in practice. > > > > If it is, we can always use refaults as a correcting > > mechanism - which would have the added benefit of being > > able to do streaming IO without putting any pressure on > > the active list, essentially clock-pro replacement with > > just some tweaks to shrink_list()... > > As an aside, due to the use-once algorithm file pages are at a > natural disadvantage already. I believe it would be really > hard to construct a workload where anon pages suffer the positive > feedback loop you describe... That scenario works for file queues too. Of course, all this is but a theoretical speculation at this point, but I am concerned that - that loop would tend to happen under various border conditions, making it hard to isolate, diagnose, and debug, and - long before it becomes explicitly visible (say, as an excessive cpu consumption by scanner), it would ruin global lru ordering, degrading overall performance. Generally speaking, multi-queue replacement mechanisms were tried in the past, and they all suffer from the common drawback: once scanning rate is different for different queues, so is the notion of "hotness", measured by scanner. As a result multi-queue scanner fails to capture working set properly. Nikita. > > -- > Politics is the struggle between those who want to make their country > the best in the world, and those who believe it already is. Each group > calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel wrote: Nikita Danilov wrote: Probably I am missing something, but I don't see how that can help. For example, suppose (for simplicity) that we have swappiness of 100%, and that fraction of referenced anon pages gets slightly less than of file pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts scanning anon queue more aggressively. As a result, pages spend less time there, and have less chance of ever being accessed, reducing fraction of referenced anon pages further, and triggering further increase in the amount of scanning, etc. Doesn't this introduce positive feed-back loop? It's a possibility, but I don't think it will be much of an issue in practice. If it is, we can always use refaults as a correcting mechanism - which would have the added benefit of being able to do streaming IO without putting any pressure on the active list, essentially clock-pro replacement with just some tweaks to shrink_list()... As an aside, due to the use-once algorithm file pages are at a natural disadvantage already. I believe it would be really hard to construct a workload where anon pages suffer the positive feedback loop you describe... -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Nikita Danilov wrote: Probably I am missing something, but I don't see how that can help. For example, suppose (for simplicity) that we have swappiness of 100%, and that fraction of referenced anon pages gets slightly less than of file pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts scanning anon queue more aggressively. As a result, pages spend less time there, and have less chance of ever being accessed, reducing fraction of referenced anon pages further, and triggering further increase in the amount of scanning, etc. Doesn't this introduce positive feed-back loop? It's a possibility, but I don't think it will be much of an issue in practice. If it is, we can always use refaults as a correcting mechanism - which would have the added benefit of being able to do streaming IO without putting any pressure on the active list, essentially clock-pro replacement with just some tweaks to shrink_list()... -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel writes: > Nikita Danilov wrote: > > Rik van Riel writes: > > > [ OK, I suck. I edited yesterday's email with the new info, but forgot > > >to change the attachment to today's patch. Here is today's patch. ] > > > > > > Split the anonymous and file backed pages out onto their own pageout > > > queues. This we do not unnecessarily churn through lots of anonymous > > > pages when we do not want to swap them out anyway. > > > > Won't this re-introduce problems similar to ones due to split > > inactive_clean/inactive_dirty queues we had in the past? > > > > For example, by rotating anon queues faster than file queues, kernel > > would end up reclaiming anon pages that are hotter (in "absolute" LRU > > order) than some file pages. > > That is why we check the fraction of referenced pages in each > queue. Please look at the get_scan_ratio() and shrink_zone() > code in my patch. Probably I am missing something, but I don't see how that can help. For example, suppose (for simplicity) that we have swappiness of 100%, and that fraction of referenced anon pages gets slightly less than of file pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts scanning anon queue more aggressively. As a result, pages spend less time there, and have less chance of ever being accessed, reducing fraction of referenced anon pages further, and triggering further increase in the amount of scanning, etc. Doesn't this introduce positive feed-back loop? Nikita. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Nikita Danilov wrote: Rik van Riel writes: > [ OK, I suck. I edited yesterday's email with the new info, but forgot >to change the attachment to today's patch. Here is today's patch. ] > > Split the anonymous and file backed pages out onto their own pageout > queues. This we do not unnecessarily churn through lots of anonymous > pages when we do not want to swap them out anyway. Won't this re-introduce problems similar to ones due to split inactive_clean/inactive_dirty queues we had in the past? For example, by rotating anon queues faster than file queues, kernel would end up reclaiming anon pages that are hotter (in "absolute" LRU order) than some file pages. That is why we check the fraction of referenced pages in each queue. Please look at the get_scan_ratio() and shrink_zone() code in my patch. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel writes: > [ OK, I suck. I edited yesterday's email with the new info, but forgot >to change the attachment to today's patch. Here is today's patch. ] > > Split the anonymous and file backed pages out onto their own pageout > queues. This we do not unnecessarily churn through lots of anonymous > pages when we do not want to swap them out anyway. Won't this re-introduce problems similar to ones due to split inactive_clean/inactive_dirty queues we had in the past? For example, by rotating anon queues faster than file queues, kernel would end up reclaiming anon pages that are hotter (in "absolute" LRU order) than some file pages. Nikita. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel writes: [ OK, I suck. I edited yesterday's email with the new info, but forgot to change the attachment to today's patch. Here is today's patch. ] Split the anonymous and file backed pages out onto their own pageout queues. This we do not unnecessarily churn through lots of anonymous pages when we do not want to swap them out anyway. Won't this re-introduce problems similar to ones due to split inactive_clean/inactive_dirty queues we had in the past? For example, by rotating anon queues faster than file queues, kernel would end up reclaiming anon pages that are hotter (in absolute LRU order) than some file pages. Nikita. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Nikita Danilov wrote: Rik van Riel writes: [ OK, I suck. I edited yesterday's email with the new info, but forgot to change the attachment to today's patch. Here is today's patch. ] Split the anonymous and file backed pages out onto their own pageout queues. This we do not unnecessarily churn through lots of anonymous pages when we do not want to swap them out anyway. Won't this re-introduce problems similar to ones due to split inactive_clean/inactive_dirty queues we had in the past? For example, by rotating anon queues faster than file queues, kernel would end up reclaiming anon pages that are hotter (in absolute LRU order) than some file pages. That is why we check the fraction of referenced pages in each queue. Please look at the get_scan_ratio() and shrink_zone() code in my patch. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel writes: Nikita Danilov wrote: Rik van Riel writes: [ OK, I suck. I edited yesterday's email with the new info, but forgot to change the attachment to today's patch. Here is today's patch. ] Split the anonymous and file backed pages out onto their own pageout queues. This we do not unnecessarily churn through lots of anonymous pages when we do not want to swap them out anyway. Won't this re-introduce problems similar to ones due to split inactive_clean/inactive_dirty queues we had in the past? For example, by rotating anon queues faster than file queues, kernel would end up reclaiming anon pages that are hotter (in absolute LRU order) than some file pages. That is why we check the fraction of referenced pages in each queue. Please look at the get_scan_ratio() and shrink_zone() code in my patch. Probably I am missing something, but I don't see how that can help. For example, suppose (for simplicity) that we have swappiness of 100%, and that fraction of referenced anon pages gets slightly less than of file pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts scanning anon queue more aggressively. As a result, pages spend less time there, and have less chance of ever being accessed, reducing fraction of referenced anon pages further, and triggering further increase in the amount of scanning, etc. Doesn't this introduce positive feed-back loop? Nikita. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Nikita Danilov wrote: Probably I am missing something, but I don't see how that can help. For example, suppose (for simplicity) that we have swappiness of 100%, and that fraction of referenced anon pages gets slightly less than of file pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts scanning anon queue more aggressively. As a result, pages spend less time there, and have less chance of ever being accessed, reducing fraction of referenced anon pages further, and triggering further increase in the amount of scanning, etc. Doesn't this introduce positive feed-back loop? It's a possibility, but I don't think it will be much of an issue in practice. If it is, we can always use refaults as a correcting mechanism - which would have the added benefit of being able to do streaming IO without putting any pressure on the active list, essentially clock-pro replacement with just some tweaks to shrink_list()... -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel wrote: Nikita Danilov wrote: Probably I am missing something, but I don't see how that can help. For example, suppose (for simplicity) that we have swappiness of 100%, and that fraction of referenced anon pages gets slightly less than of file pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts scanning anon queue more aggressively. As a result, pages spend less time there, and have less chance of ever being accessed, reducing fraction of referenced anon pages further, and triggering further increase in the amount of scanning, etc. Doesn't this introduce positive feed-back loop? It's a possibility, but I don't think it will be much of an issue in practice. If it is, we can always use refaults as a correcting mechanism - which would have the added benefit of being able to do streaming IO without putting any pressure on the active list, essentially clock-pro replacement with just some tweaks to shrink_list()... As an aside, due to the use-once algorithm file pages are at a natural disadvantage already. I believe it would be really hard to construct a workload where anon pages suffer the positive feedback loop you describe... -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel writes: Rik van Riel wrote: Nikita Danilov wrote: Probably I am missing something, but I don't see how that can help. For example, suppose (for simplicity) that we have swappiness of 100%, and that fraction of referenced anon pages gets slightly less than of file pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts scanning anon queue more aggressively. As a result, pages spend less time there, and have less chance of ever being accessed, reducing fraction of referenced anon pages further, and triggering further increase in the amount of scanning, etc. Doesn't this introduce positive feed-back loop? It's a possibility, but I don't think it will be much of an issue in practice. If it is, we can always use refaults as a correcting mechanism - which would have the added benefit of being able to do streaming IO without putting any pressure on the active list, essentially clock-pro replacement with just some tweaks to shrink_list()... As an aside, due to the use-once algorithm file pages are at a natural disadvantage already. I believe it would be really hard to construct a workload where anon pages suffer the positive feedback loop you describe... That scenario works for file queues too. Of course, all this is but a theoretical speculation at this point, but I am concerned that - that loop would tend to happen under various border conditions, making it hard to isolate, diagnose, and debug, and - long before it becomes explicitly visible (say, as an excessive cpu consumption by scanner), it would ruin global lru ordering, degrading overall performance. Generally speaking, multi-queue replacement mechanisms were tried in the past, and they all suffer from the common drawback: once scanning rate is different for different queues, so is the notion of hotness, measured by scanner. As a result multi-queue scanner fails to capture working set properly. Nikita. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Nikita Danilov wrote: Generally speaking, multi-queue replacement mechanisms were tried in the past, and they all suffer from the common drawback: once scanning rate is different for different queues, so is the notion of hotness, measured by scanner. As a result multi-queue scanner fails to capture working set properly. You realize that the current single queue in the 2.6 kernel has this problem in a much worse way: when swappiness is low and the kernel does not want to reclaim mapped pages, it will randomly rotate those pages around the list. In addition, the referenced bit on unmapped page cache pages was ignored completely, making it impossible for the VM to separate the page cache working set from transient pages due to streaming IO. I agree that we should put some more negative feedback in place if it turns out we need it. I have refault code ready that can be plugged into this patch, but I don't want to add the overhead of such code if it turns out we do not actually need it. -- All Rights Reversed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel wrote: Nikita Danilov wrote: Probably I am missing something, but I don't see how that can help. For example, suppose (for simplicity) that we have swappiness of 100%, and that fraction of referenced anon pages gets slightly less than of file pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts scanning anon queue more aggressively. As a result, pages spend less time there, and have less chance of ever being accessed, reducing fraction of referenced anon pages further, and triggering further increase in the amount of scanning, etc. Doesn't this introduce positive feed-back loop? It's a possibility, but I don't think it will be much of an issue in practice. If it is, we can always use refaults as a correcting mechanism - which would have the added benefit of being able to do streaming IO without putting any pressure on the active list, essentially clock-pro replacement with just some tweaks to shrink_list()... I think you're going to have to use refault rates. AIX 3.5 had to add that. Something like: if refault_rate(anonymous/mmap) refault_rate(pagecache) drop a pagecache page else drop either You do have anonymous memory and mmapped executables in the same queue, right? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Chuck Ebbert wrote: I think you're going to have to use refault rates. AIX 3.5 had to add that. Something like: if refault_rate(anonymous/mmap) refault_rate(pagecache) drop a pagecache page else drop either How about the opposite? If the page cache refault rate is way higher than the anonymous refault rate, did you favor page cache? Btw, just a higher fault rate will already make that cache grow faster than the other, simply because it will have more allocations than the other cache and they both get shrunk to some degree... You do have anonymous memory and mmapped executables in the same queue, right? Nope. It is very hard to see the difference between mmapped executables and mmapped data from any structure linked off the struct page... -- All Rights Reversed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
Matt Mackall wrote: On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote: - Active: %8lu kB\n - Inactive: %8lu kB\n ... + Active(anon): %8lu kB\n + Inactive(anon): %8lu kB\n + Active(file): %8lu kB\n + Inactive(file): %8lu kB\n Potentially incompatible change. How about preserving the original fields (by totalling), then adding the other fields in a second patch. Fixed in the attached patch. if (!pagevec_add(lru_pvec, page)) - __pagevec_lru_add(lru_pvec); + __pagevec_lru_add_file(lru_pvec); Wouldn't lru_file_add or file_lru_add be a better name? If the object is a file lru then sticking add in the middle is a little ugly. Not sure about this one. Does anybody else have an opinion here? spin_lock_irq(zone-lru_lock); if (PageLRU(page) !PageActive(page)) { - del_page_from_inactive_list(zone, page); + if (page_anon(page)) { + del_page_from_inactive_anon_list(zone,page); SetPageActive(page); - add_page_to_active_list(zone, page); + add_page_to_active_anon_list(zone, page); + } else { + del_page_from_inactive_file_list(zone, page); + SetPageActive(page); + add_page_to_active_file_list(zone, page); + } __count_vm_event(PGACTIVATE); } Missing a level of indentation. Fixed. -- All Rights Reversed --- linux-2.6.20.x86_64/fs/proc/proc_misc.c.vmsplit 2007-03-19 20:09:22.0 -0400 +++ linux-2.6.20.x86_64/fs/proc/proc_misc.c 2007-03-21 15:10:25.0 -0400 @@ -147,43 +147,53 @@ static int meminfo_read_proc(char *page, * Tagged format, for easy grepping and expansion. */ len = sprintf(page, - MemTotal: %8lu kB\n - MemFree: %8lu kB\n - Buffers: %8lu kB\n - Cached: %8lu kB\n - SwapCached: %8lu kB\n - Active: %8lu kB\n - Inactive: %8lu kB\n + MemTotal: %8lu kB\n + MemFree:%8lu kB\n + Buffers:%8lu kB\n + Cached: %8lu kB\n + SwapCached: %8lu kB\n + Active: %8lu kB\n + Inactive: %8lu kB\n + Active(anon): %8lu kB\n + Inactive(anon): %8lu kB\n + Active(file): %8lu kB\n + Inactive(file): %8lu kB\n #ifdef CONFIG_HIGHMEM - HighTotal:%8lu kB\n - HighFree: %8lu kB\n - LowTotal: %8lu kB\n - LowFree: %8lu kB\n -#endif - SwapTotal:%8lu kB\n - SwapFree: %8lu kB\n - Dirty:%8lu kB\n - Writeback:%8lu kB\n - AnonPages:%8lu kB\n - Mapped: %8lu kB\n - Slab: %8lu kB\n - SReclaimable: %8lu kB\n - SUnreclaim: %8lu kB\n - PageTables: %8lu kB\n - NFS_Unstable: %8lu kB\n - Bounce: %8lu kB\n - CommitLimit: %8lu kB\n - Committed_AS: %8lu kB\n - VmallocTotal: %8lu kB\n - VmallocUsed: %8lu kB\n - VmallocChunk: %8lu kB\n, + HighTotal: %8lu kB\n + HighFree: %8lu kB\n + LowTotal: %8lu kB\n + LowFree:%8lu kB\n +#endif + SwapTotal: %8lu kB\n + SwapFree: %8lu kB\n + Dirty: %8lu kB\n + Writeback: %8lu kB\n + AnonPages: %8lu kB\n + Mapped: %8lu kB\n + Slab: %8lu kB\n + SReclaimable: %8lu kB\n + SUnreclaim: %8lu kB\n + PageTables: %8lu kB\n + NFS_Unstable: %8lu kB\n + Bounce: %8lu kB\n + CommitLimit:%8lu kB\n + Committed_AS: %8lu kB\n + VmallocTotal: %8lu kB\n + VmallocUsed:%8lu kB\n + VmallocChunk: %8lu kB\n, K(i.totalram), K(i.freeram), K(i.bufferram), K(cached), K(total_swapcache_pages), - K(global_page_state(NR_ACTIVE)), - K(global_page_state(NR_INACTIVE)), + K(global_page_state(NR_ACTIVE_ANON) + +global_page_state(NR_ACTIVE_FILE)), + K(global_page_state(NR_INACTIVE_ANON) + +global_page_state(NR_INACTIVE_FILE)), + K(global_page_state(NR_ACTIVE_ANON)), + K(global_page_state(NR_INACTIVE_ANON)), + K(global_page_state(NR_ACTIVE_FILE)), + K(global_page_state(NR_INACTIVE_FILE)), #ifdef CONFIG_HIGHMEM K(i.totalhigh), K(i.freehigh), --- linux-2.6.20.x86_64/fs/mpage.c.vmsplit 2007-02-04 13:44:54.0 -0500 +++ linux-2.6.20.x86_64/fs/mpage.c 2007-03-19 20:09:36.0 -0400 @@ -408,12 +408,12 @@ mpage_readpages(struct address_space *ma first_logical_block, get_block); if (!pagevec_add(lru_pvec, page)) -__pagevec_lru_add(lru_pvec); +__pagevec_lru_add_file(lru_pvec); } else { page_cache_release(page); } } - pagevec_lru_add(lru_pvec); + pagevec_lru_add_file(lru_pvec); BUG_ON(!list_empty(pages)); if (bio) mpage_bio_submit(READ, bio); --- linux-2.6.20.x86_64/fs/cifs/file.c.vmsplit 2007-03-19 20:09:21.0 -0400 +++ linux-2.6.20.x86_64/fs/cifs/file.c 2007-03-19 20:09:36.0 -0400 @@ -1746,7 +1746,7 @@ static void cifs_copy_cache_pages(struct
Re: [RFC][PATCH] split file and anonymous page queues #3
Rik van Riel writes: Nikita Danilov wrote: Generally speaking, multi-queue replacement mechanisms were tried in the past, and they all suffer from the common drawback: once scanning rate is different for different queues, so is the notion of hotness, measured by scanner. As a result multi-queue scanner fails to capture working set properly. You realize that the current single queue in the 2.6 kernel has this problem in a much worse way: when swappiness is low and the kernel does not want to reclaim mapped pages, it will randomly rotate those pages around the list. Agree. Some time ago I tried to solve this very problem with dont-rotate-active-list patch (http://linuxhacker.ru/~nikita/patches/2.6.12-rc6/2005.06.11/vm_03-dont-rotate-active-list.patch), but it had problems on its own. In addition, the referenced bit on unmapped page cache pages was ignored completely, making it impossible for the VM to separate the page cache working set from transient pages due to streaming IO. Yes, basically FIFO for clean file system pages and FIFO-second-chance for dirty file pages. Very bad. I agree that we should put some more negative feedback in place if it turns out we need it. I have refault code ready that can be plugged into this patch, but I don't want to add the overhead of such code if it turns out we do not actually need it. In my humble opinion VM already has too many mechanisms that are supposed to help in corner cases, but there is little to do with that, except for major rewrite. Nikita. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote: > - "Active: %8lu kB\n" > - "Inactive: %8lu kB\n" ... > + "Active(anon): %8lu kB\n" > + "Inactive(anon): %8lu kB\n" > + "Active(file): %8lu kB\n" > + "Inactive(file): %8lu kB\n" Potentially incompatible change. How about preserving the original fields (by totalling), then adding the other fields in a second patch. > if (!pagevec_add(_pvec, page)) > - __pagevec_lru_add(_pvec); > + __pagevec_lru_add_file(_pvec); Wouldn't lru_file_add or file_lru_add be a better name? If the object is a "file lru" then sticking "add" in the middle is a little ugly. > spin_lock_irq(>lru_lock); > if (PageLRU(page) && !PageActive(page)) { > - del_page_from_inactive_list(zone, page); > + if (page_anon(page)) { > + del_page_from_inactive_anon_list(zone,page); > SetPageActive(page); > - add_page_to_active_list(zone, page); > + add_page_to_active_anon_list(zone, page); > + } else { > + del_page_from_inactive_file_list(zone, page); > + SetPageActive(page); > + add_page_to_active_file_list(zone, page); > + } > __count_vm_event(PGACTIVATE); > } Missing a level of indentation. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] split file and anonymous page queues #3
On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote: - Active: %8lu kB\n - Inactive: %8lu kB\n ... + Active(anon): %8lu kB\n + Inactive(anon): %8lu kB\n + Active(file): %8lu kB\n + Inactive(file): %8lu kB\n Potentially incompatible change. How about preserving the original fields (by totalling), then adding the other fields in a second patch. if (!pagevec_add(lru_pvec, page)) - __pagevec_lru_add(lru_pvec); + __pagevec_lru_add_file(lru_pvec); Wouldn't lru_file_add or file_lru_add be a better name? If the object is a file lru then sticking add in the middle is a little ugly. spin_lock_irq(zone-lru_lock); if (PageLRU(page) !PageActive(page)) { - del_page_from_inactive_list(zone, page); + if (page_anon(page)) { + del_page_from_inactive_anon_list(zone,page); SetPageActive(page); - add_page_to_active_list(zone, page); + add_page_to_active_anon_list(zone, page); + } else { + del_page_from_inactive_file_list(zone, page); + SetPageActive(page); + add_page_to_active_file_list(zone, page); + } __count_vm_event(PGACTIVATE); } Missing a level of indentation. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/