Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Nikita Danilov
Rik van Riel writes:
 > Nikita Danilov wrote:
 > 
 > > Generally speaking, multi-queue replacement mechanisms were tried in the
 > > past, and they all suffer from the common drawback: once scanning rate
 > > is different for different queues, so is the notion of "hotness",
 > > measured by scanner. As a result multi-queue scanner fails to capture
 > > working set properly.
 > 
 > You realize that the current "single" queue in the 2.6 kernel
 > has this problem in a much worse way: when swappiness is low
 > and the kernel does not want to reclaim mapped pages, it will
 > randomly rotate those pages around the list.

Agree. Some time ago I tried to solve this very problem with
dont-rotate-active-list patch
(http://linuxhacker.ru/~nikita/patches/2.6.12-rc6/2005.06.11/vm_03-dont-rotate-active-list.patch),
but it had problems on its own.

 > 
 > In addition, the referenced bit on unmapped page cache pages
 > was ignored completely, making it impossible for the VM to
 > separate the page cache working set from transient pages due
 > to streaming IO.

Yes, basically FIFO for clean file system pages and FIFO-second-chance
for dirty file pages. Very bad.

 > 
 > I agree that we should put some more negative feedback in
 > place if it turns out we need it.  I have refault code ready
 > that can be plugged into this patch, but I don't want to add
 > the overhead of such code if it turns out we do not actually
 > need it.

In my humble opinion VM already has too many mechanisms that are
supposed to help in corner cases, but there is little to do with that,
except for major rewrite.

Nikita.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Matt Mackall wrote:

On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote:

-   "Active:   %8lu kB\n"
-   "Inactive: %8lu kB\n"

...

+   "Active(anon):   %8lu kB\n"
+   "Inactive(anon): %8lu kB\n"
+   "Active(file):   %8lu kB\n"
+   "Inactive(file): %8lu kB\n"


Potentially incompatible change. How about preserving the original
fields (by totalling), then adding the other fields in a second patch.


Fixed in the attached patch.


if (!pagevec_add(_pvec, page))
-   __pagevec_lru_add(_pvec);
+   __pagevec_lru_add_file(_pvec);


Wouldn't lru_file_add or file_lru_add be a better name? If the object
is a "file lru" then sticking "add" in the middle is a little ugly.


Not sure about this one.  Does anybody else have an opinion here?


spin_lock_irq(>lru_lock);
if (PageLRU(page) && !PageActive(page)) {
-   del_page_from_inactive_list(zone, page);
+   if (page_anon(page)) {
+   del_page_from_inactive_anon_list(zone,page);
SetPageActive(page);
-   add_page_to_active_list(zone, page);
+   add_page_to_active_anon_list(zone, page);
+   } else {
+   del_page_from_inactive_file_list(zone, page);
+   SetPageActive(page);
+   add_page_to_active_file_list(zone, page);
+   }
__count_vm_event(PGACTIVATE);
}


Missing a level of indentation.


Fixed.

--
All Rights Reversed
--- linux-2.6.20.x86_64/fs/proc/proc_misc.c.vmsplit	2007-03-19 20:09:22.0 -0400
+++ linux-2.6.20.x86_64/fs/proc/proc_misc.c	2007-03-21 15:10:25.0 -0400
@@ -147,43 +147,53 @@ static int meminfo_read_proc(char *page,
 	 * Tagged format, for easy grepping and expansion.
 	 */
 	len = sprintf(page,
-		"MemTotal: %8lu kB\n"
-		"MemFree:  %8lu kB\n"
-		"Buffers:  %8lu kB\n"
-		"Cached:   %8lu kB\n"
-		"SwapCached:   %8lu kB\n"
-		"Active:   %8lu kB\n"
-		"Inactive: %8lu kB\n"
+		"MemTotal:   %8lu kB\n"
+		"MemFree:%8lu kB\n"
+		"Buffers:%8lu kB\n"
+		"Cached: %8lu kB\n"
+		"SwapCached: %8lu kB\n"
+		"Active: %8lu kB\n"
+		"Inactive:   %8lu kB\n"
+		"Active(anon):   %8lu kB\n"
+		"Inactive(anon): %8lu kB\n"
+		"Active(file):   %8lu kB\n"
+		"Inactive(file): %8lu kB\n"
 #ifdef CONFIG_HIGHMEM
-		"HighTotal:%8lu kB\n"
-		"HighFree: %8lu kB\n"
-		"LowTotal: %8lu kB\n"
-		"LowFree:  %8lu kB\n"
-#endif
-		"SwapTotal:%8lu kB\n"
-		"SwapFree: %8lu kB\n"
-		"Dirty:%8lu kB\n"
-		"Writeback:%8lu kB\n"
-		"AnonPages:%8lu kB\n"
-		"Mapped:   %8lu kB\n"
-		"Slab: %8lu kB\n"
-		"SReclaimable: %8lu kB\n"
-		"SUnreclaim:   %8lu kB\n"
-		"PageTables:   %8lu kB\n"
-		"NFS_Unstable: %8lu kB\n"
-		"Bounce:   %8lu kB\n"
-		"CommitLimit:  %8lu kB\n"
-		"Committed_AS: %8lu kB\n"
-		"VmallocTotal: %8lu kB\n"
-		"VmallocUsed:  %8lu kB\n"
-		"VmallocChunk: %8lu kB\n",
+		"HighTotal:  %8lu kB\n"
+		"HighFree:   %8lu kB\n"
+		"LowTotal:   %8lu kB\n"
+		"LowFree:%8lu kB\n"
+#endif
+		"SwapTotal:  %8lu kB\n"
+		"SwapFree:   %8lu kB\n"
+		"Dirty:  %8lu kB\n"
+		"Writeback:  %8lu kB\n"
+		"AnonPages:  %8lu kB\n"
+		"Mapped: %8lu kB\n"
+		"Slab:   %8lu kB\n"
+		"SReclaimable:   %8lu kB\n"
+		"SUnreclaim: %8lu kB\n"
+		"PageTables: %8lu kB\n"
+		"NFS_Unstable:   %8lu kB\n"
+		"Bounce: %8lu kB\n"
+		"CommitLimit:%8lu kB\n"
+		"Committed_AS:   %8lu kB\n"
+		"VmallocTotal:   %8lu kB\n"
+		"VmallocUsed:%8lu kB\n"
+		"VmallocChunk:   %8lu kB\n",
 		K(i.totalram),
 		K(i.freeram),
 		K(i.bufferram),
 		K(cached),
 		K(total_swapcache_pages),
-		K(global_page_state(NR_ACTIVE)),
-		K(global_page_state(NR_INACTIVE)),
+		K(global_page_state(NR_ACTIVE_ANON) +
+global_page_state(NR_ACTIVE_FILE)),
+		K(global_page_state(NR_INACTIVE_ANON) +
+global_page_state(NR_INACTIVE_FILE)),
+		K(global_page_state(NR_ACTIVE_ANON)),
+		K(global_page_state(NR_INACTIVE_ANON)),
+		K(global_page_state(NR_ACTIVE_FILE)),
+		K(global_page_state(NR_INACTIVE_FILE)),
 #ifdef CONFIG_HIGHMEM
 		K(i.totalhigh),
 		K(i.freehigh),
--- linux-2.6.20.x86_64/fs/mpage.c.vmsplit	2007-02-04 13:44:54.0 -0500
+++ linux-2.6.20.x86_64/fs/mpage.c	2007-03-19 20:09:36.0 -0400
@@ -408,12 +408,12 @@ mpage_readpages(struct address_space *ma
 	_logical_block,
 	get_block);
 			if (!pagevec_add(_pvec, page))
-__pagevec_lru_add(_pvec);
+__pagevec_lru_add_file(_pvec);
 		} else {
 			page_cache_release(page);
 		}
 	}
-	pagevec_lru_add(_pvec);
+	pagevec_lru_add_file(_pvec);
 	BUG_ON(!list_empty(pages));
 	if (bio)
 		mpage_bio_submit(READ, bio);
--- linux-2.6.20.x86_64/fs/cifs/file.c.vmsplit	2007-03-19 20:09:21.0 -0400
+++ linux-2.6.20.x86_64/fs/cifs/file.c	2007-03-19 

Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Chuck Ebbert wrote:


I think you're going to have to use refault rates. AIX 3.5 had
to add that. Something like:

if refault_rate(anonymous/mmap) > refault_rate(pagecache)
   drop a pagecache page
else
   drop either


How about the opposite?

If the page cache refault rate is way higher than the
anonymous refault rate, did you favor page cache?

Btw, just a higher fault rate will already make that
cache grow faster than the other, simply because it
will have more allocations than the other cache and
they both get shrunk to some degree...


You do have anonymous memory and mmapped executables in the same
queue, right?


Nope.  It is very hard to see the difference between mmapped
executables and mmapped data from any structure linked off
the struct page...

--
All Rights Reversed
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Chuck Ebbert
Rik van Riel wrote:
> Nikita Danilov wrote:
> 
>> Probably I am missing something, but I don't see how that can help. For
>> example, suppose (for simplicity) that we have swappiness of 100%, and
>> that fraction of referenced anon pages gets slightly less than of file
>> pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
>> scanning anon queue more aggressively. As a result, pages spend less
>> time there, and have less chance of ever being accessed, reducing
>> fraction of referenced anon pages further, and triggering further
>> increase in the amount of scanning, etc. Doesn't this introduce positive
>> feed-back loop?
> 
> It's a possibility, but I don't think it will be much of an
> issue in practice.
> 
> If it is, we can always use refaults as a correcting
> mechanism - which would have the added benefit of being
> able to do streaming IO without putting any pressure on
> the active list, essentially clock-pro replacement with
> just some tweaks to shrink_list()...
> 

I think you're going to have to use refault rates. AIX 3.5 had
to add that. Something like:

if refault_rate(anonymous/mmap) > refault_rate(pagecache)
   drop a pagecache page
else
   drop either

You do have anonymous memory and mmapped executables in the same
queue, right?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Nikita Danilov wrote:


Generally speaking, multi-queue replacement mechanisms were tried in the
past, and they all suffer from the common drawback: once scanning rate
is different for different queues, so is the notion of "hotness",
measured by scanner. As a result multi-queue scanner fails to capture
working set properly.


You realize that the current "single" queue in the 2.6 kernel
has this problem in a much worse way: when swappiness is low
and the kernel does not want to reclaim mapped pages, it will
randomly rotate those pages around the list.

In addition, the referenced bit on unmapped page cache pages
was ignored completely, making it impossible for the VM to
separate the page cache working set from transient pages due
to streaming IO.

I agree that we should put some more negative feedback in
place if it turns out we need it.  I have refault code ready
that can be plugged into this patch, but I don't want to add
the overhead of such code if it turns out we do not actually
need it.

--
All Rights Reversed
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Nikita Danilov
Rik van Riel writes:
 > Rik van Riel wrote:
 > > Nikita Danilov wrote:
 > > 
 > >> Probably I am missing something, but I don't see how that can help. For
 > >> example, suppose (for simplicity) that we have swappiness of 100%, and
 > >> that fraction of referenced anon pages gets slightly less than of file
 > >> pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
 > >> scanning anon queue more aggressively. As a result, pages spend less
 > >> time there, and have less chance of ever being accessed, reducing
 > >> fraction of referenced anon pages further, and triggering further
 > >> increase in the amount of scanning, etc. Doesn't this introduce positive
 > >> feed-back loop?
 > > 
 > > It's a possibility, but I don't think it will be much of an
 > > issue in practice.
 > > 
 > > If it is, we can always use refaults as a correcting
 > > mechanism - which would have the added benefit of being
 > > able to do streaming IO without putting any pressure on
 > > the active list, essentially clock-pro replacement with
 > > just some tweaks to shrink_list()...
 > 
 > As an aside, due to the use-once algorithm file pages are at a
 > natural disadvantage already.  I believe it would be really
 > hard to construct a workload where anon pages suffer the positive
 > feedback loop you describe...

That scenario works for file queues too. Of course, all this is but a
theoretical speculation at this point, but I am concerned that

 - that loop would tend to happen under various border conditions,
 making it hard to isolate, diagnose, and debug, and

 - long before it becomes explicitly visible (say, as an excessive cpu
 consumption by scanner), it would ruin global lru ordering, degrading
 overall performance.

Generally speaking, multi-queue replacement mechanisms were tried in the
past, and they all suffer from the common drawback: once scanning rate
is different for different queues, so is the notion of "hotness",
measured by scanner. As a result multi-queue scanner fails to capture
working set properly.

Nikita.


 > 
 > -- 
 > Politics is the struggle between those who want to make their country
 > the best in the world, and those who believe it already is.  Each group
 > calls the other unpatriotic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Rik van Riel wrote:

Nikita Danilov wrote:


Probably I am missing something, but I don't see how that can help. For
example, suppose (for simplicity) that we have swappiness of 100%, and
that fraction of referenced anon pages gets slightly less than of file
pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
scanning anon queue more aggressively. As a result, pages spend less
time there, and have less chance of ever being accessed, reducing
fraction of referenced anon pages further, and triggering further
increase in the amount of scanning, etc. Doesn't this introduce positive
feed-back loop?


It's a possibility, but I don't think it will be much of an
issue in practice.

If it is, we can always use refaults as a correcting
mechanism - which would have the added benefit of being
able to do streaming IO without putting any pressure on
the active list, essentially clock-pro replacement with
just some tweaks to shrink_list()...


As an aside, due to the use-once algorithm file pages are at a
natural disadvantage already.  I believe it would be really
hard to construct a workload where anon pages suffer the positive
feedback loop you describe...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Nikita Danilov wrote:


Probably I am missing something, but I don't see how that can help. For
example, suppose (for simplicity) that we have swappiness of 100%, and
that fraction of referenced anon pages gets slightly less than of file
pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
scanning anon queue more aggressively. As a result, pages spend less
time there, and have less chance of ever being accessed, reducing
fraction of referenced anon pages further, and triggering further
increase in the amount of scanning, etc. Doesn't this introduce positive
feed-back loop?


It's a possibility, but I don't think it will be much of an
issue in practice.

If it is, we can always use refaults as a correcting
mechanism - which would have the added benefit of being
able to do streaming IO without putting any pressure on
the active list, essentially clock-pro replacement with
just some tweaks to shrink_list()...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Nikita Danilov
Rik van Riel writes:
 > Nikita Danilov wrote:
 > > Rik van Riel writes:
 > >  > [ OK, I suck.  I edited yesterday's email with the new info, but forgot
 > >  >to change the attachment to today's patch.  Here is today's patch. ]
 > >  > 
 > >  > Split the anonymous and file backed pages out onto their own pageout
 > >  > queues.  This we do not unnecessarily churn through lots of anonymous
 > >  > pages when we do not want to swap them out anyway.
 > > 
 > > Won't this re-introduce problems similar to ones due to split
 > > inactive_clean/inactive_dirty queues we had in the past?
 > > 
 > > For example, by rotating anon queues faster than file queues, kernel
 > > would end up reclaiming anon pages that are hotter (in "absolute" LRU
 > > order) than some file pages.
 > 
 > That is why we check the fraction of referenced pages in each
 > queue.  Please look at the get_scan_ratio() and shrink_zone()
 > code in my patch.

Probably I am missing something, but I don't see how that can help. For
example, suppose (for simplicity) that we have swappiness of 100%, and
that fraction of referenced anon pages gets slightly less than of file
pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
scanning anon queue more aggressively. As a result, pages spend less
time there, and have less chance of ever being accessed, reducing
fraction of referenced anon pages further, and triggering further
increase in the amount of scanning, etc. Doesn't this introduce positive
feed-back loop?

Nikita.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Nikita Danilov wrote:

Rik van Riel writes:
 > [ OK, I suck.  I edited yesterday's email with the new info, but forgot
 >to change the attachment to today's patch.  Here is today's patch. ]
 > 
 > Split the anonymous and file backed pages out onto their own pageout

 > queues.  This we do not unnecessarily churn through lots of anonymous
 > pages when we do not want to swap them out anyway.

Won't this re-introduce problems similar to ones due to split
inactive_clean/inactive_dirty queues we had in the past?

For example, by rotating anon queues faster than file queues, kernel
would end up reclaiming anon pages that are hotter (in "absolute" LRU
order) than some file pages.


That is why we check the fraction of referenced pages in each
queue.  Please look at the get_scan_ratio() and shrink_zone()
code in my patch.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Nikita Danilov
Rik van Riel writes:
 > [ OK, I suck.  I edited yesterday's email with the new info, but forgot
 >to change the attachment to today's patch.  Here is today's patch. ]
 > 
 > Split the anonymous and file backed pages out onto their own pageout
 > queues.  This we do not unnecessarily churn through lots of anonymous
 > pages when we do not want to swap them out anyway.

Won't this re-introduce problems similar to ones due to split
inactive_clean/inactive_dirty queues we had in the past?

For example, by rotating anon queues faster than file queues, kernel
would end up reclaiming anon pages that are hotter (in "absolute" LRU
order) than some file pages.

Nikita.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Nikita Danilov
Rik van Riel writes:
  [ OK, I suck.  I edited yesterday's email with the new info, but forgot
 to change the attachment to today's patch.  Here is today's patch. ]
  
  Split the anonymous and file backed pages out onto their own pageout
  queues.  This we do not unnecessarily churn through lots of anonymous
  pages when we do not want to swap them out anyway.

Won't this re-introduce problems similar to ones due to split
inactive_clean/inactive_dirty queues we had in the past?

For example, by rotating anon queues faster than file queues, kernel
would end up reclaiming anon pages that are hotter (in absolute LRU
order) than some file pages.

Nikita.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Nikita Danilov wrote:

Rik van Riel writes:
  [ OK, I suck.  I edited yesterday's email with the new info, but forgot
 to change the attachment to today's patch.  Here is today's patch. ]
  
  Split the anonymous and file backed pages out onto their own pageout

  queues.  This we do not unnecessarily churn through lots of anonymous
  pages when we do not want to swap them out anyway.

Won't this re-introduce problems similar to ones due to split
inactive_clean/inactive_dirty queues we had in the past?

For example, by rotating anon queues faster than file queues, kernel
would end up reclaiming anon pages that are hotter (in absolute LRU
order) than some file pages.


That is why we check the fraction of referenced pages in each
queue.  Please look at the get_scan_ratio() and shrink_zone()
code in my patch.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Nikita Danilov
Rik van Riel writes:
  Nikita Danilov wrote:
   Rik van Riel writes:
 [ OK, I suck.  I edited yesterday's email with the new info, but forgot
to change the attachment to today's patch.  Here is today's patch. ]
 
 Split the anonymous and file backed pages out onto their own pageout
 queues.  This we do not unnecessarily churn through lots of anonymous
 pages when we do not want to swap them out anyway.
   
   Won't this re-introduce problems similar to ones due to split
   inactive_clean/inactive_dirty queues we had in the past?
   
   For example, by rotating anon queues faster than file queues, kernel
   would end up reclaiming anon pages that are hotter (in absolute LRU
   order) than some file pages.
  
  That is why we check the fraction of referenced pages in each
  queue.  Please look at the get_scan_ratio() and shrink_zone()
  code in my patch.

Probably I am missing something, but I don't see how that can help. For
example, suppose (for simplicity) that we have swappiness of 100%, and
that fraction of referenced anon pages gets slightly less than of file
pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
scanning anon queue more aggressively. As a result, pages spend less
time there, and have less chance of ever being accessed, reducing
fraction of referenced anon pages further, and triggering further
increase in the amount of scanning, etc. Doesn't this introduce positive
feed-back loop?

Nikita.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Nikita Danilov wrote:


Probably I am missing something, but I don't see how that can help. For
example, suppose (for simplicity) that we have swappiness of 100%, and
that fraction of referenced anon pages gets slightly less than of file
pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
scanning anon queue more aggressively. As a result, pages spend less
time there, and have less chance of ever being accessed, reducing
fraction of referenced anon pages further, and triggering further
increase in the amount of scanning, etc. Doesn't this introduce positive
feed-back loop?


It's a possibility, but I don't think it will be much of an
issue in practice.

If it is, we can always use refaults as a correcting
mechanism - which would have the added benefit of being
able to do streaming IO without putting any pressure on
the active list, essentially clock-pro replacement with
just some tweaks to shrink_list()...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Rik van Riel wrote:

Nikita Danilov wrote:


Probably I am missing something, but I don't see how that can help. For
example, suppose (for simplicity) that we have swappiness of 100%, and
that fraction of referenced anon pages gets slightly less than of file
pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
scanning anon queue more aggressively. As a result, pages spend less
time there, and have less chance of ever being accessed, reducing
fraction of referenced anon pages further, and triggering further
increase in the amount of scanning, etc. Doesn't this introduce positive
feed-back loop?


It's a possibility, but I don't think it will be much of an
issue in practice.

If it is, we can always use refaults as a correcting
mechanism - which would have the added benefit of being
able to do streaming IO without putting any pressure on
the active list, essentially clock-pro replacement with
just some tweaks to shrink_list()...


As an aside, due to the use-once algorithm file pages are at a
natural disadvantage already.  I believe it would be really
hard to construct a workload where anon pages suffer the positive
feedback loop you describe...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Nikita Danilov
Rik van Riel writes:
  Rik van Riel wrote:
   Nikita Danilov wrote:
   
   Probably I am missing something, but I don't see how that can help. For
   example, suppose (for simplicity) that we have swappiness of 100%, and
   that fraction of referenced anon pages gets slightly less than of file
   pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
   scanning anon queue more aggressively. As a result, pages spend less
   time there, and have less chance of ever being accessed, reducing
   fraction of referenced anon pages further, and triggering further
   increase in the amount of scanning, etc. Doesn't this introduce positive
   feed-back loop?
   
   It's a possibility, but I don't think it will be much of an
   issue in practice.
   
   If it is, we can always use refaults as a correcting
   mechanism - which would have the added benefit of being
   able to do streaming IO without putting any pressure on
   the active list, essentially clock-pro replacement with
   just some tweaks to shrink_list()...
  
  As an aside, due to the use-once algorithm file pages are at a
  natural disadvantage already.  I believe it would be really
  hard to construct a workload where anon pages suffer the positive
  feedback loop you describe...

That scenario works for file queues too. Of course, all this is but a
theoretical speculation at this point, but I am concerned that

 - that loop would tend to happen under various border conditions,
 making it hard to isolate, diagnose, and debug, and

 - long before it becomes explicitly visible (say, as an excessive cpu
 consumption by scanner), it would ruin global lru ordering, degrading
 overall performance.

Generally speaking, multi-queue replacement mechanisms were tried in the
past, and they all suffer from the common drawback: once scanning rate
is different for different queues, so is the notion of hotness,
measured by scanner. As a result multi-queue scanner fails to capture
working set properly.

Nikita.


  
  -- 
  Politics is the struggle between those who want to make their country
  the best in the world, and those who believe it already is.  Each group
  calls the other unpatriotic.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Nikita Danilov wrote:


Generally speaking, multi-queue replacement mechanisms were tried in the
past, and they all suffer from the common drawback: once scanning rate
is different for different queues, so is the notion of hotness,
measured by scanner. As a result multi-queue scanner fails to capture
working set properly.


You realize that the current single queue in the 2.6 kernel
has this problem in a much worse way: when swappiness is low
and the kernel does not want to reclaim mapped pages, it will
randomly rotate those pages around the list.

In addition, the referenced bit on unmapped page cache pages
was ignored completely, making it impossible for the VM to
separate the page cache working set from transient pages due
to streaming IO.

I agree that we should put some more negative feedback in
place if it turns out we need it.  I have refault code ready
that can be plugged into this patch, but I don't want to add
the overhead of such code if it turns out we do not actually
need it.

--
All Rights Reversed
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Chuck Ebbert
Rik van Riel wrote:
 Nikita Danilov wrote:
 
 Probably I am missing something, but I don't see how that can help. For
 example, suppose (for simplicity) that we have swappiness of 100%, and
 that fraction of referenced anon pages gets slightly less than of file
 pages. get_scan_ratio() increases anon_percent, and shrink_zone() starts
 scanning anon queue more aggressively. As a result, pages spend less
 time there, and have less chance of ever being accessed, reducing
 fraction of referenced anon pages further, and triggering further
 increase in the amount of scanning, etc. Doesn't this introduce positive
 feed-back loop?
 
 It's a possibility, but I don't think it will be much of an
 issue in practice.
 
 If it is, we can always use refaults as a correcting
 mechanism - which would have the added benefit of being
 able to do streaming IO without putting any pressure on
 the active list, essentially clock-pro replacement with
 just some tweaks to shrink_list()...
 

I think you're going to have to use refault rates. AIX 3.5 had
to add that. Something like:

if refault_rate(anonymous/mmap)  refault_rate(pagecache)
   drop a pagecache page
else
   drop either

You do have anonymous memory and mmapped executables in the same
queue, right?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Chuck Ebbert wrote:


I think you're going to have to use refault rates. AIX 3.5 had
to add that. Something like:

if refault_rate(anonymous/mmap)  refault_rate(pagecache)
   drop a pagecache page
else
   drop either


How about the opposite?

If the page cache refault rate is way higher than the
anonymous refault rate, did you favor page cache?

Btw, just a higher fault rate will already make that
cache grow faster than the other, simply because it
will have more allocations than the other cache and
they both get shrunk to some degree...


You do have anonymous memory and mmapped executables in the same
queue, right?


Nope.  It is very hard to see the difference between mmapped
executables and mmapped data from any structure linked off
the struct page...

--
All Rights Reversed
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Rik van Riel

Matt Mackall wrote:

On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote:

-   Active:   %8lu kB\n
-   Inactive: %8lu kB\n

...

+   Active(anon):   %8lu kB\n
+   Inactive(anon): %8lu kB\n
+   Active(file):   %8lu kB\n
+   Inactive(file): %8lu kB\n


Potentially incompatible change. How about preserving the original
fields (by totalling), then adding the other fields in a second patch.


Fixed in the attached patch.


if (!pagevec_add(lru_pvec, page))
-   __pagevec_lru_add(lru_pvec);
+   __pagevec_lru_add_file(lru_pvec);


Wouldn't lru_file_add or file_lru_add be a better name? If the object
is a file lru then sticking add in the middle is a little ugly.


Not sure about this one.  Does anybody else have an opinion here?


spin_lock_irq(zone-lru_lock);
if (PageLRU(page)  !PageActive(page)) {
-   del_page_from_inactive_list(zone, page);
+   if (page_anon(page)) {
+   del_page_from_inactive_anon_list(zone,page);
SetPageActive(page);
-   add_page_to_active_list(zone, page);
+   add_page_to_active_anon_list(zone, page);
+   } else {
+   del_page_from_inactive_file_list(zone, page);
+   SetPageActive(page);
+   add_page_to_active_file_list(zone, page);
+   }
__count_vm_event(PGACTIVATE);
}


Missing a level of indentation.


Fixed.

--
All Rights Reversed
--- linux-2.6.20.x86_64/fs/proc/proc_misc.c.vmsplit	2007-03-19 20:09:22.0 -0400
+++ linux-2.6.20.x86_64/fs/proc/proc_misc.c	2007-03-21 15:10:25.0 -0400
@@ -147,43 +147,53 @@ static int meminfo_read_proc(char *page,
 	 * Tagged format, for easy grepping and expansion.
 	 */
 	len = sprintf(page,
-		MemTotal: %8lu kB\n
-		MemFree:  %8lu kB\n
-		Buffers:  %8lu kB\n
-		Cached:   %8lu kB\n
-		SwapCached:   %8lu kB\n
-		Active:   %8lu kB\n
-		Inactive: %8lu kB\n
+		MemTotal:   %8lu kB\n
+		MemFree:%8lu kB\n
+		Buffers:%8lu kB\n
+		Cached: %8lu kB\n
+		SwapCached: %8lu kB\n
+		Active: %8lu kB\n
+		Inactive:   %8lu kB\n
+		Active(anon):   %8lu kB\n
+		Inactive(anon): %8lu kB\n
+		Active(file):   %8lu kB\n
+		Inactive(file): %8lu kB\n
 #ifdef CONFIG_HIGHMEM
-		HighTotal:%8lu kB\n
-		HighFree: %8lu kB\n
-		LowTotal: %8lu kB\n
-		LowFree:  %8lu kB\n
-#endif
-		SwapTotal:%8lu kB\n
-		SwapFree: %8lu kB\n
-		Dirty:%8lu kB\n
-		Writeback:%8lu kB\n
-		AnonPages:%8lu kB\n
-		Mapped:   %8lu kB\n
-		Slab: %8lu kB\n
-		SReclaimable: %8lu kB\n
-		SUnreclaim:   %8lu kB\n
-		PageTables:   %8lu kB\n
-		NFS_Unstable: %8lu kB\n
-		Bounce:   %8lu kB\n
-		CommitLimit:  %8lu kB\n
-		Committed_AS: %8lu kB\n
-		VmallocTotal: %8lu kB\n
-		VmallocUsed:  %8lu kB\n
-		VmallocChunk: %8lu kB\n,
+		HighTotal:  %8lu kB\n
+		HighFree:   %8lu kB\n
+		LowTotal:   %8lu kB\n
+		LowFree:%8lu kB\n
+#endif
+		SwapTotal:  %8lu kB\n
+		SwapFree:   %8lu kB\n
+		Dirty:  %8lu kB\n
+		Writeback:  %8lu kB\n
+		AnonPages:  %8lu kB\n
+		Mapped: %8lu kB\n
+		Slab:   %8lu kB\n
+		SReclaimable:   %8lu kB\n
+		SUnreclaim: %8lu kB\n
+		PageTables: %8lu kB\n
+		NFS_Unstable:   %8lu kB\n
+		Bounce: %8lu kB\n
+		CommitLimit:%8lu kB\n
+		Committed_AS:   %8lu kB\n
+		VmallocTotal:   %8lu kB\n
+		VmallocUsed:%8lu kB\n
+		VmallocChunk:   %8lu kB\n,
 		K(i.totalram),
 		K(i.freeram),
 		K(i.bufferram),
 		K(cached),
 		K(total_swapcache_pages),
-		K(global_page_state(NR_ACTIVE)),
-		K(global_page_state(NR_INACTIVE)),
+		K(global_page_state(NR_ACTIVE_ANON) +
+global_page_state(NR_ACTIVE_FILE)),
+		K(global_page_state(NR_INACTIVE_ANON) +
+global_page_state(NR_INACTIVE_FILE)),
+		K(global_page_state(NR_ACTIVE_ANON)),
+		K(global_page_state(NR_INACTIVE_ANON)),
+		K(global_page_state(NR_ACTIVE_FILE)),
+		K(global_page_state(NR_INACTIVE_FILE)),
 #ifdef CONFIG_HIGHMEM
 		K(i.totalhigh),
 		K(i.freehigh),
--- linux-2.6.20.x86_64/fs/mpage.c.vmsplit	2007-02-04 13:44:54.0 -0500
+++ linux-2.6.20.x86_64/fs/mpage.c	2007-03-19 20:09:36.0 -0400
@@ -408,12 +408,12 @@ mpage_readpages(struct address_space *ma
 	first_logical_block,
 	get_block);
 			if (!pagevec_add(lru_pvec, page))
-__pagevec_lru_add(lru_pvec);
+__pagevec_lru_add_file(lru_pvec);
 		} else {
 			page_cache_release(page);
 		}
 	}
-	pagevec_lru_add(lru_pvec);
+	pagevec_lru_add_file(lru_pvec);
 	BUG_ON(!list_empty(pages));
 	if (bio)
 		mpage_bio_submit(READ, bio);
--- linux-2.6.20.x86_64/fs/cifs/file.c.vmsplit	2007-03-19 20:09:21.0 -0400
+++ linux-2.6.20.x86_64/fs/cifs/file.c	2007-03-19 20:09:36.0 -0400
@@ -1746,7 +1746,7 @@ static void cifs_copy_cache_pages(struct
 		

Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-21 Thread Nikita Danilov
Rik van Riel writes:
  Nikita Danilov wrote:
  
   Generally speaking, multi-queue replacement mechanisms were tried in the
   past, and they all suffer from the common drawback: once scanning rate
   is different for different queues, so is the notion of hotness,
   measured by scanner. As a result multi-queue scanner fails to capture
   working set properly.
  
  You realize that the current single queue in the 2.6 kernel
  has this problem in a much worse way: when swappiness is low
  and the kernel does not want to reclaim mapped pages, it will
  randomly rotate those pages around the list.

Agree. Some time ago I tried to solve this very problem with
dont-rotate-active-list patch
(http://linuxhacker.ru/~nikita/patches/2.6.12-rc6/2005.06.11/vm_03-dont-rotate-active-list.patch),
but it had problems on its own.

  
  In addition, the referenced bit on unmapped page cache pages
  was ignored completely, making it impossible for the VM to
  separate the page cache working set from transient pages due
  to streaming IO.

Yes, basically FIFO for clean file system pages and FIFO-second-chance
for dirty file pages. Very bad.

  
  I agree that we should put some more negative feedback in
  place if it turns out we need it.  I have refault code ready
  that can be plugged into this patch, but I don't want to add
  the overhead of such code if it turns out we do not actually
  need it.

In my humble opinion VM already has too many mechanisms that are
supposed to help in corner cases, but there is little to do with that,
except for major rewrite.

Nikita.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-20 Thread Matt Mackall
On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote:
> - "Active:   %8lu kB\n"
> - "Inactive: %8lu kB\n"
...
> + "Active(anon):   %8lu kB\n"
> + "Inactive(anon): %8lu kB\n"
> + "Active(file):   %8lu kB\n"
> + "Inactive(file): %8lu kB\n"

Potentially incompatible change. How about preserving the original
fields (by totalling), then adding the other fields in a second patch.

>   if (!pagevec_add(_pvec, page))
> - __pagevec_lru_add(_pvec);
> + __pagevec_lru_add_file(_pvec);

Wouldn't lru_file_add or file_lru_add be a better name? If the object
is a "file lru" then sticking "add" in the middle is a little ugly.

>   spin_lock_irq(>lru_lock);
>   if (PageLRU(page) && !PageActive(page)) {
> - del_page_from_inactive_list(zone, page);
> + if (page_anon(page)) {
> + del_page_from_inactive_anon_list(zone,page);
>   SetPageActive(page);
> - add_page_to_active_list(zone, page);
> + add_page_to_active_anon_list(zone, page);
> + } else {
> + del_page_from_inactive_file_list(zone, page);
> + SetPageActive(page);
> + add_page_to_active_file_list(zone, page);
> + }
>   __count_vm_event(PGACTIVATE);
>   }

Missing a level of indentation.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split file and anonymous page queues #3

2007-03-20 Thread Matt Mackall
On Tue, Mar 20, 2007 at 06:08:10PM -0400, Rik van Riel wrote:
 - Active:   %8lu kB\n
 - Inactive: %8lu kB\n
...
 + Active(anon):   %8lu kB\n
 + Inactive(anon): %8lu kB\n
 + Active(file):   %8lu kB\n
 + Inactive(file): %8lu kB\n

Potentially incompatible change. How about preserving the original
fields (by totalling), then adding the other fields in a second patch.

   if (!pagevec_add(lru_pvec, page))
 - __pagevec_lru_add(lru_pvec);
 + __pagevec_lru_add_file(lru_pvec);

Wouldn't lru_file_add or file_lru_add be a better name? If the object
is a file lru then sticking add in the middle is a little ugly.

   spin_lock_irq(zone-lru_lock);
   if (PageLRU(page)  !PageActive(page)) {
 - del_page_from_inactive_list(zone, page);
 + if (page_anon(page)) {
 + del_page_from_inactive_anon_list(zone,page);
   SetPageActive(page);
 - add_page_to_active_list(zone, page);
 + add_page_to_active_anon_list(zone, page);
 + } else {
 + del_page_from_inactive_file_list(zone, page);
 + SetPageActive(page);
 + add_page_to_active_file_list(zone, page);
 + }
   __count_vm_event(PGACTIVATE);
   }

Missing a level of indentation.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/