Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Rik van Riel

On Fri, 27 Apr 2001, Mike Galbraith wrote:

> virgin pre7 +Rik
> real11m44.088s
> user7m57.720s
> sys 0m36.420s

> None of them make much difference.

Good, then I suppose we can put in the cleanup from my code, since
it makes the balancing a bit more predictable and should keep the
background aging within bounds.

I'll send a fixed patch tonight (with that last small thinko you
and marcelo discovered removed).

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Mike Galbraith

On Thu, 26 Apr 2001, Linus Torvalds wrote:

> Have you looked at "free_pte()"? I don't like that function, and it might
> make a difference. There are several small nits with it:

snip

> I _think_ the logic should be something along the lines of: "freeing the
> page amounts to a implied down-aging of the page, but the 'accessed' bit
> would have aged it up, so the two take each other out". But if so, the
> free_pte() logic should have something like
>
>   if (page->mapping) {
>   if (!pte_young(pte) || PageSwapCache(page))
>   age_page_down_ageonly(page);
>   if (!page->age)
>   deactivate_page(page);
>   }

Hi,

I tried this out today after some more reading.

virgin pre7 +Rik
real11m44.088s
user7m57.720s
sys 0m36.420s

+Rik +Linus
real11m48.597s
user7m55.620s
sys 0m37.860s

+Rik +Linus +HarshAging
real11m17.758s
user7m57.650s
sys 0m36.350s

None of them make much difference.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Mike Galbraith

On Fri, 27 Apr 2001, Marcelo Tosatti wrote:

> On Fri, 27 Apr 2001, Mike Galbraith wrote:
>
> > I decided to take a break from pondering input and see why the thing
> > ran itself into the ground.  Methinks I was sent the wrooong patch :)
>
> Mike,
>
> Please apply this patch on top of Rik's v2 patch otherwise you'll get the
> livelock easily:

test results:

real11m44.088s
user7m57.720s
sys 0m36.420s

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Marcelo Tosatti



On Fri, 27 Apr 2001, Mike Galbraith wrote:

> On Thu, 26 Apr 2001, Rik van Riel wrote:
> 
> > On Thu, 26 Apr 2001, Mike Galbraith wrote:
> >
> > > > > > No.  It livelocked on me with almost all active pages exausted.
> > > > > Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
> > > >
> > > > Interesting. The semantics of my patch are practically the same as
> > > > those of the stock kernel ... can you get the stock kernel to
> > > > livelock on you, too ?
> > >
> > > Generally no.  Let kswapd continue to run?  Yes, but not always.
> >
> > OK, then I guess we should find out WHY the thing livelocked...
> 
> Hi Rik,
> 
> I decided to take a break from pondering input and see why the thing
> ran itself into the ground.  Methinks I was sent the wrooong patch :)

Mike,

Please apply this patch on top of Rik's v2 patch otherwise you'll get the
livelock easily:

--- linux.orig/mm/vmscan.c  Fri Apr 27 04:32:52 2001
+++ linux/mm/vmscan.c   Fri Apr 27 04:32:34 2001
@@ -644,6 +644,7 @@
struct page * page;
int maxscan = nr_active_pages >> priority;
int page_active = 0;
+   int start_count = count;

/*
 * If no count was specified, we do background page aging.
@@ -725,7 +726,7 @@
}
spin_unlock(_lru_lock);

-   return count;
+   return (start_count - count);
 }

 /*




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Mike Galbraith

On Thu, 26 Apr 2001, Rik van Riel wrote:

> On Thu, 26 Apr 2001, Mike Galbraith wrote:
>
> > > > > No.  It livelocked on me with almost all active pages exausted.
> > > > Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
> > >
> > > Interesting. The semantics of my patch are practically the same as
> > > those of the stock kernel ... can you get the stock kernel to
> > > livelock on you, too ?
> >
> > Generally no.  Let kswapd continue to run?  Yes, but not always.
>
> OK, then I guess we should find out WHY the thing livelocked...

Hi Rik,

I decided to take a break from pondering input and see why the thing
ran itself into the ground.  Methinks I was sent the wrooong patch :)

--- linux-2.4.4-pre7/mm/vmscan.c.orig   Wed Apr 25 23:59:48 2001
+++ linux-2.4.4-pre7/mm/vmscan.cThu Apr 26 00:31:31 2001
@@ -24,6 +24,8 @@

 #include 

+#define MAX(a,b) ((a) > (b) ? (a) : (b))
+
 /*
  * The swap-out function returns 1 if it successfully
  * scanned all the pages it was asked to (`count').
@@ -631,17 +633,45 @@
 /**
  * refill_inactive_scan - scan the active list and find pages to deactivate
  * @priority: the priority at which to scan
- * @oneshot: exit after deactivating one page
+ * @count: the number of pages to deactivate
  *
  * This function will scan a portion of the active list to find
  * unused pages, those pages will then be moved to the inactive list.
  */
-int refill_inactive_scan(unsigned int priority, int oneshot)
+int refill_inactive_scan(unsigned int priority, int count)
 {
struct list_head * page_lru;
struct page * page;
-   int maxscan, page_active = 0;
-   int ret = 0;
+   int maxscan = nr_active_pages >> priority;
+   int page_active = 0;
+
+   /*
+* If no count was specified, we do background page aging.
+* This is done so, after periods of little VM activity, we
+* know which pages to swap out and we can handle load spikes.
+* However, if we scan unlimited and deactivate all pages,
+* we still wouldn't know which pages to swap ...
+*
+* The obvious solution is to do less background scanning when
+* we have lots of inactive pages and to completely stop if we
+* have tons of them...
+*/
+   if (!count) {
+   int nr_active, nr_inactive;
+
+   /* Active pages can be "hidden" in ptes, take a saner number. */
+   nr_active = MAX(nr_active_pages, num_physpages / 2);
+   nr_inactive = nr_inactive_dirty_pages + nr_free_pages() +
+   nr_inactive_clean_pages();
+
+   if (nr_inactive * 10 < nr_active) {
+   maxscan = nr_active_pages >> 4;
+   } else if (nr_inactive * 3 < nr_active_pages) {
+   maxscan = nr_active >> 8;
+   } else {
+   maxscan = 0;
+   }
+   }

/* Take the lock while messing with the list... */
spin_lock(_lru_lock);
@@ -690,14 +720,13 @@
list_del(page_lru);
list_add(page_lru, _list);
} else {
-   ret = 1;
-   if (oneshot)
+   if (--count <= 0)
break;
}
}
spin_unlock(_lru_lock);

-   return ret;
+   return count;
 }

 /*
@@ -805,10 +834,9 @@
schedule();
}

-   while (refill_inactive_scan(DEF_PRIORITY, 1)) {
-   if (--count <= 0)
-   goto done;
-   }
+   count -= refill_inactive_scan(DEF_PRIORITY, count);
+   if (--count <= 0)
+   goto done;

/* If refill_inactive_scan failed, try to page stuff out.. */
swap_out(DEF_PRIORITY, gfp_mask);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Mike Galbraith

On Thu, 26 Apr 2001, Rik van Riel wrote:

 On Thu, 26 Apr 2001, Mike Galbraith wrote:

 No.  It livelocked on me with almost all active pages exausted.
Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
  
   Interesting. The semantics of my patch are practically the same as
   those of the stock kernel ... can you get the stock kernel to
   livelock on you, too ?
 
  Generally no.  Let kswapd continue to run?  Yes, but not always.

 OK, then I guess we should find out WHY the thing livelocked...

Hi Rik,

I decided to take a break from pondering input and see why the thing
ran itself into the ground.  Methinks I was sent the wrooong patch :)

--- linux-2.4.4-pre7/mm/vmscan.c.orig   Wed Apr 25 23:59:48 2001
+++ linux-2.4.4-pre7/mm/vmscan.cThu Apr 26 00:31:31 2001
@@ -24,6 +24,8 @@

 #include asm/pgalloc.h

+#define MAX(a,b) ((a)  (b) ? (a) : (b))
+
 /*
  * The swap-out function returns 1 if it successfully
  * scanned all the pages it was asked to (`count').
@@ -631,17 +633,45 @@
 /**
  * refill_inactive_scan - scan the active list and find pages to deactivate
  * @priority: the priority at which to scan
- * @oneshot: exit after deactivating one page
+ * @count: the number of pages to deactivate
  *
  * This function will scan a portion of the active list to find
  * unused pages, those pages will then be moved to the inactive list.
  */
-int refill_inactive_scan(unsigned int priority, int oneshot)
+int refill_inactive_scan(unsigned int priority, int count)
 {
struct list_head * page_lru;
struct page * page;
-   int maxscan, page_active = 0;
-   int ret = 0;
+   int maxscan = nr_active_pages  priority;
+   int page_active = 0;
+
+   /*
+* If no count was specified, we do background page aging.
+* This is done so, after periods of little VM activity, we
+* know which pages to swap out and we can handle load spikes.
+* However, if we scan unlimited and deactivate all pages,
+* we still wouldn't know which pages to swap ...
+*
+* The obvious solution is to do less background scanning when
+* we have lots of inactive pages and to completely stop if we
+* have tons of them...
+*/
+   if (!count) {
+   int nr_active, nr_inactive;
+
+   /* Active pages can be hidden in ptes, take a saner number. */
+   nr_active = MAX(nr_active_pages, num_physpages / 2);
+   nr_inactive = nr_inactive_dirty_pages + nr_free_pages() +
+   nr_inactive_clean_pages();
+
+   if (nr_inactive * 10  nr_active) {
+   maxscan = nr_active_pages  4;
+   } else if (nr_inactive * 3  nr_active_pages) {
+   maxscan = nr_active  8;
+   } else {
+   maxscan = 0;
+   }
+   }

/* Take the lock while messing with the list... */
spin_lock(pagemap_lru_lock);
@@ -690,14 +720,13 @@
list_del(page_lru);
list_add(page_lru, active_list);
} else {
-   ret = 1;
-   if (oneshot)
+   if (--count = 0)
break;
}
}
spin_unlock(pagemap_lru_lock);

-   return ret;
+   return count;
 }

 /*
@@ -805,10 +834,9 @@
schedule();
}

-   while (refill_inactive_scan(DEF_PRIORITY, 1)) {
-   if (--count = 0)
-   goto done;
-   }
+   count -= refill_inactive_scan(DEF_PRIORITY, count);
+   if (--count = 0)
+   goto done;

/* If refill_inactive_scan failed, try to page stuff out.. */
swap_out(DEF_PRIORITY, gfp_mask);


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Marcelo Tosatti



On Fri, 27 Apr 2001, Mike Galbraith wrote:

 On Thu, 26 Apr 2001, Rik van Riel wrote:
 
  On Thu, 26 Apr 2001, Mike Galbraith wrote:
 
  No.  It livelocked on me with almost all active pages exausted.
 Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
   
Interesting. The semantics of my patch are practically the same as
those of the stock kernel ... can you get the stock kernel to
livelock on you, too ?
  
   Generally no.  Let kswapd continue to run?  Yes, but not always.
 
  OK, then I guess we should find out WHY the thing livelocked...
 
 Hi Rik,
 
 I decided to take a break from pondering input and see why the thing
 ran itself into the ground.  Methinks I was sent the wrooong patch :)

Mike,

Please apply this patch on top of Rik's v2 patch otherwise you'll get the
livelock easily:

--- linux.orig/mm/vmscan.c  Fri Apr 27 04:32:52 2001
+++ linux/mm/vmscan.c   Fri Apr 27 04:32:34 2001
@@ -644,6 +644,7 @@
struct page * page;
int maxscan = nr_active_pages  priority;
int page_active = 0;
+   int start_count = count;

/*
 * If no count was specified, we do background page aging.
@@ -725,7 +726,7 @@
}
spin_unlock(pagemap_lru_lock);

-   return count;
+   return (start_count - count);
 }

 /*




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Mike Galbraith

On Thu, 26 Apr 2001, Linus Torvalds wrote:

 Have you looked at free_pte()? I don't like that function, and it might
 make a difference. There are several small nits with it:

snip

 I _think_ the logic should be something along the lines of: freeing the
 page amounts to a implied down-aging of the page, but the 'accessed' bit
 would have aged it up, so the two take each other out. But if so, the
 free_pte() logic should have something like

   if (page-mapping) {
   if (!pte_young(pte) || PageSwapCache(page))
   age_page_down_ageonly(page);
   if (!page-age)
   deactivate_page(page);
   }

Hi,

I tried this out today after some more reading.

virgin pre7 +Rik
real11m44.088s
user7m57.720s
sys 0m36.420s

+Rik +Linus
real11m48.597s
user7m55.620s
sys 0m37.860s

+Rik +Linus +HarshAging
real11m17.758s
user7m57.650s
sys 0m36.350s

None of them make much difference.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-27 Thread Rik van Riel

On Fri, 27 Apr 2001, Mike Galbraith wrote:

 virgin pre7 +Rik
 real11m44.088s
 user7m57.720s
 sys 0m36.420s

 None of them make much difference.

Good, then I suppose we can put in the cleanup from my code, since
it makes the balancing a bit more predictable and should keep the
background aging within bounds.

I'll send a fixed patch tonight (with that last small thinko you
and marcelo discovered removed).

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Rik van Riel

On Thu, 26 Apr 2001, Mike Galbraith wrote:

> > > > No.  It livelocked on me with almost all active pages exausted.
> > > Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
> >
> > Interesting. The semantics of my patch are practically the same as
> > those of the stock kernel ... can you get the stock kernel to
> > livelock on you, too ?
>
> Generally no.  Let kswapd continue to run?  Yes, but not always.

OK, then I guess we should find out WHY the thing livelocked...

I've heard reports that it's possible to livelock the kernel,
but for some reason you find it easier to livelock the kernel
with my patch applied.

Maybe this is enough of a clue to find out some things on why
the kernel livelocked?  Maybe we should add some instrumentation
to the kernel to find out why things like this happen?

IMHO having good instrumentation in the kernel makes sense
anyway, since it will allow us to do easier performance analysis
of people's machines, so we'll have less guesswork and it'll be
easier to get the kernel to perform well on more machines...

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Mike Galbraith wrote:

> On Thu, 26 Apr 2001, Rik van Riel wrote:
>
> > On Thu, 26 Apr 2001, Mike Galbraith wrote:
> >
> > > 1. pagecache is becoming swapcache and must be aged before anything is
> > > done.  Meanwhile we're calling refill_inactive_scan() so fast that noone
> > > has a chance to touch a page.   Age becomes a simple counter.. I think.
> > > When you hit a big surge, swap pages are at the back of all lists, so all
> > > of your valuable cache gets reclaimed before we write even one swap page.
> >
> > Does the patch I sent to [EMAIL PROTECTED] last night help in
> > this ?
> >
> > I found that the way refill_inactive_scan() and swap_out() are being
> > called from the main loop in refill_inactive() aren't equal and have
> > fixed that in a way which (IMHO) also beautifies the code a bit.
> >
> > (and makes sure background aging doesn't get out of hand with a few
> > simple checks)
>
> That patch livelocked my box with only ~1000 pages on any list.
>
> I can go back and test some more if you want.

I put it back in, the livelock is 100% repeatable (10 repeats).  It's
deactivating/laundering itself to death.  :) 3mb for my 386-20/0.96.9
would have been enough to frolic (slowly) in.. this box can't live.

   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
 ...
37  3  0988   1940   1912  32368   0   04049  117   293  90  10   0
47  0  0   1348   1940   1320  26816   0   08017  121   415  86  14   0
39  1  0   1364   1972   1076  20728   0   0   184 0  124   294  84  16   0
39  3  2   1456   1940232  14720   0 57267   638  159  2225  85  15   0
29 10  2   1456   1612416   6908   0   0   25272  235  2253  84  16   0
17 18  2   1292   1420608   4216   0  60   340   454  279  3289  29  20  51
33  1  2   1296   1408488   3464   0   4   317   752  295  2432  11  47  42
locked here

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Rik van Riel wrote:

> On Thu, 26 Apr 2001, Mike Galbraith wrote:
> > On Thu, 26 Apr 2001, Mike Galbraith wrote:
> >
> > > > limit the runtime of refill_inactive_scan(). This is similar to Rik's
> > > > reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
> > > > Rik's patch with your patch except this jiffies hack, does it still
> > > > achieve the same improvement?
> > >
> > > No.  It livelocked on me with almost all active pages exausted.
> >
> > Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
>
> Interesting. The semantics of my patch are practically the same as
> those of the stock kernel ... can you get the stock kernel to
> livelock on you, too ?

Generally no.  Let kswapd continue to run?  Yes, but not always.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Rik van Riel wrote:

> On Thu, 26 Apr 2001, Mike Galbraith wrote:
>
> > 1. pagecache is becoming swapcache and must be aged before anything is
> > done.  Meanwhile we're calling refill_inactive_scan() so fast that noone
> > has a chance to touch a page.   Age becomes a simple counter.. I think.
> > When you hit a big surge, swap pages are at the back of all lists, so all
> > of your valuable cache gets reclaimed before we write even one swap page.
>
> Does the patch I sent to [EMAIL PROTECTED] last night help in
> this ?
>
> I found that the way refill_inactive_scan() and swap_out() are being
> called from the main loop in refill_inactive() aren't equal and have
> fixed that in a way which (IMHO) also beautifies the code a bit.
>
> (and makes sure background aging doesn't get out of hand with a few
> simple checks)

That patch livelocked my box with only ~1000 pages on any list.

I can go back and test some more if you want.  (I've seen this so
many times here that I generally just curse a lot [frustration] and
burn the whole tree to it's roots as soon as it shows up)

SysRq: Show Memory
Mem-info:
Free pages:1404kB ( 0kB HighMem)
( Active: 975, inactive_dirty: 28, inactive_clean: 0, free: 351 (351 702 1053) )
0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB = 512kB)
1*4kB 5*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB = 892kB)
= 0kB)
Swap cache: add 72, delete 63, find 17/67
Free swap:   264864kB
32752 pages of RAM
0 pages of HIGHMEM
1183 reserved pages
27657 pages shared
9 pages swap cached
0 pages in page table cache
Buffer memory:  112kB

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Hugh Dickins

On Thu, 26 Apr 2001, Linus Torvalds wrote:
> 
> On the other hand, to offset some of these, we actually count the page
> accessed _twice_ sometimes: we count it on lookup, and we count it when we
> see the accessed bit in vmscan.c. Which results in some pages getting aged
> up twice for just one access if we go through the vmscan logic, while if
> we just map and unmap them they get counted just once.

And sometimes three times, if you count the PAGE_AGE_START bonus
points you get whenever your age is found to be 0 (or less than
PAGE_AGE_START).  I think I see the idea, but seems more voodoo.

If you're looking to _simplify_ in this area, there's a confusing
host (9) of intercoupled age-up-and-down de/activate functions.
Aren't those better decoupled? i.e. the ageing ones ageonly,
the de/activate ones not messing with age at all.

Then I think you're left with just age_page_up() and age_page_down()
(maybe inlines as below, assuming the PAGE_AGE_START voodoo), plus
activate_page(), deactivate_page() and deactivate_page_nolock().

static inline void age_page_up(struct page *page)
{
page->age += PAGE_AGE_ADV;
if (page->age > PAGE_AGE_MAX)
page->age = PAGE_AGE_MAX;
else if (page->age < PAGE_AGE_START + PAGE_AGE_ADV)
page->age = PAGE_AGE_START + PAGE_AGE_ADV;
}

static inline void age_page_down(struct page *page)
{
page->age >>= 1;
}

But this is no more than tidying, don't let me distract you.

Hugh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Linus Torvalds wrote:

> On Thu, 26 Apr 2001, Mike Galbraith wrote:
> >
> > 2.4.4.pre7.virgin
> > real11m33.589s
> > user7m57.790s
> > sys 0m38.730s
> >
> > 2.4.4.pre7.sillyness
> > real9m30.336s
> > user7m55.270s
> > sys 0m38.510s
>
> Well, I actually like parts of this. The "always swap out current mm" one
> looks rather dangerous, and the lastscan jiffy thing is too ugly for
> words,

Compared to my tree of a couple days ago (8m53s) this is fine art ;-)

(snip nice list of things to think about)

> Comments?

If my parser ever comes out of down_trygrok().  I'll put this in
a terminal next to one from Ingo.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Rik van Riel

On Thu, 26 Apr 2001, Mike Galbraith wrote:
> On Thu, 26 Apr 2001, Mike Galbraith wrote:
> 
> > > limit the runtime of refill_inactive_scan(). This is similar to Rik's
> > > reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
> > > Rik's patch with your patch except this jiffies hack, does it still
> > > achieve the same improvement?
> >
> > No.  It livelocked on me with almost all active pages exausted.
> 
> Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.

Interesting. The semantics of my patch are practically the same as
those of the stock kernel ... can you get the stock kernel to
livelock on you, too ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Rik van Riel

On Thu, 26 Apr 2001, Mike Galbraith wrote:

> 1. pagecache is becoming swapcache and must be aged before anything is
> done.  Meanwhile we're calling refill_inactive_scan() so fast that noone
> has a chance to touch a page.   Age becomes a simple counter.. I think.
> When you hit a big surge, swap pages are at the back of all lists, so all
> of your valuable cache gets reclaimed before we write even one swap page.

Does the patch I sent to [EMAIL PROTECTED] last night help in
this ?

I found that the way refill_inactive_scan() and swap_out() are being
called from the main loop in refill_inactive() aren't equal and have
fixed that in a way which (IMHO) also beautifies the code a bit.

(and makes sure background aging doesn't get out of hand with a few
simple checks)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Linus Torvalds


On Thu, 26 Apr 2001, Mike Galbraith wrote:
> 
> 2.4.4.pre7.virgin
> real11m33.589s
> user7m57.790s
> sys 0m38.730s
> 
> 2.4.4.pre7.sillyness
> real9m30.336s
> user7m55.270s
> sys 0m38.510s

Well, I actually like parts of this. The "always swap out current mm" one
looks rather dangerous, and the lastscan jiffy thing is too ugly for
words, but refill_inactive() looks much nicer. There is beauty in
simplicity. 

The page aging in drop_pte feels pretty harsh, though.

Have you looked at "free_pte()"? I don't like that function, and it might
make a difference. There are several small nits with it:

 - it should probably try to deactivate the page. If drop_pte does that
   when it deacctivates a page involuntarily, why not do it for a real "we
   just free'd the page voluntarily"?

 - swap-cache pages should probably not just be de-activated, but actively
   aged down. Right now, they are neither, so we have to work all the 
   way through refill_inactive() and then page_launder() to clear them
   out. Even though the page may be entirely useless by now (we had a
   complex special case that caught and short-circuited some of the pages,
   and maybe it was worth it. But maybe the right thing is to just age
   them down and naturally deactivate them?)

   After all, we aged them up for references to this virtual
   mapping, and free_pte() just made it go away. Unlike normal page cache
   pages, we don't get any advantage from trying to cache the things
   across multiple VM's.

 - we're dropping the accessed bit on the floor. In the vmscan case the
   accessed bit would have aged the page up. 

On the other hand, to offset some of these, we actually count the page
accessed _twice_ sometimes: we count it on lookup, and we count it when we
see the accessed bit in vmscan.c. Which results in some pages getting aged
up twice for just one access if we go through the vmscan logic, while if
we just map and unmap them they get counted just once.

Obviously the page aging logic seems to be making a noticeable difference
to you. So looking at page aging logic issues in the bigger picture migth
be worthwhile - not just staring at the actual swap-out code. The fact is,
the swap-out-code cannot get the aging right if the rest of the system
ignores it or does it only for some cases.

I _think_ the logic should be something along the lines of: "freeing the
page amounts to a implied down-aging of the page, but the 'accessed' bit
would have aged it up, so the two take each other out". But if so, the
free_pte() logic should have something like

if (page->mapping) {
if (!pte_young(pte) || PageSwapCache(page))
age_page_down_ageonly(page);
if (!page->age)
deactivate_page(page);
}

instead of just ignoring these issues completely.

Comments? 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Marcelo Tosatti wrote:

> On Thu, 26 Apr 2001, Mike Galbraith wrote:
>
> > > (i cannot see how this chunk affects the VM, AFAICS this too makes the
> > > zapping of the cache less agressive.)
> >
> > (more folks get snagged on write.. they can't eat cache so fast)
>
> What about GFP_BUFFER allocations ? :)
>
> I suspect the jiffies hack is avoiding GFP_BUFFER allocations to eat cache
> insanely.

(I think it's aging speed in general.  If user tasks aren't doing it,
kswapd is.)

> Easy way to confirm that: add the kswapd wait queue again and make
> allocators which don't have __GFP_IO set wait on that in
> try_to_free_pages().

I've tried not allowing those to enter try_to_free_pages() [nogo].
I'll try a waitqueue.  wait_event(waitqueue_active(_wait)) ok?

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Marcelo Tosatti


On Thu, 26 Apr 2001, Mike Galbraith wrote:

> > (i cannot see how this chunk affects the VM, AFAICS this too makes the
> > zapping of the cache less agressive.)
> 
> (more folks get snagged on write.. they can't eat cache so fast)

What about GFP_BUFFER allocations ? :)

I suspect the jiffies hack is avoiding GFP_BUFFER allocations to eat cache
insanely.

Easy way to confirm that: add the kswapd wait queue again and make
allocators which don't have __GFP_IO set wait on that in
try_to_free_pages(). 




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Mike Galbraith wrote:

> > limit the runtime of refill_inactive_scan(). This is similar to Rik's
> > reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
> > Rik's patch with your patch except this jiffies hack, does it still
> > achieve the same improvement?
>
> No.  It livelocked on me with almost all active pages exausted.

Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
Still want me to try mixing?

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Marcelo Tosatti wrote:

> Have you tried to tune SWAP_SHIFT and the priority used inside swap_out()
> to see if you can make pte deactivation less aggressive ?

Many many many times.. no dice.

(more agressive is much better for surge regulation.. power brakes!)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Ingo Molnar wrote:

> On Thu, 26 Apr 2001, Mike Galbraith wrote:
>
> > 2.4.4.pre7.virgin
> > real11m33.589s
>
> > 2.4.4.pre7.sillyness
> > real9m30.336s
>
> very interesting. Looks like there are still reserves in the VM, for heavy
> workloads. (and swapping is all about heavy workloads.)
>
> it would be interesting to see why your patch has such a good effect.
> (and it would be nice get the same improvement in a clean way.)

It's not good.. it's an ugly beaste from hell ;-)

> > -   if (!page->age)
> > -   deactivate_page(page);
> > +   age_page_down(page);
>
> this one preserves the cache a bit more agressively.

(intent)

>
> > /* Always start by trying to penalize the process that is allocating memory */
> > if (mm)
> > -   retval = swap_out_mm(mm, swap_amount(mm));
> > +   return swap_out_mm(mm, swap_amount(mm));
>
> keep swap-out activity more focused to the process that is generating the
> VM pressure. It might make sense to test this single change in isolation.
> (While we cannot ignore to swap out other contexts under memory pressure,
> we could do something to make it focused on the current MM a bit more.)

(also the intent.. make 'em pagein like a bugger to slow down cache munh)

> > +   static unsigned long lastscan;
> > +
> > +   if (lastscan == jiffies)
> > +   return 0;
>
> limit the runtime of refill_inactive_scan(). This is similar to Rik's
> reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
> Rik's patch with your patch except this jiffies hack, does it still
> achieve the same improvement?

No.  It livelocked on me with almost all active pages exausted.

> > +   int shortage = inactive_shortage();
> >
> > +   if (refill_inactive_scan(DEF_PRIORITY, 0) < shortage)
> > /* If refill_inactive_scan failed, try to page stuff out.. */
> > swap_out(DEF_PRIORITY, gfp_mask);
> >
> > +   return 0;
>
> (i cannot see how this chunk affects the VM, AFAICS this too makes the
> zapping of the cache less agressive.)

(more folks get snagged on write.. they can't eat cache so fast)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Ingo Molnar


On Thu, 26 Apr 2001, Mike Galbraith wrote:

> 2.4.4.pre7.virgin
> real11m33.589s

> 2.4.4.pre7.sillyness
> real9m30.336s

very interesting. Looks like there are still reserves in the VM, for heavy
workloads. (and swapping is all about heavy workloads.)

it would be interesting to see why your patch has such a good effect.
(and it would be nice get the same improvement in a clean way.)

> - if (!page->age)
> - deactivate_page(page);
> + age_page_down(page);

this one preserves the cache a bit more agressively.

>   /* Always start by trying to penalize the process that is allocating memory */
>   if (mm)
> - retval = swap_out_mm(mm, swap_amount(mm));
> + return swap_out_mm(mm, swap_amount(mm));

keep swap-out activity more focused to the process that is generating the
VM pressure. It might make sense to test this single change in isolation.
(While we cannot ignore to swap out other contexts under memory pressure,
we could do something to make it focused on the current MM a bit more.)

> + static unsigned long lastscan;
> +
> + if (lastscan == jiffies)
> + return 0;

limit the runtime of refill_inactive_scan(). This is similar to Rik's
reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
Rik's patch with your patch except this jiffies hack, does it still
achieve the same improvement?

> + int shortage = inactive_shortage();
>
> + if (refill_inactive_scan(DEF_PRIORITY, 0) < shortage)
>   /* If refill_inactive_scan failed, try to page stuff out.. */
>   swap_out(DEF_PRIORITY, gfp_mask);
>
> + return 0;

(i cannot see how this chunk affects the VM, AFAICS this too makes the
zapping of the cache less agressive.)

perhaps the best would be to first test Rik's patch on pre7-vanilla, it
should go in the same direction your changes go, i think?

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Marcelo Tosatti



On Thu, 26 Apr 2001, Mike Galbraith wrote:

> On Thu, 26 Apr 2001, Marcelo Tosatti wrote:
> 
> > > (I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
> > > this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)
> >
> > Which kind of changes you're doing to get better performance on this test?
> 
> :)
> 
> 2.4.4.pre7.virgin
> real11m33.589s
> user7m57.790s
> sys 0m38.730s
> 
> 2.4.4.pre7.sillyness
> real9m30.336s
> user7m55.270s
> sys 0m38.510s

Have you tried to tune SWAP_SHIFT and the priority used inside swap_out()
to see if you can make pte deactivation less aggressive ?

If you get the desired effect tuning those values and you end up with the
conclusion that this tuning is a good change for most "common workloads",
it can be integrated in the main kernel.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Ingo Molnar wrote:

>
> On Thu, 26 Apr 2001, Mike Galbraith wrote:
>
> > More of a question.  Neither Ingo's nor your patch makes any
> > difference on my UP box (128mb PIII/500) doing make -j30. [...]
>
> (the patch Marcelo sent is the -B3 patch plus Linus' suggested async
> interface cleanup, so it should be functionally equivalent to the -B3
> patch.)
>
>   Ingo

I know.. read 'em all with much interest :)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Marcelo Tosatti wrote:

> > (I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
> > this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)
>
> Which kind of changes you're doing to get better performance on this test?

:)

2.4.4.pre7.virgin
real11m33.589s
user7m57.790s
sys 0m38.730s

2.4.4.pre7.sillyness
real9m30.336s
user7m55.270s
sys 0m38.510s

--- mm/vmscan.c.org Thu Apr 26 06:35:19 2001
+++ mm/vmscan.c Thu Apr 26 08:25:52 2001
@@ -72,8 +72,7 @@
set_pte(page_table, swp_entry_to_pte(entry));
 drop_pte:
mm->rss--;
-   if (!page->age)
-   deactivate_page(page);
+   age_page_down(page);
UnlockPage(page);
page_cache_release(page);
return;
@@ -282,7 +281,7 @@

/* Always start by trying to penalize the process that is allocating memory */
if (mm)
-   retval = swap_out_mm(mm, swap_amount(mm));
+   return swap_out_mm(mm, swap_amount(mm));

/* Then, look at the other mm's */
counter = mmlist_nr >> priority;
@@ -642,6 +641,10 @@
struct page * page;
int maxscan, page_active = 0;
int ret = 0;
+   static unsigned long lastscan;
+
+   if (lastscan == jiffies)
+   return 0;

/* Take the lock while messing with the list... */
spin_lock(_lru_lock);
@@ -695,6 +698,7 @@
break;
}
}
+   lastscan = jiffies;
spin_unlock(_lru_lock);

return ret;
@@ -791,35 +795,13 @@
 #define DEF_PRIORITY (6)
 static int refill_inactive(unsigned int gfp_mask, int user)
 {
-   int count, start_count, maxtry;
-
-   count = inactive_shortage() + free_shortage();
-   if (user)
-   count = (1 << page_cluster);
-   start_count = count;
-
-   maxtry = 6;
-   do {
-   if (current->need_resched) {
-   __set_current_state(TASK_RUNNING);
-   schedule();
-   }
-
-   while (refill_inactive_scan(DEF_PRIORITY, 1)) {
-   if (--count <= 0)
-   goto done;
-   }
+   int shortage = inactive_shortage();

+   if (refill_inactive_scan(DEF_PRIORITY, 0) < shortage)
/* If refill_inactive_scan failed, try to page stuff out.. */
swap_out(DEF_PRIORITY, gfp_mask);

-   if (--maxtry <= 0)
-   return 0;
-
-   } while (inactive_shortage());
-
-done:
-   return (count < start_count);
+   return 0;
 }

 static int do_try_to_free_pages(unsigned int gfp_mask, int user)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Ingo Molnar


On Thu, 26 Apr 2001, Mike Galbraith wrote:

> More of a question.  Neither Ingo's nor your patch makes any
> difference on my UP box (128mb PIII/500) doing make -j30. [...]

(the patch Marcelo sent is the -B3 patch plus Linus' suggested async
interface cleanup, so it should be functionally equivalent to the -B3
patch.)

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Linus Torvalds wrote:

 On Thu, 26 Apr 2001, Mike Galbraith wrote:
 
  2.4.4.pre7.virgin
  real11m33.589s
  user7m57.790s
  sys 0m38.730s
 
  2.4.4.pre7.sillyness
  real9m30.336s
  user7m55.270s
  sys 0m38.510s

 Well, I actually like parts of this. The always swap out current mm one
 looks rather dangerous, and the lastscan jiffy thing is too ugly for
 words,

Compared to my tree of a couple days ago (8m53s) this is fine art ;-)

(snip nice list of things to think about)

 Comments?

If my parser ever comes out of down_trygrok().  I'll put this in
a terminal next to one from Ingo.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Hugh Dickins

On Thu, 26 Apr 2001, Linus Torvalds wrote:
 
 On the other hand, to offset some of these, we actually count the page
 accessed _twice_ sometimes: we count it on lookup, and we count it when we
 see the accessed bit in vmscan.c. Which results in some pages getting aged
 up twice for just one access if we go through the vmscan logic, while if
 we just map and unmap them they get counted just once.

And sometimes three times, if you count the PAGE_AGE_START bonus
points you get whenever your age is found to be 0 (or less than
PAGE_AGE_START).  I think I see the idea, but seems more voodoo.

If you're looking to _simplify_ in this area, there's a confusing
host (9) of intercoupled age-up-and-down de/activate functions.
Aren't those better decoupled? i.e. the ageing ones ageonly,
the de/activate ones not messing with age at all.

Then I think you're left with just age_page_up() and age_page_down()
(maybe inlines as below, assuming the PAGE_AGE_START voodoo), plus
activate_page(), deactivate_page() and deactivate_page_nolock().

static inline void age_page_up(struct page *page)
{
page-age += PAGE_AGE_ADV;
if (page-age  PAGE_AGE_MAX)
page-age = PAGE_AGE_MAX;
else if (page-age  PAGE_AGE_START + PAGE_AGE_ADV)
page-age = PAGE_AGE_START + PAGE_AGE_ADV;
}

static inline void age_page_down(struct page *page)
{
page-age = 1;
}

But this is no more than tidying, don't let me distract you.

Hugh

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Rik van Riel wrote:

 On Thu, 26 Apr 2001, Mike Galbraith wrote:

  1. pagecache is becoming swapcache and must be aged before anything is
  done.  Meanwhile we're calling refill_inactive_scan() so fast that noone
  has a chance to touch a page.   Age becomes a simple counter.. I think.
  When you hit a big surge, swap pages are at the back of all lists, so all
  of your valuable cache gets reclaimed before we write even one swap page.

 Does the patch I sent to [EMAIL PROTECTED] last night help in
 this ?

 I found that the way refill_inactive_scan() and swap_out() are being
 called from the main loop in refill_inactive() aren't equal and have
 fixed that in a way which (IMHO) also beautifies the code a bit.

 (and makes sure background aging doesn't get out of hand with a few
 simple checks)

That patch livelocked my box with only ~1000 pages on any list.

I can go back and test some more if you want.  (I've seen this so
many times here that I generally just curse a lot [frustration] and
burn the whole tree to it's roots as soon as it shows up)

SysRq: Show Memory
Mem-info:
Free pages:1404kB ( 0kB HighMem)
( Active: 975, inactive_dirty: 28, inactive_clean: 0, free: 351 (351 702 1053) )
0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB = 512kB)
1*4kB 5*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB = 892kB)
= 0kB)
Swap cache: add 72, delete 63, find 17/67
Free swap:   264864kB
32752 pages of RAM
0 pages of HIGHMEM
1183 reserved pages
27657 pages shared
9 pages swap cached
0 pages in page table cache
Buffer memory:  112kB

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Rik van Riel wrote:

 On Thu, 26 Apr 2001, Mike Galbraith wrote:
  On Thu, 26 Apr 2001, Mike Galbraith wrote:
 
limit the runtime of refill_inactive_scan(). This is similar to Rik's
reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
Rik's patch with your patch except this jiffies hack, does it still
achieve the same improvement?
  
   No.  It livelocked on me with almost all active pages exausted.
 
  Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.

 Interesting. The semantics of my patch are practically the same as
 those of the stock kernel ... can you get the stock kernel to
 livelock on you, too ?

Generally no.  Let kswapd continue to run?  Yes, but not always.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Rik van Riel

On Thu, 26 Apr 2001, Mike Galbraith wrote:

No.  It livelocked on me with almost all active pages exausted.
   Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
 
  Interesting. The semantics of my patch are practically the same as
  those of the stock kernel ... can you get the stock kernel to
  livelock on you, too ?

 Generally no.  Let kswapd continue to run?  Yes, but not always.

OK, then I guess we should find out WHY the thing livelocked...

I've heard reports that it's possible to livelock the kernel,
but for some reason you find it easier to livelock the kernel
with my patch applied.

Maybe this is enough of a clue to find out some things on why
the kernel livelocked?  Maybe we should add some instrumentation
to the kernel to find out why things like this happen?

IMHO having good instrumentation in the kernel makes sense
anyway, since it will allow us to do easier performance analysis
of people's machines, so we'll have less guesswork and it'll be
easier to get the kernel to perform well on more machines...

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Ingo Molnar


On Thu, 26 Apr 2001, Mike Galbraith wrote:

 More of a question.  Neither Ingo's nor your patch makes any
 difference on my UP box (128mb PIII/500) doing make -j30. [...]

(the patch Marcelo sent is the -B3 patch plus Linus' suggested async
interface cleanup, so it should be functionally equivalent to the -B3
patch.)

Ingo

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Marcelo Tosatti wrote:

  (I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
  this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)

 Which kind of changes you're doing to get better performance on this test?

:)

2.4.4.pre7.virgin
real11m33.589s
user7m57.790s
sys 0m38.730s

2.4.4.pre7.sillyness
real9m30.336s
user7m55.270s
sys 0m38.510s

--- mm/vmscan.c.org Thu Apr 26 06:35:19 2001
+++ mm/vmscan.c Thu Apr 26 08:25:52 2001
@@ -72,8 +72,7 @@
set_pte(page_table, swp_entry_to_pte(entry));
 drop_pte:
mm-rss--;
-   if (!page-age)
-   deactivate_page(page);
+   age_page_down(page);
UnlockPage(page);
page_cache_release(page);
return;
@@ -282,7 +281,7 @@

/* Always start by trying to penalize the process that is allocating memory */
if (mm)
-   retval = swap_out_mm(mm, swap_amount(mm));
+   return swap_out_mm(mm, swap_amount(mm));

/* Then, look at the other mm's */
counter = mmlist_nr  priority;
@@ -642,6 +641,10 @@
struct page * page;
int maxscan, page_active = 0;
int ret = 0;
+   static unsigned long lastscan;
+
+   if (lastscan == jiffies)
+   return 0;

/* Take the lock while messing with the list... */
spin_lock(pagemap_lru_lock);
@@ -695,6 +698,7 @@
break;
}
}
+   lastscan = jiffies;
spin_unlock(pagemap_lru_lock);

return ret;
@@ -791,35 +795,13 @@
 #define DEF_PRIORITY (6)
 static int refill_inactive(unsigned int gfp_mask, int user)
 {
-   int count, start_count, maxtry;
-
-   count = inactive_shortage() + free_shortage();
-   if (user)
-   count = (1  page_cluster);
-   start_count = count;
-
-   maxtry = 6;
-   do {
-   if (current-need_resched) {
-   __set_current_state(TASK_RUNNING);
-   schedule();
-   }
-
-   while (refill_inactive_scan(DEF_PRIORITY, 1)) {
-   if (--count = 0)
-   goto done;
-   }
+   int shortage = inactive_shortage();

+   if (refill_inactive_scan(DEF_PRIORITY, 0)  shortage)
/* If refill_inactive_scan failed, try to page stuff out.. */
swap_out(DEF_PRIORITY, gfp_mask);

-   if (--maxtry = 0)
-   return 0;
-
-   } while (inactive_shortage());
-
-done:
-   return (count  start_count);
+   return 0;
 }

 static int do_try_to_free_pages(unsigned int gfp_mask, int user)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Ingo Molnar wrote:


 On Thu, 26 Apr 2001, Mike Galbraith wrote:

  More of a question.  Neither Ingo's nor your patch makes any
  difference on my UP box (128mb PIII/500) doing make -j30. [...]

 (the patch Marcelo sent is the -B3 patch plus Linus' suggested async
 interface cleanup, so it should be functionally equivalent to the -B3
 patch.)

   Ingo

I know.. read 'em all with much interest :)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Marcelo Tosatti



On Thu, 26 Apr 2001, Mike Galbraith wrote:

 On Thu, 26 Apr 2001, Marcelo Tosatti wrote:
 
   (I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
   this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)
 
  Which kind of changes you're doing to get better performance on this test?
 
 :)
 
 2.4.4.pre7.virgin
 real11m33.589s
 user7m57.790s
 sys 0m38.730s
 
 2.4.4.pre7.sillyness
 real9m30.336s
 user7m55.270s
 sys 0m38.510s

Have you tried to tune SWAP_SHIFT and the priority used inside swap_out()
to see if you can make pte deactivation less aggressive ?

If you get the desired effect tuning those values and you end up with the
conclusion that this tuning is a good change for most common workloads,
it can be integrated in the main kernel.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Ingo Molnar


On Thu, 26 Apr 2001, Mike Galbraith wrote:

 2.4.4.pre7.virgin
 real11m33.589s

 2.4.4.pre7.sillyness
 real9m30.336s

very interesting. Looks like there are still reserves in the VM, for heavy
workloads. (and swapping is all about heavy workloads.)

it would be interesting to see why your patch has such a good effect.
(and it would be nice get the same improvement in a clean way.)

 - if (!page-age)
 - deactivate_page(page);
 + age_page_down(page);

this one preserves the cache a bit more agressively.

   /* Always start by trying to penalize the process that is allocating memory */
   if (mm)
 - retval = swap_out_mm(mm, swap_amount(mm));
 + return swap_out_mm(mm, swap_amount(mm));

keep swap-out activity more focused to the process that is generating the
VM pressure. It might make sense to test this single change in isolation.
(While we cannot ignore to swap out other contexts under memory pressure,
we could do something to make it focused on the current MM a bit more.)

 + static unsigned long lastscan;
 +
 + if (lastscan == jiffies)
 + return 0;

limit the runtime of refill_inactive_scan(). This is similar to Rik's
reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
Rik's patch with your patch except this jiffies hack, does it still
achieve the same improvement?

 + int shortage = inactive_shortage();

 + if (refill_inactive_scan(DEF_PRIORITY, 0)  shortage)
   /* If refill_inactive_scan failed, try to page stuff out.. */
   swap_out(DEF_PRIORITY, gfp_mask);

 + return 0;

(i cannot see how this chunk affects the VM, AFAICS this too makes the
zapping of the cache less agressive.)

perhaps the best would be to first test Rik's patch on pre7-vanilla, it
should go in the same direction your changes go, i think?

Ingo

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Ingo Molnar wrote:

 On Thu, 26 Apr 2001, Mike Galbraith wrote:

  2.4.4.pre7.virgin
  real11m33.589s

  2.4.4.pre7.sillyness
  real9m30.336s

 very interesting. Looks like there are still reserves in the VM, for heavy
 workloads. (and swapping is all about heavy workloads.)

 it would be interesting to see why your patch has such a good effect.
 (and it would be nice get the same improvement in a clean way.)

It's not good.. it's an ugly beaste from hell ;-)

  -   if (!page-age)
  -   deactivate_page(page);
  +   age_page_down(page);

 this one preserves the cache a bit more agressively.

(intent)


  /* Always start by trying to penalize the process that is allocating memory */
  if (mm)
  -   retval = swap_out_mm(mm, swap_amount(mm));
  +   return swap_out_mm(mm, swap_amount(mm));

 keep swap-out activity more focused to the process that is generating the
 VM pressure. It might make sense to test this single change in isolation.
 (While we cannot ignore to swap out other contexts under memory pressure,
 we could do something to make it focused on the current MM a bit more.)

(also the intent.. make 'em pagein like a bugger to slow down cache munh)

  +   static unsigned long lastscan;
  +
  +   if (lastscan == jiffies)
  +   return 0;

 limit the runtime of refill_inactive_scan(). This is similar to Rik's
 reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
 Rik's patch with your patch except this jiffies hack, does it still
 achieve the same improvement?

No.  It livelocked on me with almost all active pages exausted.

  +   int shortage = inactive_shortage();
 
  +   if (refill_inactive_scan(DEF_PRIORITY, 0)  shortage)
  /* If refill_inactive_scan failed, try to page stuff out.. */
  swap_out(DEF_PRIORITY, gfp_mask);
 
  +   return 0;

 (i cannot see how this chunk affects the VM, AFAICS this too makes the
 zapping of the cache less agressive.)

(more folks get snagged on write.. they can't eat cache so fast)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Marcelo Tosatti wrote:

 Have you tried to tune SWAP_SHIFT and the priority used inside swap_out()
 to see if you can make pte deactivation less aggressive ?

Many many many times.. no dice.

(more agressive is much better for surge regulation.. power brakes!)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Mike Galbraith wrote:

  limit the runtime of refill_inactive_scan(). This is similar to Rik's
  reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
  Rik's patch with your patch except this jiffies hack, does it still
  achieve the same improvement?

 No.  It livelocked on me with almost all active pages exausted.

Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.
Still want me to try mixing?

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Marcelo Tosatti


On Thu, 26 Apr 2001, Mike Galbraith wrote:

  (i cannot see how this chunk affects the VM, AFAICS this too makes the
  zapping of the cache less agressive.)
 
 (more folks get snagged on write.. they can't eat cache so fast)

What about GFP_BUFFER allocations ? :)

I suspect the jiffies hack is avoiding GFP_BUFFER allocations to eat cache
insanely.

Easy way to confirm that: add the kswapd wait queue again and make
allocators which don't have __GFP_IO set wait on that in
try_to_free_pages(). 




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Marcelo Tosatti wrote:

 On Thu, 26 Apr 2001, Mike Galbraith wrote:

   (i cannot see how this chunk affects the VM, AFAICS this too makes the
   zapping of the cache less agressive.)
 
  (more folks get snagged on write.. they can't eat cache so fast)

 What about GFP_BUFFER allocations ? :)

 I suspect the jiffies hack is avoiding GFP_BUFFER allocations to eat cache
 insanely.

(I think it's aging speed in general.  If user tasks aren't doing it,
kswapd is.)

 Easy way to confirm that: add the kswapd wait queue again and make
 allocators which don't have __GFP_IO set wait on that in
 try_to_free_pages().

I've tried not allowing those to enter try_to_free_pages() [nogo].
I'll try a waitqueue.  wait_event(waitqueue_active(kswapd_wait)) ok?

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Linus Torvalds


On Thu, 26 Apr 2001, Mike Galbraith wrote:
 
 2.4.4.pre7.virgin
 real11m33.589s
 user7m57.790s
 sys 0m38.730s
 
 2.4.4.pre7.sillyness
 real9m30.336s
 user7m55.270s
 sys 0m38.510s

Well, I actually like parts of this. The always swap out current mm one
looks rather dangerous, and the lastscan jiffy thing is too ugly for
words, but refill_inactive() looks much nicer. There is beauty in
simplicity. 

The page aging in drop_pte feels pretty harsh, though.

Have you looked at free_pte()? I don't like that function, and it might
make a difference. There are several small nits with it:

 - it should probably try to deactivate the page. If drop_pte does that
   when it deacctivates a page involuntarily, why not do it for a real we
   just free'd the page voluntarily?

 - swap-cache pages should probably not just be de-activated, but actively
   aged down. Right now, they are neither, so we have to work all the 
   way through refill_inactive() and then page_launder() to clear them
   out. Even though the page may be entirely useless by now (we had a
   complex special case that caught and short-circuited some of the pages,
   and maybe it was worth it. But maybe the right thing is to just age
   them down and naturally deactivate them?)

   After all, we aged them up for references to this virtual
   mapping, and free_pte() just made it go away. Unlike normal page cache
   pages, we don't get any advantage from trying to cache the things
   across multiple VM's.

 - we're dropping the accessed bit on the floor. In the vmscan case the
   accessed bit would have aged the page up. 

On the other hand, to offset some of these, we actually count the page
accessed _twice_ sometimes: we count it on lookup, and we count it when we
see the accessed bit in vmscan.c. Which results in some pages getting aged
up twice for just one access if we go through the vmscan logic, while if
we just map and unmap them they get counted just once.

Obviously the page aging logic seems to be making a noticeable difference
to you. So looking at page aging logic issues in the bigger picture migth
be worthwhile - not just staring at the actual swap-out code. The fact is,
the swap-out-code cannot get the aging right if the rest of the system
ignores it or does it only for some cases.

I _think_ the logic should be something along the lines of: freeing the
page amounts to a implied down-aging of the page, but the 'accessed' bit
would have aged it up, so the two take each other out. But if so, the
free_pte() logic should have something like

if (page-mapping) {
if (!pte_young(pte) || PageSwapCache(page))
age_page_down_ageonly(page);
if (!page-age)
deactivate_page(page);
}

instead of just ignoring these issues completely.

Comments? 

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Rik van Riel

On Thu, 26 Apr 2001, Mike Galbraith wrote:

 1. pagecache is becoming swapcache and must be aged before anything is
 done.  Meanwhile we're calling refill_inactive_scan() so fast that noone
 has a chance to touch a page.   Age becomes a simple counter.. I think.
 When you hit a big surge, swap pages are at the back of all lists, so all
 of your valuable cache gets reclaimed before we write even one swap page.

Does the patch I sent to [EMAIL PROTECTED] last night help in
this ?

I found that the way refill_inactive_scan() and swap_out() are being
called from the main loop in refill_inactive() aren't equal and have
fixed that in a way which (IMHO) also beautifies the code a bit.

(and makes sure background aging doesn't get out of hand with a few
simple checks)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Rik van Riel

On Thu, 26 Apr 2001, Mike Galbraith wrote:
 On Thu, 26 Apr 2001, Mike Galbraith wrote:
 
   limit the runtime of refill_inactive_scan(). This is similar to Rik's
   reclaim-limit+aging-tuning patch to linux-mm yesterday. could you try
   Rik's patch with your patch except this jiffies hack, does it still
   achieve the same improvement?
 
  No.  It livelocked on me with almost all active pages exausted.
 
 Misspoke.. I didn't try the two mixed.  Rik's patch livelocked me.

Interesting. The semantics of my patch are practically the same as
those of the stock kernel ... can you get the stock kernel to
livelock on you, too ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-26 Thread Mike Galbraith

On Thu, 26 Apr 2001, Mike Galbraith wrote:

 On Thu, 26 Apr 2001, Rik van Riel wrote:

  On Thu, 26 Apr 2001, Mike Galbraith wrote:
 
   1. pagecache is becoming swapcache and must be aged before anything is
   done.  Meanwhile we're calling refill_inactive_scan() so fast that noone
   has a chance to touch a page.   Age becomes a simple counter.. I think.
   When you hit a big surge, swap pages are at the back of all lists, so all
   of your valuable cache gets reclaimed before we write even one swap page.
 
  Does the patch I sent to [EMAIL PROTECTED] last night help in
  this ?
 
  I found that the way refill_inactive_scan() and swap_out() are being
  called from the main loop in refill_inactive() aren't equal and have
  fixed that in a way which (IMHO) also beautifies the code a bit.
 
  (and makes sure background aging doesn't get out of hand with a few
  simple checks)

 That patch livelocked my box with only ~1000 pages on any list.

 I can go back and test some more if you want.

I put it back in, the livelock is 100% repeatable (10 repeats).  It's
deactivating/laundering itself to death.  :) 3mb for my 386-20/0.96.9
would have been enough to frolic (slowly) in.. this box can't live.

   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
 ...
37  3  0988   1940   1912  32368   0   04049  117   293  90  10   0
47  0  0   1348   1940   1320  26816   0   08017  121   415  86  14   0
39  1  0   1364   1972   1076  20728   0   0   184 0  124   294  84  16   0
39  3  2   1456   1940232  14720   0 57267   638  159  2225  85  15   0
29 10  2   1456   1612416   6908   0   0   25272  235  2253  84  16   0
17 18  2   1292   1420608   4216   0  60   340   454  279  3289  29  20  51
33  1  2   1296   1408488   3464   0   4   317   752  295  2432  11  47  42
locked here

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-25 Thread Mike Galbraith

On Thu, 26 Apr 2001, Marcelo Tosatti wrote:

> On Thu, 26 Apr 2001, Mike Galbraith wrote:
>
> > > Comments?
> >
> > More of a question.  Neither Ingo's nor your patch makes any difference
> > on my UP box (128mb PIII/500) doing make -j30.
>
> Well, my patch incorporates Ingo's patch.
>
> It is now integrated into pre7, btw.
>
> >  It is taking me 11 1/2
> >  minutes to do this test (that's horrible).  Any idea why?~
>
> Not really.

(darn)

> If you have concurrent swapping activity, pre7 should improve the
> performance since all swap IO is asynchronous now. Only paths which really
> need to stop and wait for the swap data are doing it. (eg do_swap_page)

I'll grab virgin pre7 in a few.

> > (I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
> > this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)
>
> Which kind of changes you're doing to get better performance on this test?

Prevent cache collapse at all cost is #one.  Matching deactivation rate
to launder/reclaim.. et al.  Trying HARD to give PG_referenced a chance
to happen between aging scans [1].

-Mike

1. pagecache is becoming swapcache and must be aged before anything is
done.  Meanwhile we're calling refill_inactive_scan() so fast that noone
has a chance to touch a page.   Age becomes a simple counter.. I think.
When you hit a big surge, swap pages are at the back of all lists, so all
of your valuable cache gets reclaimed before we write even one swap page.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-25 Thread Marcelo Tosatti



On Thu, 26 Apr 2001, Mike Galbraith wrote:

> > Comments?
> 
> More of a question.  Neither Ingo's nor your patch makes any difference
> on my UP box (128mb PIII/500) doing make -j30.

Well, my patch incorporates Ingo's patch.

It is now integrated into pre7, btw. 

>  It is taking me 11 1/2
>  minutes to do this test (that's horrible).  Any idea why?~

Not really.

If you have concurrent swapping activity, pre7 should improve the
performance since all swap IO is asynchronous now. Only paths which really
need to stop and wait for the swap data are doing it. (eg do_swap_page)

> (I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
> this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)

Which kind of changes you're doing to get better performance on this test?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-25 Thread Mike Galbraith

On Wed, 25 Apr 2001, Marcelo Tosatti wrote:

> On Tue, 24 Apr 2001, Linus Torvalds wrote:
>
> > Basically, I don't want to mix synchronous and asynchronous
> > interfaces. Everything should be asynchronous by default, and waiting
> > should be explicit.
>
> The following patch changes all swap IO functions to be asynchronous by
> default and changes the callers to wait when needed (except
> rw_swap_page_base which will block writers if there are too many in flight
> swap IOs).
>
> Ingo's find_get_swapcache_page() does not wait on locked pages anymore,
> which is now up to the callers.
>
> time make -j32 test with 4 CPUs machine, 128M ram and 128M swap:
>
> pre5
> 
> 228.04user 28.14system 5:16.52elapsed 80%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (525113major+678617minor)pagefaults 0swaps
>
> pre5 + attached patch
> 
> 227.18user 25.49system 3:40.53elapsed 114%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (495387major+644924minor)pagefaults 0swaps
>
>
> Comments?

More of a question.  Neither Ingo's nor your patch makes any difference
on my UP box (128mb PIII/500) doing make -j30.  It is taking me 11 1/2
minutes to do this test (that's horrible).  Any idea why?

(I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-25 Thread Marcelo Tosatti


Resending...

-- Forwarded message --
Date: Tue, 24 Apr 2001 23:28:38 -0300 (BRT)
From: Marcelo Tosatti <[EMAIL PROTECTED]>
To: Linus Torvalds <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>, Alan Cox <[EMAIL PROTECTED]>,
 Linux Kernel List <[EMAIL PROTECTED]>, [EMAIL PROTECTED],
 Rik van Riel <[EMAIL PROTECTED]>,
 Szabolcs Szakacsits <[EMAIL PROTECTED]>
Subject: Re: [patch] swap-speedup-2.4.3-B3



On Tue, 24 Apr 2001, Linus Torvalds wrote:

> Basically, I don't want to mix synchronous and asynchronous
> interfaces. Everything should be asynchronous by default, and waiting
> should be explicit.

The following patch changes all swap IO functions to be asynchronous by
default and changes the callers to wait when needed (except
rw_swap_page_base which will block writers if there are too many in flight
swap IOs). 

Ingo's find_get_swapcache_page() does not wait on locked pages anymore,
which is now up to the callers.

time make -j32 test with 4 CPUs machine, 128M ram and 128M swap: 

pre5

228.04user 28.14system 5:16.52elapsed 80%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (525113major+678617minor)pagefaults 0swaps

pre5 + attached patch 

227.18user 25.49system 3:40.53elapsed 114%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (495387major+644924minor)pagefaults 0swaps


Comments? 


diff -Nur linux.orig/include/linux/pagemap.h linux/include/linux/pagemap.h
--- linux.orig/include/linux/pagemap.h  Wed Apr 25 00:51:36 2001
+++ linux/include/linux/pagemap.h   Wed Apr 25 00:53:04 2001
@@ -77,7 +77,12 @@
unsigned long index, struct page **hash);
 extern void lock_page(struct page *page);
 #define find_lock_page(mapping, index) \
-   __find_lock_page(mapping, index, page_hash(mapping, index))
+   __find_lock_page(mapping, index, page_hash(mapping, index))
+
+extern struct page * __find_get_swapcache_page (struct address_space * mapping,
+   unsigned long index, struct page **hash);
+#define find_get_swapcache_page(mapping, index) \
+   __find_get_swapcache_page(mapping, index, page_hash(mapping, index))
 
 extern void __add_page_to_hash_queue(struct page * page, struct page **p);
 
diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h Wed Apr 25 00:51:36 2001
+++ linux/include/linux/swap.h  Wed Apr 25 00:53:04 2001
@@ -111,8 +111,8 @@
 extern int try_to_free_pages(unsigned int gfp_mask);
 
 /* linux/mm/page_io.c */
-extern void rw_swap_page(int, struct page *, int);
-extern void rw_swap_page_nolock(int, swp_entry_t, char *, int);
+extern void rw_swap_page(int, struct page *);
+extern void rw_swap_page_nolock(int, swp_entry_t, char *);
 
 /* linux/mm/page_alloc.c */
 
@@ -121,8 +121,7 @@
 extern void add_to_swap_cache(struct page *, swp_entry_t);
 extern int swap_check_entry(unsigned long);
 extern struct page * lookup_swap_cache(swp_entry_t);
-extern struct page * read_swap_cache_async(swp_entry_t, int);
-#define read_swap_cache(entry) read_swap_cache_async(entry, 1);
+extern struct page * read_swap_cache_async(swp_entry_t);
 
 /* linux/mm/oom_kill.c */
 extern int out_of_memory(void);
diff -Nur linux.orig/mm/filemap.c linux/mm/filemap.c
--- linux.orig/mm/filemap.c Wed Apr 25 00:51:35 2001
+++ linux/mm/filemap.c  Wed Apr 25 00:53:04 2001
@@ -678,6 +678,34 @@
 }
 
 /*
+ * Find a swapcache page (and get a reference) or return NULL.
+ * The SwapCache check is protected by the pagecache lock.
+ */
+struct page * __find_get_swapcache_page(struct address_space *mapping,
+ unsigned long offset, struct page **hash)
+{
+   struct page *page;
+
+   /*
+* We need the LRU lock to protect against page_launder().
+*/
+
+   spin_lock(_lock);
+   page = __find_page_nolock(mapping, offset, *hash);
+   if (page) {
+   spin_lock(_lru_lock);
+   if (PageSwapCache(page)) 
+   page_cache_get(page);
+   else
+   page = NULL;
+   spin_unlock(_lru_lock);
+   }
+   spin_unlock(_lock);
+
+   return page;
+}
+
+/*
  * Same as the above, but lock the page too, verifying that
  * it's still valid once we own it.
  */
diff -Nur linux.orig/mm/memory.c linux/mm/memory.c
--- linux.orig/mm/memory.c  Wed Apr 25 00:51:35 2001
+++ linux/mm/memory.c   Wed Apr 25 00:53:04 2001
@@ -1040,7 +1040,7 @@
break;
}
/* Ok, do the async read-ahead now */
-   new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 
0);
+   new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset));
if (new_page != NULL)
page_cache_release(new_page);
swap_free(SWP_ENTRY(SWP_TYPE(entry), 

Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-25 Thread Marcelo Tosatti


Resending...

-- Forwarded message --
Date: Tue, 24 Apr 2001 23:28:38 -0300 (BRT)
From: Marcelo Tosatti [EMAIL PROTECTED]
To: Linus Torvalds [EMAIL PROTECTED]
Cc: Ingo Molnar [EMAIL PROTECTED], Alan Cox [EMAIL PROTECTED],
 Linux Kernel List [EMAIL PROTECTED], [EMAIL PROTECTED],
 Rik van Riel [EMAIL PROTECTED],
 Szabolcs Szakacsits [EMAIL PROTECTED]
Subject: Re: [patch] swap-speedup-2.4.3-B3



On Tue, 24 Apr 2001, Linus Torvalds wrote:

 Basically, I don't want to mix synchronous and asynchronous
 interfaces. Everything should be asynchronous by default, and waiting
 should be explicit.

The following patch changes all swap IO functions to be asynchronous by
default and changes the callers to wait when needed (except
rw_swap_page_base which will block writers if there are too many in flight
swap IOs). 

Ingo's find_get_swapcache_page() does not wait on locked pages anymore,
which is now up to the callers.

time make -j32 test with 4 CPUs machine, 128M ram and 128M swap: 

pre5

228.04user 28.14system 5:16.52elapsed 80%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (525113major+678617minor)pagefaults 0swaps

pre5 + attached patch 

227.18user 25.49system 3:40.53elapsed 114%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (495387major+644924minor)pagefaults 0swaps


Comments? 


diff -Nur linux.orig/include/linux/pagemap.h linux/include/linux/pagemap.h
--- linux.orig/include/linux/pagemap.h  Wed Apr 25 00:51:36 2001
+++ linux/include/linux/pagemap.h   Wed Apr 25 00:53:04 2001
@@ -77,7 +77,12 @@
unsigned long index, struct page **hash);
 extern void lock_page(struct page *page);
 #define find_lock_page(mapping, index) \
-   __find_lock_page(mapping, index, page_hash(mapping, index))
+   __find_lock_page(mapping, index, page_hash(mapping, index))
+
+extern struct page * __find_get_swapcache_page (struct address_space * mapping,
+   unsigned long index, struct page **hash);
+#define find_get_swapcache_page(mapping, index) \
+   __find_get_swapcache_page(mapping, index, page_hash(mapping, index))
 
 extern void __add_page_to_hash_queue(struct page * page, struct page **p);
 
diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h Wed Apr 25 00:51:36 2001
+++ linux/include/linux/swap.h  Wed Apr 25 00:53:04 2001
@@ -111,8 +111,8 @@
 extern int try_to_free_pages(unsigned int gfp_mask);
 
 /* linux/mm/page_io.c */
-extern void rw_swap_page(int, struct page *, int);
-extern void rw_swap_page_nolock(int, swp_entry_t, char *, int);
+extern void rw_swap_page(int, struct page *);
+extern void rw_swap_page_nolock(int, swp_entry_t, char *);
 
 /* linux/mm/page_alloc.c */
 
@@ -121,8 +121,7 @@
 extern void add_to_swap_cache(struct page *, swp_entry_t);
 extern int swap_check_entry(unsigned long);
 extern struct page * lookup_swap_cache(swp_entry_t);
-extern struct page * read_swap_cache_async(swp_entry_t, int);
-#define read_swap_cache(entry) read_swap_cache_async(entry, 1);
+extern struct page * read_swap_cache_async(swp_entry_t);
 
 /* linux/mm/oom_kill.c */
 extern int out_of_memory(void);
diff -Nur linux.orig/mm/filemap.c linux/mm/filemap.c
--- linux.orig/mm/filemap.c Wed Apr 25 00:51:35 2001
+++ linux/mm/filemap.c  Wed Apr 25 00:53:04 2001
@@ -678,6 +678,34 @@
 }
 
 /*
+ * Find a swapcache page (and get a reference) or return NULL.
+ * The SwapCache check is protected by the pagecache lock.
+ */
+struct page * __find_get_swapcache_page(struct address_space *mapping,
+ unsigned long offset, struct page **hash)
+{
+   struct page *page;
+
+   /*
+* We need the LRU lock to protect against page_launder().
+*/
+
+   spin_lock(pagecache_lock);
+   page = __find_page_nolock(mapping, offset, *hash);
+   if (page) {
+   spin_lock(pagemap_lru_lock);
+   if (PageSwapCache(page)) 
+   page_cache_get(page);
+   else
+   page = NULL;
+   spin_unlock(pagemap_lru_lock);
+   }
+   spin_unlock(pagecache_lock);
+
+   return page;
+}
+
+/*
  * Same as the above, but lock the page too, verifying that
  * it's still valid once we own it.
  */
diff -Nur linux.orig/mm/memory.c linux/mm/memory.c
--- linux.orig/mm/memory.c  Wed Apr 25 00:51:35 2001
+++ linux/mm/memory.c   Wed Apr 25 00:53:04 2001
@@ -1040,7 +1040,7 @@
break;
}
/* Ok, do the async read-ahead now */
-   new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 
0);
+   new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset));
if (new_page != NULL)
page_cache_release(new_page);
swap_free(SWP_ENTRY(SWP_TYPE(entry), offset));
@@ -1063,13 +1063,13

Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-25 Thread Mike Galbraith

On Wed, 25 Apr 2001, Marcelo Tosatti wrote:

 On Tue, 24 Apr 2001, Linus Torvalds wrote:

  Basically, I don't want to mix synchronous and asynchronous
  interfaces. Everything should be asynchronous by default, and waiting
  should be explicit.

 The following patch changes all swap IO functions to be asynchronous by
 default and changes the callers to wait when needed (except
 rw_swap_page_base which will block writers if there are too many in flight
 swap IOs).

 Ingo's find_get_swapcache_page() does not wait on locked pages anymore,
 which is now up to the callers.

 time make -j32 test with 4 CPUs machine, 128M ram and 128M swap:

 pre5
 
 228.04user 28.14system 5:16.52elapsed 80%CPU (0avgtext+0avgdata
 0maxresident)k
 0inputs+0outputs (525113major+678617minor)pagefaults 0swaps

 pre5 + attached patch
 
 227.18user 25.49system 3:40.53elapsed 114%CPU (0avgtext+0avgdata
 0maxresident)k
 0inputs+0outputs (495387major+644924minor)pagefaults 0swaps


 Comments?

More of a question.  Neither Ingo's nor your patch makes any difference
on my UP box (128mb PIII/500) doing make -j30.  It is taking me 11 1/2
minutes to do this test (that's horrible).  Any idea why?

(I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-25 Thread Marcelo Tosatti



On Thu, 26 Apr 2001, Mike Galbraith wrote:

  Comments?
 
 More of a question.  Neither Ingo's nor your patch makes any difference
 on my UP box (128mb PIII/500) doing make -j30.

Well, my patch incorporates Ingo's patch.

It is now integrated into pre7, btw. 

  It is taking me 11 1/2
  minutes to do this test (that's horrible).  Any idea why?~

Not really.

If you have concurrent swapping activity, pre7 should improve the
performance since all swap IO is asynchronous now. Only paths which really
need to stop and wait for the swap data are doing it. (eg do_swap_page)

 (I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
 this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)

Which kind of changes you're doing to get better performance on this test?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3 (fwd)

2001-04-25 Thread Mike Galbraith

On Thu, 26 Apr 2001, Marcelo Tosatti wrote:

 On Thu, 26 Apr 2001, Mike Galbraith wrote:

   Comments?
 
  More of a question.  Neither Ingo's nor your patch makes any difference
  on my UP box (128mb PIII/500) doing make -j30.

 Well, my patch incorporates Ingo's patch.

 It is now integrated into pre7, btw.

   It is taking me 11 1/2
   minutes to do this test (that's horrible).  Any idea why?~

 Not really.

(darn)

 If you have concurrent swapping activity, pre7 should improve the
 performance since all swap IO is asynchronous now. Only paths which really
 need to stop and wait for the swap data are doing it. (eg do_swap_page)

I'll grab virgin pre7 in a few.

  (I can get it to under 9 with MUCH extremely ugly tinkering.  I've done
  this enough to know that I _should_ be able to do 8 1/2 minutes ~easily)

 Which kind of changes you're doing to get better performance on this test?

Prevent cache collapse at all cost is #one.  Matching deactivation rate
to launder/reclaim.. et al.  Trying HARD to give PG_referenced a chance
to happen between aging scans [1].

-Mike

1. pagecache is becoming swapcache and must be aged before anything is
done.  Meanwhile we're calling refill_inactive_scan() so fast that noone
has a chance to touch a page.   Age becomes a simple counter.. I think.
When you hit a big surge, swap pages are at the back of all lists, so all
of your valuable cache gets reclaimed before we write even one swap page.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3

2001-04-24 Thread Marcelo Tosatti



On Tue, 24 Apr 2001, Linus Torvalds wrote:

> Basically, I don't want to mix synchronous and asynchronous
> interfaces. Everything should be asynchronous by default, and waiting
> should be explicit.

The following patch changes all swap IO functions to be asynchronous by
default and changes the callers to wait when needed (except
rw_swap_page_base which will block writers if there are too many in flight
swap IOs). 

Ingo's find_get_swapcache_page() does not wait on locked pages anymore,
which is now up to the callers.

time make -j32 test with 4 CPUs machine, 128M ram and 128M swap: 

pre5

228.04user 28.14system 5:16.52elapsed 80%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (525113major+678617minor)pagefaults 0swaps

pre5 + attached patch 

227.18user 25.49system 3:40.53elapsed 114%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (495387major+644924minor)pagefaults 0swaps


Comments? 


diff -Nur linux.orig/include/linux/pagemap.h linux/include/linux/pagemap.h
--- linux.orig/include/linux/pagemap.h  Wed Apr 25 00:51:36 2001
+++ linux/include/linux/pagemap.h   Wed Apr 25 00:53:04 2001
@@ -77,7 +77,12 @@
unsigned long index, struct page **hash);
 extern void lock_page(struct page *page);
 #define find_lock_page(mapping, index) \
-   __find_lock_page(mapping, index, page_hash(mapping, index))
+   __find_lock_page(mapping, index, page_hash(mapping, index))
+
+extern struct page * __find_get_swapcache_page (struct address_space * mapping,
+   unsigned long index, struct page **hash);
+#define find_get_swapcache_page(mapping, index) \
+   __find_get_swapcache_page(mapping, index, page_hash(mapping, index))
 
 extern void __add_page_to_hash_queue(struct page * page, struct page **p);
 
diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h Wed Apr 25 00:51:36 2001
+++ linux/include/linux/swap.h  Wed Apr 25 00:53:04 2001
@@ -111,8 +111,8 @@
 extern int try_to_free_pages(unsigned int gfp_mask);
 
 /* linux/mm/page_io.c */
-extern void rw_swap_page(int, struct page *, int);
-extern void rw_swap_page_nolock(int, swp_entry_t, char *, int);
+extern void rw_swap_page(int, struct page *);
+extern void rw_swap_page_nolock(int, swp_entry_t, char *);
 
 /* linux/mm/page_alloc.c */
 
@@ -121,8 +121,7 @@
 extern void add_to_swap_cache(struct page *, swp_entry_t);
 extern int swap_check_entry(unsigned long);
 extern struct page * lookup_swap_cache(swp_entry_t);
-extern struct page * read_swap_cache_async(swp_entry_t, int);
-#define read_swap_cache(entry) read_swap_cache_async(entry, 1);
+extern struct page * read_swap_cache_async(swp_entry_t);
 
 /* linux/mm/oom_kill.c */
 extern int out_of_memory(void);
diff -Nur linux.orig/mm/filemap.c linux/mm/filemap.c
--- linux.orig/mm/filemap.c Wed Apr 25 00:51:35 2001
+++ linux/mm/filemap.c  Wed Apr 25 00:53:04 2001
@@ -678,6 +678,34 @@
 }
 
 /*
+ * Find a swapcache page (and get a reference) or return NULL.
+ * The SwapCache check is protected by the pagecache lock.
+ */
+struct page * __find_get_swapcache_page(struct address_space *mapping,
+ unsigned long offset, struct page **hash)
+{
+   struct page *page;
+
+   /*
+* We need the LRU lock to protect against page_launder().
+*/
+
+   spin_lock(_lock);
+   page = __find_page_nolock(mapping, offset, *hash);
+   if (page) {
+   spin_lock(_lru_lock);
+   if (PageSwapCache(page)) 
+   page_cache_get(page);
+   else
+   page = NULL;
+   spin_unlock(_lru_lock);
+   }
+   spin_unlock(_lock);
+
+   return page;
+}
+
+/*
  * Same as the above, but lock the page too, verifying that
  * it's still valid once we own it.
  */
diff -Nur linux.orig/mm/memory.c linux/mm/memory.c
--- linux.orig/mm/memory.c  Wed Apr 25 00:51:35 2001
+++ linux/mm/memory.c   Wed Apr 25 00:53:04 2001
@@ -1040,7 +1040,7 @@
break;
}
/* Ok, do the async read-ahead now */
-   new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 
0);
+   new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset));
if (new_page != NULL)
page_cache_release(new_page);
swap_free(SWP_ENTRY(SWP_TYPE(entry), offset));
@@ -1063,13 +1063,13 @@
if (!page) {
lock_kernel();
swapin_readahead(entry);
-   page = read_swap_cache(entry);
+   page = read_swap_cache_async(entry);
unlock_kernel();
if (!page) {
spin_lock(>page_table_lock);
return -1;
}
-
+   wait_on_page(page);
flush_page_to_ram(page);

Re: [patch] swap-speedup-2.4.3-B3

2001-04-24 Thread Linus Torvalds


On Tue, 24 Apr 2001, Ingo Molnar wrote:
> 
> the latest swap-speedup patch can be found at:

Please don't add more of those horrible "wait" arguments.

Make two different versions of a function instead. It's going to clean up
and simplify the code, and there really isn't any reason to do what you're
doing.

You should split up the logic differently: if you want to wait for the
page, then DO so:

page = lookup_swap_cache(..);
if (page) {
wait_for_swap_cache:valid(page);
.. use page ..
}

Note how much more readable and UNDERSTANDABLE the above is, compared to

page = lookup_swap_cache(..., 1);
if (page) {
...

and note also how splitting up the waiting will

 - simplify the swap cache lookup function, making it faster for people
   who do _NOT_ want to wait.

 - make it easier to statically check the correctness of programs by just
   eye-balling them ("Hey, he's calling 'wait' with the spinlock held").

 - more easily moving the wait around, allowing for more concurrency.

Basically, I don't want to mix synchronous and asynchronous
interfaces. Everything should be asynchronous by default, and waiting
should be explicit.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[patch] swap-speedup-2.4.3-B3

2001-04-24 Thread Ingo Molnar


the latest swap-speedup patch can be found at:

  http://people.redhat.com/mingo/swap-speedup/swap-speedup-2.4.3-B3

(the patch is against 2.4.4-pre6 or 2.4.3-ac13.)

-B3 includes Marcelo's patch for another area that blocks unnecesserily on
locked swapcache pages: async swapcache readahead. Marcello did some tests
which shows that this fix brought some nice improvements too.

"make -j32 bzImage" using 128MB mem, 128MB swap, 4 CPUs:

  stock 2.4.3-ac13
  
  real4m0.678s
  user4m2.870s
  sys 0m38.920s

  swap-speedup-A2
  ---
  real3m24.190s
  user4m1.070s
  sys 0m31.950s

  swap-speedup-B3 (A2 + Marcelo's swapin-readahead non-blocking patch)
  ---
  real3m7.410s
  user4m0.940s
  sys 0m28.680s

ie. for this kernel compile test:

   swap-speedup-A2 is a 18% speedup relative to stock 2.4.3-ac13
   swap-speedup-B3 is a 28% speedup relative to stock 2.4.3-ac13

and the amount of CPU time spent in the kernel has been reduced
significantly as well.

I believe all the correctness and SMP-locking issues have been taken care
of in -B3 as well.

Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[patch] swap-speedup-2.4.3-B3

2001-04-24 Thread Ingo Molnar


the latest swap-speedup patch can be found at:

  http://people.redhat.com/mingo/swap-speedup/swap-speedup-2.4.3-B3

(the patch is against 2.4.4-pre6 or 2.4.3-ac13.)

-B3 includes Marcelo's patch for another area that blocks unnecesserily on
locked swapcache pages: async swapcache readahead. Marcello did some tests
which shows that this fix brought some nice improvements too.

make -j32 bzImage using 128MB mem, 128MB swap, 4 CPUs:

  stock 2.4.3-ac13
  
  real4m0.678s
  user4m2.870s
  sys 0m38.920s

  swap-speedup-A2
  ---
  real3m24.190s
  user4m1.070s
  sys 0m31.950s

  swap-speedup-B3 (A2 + Marcelo's swapin-readahead non-blocking patch)
  ---
  real3m7.410s
  user4m0.940s
  sys 0m28.680s

ie. for this kernel compile test:

   swap-speedup-A2 is a 18% speedup relative to stock 2.4.3-ac13
   swap-speedup-B3 is a 28% speedup relative to stock 2.4.3-ac13

and the amount of CPU time spent in the kernel has been reduced
significantly as well.

I believe all the correctness and SMP-locking issues have been taken care
of in -B3 as well.

Ingo

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] swap-speedup-2.4.3-B3

2001-04-24 Thread Linus Torvalds


On Tue, 24 Apr 2001, Ingo Molnar wrote:
 
 the latest swap-speedup patch can be found at:

Please don't add more of those horrible wait arguments.

Make two different versions of a function instead. It's going to clean up
and simplify the code, and there really isn't any reason to do what you're
doing.

You should split up the logic differently: if you want to wait for the
page, then DO so:

page = lookup_swap_cache(..);
if (page) {
wait_for_swap_cache:valid(page);
.. use page ..
}

Note how much more readable and UNDERSTANDABLE the above is, compared to

page = lookup_swap_cache(..., 1);
if (page) {
...

and note also how splitting up the waiting will

 - simplify the swap cache lookup function, making it faster for people
   who do _NOT_ want to wait.

 - make it easier to statically check the correctness of programs by just
   eye-balling them (Hey, he's calling 'wait' with the spinlock held).

 - more easily moving the wait around, allowing for more concurrency.

Basically, I don't want to mix synchronous and asynchronous
interfaces. Everything should be asynchronous by default, and waiting
should be explicit.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/