Re: [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3

2001-04-18 Thread Christoph Rohland

Hi Stephen,

On Tue, 17 Apr 2001, Stephen C. Tweedie wrote:
> I don't see the problem.  shmem_getpage_locked appears to back off
> correctly if it encounters a swap-cached page already existing if
> swapin_readahead has installed the page first, at least with the
> code in 2.4.3-ac5.

But the swap count can be increased by anybody without having the page
lock. So the check triggers and is bogus. See my old patch.

> There *does* appear to be a race, but it's swapin_readahead racing
> with shmem_writepage.  That code does not search for an existing
> entry in the swap cache when it decides to move a shmem page to
> swap, so we can install the page twice and end up doing a lookup on
> the wrong physical page if there is swap readahead going on.

I cannot follow you here. How can we have a swap cache entry if there
is no swap entry. . . . Oh stop you mean swapin_readahead does swap in
some totally bogus page into the swap cache after we did
__get_swap_page? I never thought about that!

> To fix that, shmem_writepage needs to do a swap cache lookup and
> lock before installing the new page --- it should probably just copy
> the new page into the old one if it finds one already there.

OK I will look into that.

Greetings
Christoph


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3

2001-04-18 Thread Christoph Rohland

Hi Stephen,

On Tue, 17 Apr 2001, Stephen C. Tweedie wrote:
 I don't see the problem.  shmem_getpage_locked appears to back off
 correctly if it encounters a swap-cached page already existing if
 swapin_readahead has installed the page first, at least with the
 code in 2.4.3-ac5.

But the swap count can be increased by anybody without having the page
lock. So the check triggers and is bogus. See my old patch.

 There *does* appear to be a race, but it's swapin_readahead racing
 with shmem_writepage.  That code does not search for an existing
 entry in the swap cache when it decides to move a shmem page to
 swap, so we can install the page twice and end up doing a lookup on
 the wrong physical page if there is swap readahead going on.

I cannot follow you here. How can we have a swap cache entry if there
is no swap entry. . . . Oh stop you mean swapin_readahead does swap in
some totally bogus page into the swap cache after we did
__get_swap_page? I never thought about that!

 To fix that, shmem_writepage needs to do a swap cache lookup and
 lock before installing the new page --- it should probably just copy
 the new page into the old one if it finds one already there.

OK I will look into that.

Greetings
Christoph


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3

2001-04-17 Thread Stephen C. Tweedie

Hi,

On Sat, Apr 14, 2001 at 08:31:07PM -0300, Marcelo Tosatti wrote:
> On Sat, 14 Apr 2001, Rik van Riel wrote:
> > On Sat, 14 Apr 2001, Marcelo Tosatti wrote:
> > 
> > > There is a nasty race between shmem_getpage_locked() and
> > > swapin_readahead() with the new shmem code (introduced in
> > > 2.4.3-ac3 and merged in the main tree in 2.4.4-pre3):

> Test (multiple shm-stress) runs fine without swapin_readahead(), as
> expected.

> Stephen/Linus? 

I don't see the problem.  shmem_getpage_locked appears to back off
correctly if it encounters a swap-cached page already existing if
swapin_readahead has installed the page first, at least with the code
in 2.4.3-ac5.

There *does* appear to be a race, but it's swapin_readahead racing
with shmem_writepage.  That code does not search for an existing entry
in the swap cache when it decides to move a shmem page to swap, so we
can install the page twice and end up doing a lookup on the wrong
physical page if there is swap readahead going on.

To fix that, shmem_writepage needs to do a swap cache lookup and lock
before installing the new page --- it should probably just copy the
new page into the old one if it finds one already there.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked()/ swapin_readahead() race in 2.4.4-pre3

2001-04-14 Thread Marcelo Tosatti


On Sat, 14 Apr 2001, Rik van Riel wrote:

> On Sat, 14 Apr 2001, Marcelo Tosatti wrote:
> 
> > There is a nasty race between shmem_getpage_locked() and
> > swapin_readahead() with the new shmem code (introduced in
> > 2.4.3-ac3 and merged in the main tree in 2.4.4-pre3):
> 
> > I don't see any clean fix for this one.
> > Suggestions ?
> 
> As we discussed with Alan on irc, we could remove the (physical)
> swapin_readahead() and get 2.4 stable. Once 2.4 is stable we
> could (if needed) introduce a virtual address based readahead
> strategy for swap-backed things ... most of that code has been
> ready for months thanks to Ben LaHaise.
> 
> A virtual-address based readahead not only makes much more sense
> from a performance POV, it also cleanly gets the ugly locking
> issues out of the way.

Test (multiple shm-stress) runs fine without swapin_readahead(), as
expected.

I tried "make -j32" test (128M RAM, 4 CPU's) and got 4m17 without
readahead against 3m40 with readahead, on average. Need real swap
intensive workloads to "really" know of how much it hurts, though.

People with swap intensive workloads: please test this and report results. 

Stephen/Linus? 


Patch against 2.4.4-pre3.


diff -Nur linux.orig/include/linux/mm.h linux/include/linux/mm.h
--- linux.orig/include/linux/mm.h   Sat Apr 14 21:31:38 2001
+++ linux/include/linux/mm.hSat Apr 14 21:30:44 2001
@@ -425,7 +425,6 @@
 extern void mem_init(void);
 extern void show_mem(void);
 extern void si_meminfo(struct sysinfo * val);
-extern void swapin_readahead(swp_entry_t);
 
 /* mmap.c */
 extern void lock_vma_mappings(struct vm_area_struct *);
diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h Sat Apr 14 21:31:38 2001
+++ linux/include/linux/swap.h  Sat Apr 14 21:30:28 2001
@@ -145,7 +145,6 @@
struct inode **);
 extern int swap_duplicate(swp_entry_t);
 extern int swap_count(struct page *);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
 #define get_swap_page() __get_swap_page(1)
 extern void __swap_free(swp_entry_t, unsigned short);
 #define swap_free(entry) __swap_free((entry), 1)
diff -Nur linux.orig/mm/memory.c linux/mm/memory.c
--- linux.orig/mm/memory.c  Sat Apr 14 21:31:38 2001
+++ linux/mm/memory.c   Sat Apr 14 21:28:34 2001
@@ -1012,42 +1012,6 @@
return;
 }
 
-
-
-/* 
- * Primitive swap readahead code. We simply read an aligned block of
- * (1 << page_cluster) entries in the swap area. This method is chosen
- * because it doesn't cost us any seek time.  We also make sure to queue
- * the 'original' request together with the readahead ones...  
- */
-void swapin_readahead(swp_entry_t entry)
-{
-   int i, num;
-   struct page *new_page;
-   unsigned long offset;
-
-   /*
-* Get the number of handles we should do readahead io to. Also,
-* grab temporary references on them, releasing them as io completes.
-*/
-   num = valid_swaphandles(entry, );
-   for (i = 0; i < num; offset++, i++) {
-   /* Don't block on I/O for read-ahead */
-   if (atomic_read(_async_pages) >= pager_daemon.swap_cluster
-   * (1 << page_cluster)) {
-   while (i++ < num)
-   swap_free(SWP_ENTRY(SWP_TYPE(entry), offset++));
-   break;
-   }
-   /* Ok, do the async read-ahead now */
-   new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 
0);
-   if (new_page != NULL)
-   page_cache_release(new_page);
-   swap_free(SWP_ENTRY(SWP_TYPE(entry), offset));
-   }
-   return;
-}
-
 /*
  * We hold the mm semaphore and the page_table_lock on entry and exit.
  */
@@ -1062,7 +1026,6 @@
page = lookup_swap_cache(entry);
if (!page) {
lock_kernel();
-   swapin_readahead(entry);
page = read_swap_cache(entry);
unlock_kernel();
if (!page) {
diff -Nur linux.orig/mm/shmem.c linux/mm/shmem.c
--- linux.orig/mm/shmem.c   Sat Apr 14 21:31:38 2001
+++ linux/mm/shmem.cSat Apr 14 21:28:44 2001
@@ -328,7 +328,6 @@
if (!page) {
spin_unlock (>lock);
lock_kernel();
-   swapin_readahead(*entry);
page = read_swap_cache(*entry);
unlock_kernel();
if (!page) 
diff -Nur linux.orig/mm/swapfile.c linux/mm/swapfile.c
--- linux.orig/mm/swapfile.cThu Mar 22 14:22:15 2001
+++ linux/mm/swapfile.c Sat Apr 14 21:30:04 2001
@@ -955,34 +955,3 @@
}
return;
 }
-
-/*
- * Kernel_lock protects against swap device deletion. Grab an extra
- * reference on the swaphandle so that it dos not become unused.
- */
-int valid_swaphandles(swp_entry_t entry, unsigned long 

[NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked()/ swapin_readahead() race in 2.4.4-pre3

2001-04-14 Thread Marcelo Tosatti


On Sat, 14 Apr 2001, Rik van Riel wrote:

 On Sat, 14 Apr 2001, Marcelo Tosatti wrote:
 
  There is a nasty race between shmem_getpage_locked() and
  swapin_readahead() with the new shmem code (introduced in
  2.4.3-ac3 and merged in the main tree in 2.4.4-pre3):
 
  I don't see any clean fix for this one.
  Suggestions ?
 
 As we discussed with Alan on irc, we could remove the (physical)
 swapin_readahead() and get 2.4 stable. Once 2.4 is stable we
 could (if needed) introduce a virtual address based readahead
 strategy for swap-backed things ... most of that code has been
 ready for months thanks to Ben LaHaise.
 
 A virtual-address based readahead not only makes much more sense
 from a performance POV, it also cleanly gets the ugly locking
 issues out of the way.

Test (multiple shm-stress) runs fine without swapin_readahead(), as
expected.

I tried "make -j32" test (128M RAM, 4 CPU's) and got 4m17 without
readahead against 3m40 with readahead, on average. Need real swap
intensive workloads to "really" know of how much it hurts, though.

People with swap intensive workloads: please test this and report results. 

Stephen/Linus? 


Patch against 2.4.4-pre3.


diff -Nur linux.orig/include/linux/mm.h linux/include/linux/mm.h
--- linux.orig/include/linux/mm.h   Sat Apr 14 21:31:38 2001
+++ linux/include/linux/mm.hSat Apr 14 21:30:44 2001
@@ -425,7 +425,6 @@
 extern void mem_init(void);
 extern void show_mem(void);
 extern void si_meminfo(struct sysinfo * val);
-extern void swapin_readahead(swp_entry_t);
 
 /* mmap.c */
 extern void lock_vma_mappings(struct vm_area_struct *);
diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h
--- linux.orig/include/linux/swap.h Sat Apr 14 21:31:38 2001
+++ linux/include/linux/swap.h  Sat Apr 14 21:30:28 2001
@@ -145,7 +145,6 @@
struct inode **);
 extern int swap_duplicate(swp_entry_t);
 extern int swap_count(struct page *);
-extern int valid_swaphandles(swp_entry_t, unsigned long *);
 #define get_swap_page() __get_swap_page(1)
 extern void __swap_free(swp_entry_t, unsigned short);
 #define swap_free(entry) __swap_free((entry), 1)
diff -Nur linux.orig/mm/memory.c linux/mm/memory.c
--- linux.orig/mm/memory.c  Sat Apr 14 21:31:38 2001
+++ linux/mm/memory.c   Sat Apr 14 21:28:34 2001
@@ -1012,42 +1012,6 @@
return;
 }
 
-
-
-/* 
- * Primitive swap readahead code. We simply read an aligned block of
- * (1  page_cluster) entries in the swap area. This method is chosen
- * because it doesn't cost us any seek time.  We also make sure to queue
- * the 'original' request together with the readahead ones...  
- */
-void swapin_readahead(swp_entry_t entry)
-{
-   int i, num;
-   struct page *new_page;
-   unsigned long offset;
-
-   /*
-* Get the number of handles we should do readahead io to. Also,
-* grab temporary references on them, releasing them as io completes.
-*/
-   num = valid_swaphandles(entry, offset);
-   for (i = 0; i  num; offset++, i++) {
-   /* Don't block on I/O for read-ahead */
-   if (atomic_read(nr_async_pages) = pager_daemon.swap_cluster
-   * (1  page_cluster)) {
-   while (i++  num)
-   swap_free(SWP_ENTRY(SWP_TYPE(entry), offset++));
-   break;
-   }
-   /* Ok, do the async read-ahead now */
-   new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 
0);
-   if (new_page != NULL)
-   page_cache_release(new_page);
-   swap_free(SWP_ENTRY(SWP_TYPE(entry), offset));
-   }
-   return;
-}
-
 /*
  * We hold the mm semaphore and the page_table_lock on entry and exit.
  */
@@ -1062,7 +1026,6 @@
page = lookup_swap_cache(entry);
if (!page) {
lock_kernel();
-   swapin_readahead(entry);
page = read_swap_cache(entry);
unlock_kernel();
if (!page) {
diff -Nur linux.orig/mm/shmem.c linux/mm/shmem.c
--- linux.orig/mm/shmem.c   Sat Apr 14 21:31:38 2001
+++ linux/mm/shmem.cSat Apr 14 21:28:44 2001
@@ -328,7 +328,6 @@
if (!page) {
spin_unlock (info-lock);
lock_kernel();
-   swapin_readahead(*entry);
page = read_swap_cache(*entry);
unlock_kernel();
if (!page) 
diff -Nur linux.orig/mm/swapfile.c linux/mm/swapfile.c
--- linux.orig/mm/swapfile.cThu Mar 22 14:22:15 2001
+++ linux/mm/swapfile.c Sat Apr 14 21:30:04 2001
@@ -955,34 +955,3 @@
}
return;
 }
-
-/*
- * Kernel_lock protects against swap device deletion. Grab an extra
- * reference on the swaphandle so that it dos not become unused.
- */
-int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
-{
-