Re: [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
Hi Stephen, On Tue, 17 Apr 2001, Stephen C. Tweedie wrote: > I don't see the problem. shmem_getpage_locked appears to back off > correctly if it encounters a swap-cached page already existing if > swapin_readahead has installed the page first, at least with the > code in 2.4.3-ac5. But the swap count can be increased by anybody without having the page lock. So the check triggers and is bogus. See my old patch. > There *does* appear to be a race, but it's swapin_readahead racing > with shmem_writepage. That code does not search for an existing > entry in the swap cache when it decides to move a shmem page to > swap, so we can install the page twice and end up doing a lookup on > the wrong physical page if there is swap readahead going on. I cannot follow you here. How can we have a swap cache entry if there is no swap entry. . . . Oh stop you mean swapin_readahead does swap in some totally bogus page into the swap cache after we did __get_swap_page? I never thought about that! > To fix that, shmem_writepage needs to do a swap cache lookup and > lock before installing the new page --- it should probably just copy > the new page into the old one if it finds one already there. OK I will look into that. Greetings Christoph - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
Hi, On Sat, Apr 14, 2001 at 08:31:07PM -0300, Marcelo Tosatti wrote: > On Sat, 14 Apr 2001, Rik van Riel wrote: > > On Sat, 14 Apr 2001, Marcelo Tosatti wrote: > > > > > There is a nasty race between shmem_getpage_locked() and > > > swapin_readahead() with the new shmem code (introduced in > > > 2.4.3-ac3 and merged in the main tree in 2.4.4-pre3): > Test (multiple shm-stress) runs fine without swapin_readahead(), as > expected. > Stephen/Linus? I don't see the problem. shmem_getpage_locked appears to back off correctly if it encounters a swap-cached page already existing if swapin_readahead has installed the page first, at least with the code in 2.4.3-ac5. There *does* appear to be a race, but it's swapin_readahead racing with shmem_writepage. That code does not search for an existing entry in the swap cache when it decides to move a shmem page to swap, so we can install the page twice and end up doing a lookup on the wrong physical page if there is swap readahead going on. To fix that, shmem_writepage needs to do a swap cache lookup and lock before installing the new page --- it should probably just copy the new page into the old one if it finds one already there. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
Hi, On Sat, 14 Apr 2001, Marcelo Tosatti wrote: > There is a nasty race between shmem_getpage_locked() and > swapin_readahead() with the new shmem code (introduced in 2.4.3-ac3 > and merged in the main tree in 2.4.4-pre3): > > shmem_getpage_locked() finds a page in the swapcache and moves it to > the pagecache as an shmem page, freeing the swapcache and the swap > map entry for this page. (which causes a BUG() in mm/shmem.c:353 > since the swap map entry is being used) > > In the meanwhile, swapin_readahead() is allocating a page and adding > it to the swapcache. Oh, I was just chasing this also. > I don't see any clean fix for this one. I think the actual check for swap_count is not necessary: If swapin_readahead allocates a new swap_cache page for the entry, that's not a real bug. On memory pressure this page will be reclaimed. Actually we have to make shmem much more unfriendly to the swap cache to make it correct: I think we have to drop the whole drop swap cache pages on truncate logic since it uses lookup_swap_cache and delete_from_swap_cache which both lock the page, while holding a spinlock :-( The appended patch implements both changes and relies on the page stealer to shrink the swap cache. It also integrates fixes which Marcelo did send earlier. Greetings Christoph --- 2.4.4-pre3/mm/shmem.c Sat Apr 14 11:12:54 2001 +++ u2.4.3/mm/shmem.c Sun Apr 15 13:45:58 2001 @@ -123,10 +123,19 @@ entry = *ptr; *ptr = (swp_entry_t){0}; freed++; +#if 0 +/* +* This does not work since it may sleep while holding +* a spinlock +* +* We rely on the page stealer to free up the +* allocated swap space later +*/ if ((page = lookup_swap_cache(entry)) != NULL) { delete_from_swap_cache(page); page_cache_release(page); } +#endif swap_free (entry); } return freed; @@ -236,8 +245,10 @@ /* Only move to the swap cache if there are no other users of * the page. */ - if (atomic_read(&page->count) > 2) - goto out; + if (atomic_read(&page->count) > 2){ + set_page_dirty(page); + goto out; + } inode = page->mapping->host; info = &inode->u.shmem_i; @@ -348,9 +359,6 @@ if (TryLockPage(page)) goto wait_retry; - if (swap_count(page) > 2) - BUG(); - swap_free(*entry); *entry = (swp_entry_t) {0}; delete_from_swap_cache_nolock(page); @@ -432,6 +440,7 @@ *ptr = NOPAGE_SIGBUS; return error; sigbus: + up (&inode->i_sem); *ptr = NOPAGE_SIGBUS; return -EFAULT; }
[NEED TESTERS] remove swapin_readahead Re: shmem_getpage_locked()/ swapin_readahead() race in 2.4.4-pre3
On Sat, 14 Apr 2001, Rik van Riel wrote: > On Sat, 14 Apr 2001, Marcelo Tosatti wrote: > > > There is a nasty race between shmem_getpage_locked() and > > swapin_readahead() with the new shmem code (introduced in > > 2.4.3-ac3 and merged in the main tree in 2.4.4-pre3): > > > I don't see any clean fix for this one. > > Suggestions ? > > As we discussed with Alan on irc, we could remove the (physical) > swapin_readahead() and get 2.4 stable. Once 2.4 is stable we > could (if needed) introduce a virtual address based readahead > strategy for swap-backed things ... most of that code has been > ready for months thanks to Ben LaHaise. > > A virtual-address based readahead not only makes much more sense > from a performance POV, it also cleanly gets the ugly locking > issues out of the way. Test (multiple shm-stress) runs fine without swapin_readahead(), as expected. I tried "make -j32" test (128M RAM, 4 CPU's) and got 4m17 without readahead against 3m40 with readahead, on average. Need real swap intensive workloads to "really" know of how much it hurts, though. People with swap intensive workloads: please test this and report results. Stephen/Linus? Patch against 2.4.4-pre3. diff -Nur linux.orig/include/linux/mm.h linux/include/linux/mm.h --- linux.orig/include/linux/mm.h Sat Apr 14 21:31:38 2001 +++ linux/include/linux/mm.hSat Apr 14 21:30:44 2001 @@ -425,7 +425,6 @@ extern void mem_init(void); extern void show_mem(void); extern void si_meminfo(struct sysinfo * val); -extern void swapin_readahead(swp_entry_t); /* mmap.c */ extern void lock_vma_mappings(struct vm_area_struct *); diff -Nur linux.orig/include/linux/swap.h linux/include/linux/swap.h --- linux.orig/include/linux/swap.h Sat Apr 14 21:31:38 2001 +++ linux/include/linux/swap.h Sat Apr 14 21:30:28 2001 @@ -145,7 +145,6 @@ struct inode **); extern int swap_duplicate(swp_entry_t); extern int swap_count(struct page *); -extern int valid_swaphandles(swp_entry_t, unsigned long *); #define get_swap_page() __get_swap_page(1) extern void __swap_free(swp_entry_t, unsigned short); #define swap_free(entry) __swap_free((entry), 1) diff -Nur linux.orig/mm/memory.c linux/mm/memory.c --- linux.orig/mm/memory.c Sat Apr 14 21:31:38 2001 +++ linux/mm/memory.c Sat Apr 14 21:28:34 2001 @@ -1012,42 +1012,6 @@ return; } - - -/* - * Primitive swap readahead code. We simply read an aligned block of - * (1 << page_cluster) entries in the swap area. This method is chosen - * because it doesn't cost us any seek time. We also make sure to queue - * the 'original' request together with the readahead ones... - */ -void swapin_readahead(swp_entry_t entry) -{ - int i, num; - struct page *new_page; - unsigned long offset; - - /* -* Get the number of handles we should do readahead io to. Also, -* grab temporary references on them, releasing them as io completes. -*/ - num = valid_swaphandles(entry, &offset); - for (i = 0; i < num; offset++, i++) { - /* Don't block on I/O for read-ahead */ - if (atomic_read(&nr_async_pages) >= pager_daemon.swap_cluster - * (1 << page_cluster)) { - while (i++ < num) - swap_free(SWP_ENTRY(SWP_TYPE(entry), offset++)); - break; - } - /* Ok, do the async read-ahead now */ - new_page = read_swap_cache_async(SWP_ENTRY(SWP_TYPE(entry), offset), 0); - if (new_page != NULL) - page_cache_release(new_page); - swap_free(SWP_ENTRY(SWP_TYPE(entry), offset)); - } - return; -} - /* * We hold the mm semaphore and the page_table_lock on entry and exit. */ @@ -1062,7 +1026,6 @@ page = lookup_swap_cache(entry); if (!page) { lock_kernel(); - swapin_readahead(entry); page = read_swap_cache(entry); unlock_kernel(); if (!page) { diff -Nur linux.orig/mm/shmem.c linux/mm/shmem.c --- linux.orig/mm/shmem.c Sat Apr 14 21:31:38 2001 +++ linux/mm/shmem.cSat Apr 14 21:28:44 2001 @@ -328,7 +328,6 @@ if (!page) { spin_unlock (&info->lock); lock_kernel(); - swapin_readahead(*entry); page = read_swap_cache(*entry); unlock_kernel(); if (!page) diff -Nur linux.orig/mm/swapfile.c linux/mm/swapfile.c --- linux.orig/mm/swapfile.cThu Mar 22 14:22:15 2001 +++ linux/mm/swapfile.c Sat Apr 14 21:30:04 2001 @@ -955,34 +955,3 @@ } return; } - -/* - * Kernel_lock protects against swap device deletion. Grab an extra - * reference on the swaphandle so that it dos not become unused. - */ -int valid_swaphandles(swp_entry_t entry
Re: shmem_getpage_locked() / swapin_readahead() race in 2.4.4-pre3
On Sat, 14 Apr 2001, Marcelo Tosatti wrote: > There is a nasty race between shmem_getpage_locked() and > swapin_readahead() with the new shmem code (introduced in > 2.4.3-ac3 and merged in the main tree in 2.4.4-pre3): > I don't see any clean fix for this one. > Suggestions ? As we discussed with Alan on irc, we could remove the (physical) swapin_readahead() and get 2.4 stable. Once 2.4 is stable we could (if needed) introduce a virtual address based readahead strategy for swap-backed things ... most of that code has been ready for months thanks to Ben LaHaise. A virtual-address based readahead not only makes much more sense from a performance POV, it also cleanly gets the ugly locking issues out of the way. regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/