Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensiveworkload
yep this has done the trick, the deadlock is gone. I've attached the full VM-fixes patch (this fix included) against vanilla test9-pre5. Ingo --- linux/fs/buffer.c.orig Fri Sep 22 02:31:07 2000 +++ linux/fs/buffer.c Fri Sep 22 02:31:13 2000 @@ -706,9 +706,7 @@ static void refill_freelist(int size) { if (!grow_buffers(size)) { - balance_dirty(NODEV); - wakeup_kswapd(0); /* We can't wait because of __GFP_IO */ - schedule(); + try_to_free_pages(GFP_BUFFER); } } --- linux/mm/filemap.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/filemap.c Fri Sep 22 02:31:13 2000 @@ -255,7 +255,7 @@ * up kswapd. */ age_page_up(page); - if (inactive_shortage() > (inactive_target * 3) / 4) + if (inactive_shortage() > inactive_target / 2 && free_shortage()) wakeup_kswapd(0); not_found: return page; --- linux/mm/page_alloc.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/page_alloc.c Fri Sep 22 02:31:13 2000 @@ -444,7 +444,8 @@ * processes, etc). */ if (gfp_mask & __GFP_WAIT) { - wakeup_kswapd(1); + try_to_free_pages(gfp_mask); + memory_pressure++; goto try_again; } } --- linux/mm/swap.c.origFri Sep 22 02:31:07 2000 +++ linux/mm/swap.c Fri Sep 22 02:31:13 2000 @@ -233,27 +233,11 @@ spin_lock(&pagemap_lru_lock); if (!PageLocked(page)) BUG(); - /* -* Heisenbug Compensator(tm) -* This bug shouldn't trigger, but for unknown reasons it -* sometimes does. If there are no signs of list corruption, -* we ignore the problem. Else we BUG()... -*/ - if (PageActive(page) || PageInactiveDirty(page) || - PageInactiveClean(page)) { - struct list_head * page_lru = &page->lru; - if (page_lru->next->prev != page_lru) { - printk("VM: lru_cache_add, bit or list corruption..\n"); - BUG(); - } - printk("VM: lru_cache_add, page already in list!\n"); - goto page_already_on_list; - } + DEBUG_ADD_PAGE add_page_to_active_list(page); /* This should be relatively rare */ if (!page->age) deactivate_page_nolock(page); -page_already_on_list: spin_unlock(&pagemap_lru_lock); } --- linux/mm/vmscan.c.orig Fri Sep 22 02:31:07 2000 +++ linux/mm/vmscan.c Fri Sep 22 02:31:27 2000 @@ -377,7 +377,7 @@ #define SWAP_SHIFT 5 #define SWAP_MIN 8 -static int swap_out(unsigned int priority, int gfp_mask) +static int swap_out(unsigned int priority, int gfp_mask, unsigned long idle_time) { struct task_struct * p; int counter; @@ -407,6 +407,7 @@ struct mm_struct *best = NULL; int pid = 0; int assign = 0; + int found_task = 0; select: read_lock(&tasklist_lock); p = init_task.next_task; @@ -416,6 +417,11 @@ continue; if (mm->rss <= 0) continue; + /* Skip tasks which haven't slept long enough yet when +idle-swapping. */ + if (idle_time && !assign && (!(p->state & TASK_INTERRUPTIBLE) +|| + time_before(p->sleep_time + idle_time * HZ, +jiffies))) + continue; + found_task++; /* Refresh swap_cnt? */ if (assign == 1) { mm->swap_cnt = (mm->rss >> SWAP_SHIFT); @@ -430,7 +436,7 @@ } read_unlock(&tasklist_lock); if (!best) { - if (!assign) { + if (!assign && found_task > 0) { assign = 1; goto select; } @@ -691,9 +697,9 @@ * Now the page is really freeable, so we * move it to the inactive_clean list. */ - UnlockPage(page); del_page_from_inactive_dirty_list(page); add_page_to_inactive_clean_list(page); + UnlockPage(page); cleaned_pages++; } else { /* @@ -701,9 +707,9 @@ * It's no use keeping it here, so we move it to * the active list. *
Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensiveworkload
On Fri, 22 Sep 2000, Rik van Riel wrote: > 894 if (current->need_resched && !(gfp_mask & __GFP_IO)) { > 895 __set_current_state(TASK_RUNNING); > 896 schedule(); > 897 } > The idea was to not allow processes which have IO locks > to schedule away, but as you can see, the check is > reversed ... thanks ... sounds good. Will have this tested in about 15 mins. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test9-pre5+t9p2-vmpatch VM deadlock during write-intensiveworkload
btw. - no swapdevice here. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
test9-pre5+t9p2-vmpatch VM deadlock during write-intensive workload
i'm still getting VM related lockups during heavy write load, in test9-pre5 + your 2.4.0-t9p2-vmpatch (which i understand as being your last VM related fix-patch, correct?). Here is a histogram of such a lockup: 1 Trace; 4010a720 <__switch_to+38/e8> 5 Trace; 4010a74b <__switch_to+63/e8> 13 Trace; 4010abc4 819 Trace; 4010abca 1806 Trace; 4010abce 1 Trace; 4010abd0 2 Trace; 4011af51 1 Trace; 4011af77 1 Trace; 4011b010 3 Trace; 4011b018 1 Trace; 4011b02d 1 Trace; 4011b051 1 Trace; 4011b056 2 Trace; 4011b05c 3 Trace; 4011b06d 4 Trace; 4011b076 537 Trace; 4011b2bb 2 Trace; 4011b2c6 1 Trace; 4011b2c9 4 Trace; 4011b2d5 31 Trace; 4011b31a 1 Trace; 4011b31d 1 Trace; 4011b32a 1 Trace; 4011b346 11 Trace; 4011b378 2 Trace; 4011b381 5 Trace; 4011b3f8 17 Trace; 4011b404 9 Trace; 4011b43f 1 Trace; 4011b450 1 Trace; 4011b457 2 Trace; 4011b48c 1 Trace; 4011b49c 428 Trace; 4011b4cd 6 Trace; 4011b4f7 4 Trace; 4011b500 2 Trace; 4011b509 1 Trace; 4011b560 1 Trace; 4011b809 <__wake_up+79/3f0> 1 Trace; 4011b81b <__wake_up+8b/3f0> 8 Trace; 4011b81e <__wake_up+8e/3f0> 310 Trace; 4011ba90 <__wake_up+300/3f0> 1 Trace; 4011bb7b <__wake_up+3eb/3f0> 2 Trace; 4011c32b 244 Trace; 4011d40e 1 Trace; 4011d411 1 Trace; 4011d56c 618 Trace; 4011d62e 2 Trace; 40122f28 2 Trace; 40126c3c 1 Trace; 401377ab 1 Trace; 401377c8 5 Trace; 401377cc 15 Trace; 401377d4 11 Trace; 401377dc 2 Trace; 401377e0 6 Trace; 401377ee 8 Trace; 4013783c 1 Trace; 401378f8 3 Trace; 4013792d 2 Trace; 401379af 2 Trace; 401379f3 1 Trace; 40138524 <__alloc_pages+7c/4b8> 1 Trace; 4013852b <__alloc_pages+83/4b8> (first column is number of profiling hits, profiling hits taken on all CPUs.) unfortunately i havent captured which processes are running. This is an 8-CPU SMP box, 8 write-intensive processes are running, they create new 1k-1MB files in new directories - a total of many gigabytes. this lockup happens both during vanilla test9-pre5 and with 2.4.0-t9p2-vmpatch. Your patch makes the lockup happen a bit later than previous, but it still happens. During the lockup all dirty buffers are written out to disk until it reaches such a state: 2162688 pages of RAM 1343488 pages of HIGHMEM 116116 reserved pages 652826 pages shared 0 pages swap cached 0 pages in page table cache Buffer memory:52592kB CLEAN: 664 buffers, 2302 kbyte, 5 used (last=93), 0 locked, 0 protected, 0 dirty LOCKED: 661752 buffers, 2646711 kbyte, 37 used (last=661397), 0 locked, 0 protected, 0 dirty DIRTY: 17 buffers, 26 kbyte, 1 used (last=1), 0 locked, 0 protected, 17 dirty no disk IO happens anymore, but the lockup persists. The histogram was taken after all disk IO has stopped. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/