[PATCH] invalidate/o_direct livelock {was Re: [RFC] ext3/jbd race: releasing in-use journal_heads}
Hi, On Tue, 2005-03-08 at 09:28, Stephen C. Tweedie wrote: > I think it should be OK just to move the page->mapping != mapping test > above the page>index > end test. Sure, if all the pages have been > stolen by the time we see them, then we'll repeat without advancing > "next"; but we're still making progress in that case because pages that > were previously in this mapping *have* been removed, just by a different > process. If we're really concerned about that case we can play the > trick that invalidate_mapping_pages() tries and do a "next++" in that > case. Variant on akpm's last patch with this change, plus one extra s/page->index/page_index/ that he missed. I've moved the whole next=page_index+1 and wrapping check up to just after we've tested the mapping, so even if we break early all of that updating has already been done reliably and we don't end up setting next in two places in the code. A 400-task fsstress used to livelock reliably within minutes inside O_DIRECT without any fix; this patch has been running for over an hour now without problem. --Stephen Fix livelock in invalidate_inode_pages2_range invalidate_inode_pages2_range can loop indefinitely if pagevec_lookup returns only pages beyond the end of the range being invalidated. Fix it by advancing the restart pointer, "next", every time we see a page on the correct mapping, not just if the page is within the range; and break out early when we encounter such a page. Based on a proposed fix by akpm. Signed-off-by: Stephen Tweedie <[EMAIL PROTECTED]> --- linux-2.6-ext3/mm/truncate.c.=K0002=.orig +++ linux-2.6-ext3/mm/truncate.c @@ -268,25 +268,31 @@ int invalidate_inode_pages2_range(struct min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) { for (i = 0; !ret && i < pagevec_count(); i++) { struct page *page = pvec.pages[i]; + pgoff_t page_index; int was_dirty; lock_page(page); - if (page->mapping != mapping || page->index > end) { + if (page->mapping != mapping) { unlock_page(page); continue; } - wait_on_page_writeback(page); - next = page->index + 1; + page_index = page->index; + next = page_index + 1; if (next == 0) wrapped = 1; + if (page_index > end) { + unlock_page(page); + break; + } + wait_on_page_writeback(page); while (page_mapped(page)) { if (!did_range_unmap) { /* * Zap the rest of the file in one hit. */ unmap_mapping_range(mapping, - page->index << PAGE_CACHE_SHIFT, - (end - page->index + 1) + page_index << PAGE_CACHE_SHIFT, + (end - page_index + 1) << PAGE_CACHE_SHIFT, 0); did_range_unmap = 1; @@ -295,7 +301,7 @@ int invalidate_inode_pages2_range(struct * Just zap this page */ unmap_mapping_range(mapping, - page->index << PAGE_CACHE_SHIFT, + page_index << PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE, 0); } }
[PATCH] invalidate/o_direct livelock {was Re: [RFC] ext3/jbd race: releasing in-use journal_heads}
Hi, On Tue, 2005-03-08 at 09:28, Stephen C. Tweedie wrote: I think it should be OK just to move the page-mapping != mapping test above the pageindex end test. Sure, if all the pages have been stolen by the time we see them, then we'll repeat without advancing next; but we're still making progress in that case because pages that were previously in this mapping *have* been removed, just by a different process. If we're really concerned about that case we can play the trick that invalidate_mapping_pages() tries and do a next++ in that case. Variant on akpm's last patch with this change, plus one extra s/page-index/page_index/ that he missed. I've moved the whole next=page_index+1 and wrapping check up to just after we've tested the mapping, so even if we break early all of that updating has already been done reliably and we don't end up setting next in two places in the code. A 400-task fsstress used to livelock reliably within minutes inside O_DIRECT without any fix; this patch has been running for over an hour now without problem. --Stephen Fix livelock in invalidate_inode_pages2_range invalidate_inode_pages2_range can loop indefinitely if pagevec_lookup returns only pages beyond the end of the range being invalidated. Fix it by advancing the restart pointer, next, every time we see a page on the correct mapping, not just if the page is within the range; and break out early when we encounter such a page. Based on a proposed fix by akpm. Signed-off-by: Stephen Tweedie [EMAIL PROTECTED] --- linux-2.6-ext3/mm/truncate.c.=K0002=.orig +++ linux-2.6-ext3/mm/truncate.c @@ -268,25 +268,31 @@ int invalidate_inode_pages2_range(struct min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) { for (i = 0; !ret i pagevec_count(pvec); i++) { struct page *page = pvec.pages[i]; + pgoff_t page_index; int was_dirty; lock_page(page); - if (page-mapping != mapping || page-index end) { + if (page-mapping != mapping) { unlock_page(page); continue; } - wait_on_page_writeback(page); - next = page-index + 1; + page_index = page-index; + next = page_index + 1; if (next == 0) wrapped = 1; + if (page_index end) { + unlock_page(page); + break; + } + wait_on_page_writeback(page); while (page_mapped(page)) { if (!did_range_unmap) { /* * Zap the rest of the file in one hit. */ unmap_mapping_range(mapping, - page-index PAGE_CACHE_SHIFT, - (end - page-index + 1) + page_index PAGE_CACHE_SHIFT, + (end - page_index + 1) PAGE_CACHE_SHIFT, 0); did_range_unmap = 1; @@ -295,7 +301,7 @@ int invalidate_inode_pages2_range(struct * Just zap this page */ unmap_mapping_range(mapping, - page-index PAGE_CACHE_SHIFT, + page_index PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE, 0); } }