Re: [patch] truncate_inode_pages
Am Dienstag, 12. Juni 2001 14:32 schrieb Daniel Phillips: > On Tuesday 12 June 2001 02:00, you wrote: > > Now with -ac13 and the third try. > > Is it final? > > There have been no further problems reported, but it's Andrew Morton's > patch, his decision. > > -- > Daniel Hello Andrew, have you forwarded it to Alan for apply? Thanks, Dieter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] truncate_inode_pages
Daniel Phillips wrote: > > On Monday 11 June 2001 14:45, Andrew Morton wrote: > > Daniel Phillips wrote: > > > On Sunday 10 June 2001 03:31, Andrew Morton wrote: > > > > Daniel Phillips wrote: > > > > > This is easy, just set the list head to the page about to be > > > > > truncated. > > > > > > > > Works for me. > > > > > > It looks good, but it's black magic > > > > No, it's wrong. I'm getting BUG()s in clear_inode(): > > [...] > > The lists are mangled. > > curr is being advanced in the wrong place. Yes. > /me makes note to self: never resist the temptation to clean things up the > rest of the way > > I'll actually apply the patch and try it this time ;-) The bug is surprisingly hard to trigger. Take three: --- linux-2.4.5/mm/filemap.cMon May 28 13:31:49 2001 +++ linux-akpm/mm/filemap.c Mon Jun 11 23:31:08 2001 @@ -230,17 +230,17 @@ unsigned long offset; page = list_entry(curr, struct page, list); - curr = curr->next; offset = page->index; /* Is one of the pages to truncate? */ if ((offset >= start) || (*partial && (offset + 1) == start)) { + list_del(head); + list_add(head, curr); if (TryLockPage(page)) { page_cache_get(page); spin_unlock(&pagecache_lock); wait_on_page(page); - page_cache_release(page); - return 1; + goto out_restart; } page_cache_get(page); spin_unlock(&pagecache_lock); @@ -252,11 +252,15 @@ truncate_complete_page(page); UnlockPage(page); - page_cache_release(page); - return 1; + goto out_restart; } + curr = curr->next; } return 0; +out_restart: + page_cache_release(page); + spin_lock(&pagecache_lock); + return 1; } @@ -273,15 +277,19 @@ { unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; unsigned partial = lstart & (PAGE_CACHE_SIZE - 1); + int complete; -repeat: spin_lock(&pagecache_lock); - if (truncate_list_pages(&mapping->clean_pages, start, &partial)) - goto repeat; - if (truncate_list_pages(&mapping->dirty_pages, start, &partial)) - goto repeat; - if (truncate_list_pages(&mapping->locked_pages, start, &partial)) - goto repeat; + do { + complete = 1; + while (truncate_list_pages(&mapping->clean_pages, start, &partial)) + complete = 0; + while (truncate_list_pages(&mapping->dirty_pages, start, &partial)) + complete = 0; + while (truncate_list_pages(&mapping->locked_pages, start, &partial)) + complete = 0; + } while (!complete); + /* Traversed all three lists without dropping the lock */ spin_unlock(&pagecache_lock); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] truncate_inode_pages
On Monday 11 June 2001 14:45, Andrew Morton wrote: > Daniel Phillips wrote: > > On Sunday 10 June 2001 03:31, Andrew Morton wrote: > > > Daniel Phillips wrote: > > > > This is easy, just set the list head to the page about to be > > > > truncated. > > > > > > Works for me. > > > > It looks good, but it's black magic > > No, it's wrong. I'm getting BUG()s in clear_inode(): > [...] > The lists are mangled. curr is being advanced in the wrong place. /me makes note to self: never resist the temptation to clean things up the rest of the way I'll actually apply the patch and try it this time ;-) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] truncate_inode_pages
Daniel Phillips wrote: > > On Sunday 10 June 2001 03:31, Andrew Morton wrote: > > Daniel Phillips wrote: > > > This is easy, just set the list head to the page about to be truncated. > > > > Works for me. > > It looks good, but it's black magic No, it's wrong. I'm getting BUG()s in clear_inode(): 485 if (inode->i_data.nrpages) 486 BUG(); (gdb) p inode->i_data $5 = {clean_pages = {next = 0xd7423ae4, prev = 0xc1628280}, dirty_pages = {next = 0xd7423aec, prev = 0xd7423aec}, locked_pages = {next = 0xd7423af4, prev = 0xd7423af4}, nrpages = 0x1, a_ops = 0xc02e4940, host = 0xd7423a40, i_mmap = 0x0, i_mmap_shared = 0x0, i_shared_lock = {lock = 0x1}, gfp_mask = 0x5} (gdb) p *(struct page *)0xd7423ae4 $6 = {list = {next = 0xd7423ae4, prev = 0xc1628280}, mapping = 0xd7423aec, index = 0xd7423aec, next_hash = 0xd7423af4, count = {counter = 0xd7423af4}, flags = 0x1, lru = {next = 0xc02e4940, prev = 0xd7423a40}, age = 0x0, wait = {lock = {lock = 0x0}, task_list = {next = 0x1, prev = 0x5}}, pprev_hash = 0x0, buffers = 0x0, virtual = 0x0, zone = 0x0} (gdb) p &inode->i_data The lists are mangled. I think what's happening is: curr = curr->next; offset = page->index; /* Is one of the pages to truncate? */ if ((offset >= start) || (*partial && (offset + 1) == start)) { list_del(head); list_add_tail(head, curr); `curr' can point to `head' when we get inside the `if'. So we do, effectively, list_del(head); list_add_tail(head, head); Which turns mapping->clean_pages into an empty list, and the target page.list thinks it's on a list, but really isn't. A subsequent list_del() on the page screws things up somehow. Ho-hum. I'll have another go tomorrow. Probably a simple `if (curr != head)' will cure it. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] truncate_inode_pages
> Daniel Phillips wrote: > > > > This is easy, just set the list head to the page about to be truncated. > > Works for me. > > --- linux-2.4.5/mm/filemap.cMon May 28 13:31:49 2001 > +++ linux-akpm/mm/filemap.c Sun Jun 10 11:29:19 2001 > @@ -235,12 +235,13 @@ [snip] Works for me 12 times faster on my Athlon 550 and ReiserFS. System: Athlon 550 (old one, 0.25 µm) MSI MS-6167 Rev 1.0B (Irongate C4) 320 MB SDRAM PC100-2-2-2 AHA-2940UW IBM DDYS-T18350N 18 GB (UW/U160) Kernel 2.4.5-ac12 Glibc-2.2 (SuSE 7.1) SunWave1>cat /proc/version Linux version 2.4.5-ac12 (root@SunWave1) (gcc version 2.95.2 19991024 (release)) #1 Sat Jun 9 17:41:07 CEST 2001 SunWave1>time ./ftruncate 0.430u 54.790s 1:00.09 91.8%0+0k 0+0io 32887pf+0w With the mm/filemap.c fix: SunWave1>cat /proc/version Linux version 2.4.5-ac12 (root@SunWave1) (gcc version 2.95.2 19991024 (release)) #1 Sun Jun 10 22:49:07 CEST 2001 SunWave1>time ./ftruncate 0.220u 1.670s 0:05.13 36.8% 0+0k 0+0io 32852pf+0w Thanks, Dieter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] truncate_inode_pages
On Sunday 10 June 2001 03:31, Andrew Morton wrote: > Daniel Phillips wrote: > > This is easy, just set the list head to the page about to be truncated. > > Works for me. It looks good, but it's black magic - it could use a comment along the lines of: /* * Ensure at least one pass through all three lists with lock held */ There is some fuzz on the original code that could use a scrubbing. "start" should not be the 1st full page needing truncating, it should be the first page needing truncating at all, full or partial. The truncation condition becomes: - if ((offset >= start) || (*partial && (offset + 1) == start)) { + if (offset >= start) { [...] - if (*partial && (offset + 1) == start) + if (*partial && offset == start) Then the conditions are so easy to test it's not worth passing the &partial, just pass lstart... and... oh bother, it's not broke, why fix it ;-) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] truncate_inode_pages
Daniel Phillips wrote: > > This is easy, just set the list head to the page about to be truncated. Works for me. --- linux-2.4.5/mm/filemap.cMon May 28 13:31:49 2001 +++ linux-akpm/mm/filemap.c Sun Jun 10 11:29:19 2001 @@ -235,12 +235,13 @@ /* Is one of the pages to truncate? */ if ((offset >= start) || (*partial && (offset + 1) == start)) { + list_del(head); + list_add_tail(head, curr); if (TryLockPage(page)) { page_cache_get(page); spin_unlock(&pagecache_lock); wait_on_page(page); - page_cache_release(page); - return 1; + goto out_restart; } page_cache_get(page); spin_unlock(&pagecache_lock); @@ -252,11 +253,14 @@ truncate_complete_page(page); UnlockPage(page); - page_cache_release(page); - return 1; + goto out_restart; } } return 0; +out_restart: + page_cache_release(page); + spin_lock(&pagecache_lock); + return 1; } @@ -273,15 +277,18 @@ { unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; unsigned partial = lstart & (PAGE_CACHE_SIZE - 1); + int complete; -repeat: spin_lock(&pagecache_lock); - if (truncate_list_pages(&mapping->clean_pages, start, &partial)) - goto repeat; - if (truncate_list_pages(&mapping->dirty_pages, start, &partial)) - goto repeat; - if (truncate_list_pages(&mapping->locked_pages, start, &partial)) - goto repeat; + do { + complete = 1; + while (truncate_list_pages(&mapping->clean_pages, start, &partial)) + complete = 0; + while (truncate_list_pages(&mapping->dirty_pages, start, &partial)) + complete = 0; + while (truncate_list_pages(&mapping->locked_pages, start, &partial)) + complete = 0; + } while (!complete); spin_unlock(&pagecache_lock); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] truncate_inode_pages
On Saturday 09 June 2001 19:40, Alexander Viro wrote: > > takes 45 seconds CPU time due to the O(clean * dirty) algorithm in > > truncate_inode_pages(). The machine is locked up for the duration. > > The patch reduces this to 20 milliseconds via an O(clean + dirty) > > algorithm. > > Unfortunately, it's _not_ O(clean + dirty). > > > + while (truncate_list_pages(&mapping->clean_pages, start, > > &partial)) { + spin_lock(&pagecache_lock); > > + complete = 0; > > + } > > Cool. Now think what happens if pages with large indices are in the > very end of list. Half of them. You skip clean/2 pages on each of > clean/2 passes. Hardly a linear behaviour - all you need is a > different program to trigger it. > > Now, having a separate pass that would reorder the pages on list, > moving the to-kill ones in the beginning might help. This is easy, just set the list head to the page about to be truncated. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] truncate_inode_pages
> takes 45 seconds CPU time due to the O(clean * dirty) algorithm in > truncate_inode_pages(). The machine is locked up for the duration. > The patch reduces this to 20 milliseconds via an O(clean + dirty) > algorithm. Unfortunately, it's _not_ O(clean + dirty). > + while (truncate_list_pages(&mapping->clean_pages, start, &partial)) { > + spin_lock(&pagecache_lock); > + complete = 0; > + } Cool. Now think what happens if pages with large indices are in the very end of list. Half of them. You skip clean/2 pages on each of clean/2 passes. Hardly a linear behaviour - all you need is a different program to trigger it. Now, having a separate pass that would reorder the pages on list, moving the to-kill ones in the beginning might help. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] truncate_inode_pages
The ftruncate() in this program: #include #include #include int clean = 64 * 1024 * 1024; int dirty = 64 * 1024 * 1024; main() { int fd = open("foo", O_RDWR|O_TRUNC|O_CREAT, 0666); void *mem; ftruncate(fd, clean + dirty); mem = mmap(0, clean + dirty, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); memset(mem, 0, clean + dirty); msync(mem, clean, MS_SYNC); ftruncate(fd, clean); } takes 45 seconds CPU time due to the O(clean * dirty) algorithm in truncate_inode_pages(). The machine is locked up for the duration. The patch reduces this to 20 milliseconds via an O(clean + dirty) algorithm. --- linux-2.4.5/mm/filemap.cMon May 28 13:31:49 2001 +++ linux-akpm/mm/filemap.c Sun Jun 10 01:27:09 2001 @@ -273,15 +273,24 @@ { unsigned long start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT; unsigned partial = lstart & (PAGE_CACHE_SIZE - 1); + int complete; -repeat: spin_lock(&pagecache_lock); - if (truncate_list_pages(&mapping->clean_pages, start, &partial)) - goto repeat; - if (truncate_list_pages(&mapping->dirty_pages, start, &partial)) - goto repeat; - if (truncate_list_pages(&mapping->locked_pages, start, &partial)) - goto repeat; + do { + complete = 1; + while (truncate_list_pages(&mapping->clean_pages, start, &partial)) { + spin_lock(&pagecache_lock); + complete = 0; + } + while (truncate_list_pages(&mapping->dirty_pages, start, &partial)) { + spin_lock(&pagecache_lock); + complete = 0; + } + while (truncate_list_pages(&mapping->locked_pages, start, &partial)) { + spin_lock(&pagecache_lock); + complete = 0; + } + } while (!complete); spin_unlock(&pagecache_lock); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/