Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, January 05, 2001 04:32:50 PM -0200 Marcelo Tosatti <[EMAIL PROTECTED]> wrote: >> > I think we want to remove flush_dirty_buffers() from bdflush. >> > >> >> Whoops. If bdflush doesn't balance the dirty list, who does? > > Who marks buffers dirty. > > Linus changed mark_buffer_dirty() to use flush_dirty_buffers() in case > there are too many dirty buffers. > Yes, but mark_buffer_dirty only ends up calling flush_dirty_buffers when balance_dirty_state returns 1. This means the only people balancing are the procs (not some async daemon), and the writing only starts when we are over the hard dirty limit. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Marcelo Tosatti wrote: > On Fri, 5 Jan 2001, Rik van Riel wrote: > > > Also, you do not want the writer to block on writing out buffers > > if bdflush could write them out asynchronously while the dirty > > buffer producer can work on in the background. > > flush_dirty_buffers() do not wait on the buffers written to get clean. This sounds more like a bug we should fix then a reason not to use flush_dirty_buffers() regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to loose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Rik van Riel wrote: > Also, you do not want the writer to block on writing out buffers > if bdflush could write them out asynchronously while the dirty > buffer producer can work on in the background. flush_dirty_buffers() do not wait on the buffers written to get clean. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Marcelo Tosatti wrote: > On Fri, 5 Jan 2001, Chris Mason wrote: > > On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti > > <[EMAIL PROTECTED]> wrote: > > > On Fri, 5 Jan 2001, Chris Mason wrote: > > > > > >> > > >> Here's the latest version of the patch, against 2.4.0. The > > >> biggest open issues are what to do with bdflush, since > > >> page_launder could do everything bdflush does. > > > > > > I think we want to remove flush_dirty_buffers() from bdflush. > > > > > > > Whoops. If bdflush doesn't balance the dirty list, who does? > > Who marks buffers dirty. > > Linus changed mark_buffer_dirty() to use flush_dirty_buffers() in case > there are too many dirty buffers. > > Also, I think in practice page_launder will help on balancing. Chris is right. It is possible (not very likely, but possible notheless) that the majority of the dirty pages are _active_ pages ... in that case page_launder() won't help one bit and only flush_dirty_buffers() can do the job. Also, you do not want the writer to block on writing out buffers if bdflush could write them out asynchronously while the dirty buffer producer can work on in the background. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to loose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Chris Mason wrote: > > > On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti > <[EMAIL PROTECTED]> wrote: > > > > > On Fri, 5 Jan 2001, Chris Mason wrote: > > > >> > >> Here's the latest version of the patch, against 2.4.0. The > >> biggest open issues are what to do with bdflush, since > >> page_launder could do everything bdflush does. > > > > I think we want to remove flush_dirty_buffers() from bdflush. > > > > Whoops. If bdflush doesn't balance the dirty list, who does? Who marks buffers dirty. Linus changed mark_buffer_dirty() to use flush_dirty_buffers() in case there are too many dirty buffers. Also, I think in practice page_launder will help on balancing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti <[EMAIL PROTECTED]> wrote: > > On Fri, 5 Jan 2001, Chris Mason wrote: > >> >> Here's the latest version of the patch, against 2.4.0. The >> biggest open issues are what to do with bdflush, since >> page_launder could do everything bdflush does. > > I think we want to remove flush_dirty_buffers() from bdflush. > Whoops. If bdflush doesn't balance the dirty list, who does? But, we can do this in bdflush: if (free_shortage()) flushed = page_launder(GFP_KERNEL, 0) ; else flushed = flush_dirty_buffers(0) ; The idea is that page_launder knows best which pages to free when memory is low, and flush_dirty_buffers knows best which pages to write when things are unbalanced. Not the cleanest idea ever, since I'd prefer the first pages written when we need balancing are also the ones most likely to be freed by page_launder. In other words, I'd like to make a flush_inactive_dirty() inside mm/vmscan.c, and have bdflush call that before working on the buffer cache dirty list. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Hi Chris, I don't know if this is already known, but this patch seems to solve the loop block device problem too. I've tested it several times and did not get any kernel lockups after applying the patch. Until now this was a showstopper for me but you gave the solution. Thank you very much. Juergen Schneider - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Marcelo Tosatti wrote: > > On Fri, 5 Jan 2001, Chris Mason wrote: > > > > > Here's the latest version of the patch, against 2.4.0. The > > biggest open issues are what to do with bdflush, since > > page_launder could do everything bdflush does. > > I think we want to remove flush_dirty_buffers() from bdflush. > > While we are trying to be smart and do write clustering at the ->writepage > operation, flush_dirty_buffers() is "dumb" and will interfere with the > write clustering. Actually, I found it doesn't interfere that much. Remember, there's still an elevator in there. Even if bdflush and clustered page flushing are running at the same time the elevator makes the final decision what gets written when. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti <[EMAIL PROTECTED]> wrote: > > On Fri, 5 Jan 2001, Chris Mason wrote: > >> >> Here's the latest version of the patch, against 2.4.0. The >> biggest open issues are what to do with bdflush, since >> page_launder could do everything bdflush does. > > I think we want to remove flush_dirty_buffers() from bdflush. > I think you're right. Now that bdflush calls page_launder with GFP_KERNEL, the flush_dirty_buffers call isn't needed there. I think the current bdflush (with or without the flush_dirty_buffers call) will be more aggressive at freeing buffer cache pages from the inactive_dirty list, and it will be interesting to see how it performs. I think it will be better, but the blocksize < pagesize case might screw us up. > While we are trying to be smart and do write clustering at the ->writepage > operation, flush_dirty_buffers() is "dumb" and will interfere with the > write clustering. > Only for the buffer cache pages. For actual file data, flush_dirty_buffers is calling the writepage func, and we should still be able to cluster it. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Chris Mason wrote: > > Here's the latest version of the patch, against 2.4.0. The > biggest open issues are what to do with bdflush, since > page_launder could do everything bdflush does. I think we want to remove flush_dirty_buffers() from bdflush. While we are trying to be smart and do write clustering at the ->writepage operation, flush_dirty_buffers() is "dumb" and will interfere with the write clustering. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Here's the latest version of the patch, against 2.4.0. The biggest open issues are what to do with bdflush, since page_launder could do everything bdflush does. And, how to deal with writepage funcs that return 1 for fsync_inode_buffers. -chris diff -urN linux.2.4.0/fs/buffer.c linux/fs/buffer.c --- linux.2.4.0/fs/buffer.c Thu Jan 4 23:12:28 2001 +++ linux/fs/buffer.c Thu Jan 4 18:58:09 2001 @@ -97,6 +97,20 @@ static int grow_buffers(int size); static void __refile_buffer(struct buffer_head *); +static int block_write_anon_page(struct page *); +static void end_buffer_io_async(struct buffer_head * bh, int uptodate) ; + +static struct address_space_operations anon_space_ops = { + writepage: block_write_anon_page, + sync_page: block_sync_page, +} ; +static struct address_space anon_space_mapping = { + LIST_HEAD_INIT(anon_space_mapping.clean_pages), + LIST_HEAD_INIT(anon_space_mapping.dirty_pages), + LIST_HEAD_INIT(anon_space_mapping.locked_pages), + 0, /* nr_pages */ + _space_ops, +} ; /* This is used by some architectures to estimate available memory. */ atomic_t buffermem_pages = ATOMIC_INIT(0); @@ -161,6 +175,73 @@ atomic_dec(>b_count); } +/* just for use with anon pages, or pages that don't provide their own +** writepage func. We just want to write bh, not the whole page, so we +** queue that io here instead of calling writepage. +*/ +static int __dirty_list_writepage(struct page *page, struct buffer_head *bh) { + int other_dirty = 0 ; + struct buffer_head *cur ; + + /* check for other dirty buffers on this page. If there are none, + ** clear the page dirty bit + */ + cur = bh->b_this_page ; + while(cur != bh) { + other_dirty += buffer_dirty(cur) ; + cur = cur->b_this_page ; + } + if (other_dirty == 0) { + ClearPageDirty(page) ; + } + + /* we want the page available for locking again right away. + ** someone walking the dirty buffer list might find another + ** buffer from this page, and we don't want them to skip it in + ** favor of a younger buffer. + */ + atomic_inc(>b_count) ; + UnlockPage(page) ; + ll_rw_block(WRITE, 1, ) ; + atomic_dec(>b_count) ; + return 0 ; +} + +/* +** util function for sync_buffers and flush_dirty_buffers +** uses either the writepage func supplied in the page's mapping, +** or the anon address space writepage +*/ +static int dirty_list_writepage(struct page *page, struct buffer_head *bh) { + int (*writepage)(struct page *) ; + int ret ; + + /* someone wrote this page while we were locking before */ + if (!PageDirty(page) && !buffer_dirty(bh)) { + UnlockPage(page) ; + return 0 ; + } + writepage = page->mapping->a_ops->writepage ; + + /* For anon pages, and pages that don't have a writepage + ** func, just write this one dirty buffer. __dirty_list_writepage + ** does a little more work to make sure the page dirty bit is cleared + ** when we are the only dirty buffer on this page + */ + if (!writepage || page->mapping == _space_mapping) { + writepage = anon_space_ops.writepage ; + return __dirty_list_writepage(page, bh) ; + } + + ClearPageDirty(page) ; + ret = writepage(page) ; + if (ret == 1) { + SetPageDirty(page) ; + UnlockPage(page) ; + } + return ret ; +} + /* Call sync_buffers with wait!=0 to ensure that the call does not * return until all buffer writes have completed. Sync() may return * before the writes have finished; fsync() may not. @@ -175,6 +256,7 @@ { int i, retry, pass = 0, err = 0; struct buffer_head * bh, *next; + struct page *page ; /* One pass for no-wait, three for wait: * 0) write out all dirty, unlocked buffers; @@ -230,10 +312,27 @@ if (!buffer_dirty(bh) || pass >= 2) continue; - atomic_inc(>b_count); + page = bh->b_page ; + page_cache_get(page) ; + if (TryLockPage(page)) { + if (!wait || !pass) { + retry = 1 ; + continue ; + } + spin_unlock(_list_lock); + wait_on_page(page) ; + page_cache_release(page) ; + goto repeat ; + } spin_unlock(_list_lock); - ll_rw_block(WRITE, 1, ); - atomic_dec(>b_count); + + /* if the writepage func returns 1, it is +
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Chris Mason wrote: Here's the latest version of the patch, against 2.4.0. The biggest open issues are what to do with bdflush, since page_launder could do everything bdflush does. I think we want to remove flush_dirty_buffers() from bdflush. While we are trying to be smart and do write clustering at the -writepage operation, flush_dirty_buffers() is "dumb" and will interfere with the write clustering. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Marcelo Tosatti wrote: On Fri, 5 Jan 2001, Chris Mason wrote: Here's the latest version of the patch, against 2.4.0. The biggest open issues are what to do with bdflush, since page_launder could do everything bdflush does. I think we want to remove flush_dirty_buffers() from bdflush. While we are trying to be smart and do write clustering at the -writepage operation, flush_dirty_buffers() is "dumb" and will interfere with the write clustering. Actually, I found it doesn't interfere that much. Remember, there's still an elevator in there. Even if bdflush and clustered page flushing are running at the same time the elevator makes the final decision what gets written when. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti [EMAIL PROTECTED] wrote: On Fri, 5 Jan 2001, Chris Mason wrote: Here's the latest version of the patch, against 2.4.0. The biggest open issues are what to do with bdflush, since page_launder could do everything bdflush does. I think we want to remove flush_dirty_buffers() from bdflush. I think you're right. Now that bdflush calls page_launder with GFP_KERNEL, the flush_dirty_buffers call isn't needed there. I think the current bdflush (with or without the flush_dirty_buffers call) will be more aggressive at freeing buffer cache pages from the inactive_dirty list, and it will be interesting to see how it performs. I think it will be better, but the blocksize pagesize case might screw us up. While we are trying to be smart and do write clustering at the -writepage operation, flush_dirty_buffers() is "dumb" and will interfere with the write clustering. Only for the buffer cache pages. For actual file data, flush_dirty_buffers is calling the writepage func, and we should still be able to cluster it. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Hi Chris, I don't know if this is already known, but this patch seems to solve the loop block device problem too. I've tested it several times and did not get any kernel lockups after applying the patch. Until now this was a showstopper for me but you gave the solution. Thank you very much. Juergen Schneider - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Marcelo Tosatti wrote: On Fri, 5 Jan 2001, Chris Mason wrote: On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti [EMAIL PROTECTED] wrote: On Fri, 5 Jan 2001, Chris Mason wrote: Here's the latest version of the patch, against 2.4.0. The biggest open issues are what to do with bdflush, since page_launder could do everything bdflush does. I think we want to remove flush_dirty_buffers() from bdflush. Whoops. If bdflush doesn't balance the dirty list, who does? Who marks buffers dirty. Linus changed mark_buffer_dirty() to use flush_dirty_buffers() in case there are too many dirty buffers. Also, I think in practice page_launder will help on balancing. Chris is right. It is possible (not very likely, but possible notheless) that the majority of the dirty pages are _active_ pages ... in that case page_launder() won't help one bit and only flush_dirty_buffers() can do the job. Also, you do not want the writer to block on writing out buffers if bdflush could write them out asynchronously while the dirty buffer producer can work on in the background. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to loose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Rik van Riel wrote: Also, you do not want the writer to block on writing out buffers if bdflush could write them out asynchronously while the dirty buffer producer can work on in the background. flush_dirty_buffers() do not wait on the buffers written to get clean. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 5 Jan 2001, Marcelo Tosatti wrote: On Fri, 5 Jan 2001, Rik van Riel wrote: Also, you do not want the writer to block on writing out buffers if bdflush could write them out asynchronously while the dirty buffer producer can work on in the background. flush_dirty_buffers() do not wait on the buffers written to get clean. This sounds more like a bug we should fix then a reason not to use flush_dirty_buffers() regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to loose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, December 29, 2000 06:58:01 PM +0100 Daniel Phillips <[EMAIL PROTECTED]> wrote: > Chris Mason wrote: >> >> BTW, the last anon space mapping patch I sent also works on test13-pre5. >> The block_truncate_page fix does help my patch, since I have bdflush >> locking pages ( thanks Marcelo ) > > Yes it does, but the little additional patch you posted no longer > applies. Your patch is suffering badly under pre5 here, reducing > throughput from around 10 MB/sec to 5-6 MB/sec, and highly variable. If > you're not seeing this you should probably try to get a few others to > run it. > The additional patch I sent (for skipping writepage calls on the first loop in page_launder) was applied to test13-pre5, in a slightly better form. So, you'll either need to benchmark in pre4, or make a reversed version for pre5. Porting to the prerelease now ;-) -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, December 29, 2000 06:58:01 PM +0100 Daniel Phillips [EMAIL PROTECTED] wrote: Chris Mason wrote: BTW, the last anon space mapping patch I sent also works on test13-pre5. The block_truncate_page fix does help my patch, since I have bdflush locking pages ( thanks Marcelo ) Yes it does, but the little additional patch you posted no longer applies. Your patch is suffering badly under pre5 here, reducing throughput from around 10 MB/sec to 5-6 MB/sec, and highly variable. If you're not seeing this you should probably try to get a few others to run it. The additional patch I sent (for skipping writepage calls on the first loop in page_launder) was applied to test13-pre5, in a slightly better form. So, you'll either need to benchmark in pre4, or make a reversed version for pre5. Porting to the prerelease now ;-) -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 29 Dec 2000, Marcelo Tosatti wrote: > Now the ext2 changes will for sure make a difference since right now the > superblock lock is suffering from contention. superblock lock is suffering from more than just contention. Consider the following: sys_ustat() does get_super(), which leaves the sb unlocked. We are going to call ->statfs(). In the meanwhile, fs is umounted. Woops... In other words, get_super() callers are oopsen waiting to happen. That's what is fundamentally wrong with lock_super()/wait_super()/unlock_super() - we have no reference counter on superblocks and we blindly grab references to them assuming that they will stay. The main source of get_super() calls is in device drivers. I propose to add void invalidate_dev(kdev_t dev, int sync_flag) { struct super_block *sb = get_super(dev); switch (sync_flag) { case 1: sync_dev(dev);break; case 2: fsync_dev(dev);break; } if (sb) invalidate_inodes(sb); invalidate_buffers(dev); } to fs/buffer.c, export it and let drivers call that instead of doing the thing by hands. Then we have a chance to deal with the problem inside the kernel proper. Right now almost every block device has that sequence in BLKRRPART handling and fixing each of them separately is going to hurt like hell. Linus, I realize that it's changing the block devices near to release, but AFAICS we have to change them anyway. I'm asking for permission to take get_super() out. Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 29 Dec 2000, Chris Mason wrote: > BTW, the last anon space mapping patch I sent also works on test13-pre5. > The block_truncate_page fix does help my patch, since I have bdflush > locking pages ( thanks Marcelo ) Good to know. I thought that fix would not change performance noticeable. Now the ext2 changes will for sure make a difference since right now the superblock lock is suffering from contention. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: > > On Thursday, December 28, 2000 11:29:01 AM -0800 Linus Torvalds > <[EMAIL PROTECTED]> wrote: > [ skipping io on the first walk in page_launder ] > > > > There are some arguments for starting the writeout early, but there are > > tons of arguments against it too (the main one being "avoid doing IO if > > you can do so"), so your patch is probably fine. In the end, the > > performance characteristics are what matters. Does the patch make for > > smoother behaviour and better performance? > > My dbench speeds have always varied from run to run, but the average speed > went up about 9% with the anon space mapping patch and the page_launder > change. I could not find much difference in a pure test13-pre4, probably > because dbench doesn't generate much swap on my machine. I'll do more > tests when I get back on Monday night. > > Daniel, sounds like dbench varies less on your machine, what did the patch > do for you? > > BTW, the last anon space mapping patch I sent also works on test13-pre5. > The block_truncate_page fix does help my patch, since I have bdflush > locking pages ( thanks Marcelo ) Yes it does, but the little additional patch you posted no longer applies. Your patch is suffering badly under pre5 here, reducing throughput from around 10 MB/sec to 5-6 MB/sec, and highly variable. If you're not seeing this you should probably try to get a few others to run it. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 28, 2000 11:29:01 AM -0800 Linus Torvalds <[EMAIL PROTECTED]> wrote: [ skipping io on the first walk in page_launder ] > > There are some arguments for starting the writeout early, but there are > tons of arguments against it too (the main one being "avoid doing IO if > you can do so"), so your patch is probably fine. In the end, the > performance characteristics are what matters. Does the patch make for > smoother behaviour and better performance? My dbench speeds have always varied from run to run, but the average speed went up about 9% with the anon space mapping patch and the page_launder change. I could not find much difference in a pure test13-pre4, probably because dbench doesn't generate much swap on my machine. I'll do more tests when I get back on Monday night. Daniel, sounds like dbench varies less on your machine, what did the patch do for you? BTW, the last anon space mapping patch I sent also works on test13-pre5. The block_truncate_page fix does help my patch, since I have bdflush locking pages ( thanks Marcelo ) -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 28, 2000 11:29:01 AM -0800 Linus Torvalds [EMAIL PROTECTED] wrote: [ skipping io on the first walk in page_launder ] There are some arguments for starting the writeout early, but there are tons of arguments against it too (the main one being "avoid doing IO if you can do so"), so your patch is probably fine. In the end, the performance characteristics are what matters. Does the patch make for smoother behaviour and better performance? My dbench speeds have always varied from run to run, but the average speed went up about 9% with the anon space mapping patch and the page_launder change. I could not find much difference in a pure test13-pre4, probably because dbench doesn't generate much swap on my machine. I'll do more tests when I get back on Monday night. Daniel, sounds like dbench varies less on your machine, what did the patch do for you? BTW, the last anon space mapping patch I sent also works on test13-pre5. The block_truncate_page fix does help my patch, since I have bdflush locking pages ( thanks Marcelo ) -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: On Thursday, December 28, 2000 11:29:01 AM -0800 Linus Torvalds [EMAIL PROTECTED] wrote: [ skipping io on the first walk in page_launder ] There are some arguments for starting the writeout early, but there are tons of arguments against it too (the main one being "avoid doing IO if you can do so"), so your patch is probably fine. In the end, the performance characteristics are what matters. Does the patch make for smoother behaviour and better performance? My dbench speeds have always varied from run to run, but the average speed went up about 9% with the anon space mapping patch and the page_launder change. I could not find much difference in a pure test13-pre4, probably because dbench doesn't generate much swap on my machine. I'll do more tests when I get back on Monday night. Daniel, sounds like dbench varies less on your machine, what did the patch do for you? BTW, the last anon space mapping patch I sent also works on test13-pre5. The block_truncate_page fix does help my patch, since I have bdflush locking pages ( thanks Marcelo ) Yes it does, but the little additional patch you posted no longer applies. Your patch is suffering badly under pre5 here, reducing throughput from around 10 MB/sec to 5-6 MB/sec, and highly variable. If you're not seeing this you should probably try to get a few others to run it. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 29 Dec 2000, Chris Mason wrote: BTW, the last anon space mapping patch I sent also works on test13-pre5. The block_truncate_page fix does help my patch, since I have bdflush locking pages ( thanks Marcelo ) Good to know. I thought that fix would not change performance noticeable. Now the ext2 changes will for sure make a difference since right now the superblock lock is suffering from contention. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 29 Dec 2000, Marcelo Tosatti wrote: Now the ext2 changes will for sure make a difference since right now the superblock lock is suffering from contention. superblock lock is suffering from more than just contention. Consider the following: sys_ustat() does get_super(), which leaves the sb unlocked. We are going to call -statfs(). In the meanwhile, fs is umounted. Woops... In other words, get_super() callers are oopsen waiting to happen. That's what is fundamentally wrong with lock_super()/wait_super()/unlock_super() - we have no reference counter on superblocks and we blindly grab references to them assuming that they will stay. The main source of get_super() calls is in device drivers. I propose to add void invalidate_dev(kdev_t dev, int sync_flag) { struct super_block *sb = get_super(dev); switch (sync_flag) { case 1: sync_dev(dev);break; case 2: fsync_dev(dev);break; } if (sb) invalidate_inodes(sb); invalidate_buffers(dev); } to fs/buffer.c, export it and let drivers call that instead of doing the thing by hands. Then we have a chance to deal with the problem inside the kernel proper. Right now almost every block device has that sequence in BLKRRPART handling and fixing each of them separately is going to hurt like hell. Linus, I realize that it's changing the block devices near to release, but AFAICS we have to change them anyway. I'm asking for permission to take get_super() out. Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thu, 28 Dec 2000, Chris Mason wrote: > > Linus and Rik are cc'd in to find out if this is a good idea in > general. Probably. There are some arguments for starting the writeout early, but there are tons of arguments against it too (the main one being "avoid doing IO if you can do so"), so your patch is probably fine. In the end, the performance characteristics are what matters. Does the patch make for smoother behaviour and better performance? Anyway, the "can_get_io_locks" check is subsumed by the "launder_loop" check: we will never set "launder_loop" to non-zero if we can't get the io_locks, so you might as well just make the test be /* First loop through? Don't start IO, just move it to the back of the list */ if (!launder_loop) { and be done with it. I'd like to hear what that does for dbench. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 28, 2000 16:49:14 +0100 Daniel Phillips <[EMAIL PROTECTED]> wrote: [ dbench 48 test on the anon space mapping patch ] > > This benchmark doesn't seem to suffer a lot from noise, so the 7% > slowdown with your patch likely real. > Ok, page_launder is supposed to run through the inactive dirty list twice, and on the second run, it wants to start i/o. But, if the page is dirty, writepage is called on the first run. With my patch, this flushes lots more data than it used to. I have writepage doing all the i/o, and try_to_free_buffers only waits on it. This diff makes it so writepage is only called on the second loop through the inactive dirty list, could you please give it a try (slightly faster in my tests). Linus and Rik are cc'd in to find out if this is a good idea in general. -chris --- linux-test13-pre4/mm/vmscan.c Sat Dec 23 13:14:26 2000 +++ linux/mm/vmscan.c Thu Dec 28 15:02:08 2000 @@ -609,7 +609,7 @@ goto page_active; /* Can't start IO? Move it to the back of the list */ - if (!can_get_io_locks) { + if (!launder_loop || !can_get_io_locks) { list_del(page_lru); list_add(page_lru, _dirty_list); UnlockPage(page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: > > On Wednesday, December 27, 2000 21:26:02 +0100 Daniel Phillips > <[EMAIL PROTECTED]> wrote: > > > Hi Chris. I took your patch for a test drive under dbench and it seems > > impressively stable under load, but there are performance problems. > > > >Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K > >Without patch: 9.5 MB/sec, 11 min 6 secs > >With patch: 3.12 MB/sec, 33 min 51 sec > > > > Cool, thanks for the testing. Which benchmark are you using? bonnie and > dbench don't show any changes on my scsi disks, I'll give IDE a try as well. Chris, this was an error, I had accidently booted the wrong kernel. The 'With patch' results above are for 2.2.16, not your patch. With correct results you are looking much better: Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K Test: dbench 48 pre13 without patch: 9.5 MB/sec 11 min 6 secs pre13 with patch: 8.9 MB/sec 11 min 46 secs 2.2.16: 3.1 MB/sec 33 min 51 sec This benchmark doesn't seem to suffer a lot from noise, so the 7% slowdown with your patch likely real. We've come a long way from 2.2.16, haven't we? I'll run some of these tests against 2.2 pre19 kernels and maybe fan the flames of the 2.2/2.4 competition a little. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: On Wednesday, December 27, 2000 21:26:02 +0100 Daniel Phillips [EMAIL PROTECTED] wrote: Hi Chris. I took your patch for a test drive under dbench and it seems impressively stable under load, but there are performance problems. Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K Without patch: 9.5 MB/sec, 11 min 6 secs With patch: 3.12 MB/sec, 33 min 51 sec Cool, thanks for the testing. Which benchmark are you using? bonnie and dbench don't show any changes on my scsi disks, I'll give IDE a try as well. Chris, this was an error, I had accidently booted the wrong kernel. The 'With patch' results above are for 2.2.16, not your patch. With correct results you are looking much better: Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K Test: dbench 48 pre13 without patch: 9.5 MB/sec 11 min 6 secs pre13 with patch: 8.9 MB/sec 11 min 46 secs 2.2.16: 3.1 MB/sec 33 min 51 sec This benchmark doesn't seem to suffer a lot from noise, so the 7% slowdown with your patch likely real. We've come a long way from 2.2.16, haven't we? I'll run some of these tests against 2.2 pre19 kernels and maybe fan the flames of the 2.2/2.4 competition a little. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 28, 2000 16:49:14 +0100 Daniel Phillips [EMAIL PROTECTED] wrote: [ dbench 48 test on the anon space mapping patch ] This benchmark doesn't seem to suffer a lot from noise, so the 7% slowdown with your patch likely real. Ok, page_launder is supposed to run through the inactive dirty list twice, and on the second run, it wants to start i/o. But, if the page is dirty, writepage is called on the first run. With my patch, this flushes lots more data than it used to. I have writepage doing all the i/o, and try_to_free_buffers only waits on it. This diff makes it so writepage is only called on the second loop through the inactive dirty list, could you please give it a try (slightly faster in my tests). Linus and Rik are cc'd in to find out if this is a good idea in general. -chris --- linux-test13-pre4/mm/vmscan.c Sat Dec 23 13:14:26 2000 +++ linux/mm/vmscan.c Thu Dec 28 15:02:08 2000 @@ -609,7 +609,7 @@ goto page_active; /* Can't start IO? Move it to the back of the list */ - if (!can_get_io_locks) { + if (!launder_loop || !can_get_io_locks) { list_del(page_lru); list_add(page_lru, inactive_dirty_list); UnlockPage(page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thu, 28 Dec 2000, Chris Mason wrote: Linus and Rik are cc'd in to find out if this is a good idea in general. Probably. There are some arguments for starting the writeout early, but there are tons of arguments against it too (the main one being "avoid doing IO if you can do so"), so your patch is probably fine. In the end, the performance characteristics are what matters. Does the patch make for smoother behaviour and better performance? Anyway, the "can_get_io_locks" check is subsumed by the "launder_loop" check: we will never set "launder_loop" to non-zero if we can't get the io_locks, so you might as well just make the test be /* First loop through? Don't start IO, just move it to the back of the list */ if (!launder_loop) { and be done with it. I'd like to hear what that does for dbench. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Wednesday, December 27, 2000 21:26:02 +0100 Daniel Phillips <[EMAIL PROTECTED]> wrote: > Hi Chris. I took your patch for a test drive under dbench and it seems > impressively stable under load, but there are performance problems. > >Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K >Without patch: 9.5 MB/sec, 11 min 6 secs >With patch: 3.12 MB/sec, 33 min 51 sec > Cool, thanks for the testing. Which benchmark are you using? bonnie and dbench don't show any changes on my scsi disks, I'll give IDE a try as well. > Philosophically, I wonder if it's right for the buffer flush mechanism > to be calling into the filesystem. It seems like the buffer lists > should stay sitting between the filesystem and the block layer, it > actually does a pretty good job. > What I'm looking for is a separation of the write management (aging, memory pressure, etc, etc) from the actual write method. The lists (VM, buffer.c, whatever) should do the management, and the FS should do the i/o. This patch is not a perfect solution by any means, but its a start. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: > > Hi guys, > > Here's my latest code, which uses ll_rw_block for anon pages (or > pages without a writepage func) when flush_dirty_buffers, > sync_buffers, or fsync_inode_buffers are flushing things. This > seems to have fixed my slowdown on 1k buffer sizes, but I > haven't done extensive benchmarks yet. > > Other changes: After freeing a page with buffers, page_launder > now stops if (!free_shortage()). This is a mod of the check where > page_launder checked free_shortage after freeing a buffer cache > page. Code outside buffer.c can't detect buffer cache pages with > this patch, so the old check doesn't apply. > > My change doesn't seem quite right though, if page_launder wants > to stop when there isn't a shortage, it should do that regardless of > if the page it just freed had buffers. It looks like this was added > so bdflush could call page_launder, and get an early out after > freeing some buffer heads, but I'm not sure. > > In test13-pre4, invalidate_buffers skips buffers on a page > with a mapping. I changed that to skip mappings other than the > anon space mapping. > > Comments and/or suggestions on how to make better use of this stuff > are more than welcome ;-) Hi Chris. I took your patch for a test drive under dbench and it seems impressively stable under load, but there are performance problems. Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K Without patch: 9.5 MB/sec, 11 min 6 secs With patch: 3.12 MB/sec, 33 min 51 sec Philosophically, I wonder if it's right for the buffer flush mechanism to be calling into the filesystem. It seems like the buffer lists should stay sitting between the filesystem and the block layer, it actually does a pretty good job. What about just bumping the buffers of pages found on the inactive_dirty list to the front of the buffer LRU list? Then we can do write clustering by being selective about the bumping. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: Hi guys, Here's my latest code, which uses ll_rw_block for anon pages (or pages without a writepage func) when flush_dirty_buffers, sync_buffers, or fsync_inode_buffers are flushing things. This seems to have fixed my slowdown on 1k buffer sizes, but I haven't done extensive benchmarks yet. Other changes: After freeing a page with buffers, page_launder now stops if (!free_shortage()). This is a mod of the check where page_launder checked free_shortage after freeing a buffer cache page. Code outside buffer.c can't detect buffer cache pages with this patch, so the old check doesn't apply. My change doesn't seem quite right though, if page_launder wants to stop when there isn't a shortage, it should do that regardless of if the page it just freed had buffers. It looks like this was added so bdflush could call page_launder, and get an early out after freeing some buffer heads, but I'm not sure. In test13-pre4, invalidate_buffers skips buffers on a page with a mapping. I changed that to skip mappings other than the anon space mapping. Comments and/or suggestions on how to make better use of this stuff are more than welcome ;-) Hi Chris. I took your patch for a test drive under dbench and it seems impressively stable under load, but there are performance problems. Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K Without patch: 9.5 MB/sec, 11 min 6 secs With patch: 3.12 MB/sec, 33 min 51 sec Philosophically, I wonder if it's right for the buffer flush mechanism to be calling into the filesystem. It seems like the buffer lists should stay sitting between the filesystem and the block layer, it actually does a pretty good job. What about just bumping the buffers of pages found on the inactive_dirty list to the front of the buffer LRU list? Then we can do write clustering by being selective about the bumping. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Wednesday, December 27, 2000 21:26:02 +0100 Daniel Phillips [EMAIL PROTECTED] wrote: Hi Chris. I took your patch for a test drive under dbench and it seems impressively stable under load, but there are performance problems. Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K Without patch: 9.5 MB/sec, 11 min 6 secs With patch: 3.12 MB/sec, 33 min 51 sec Cool, thanks for the testing. Which benchmark are you using? bonnie and dbench don't show any changes on my scsi disks, I'll give IDE a try as well. Philosophically, I wonder if it's right for the buffer flush mechanism to be calling into the filesystem. It seems like the buffer lists should stay sitting between the filesystem and the block layer, it actually does a pretty good job. What I'm looking for is a separation of the write management (aging, memory pressure, etc, etc) from the actual write method. The lists (VM, buffer.c, whatever) should do the management, and the FS should do the i/o. This patch is not a perfect solution by any means, but its a start. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Tue, 26 Dec 2000, Chris Mason wrote: > > Hi guys, > > Here's my latest code, which uses ll_rw_block for anon pages (or > pages without a writepage func) when flush_dirty_buffers, > sync_buffers, or fsync_inode_buffers are flushing things. This > seems to have fixed my slowdown on 1k buffer sizes, but I > haven't done extensive benchmarks yet. Great. I'll run some benchmarks around here too and let you know. > Other changes: After freeing a page with buffers, page_launder > now stops if (!free_shortage()). This is a mod of the check where > page_launder checked free_shortage after freeing a buffer cache > page. Code outside buffer.c can't detect buffer cache pages with > this patch, so the old check doesn't apply. > > My change doesn't seem quite right though, if page_launder wants > to stop when there isn't a shortage, it should do that regardless of > if the page it just freed had buffers. It looks like this was added > so bdflush could call page_launder, and get an early out after > freeing some buffer heads, but I'm not sure. > > In test13-pre4, invalidate_buffers skips buffers on a page > with a mapping. I changed that to skip mappings other than the > anon space mapping. > > Comments and/or suggestions on how to make better use of this stuff > are more than welcome ;-) Well, the best use of this patch seems to be the ability to do write clustering in the ->writepage() operation for normal filesystems. I'll try to do a lightweight write clustering patch for block_write_full_page soon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Hi guys, Here's my latest code, which uses ll_rw_block for anon pages (or pages without a writepage func) when flush_dirty_buffers, sync_buffers, or fsync_inode_buffers are flushing things. This seems to have fixed my slowdown on 1k buffer sizes, but I haven't done extensive benchmarks yet. Other changes: After freeing a page with buffers, page_launder now stops if (!free_shortage()). This is a mod of the check where page_launder checked free_shortage after freeing a buffer cache page. Code outside buffer.c can't detect buffer cache pages with this patch, so the old check doesn't apply. My change doesn't seem quite right though, if page_launder wants to stop when there isn't a shortage, it should do that regardless of if the page it just freed had buffers. It looks like this was added so bdflush could call page_launder, and get an early out after freeing some buffer heads, but I'm not sure. In test13-pre4, invalidate_buffers skips buffers on a page with a mapping. I changed that to skip mappings other than the anon space mapping. Comments and/or suggestions on how to make better use of this stuff are more than welcome ;-) -chris diff -urN linux-test13-pre4/fs/buffer.c linux-anon-space/fs/buffer.c --- linux-test13-pre4/fs/buffer.c Sat Dec 23 13:14:48 2000 +++ linux-anon-space/fs/buffer.cTue Dec 26 00:58:06 2000 @@ -97,6 +97,17 @@ static int grow_buffers(int size); static void __refile_buffer(struct buffer_head *); +static int block_write_anon_page(struct page *); +static void end_buffer_io_async(struct buffer_head * bh, int uptodate) ; + +static struct address_space_operations anon_space_ops = { + writepage: block_write_anon_page, + sync_page: block_sync_page, +} ; +static struct address_space anon_space_mapping = { + pages: { _space_mapping.pages, _space_mapping.pages }, + a_ops: _space_ops, +} ; /* This is used by some architectures to estimate available memory. */ atomic_t buffermem_pages = ATOMIC_INIT(0); @@ -161,6 +172,73 @@ atomic_dec(>b_count); } +/* just for use with anon pages, or pages that don't provide their own +** writepage func. We just want to write bh, not the whole page, so we +** queue that io here instead of calling writepage. +*/ +static int __dirty_list_writepage(struct page *page, struct buffer_head *bh) { + int other_dirty = 0 ; + struct buffer_head *cur ; + + /* check for other dirty buffers on this page. If there are none, + ** clear the page dirty bit + */ + cur = bh->b_this_page ; + while(cur != bh) { + other_dirty += buffer_dirty(cur) ; + cur = cur->b_this_page ; + } + if (other_dirty == 0) { + ClearPageDirty(page) ; + } + + /* we want the page available for locking again right away. + ** someone walking the dirty buffer list might find another + ** buffer from this page, and we don't want them to skip it in + ** favor of a younger buffer. + */ + atomic_inc(>b_count) ; + ll_rw_block(WRITE, 1, ) ; + atomic_dec(>b_count) ; + UnlockPage(page) ; + return 0 ; +} + +/* +** util function for sync_buffers and flush_dirty_buffers +** uses either the writepage func supplied in the page's mapping, +** or the anon address space writepage +*/ +static int dirty_list_writepage(struct page *page, struct buffer_head *bh) { + int (*writepage)(struct page *) ; + int ret ; + + /* someone wrote this page while we were locking before */ + if (!PageDirty(page) && !buffer_dirty(bh)) { + UnlockPage(page) ; + return 0 ; + } + writepage = page->mapping->a_ops->writepage ; + + /* For anon pages, and pages that don't have a writepage + ** func, just write this one dirty buffer. __dirty_list_writepage + ** does a little more work to make sure the page dirty bit is cleared + ** when we are the only dirty buffer on this page + */ + if (!writepage || page->mapping == _space_mapping) { + writepage = anon_space_ops.writepage ; + return __dirty_list_writepage(page, bh) ; + } + + ClearPageDirty(page) ; + ret = writepage(page) ; + if (ret == 1) { + SetPageDirty(page) ; + UnlockPage(page) ; + } + return ret ; +} + /* Call sync_buffers with wait!=0 to ensure that the call does not * return until all buffer writes have completed. Sync() may return * before the writes have finished; fsync() may not. @@ -175,6 +253,7 @@ { int i, retry, pass = 0, err = 0; struct buffer_head * bh, *next; + struct page *page ; /* One pass for no-wait, three for wait: * 0) write out all dirty, unlocked buffers; @@ -230,10 +309,27 @@ if (!buffer_dirty(bh) || pass >= 2) continue; -
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Hi guys, Here's my latest code, which uses ll_rw_block for anon pages (or pages without a writepage func) when flush_dirty_buffers, sync_buffers, or fsync_inode_buffers are flushing things. This seems to have fixed my slowdown on 1k buffer sizes, but I haven't done extensive benchmarks yet. Other changes: After freeing a page with buffers, page_launder now stops if (!free_shortage()). This is a mod of the check where page_launder checked free_shortage after freeing a buffer cache page. Code outside buffer.c can't detect buffer cache pages with this patch, so the old check doesn't apply. My change doesn't seem quite right though, if page_launder wants to stop when there isn't a shortage, it should do that regardless of if the page it just freed had buffers. It looks like this was added so bdflush could call page_launder, and get an early out after freeing some buffer heads, but I'm not sure. In test13-pre4, invalidate_buffers skips buffers on a page with a mapping. I changed that to skip mappings other than the anon space mapping. Comments and/or suggestions on how to make better use of this stuff are more than welcome ;-) -chris diff -urN linux-test13-pre4/fs/buffer.c linux-anon-space/fs/buffer.c --- linux-test13-pre4/fs/buffer.c Sat Dec 23 13:14:48 2000 +++ linux-anon-space/fs/buffer.cTue Dec 26 00:58:06 2000 @@ -97,6 +97,17 @@ static int grow_buffers(int size); static void __refile_buffer(struct buffer_head *); +static int block_write_anon_page(struct page *); +static void end_buffer_io_async(struct buffer_head * bh, int uptodate) ; + +static struct address_space_operations anon_space_ops = { + writepage: block_write_anon_page, + sync_page: block_sync_page, +} ; +static struct address_space anon_space_mapping = { + pages: { anon_space_mapping.pages, anon_space_mapping.pages }, + a_ops: anon_space_ops, +} ; /* This is used by some architectures to estimate available memory. */ atomic_t buffermem_pages = ATOMIC_INIT(0); @@ -161,6 +172,73 @@ atomic_dec(bh-b_count); } +/* just for use with anon pages, or pages that don't provide their own +** writepage func. We just want to write bh, not the whole page, so we +** queue that io here instead of calling writepage. +*/ +static int __dirty_list_writepage(struct page *page, struct buffer_head *bh) { + int other_dirty = 0 ; + struct buffer_head *cur ; + + /* check for other dirty buffers on this page. If there are none, + ** clear the page dirty bit + */ + cur = bh-b_this_page ; + while(cur != bh) { + other_dirty += buffer_dirty(cur) ; + cur = cur-b_this_page ; + } + if (other_dirty == 0) { + ClearPageDirty(page) ; + } + + /* we want the page available for locking again right away. + ** someone walking the dirty buffer list might find another + ** buffer from this page, and we don't want them to skip it in + ** favor of a younger buffer. + */ + atomic_inc(bh-b_count) ; + ll_rw_block(WRITE, 1, bh) ; + atomic_dec(bh-b_count) ; + UnlockPage(page) ; + return 0 ; +} + +/* +** util function for sync_buffers and flush_dirty_buffers +** uses either the writepage func supplied in the page's mapping, +** or the anon address space writepage +*/ +static int dirty_list_writepage(struct page *page, struct buffer_head *bh) { + int (*writepage)(struct page *) ; + int ret ; + + /* someone wrote this page while we were locking before */ + if (!PageDirty(page) !buffer_dirty(bh)) { + UnlockPage(page) ; + return 0 ; + } + writepage = page-mapping-a_ops-writepage ; + + /* For anon pages, and pages that don't have a writepage + ** func, just write this one dirty buffer. __dirty_list_writepage + ** does a little more work to make sure the page dirty bit is cleared + ** when we are the only dirty buffer on this page + */ + if (!writepage || page-mapping == anon_space_mapping) { + writepage = anon_space_ops.writepage ; + return __dirty_list_writepage(page, bh) ; + } + + ClearPageDirty(page) ; + ret = writepage(page) ; + if (ret == 1) { + SetPageDirty(page) ; + UnlockPage(page) ; + } + return ret ; +} + /* Call sync_buffers with wait!=0 to ensure that the call does not * return until all buffer writes have completed. Sync() may return * before the writes have finished; fsync() may not. @@ -175,6 +253,7 @@ { int i, retry, pass = 0, err = 0; struct buffer_head * bh, *next; + struct page *page ; /* One pass for no-wait, three for wait: * 0) write out all dirty, unlocked buffers; @@ -230,10 +309,27 @@ if (!buffer_dirty(bh) || pass = 2) continue; -
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Tue, 26 Dec 2000, Chris Mason wrote: Hi guys, Here's my latest code, which uses ll_rw_block for anon pages (or pages without a writepage func) when flush_dirty_buffers, sync_buffers, or fsync_inode_buffers are flushing things. This seems to have fixed my slowdown on 1k buffer sizes, but I haven't done extensive benchmarks yet. Great. I'll run some benchmarks around here too and let you know. Other changes: After freeing a page with buffers, page_launder now stops if (!free_shortage()). This is a mod of the check where page_launder checked free_shortage after freeing a buffer cache page. Code outside buffer.c can't detect buffer cache pages with this patch, so the old check doesn't apply. My change doesn't seem quite right though, if page_launder wants to stop when there isn't a shortage, it should do that regardless of if the page it just freed had buffers. It looks like this was added so bdflush could call page_launder, and get an early out after freeing some buffer heads, but I'm not sure. In test13-pre4, invalidate_buffers skips buffers on a page with a mapping. I changed that to skip mappings other than the anon space mapping. Comments and/or suggestions on how to make better use of this stuff are more than welcome ;-) Well, the best use of this patch seems to be the ability to do write clustering in the -writepage() operation for normal filesystems. I'll try to do a lightweight write clustering patch for block_write_full_page soon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Saturday, December 23, 2000 11:02:53 -0800 Linus Torvalds <[EMAIL PROTECTED]> wrote: > > Which is why I prefer the higher layers handling the dirty/uptodate/xxx > bits. > Grin, I should have taken the hint when we talked about the buffer up to date checks for block_read_full_page, it made sense then too. brw_page doesn't clear the dirty bit in pre4, was that on purpose? -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Sat, 23 Dec 2000, Chris Mason wrote: > > I've updated to test13-pre4, and removed the hunk for submit_bh. > Looks as though pre4 changed the submit_bh callers to clear the dirty > bit, so my code does the same. Basically, I wanted to think of "submit_bh()" as a pure IO thing. When we call submit_bh(), that is basically the same as "statr IO on this thing". Which implies to me that submit_bh() doesn't care, or know, about why the higher layers did this. Which is why I prefer the higher layers handling the dirty/uptodate/xxx bits. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, December 22, 2000 21:26:33 -0200 Marcelo Tosatti > If we use ll_rw_block directly on buffers of anonymous pages > (page->mapping == _space_mapping) instead using > dirty_list_writepage() (which will end up calling block_write_anon_page) > we can fix the buffer flushtime issue. > Ok, I'm just being stubborn. The point of the patch was to get rid of the ll_rw_block calls, so I'm resisting adding one back in ;-) But, your way does seem like the least complicated method to honor the flushtime param for anon pages, and I'll switch over to that. I've updated to test13-pre4, and removed the hunk for submit_bh. Looks as though pre4 changed the submit_bh callers to clear the dirty bit, so my code does the same. Other changes: sync_page_buffers doesn't write blocks on the page, it only waits on them. Minor change to when the page count is increased before writepage is called. I still need to update fsync_inode_buffersdoesn't look like that will happen before I head off to visit family, where for some reason they've banned me from reading email ;-) I'll be back on Tuesday, hope everyone has a good holiday. New patch below: -chris diff -urN linux-test13-pre4/fs/buffer.c linux-anon-space/fs/buffer.c --- linux-test13-pre4/fs/buffer.c Sat Dec 23 13:14:48 2000 +++ linux-anon-space/fs/buffer.cSat Dec 23 13:30:24 2000 @@ -97,6 +97,16 @@ static int grow_buffers(int size); static void __refile_buffer(struct buffer_head *); +static int block_write_anon_page(struct page *); + +static struct address_space_operations anon_space_ops = { + writepage: block_write_anon_page, + sync_page: block_sync_page, +} ; +static struct address_space anon_space_mapping = { + pages: { _space_mapping.pages, _space_mapping.pages }, + a_ops: _space_ops, +} ; /* This is used by some architectures to estimate available memory. */ atomic_t buffermem_pages = ATOMIC_INIT(0); @@ -161,6 +171,30 @@ atomic_dec(>b_count); } +/* +** util function for sync_buffers and flush_dirty_buffers +** uses either the writepage func supplied in the page's mapping, +** or the anon address space writepage +*/ +static int dirty_list_writepage(struct page *page) { + int (*writepage)(struct page *) ; + int ret ; + + writepage = page->mapping->a_ops->writepage ; + + if (!writepage) { + writepage = anon_space_ops.writepage ; + } + + ClearPageDirty(page) ; + ret = writepage(page) ; + if (ret == 1) { + SetPageDirty(page) ; + UnlockPage(page) ; + } + return ret ; +} + /* Call sync_buffers with wait!=0 to ensure that the call does not * return until all buffer writes have completed. Sync() may return * before the writes have finished; fsync() may not. @@ -175,6 +209,7 @@ { int i, retry, pass = 0, err = 0; struct buffer_head * bh, *next; + struct page *page ; /* One pass for no-wait, three for wait: * 0) write out all dirty, unlocked buffers; @@ -230,10 +265,22 @@ if (!buffer_dirty(bh) || pass >= 2) continue; - atomic_inc(>b_count); + page = bh->b_page ; + page_cache_get(page) ; + if (TryLockPage(page)) { + if (!wait || !pass) { + retry = 1 ; + continue ; + } + spin_unlock(_list_lock); + wait_on_page(page) ; + page_cache_release(page) ; + goto repeat ; + } spin_unlock(_list_lock); - ll_rw_block(WRITE, 1, ); - atomic_dec(>b_count); + + dirty_list_writepage(page) ; + page_cache_release(page) ; retry = 1; goto repeat; } @@ -1101,8 +1148,10 @@ int dispose = BUF_CLEAN; if (buffer_locked(bh)) dispose = BUF_LOCKED; - if (buffer_dirty(bh)) + if (buffer_dirty(bh)) { dispose = BUF_DIRTY; + SetPageDirty(bh->b_page) ; + } if (buffer_protected(bh)) dispose = BUF_PROTECTED; if (dispose != bh->b_list) { @@ -1478,6 +1527,53 @@ * "Dirty" is valid only with the last case (mapped+uptodate). */ +static int block_write_anon_page(struct page *page) +{ + struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE]; + int i, nr = 0 ; + int partial = 0 ; + int ret = 0 ; + + if (!PageLocked(page)) + BUG(); + + if (!page->buffers) + BUG() ; + + head = page->buffers; + bh = head; + + /* Stage 1:
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: > It is enough to leave buffer heads we don't flush on the dirty list (and > redirty the page), they'll get written by a future loop through > flush_dirty_pages, or by page_launder. We could use ll_rw_block instead, > even though anon pages do have a writepage with this patch (just check if > page->mapping == _space_mapping). anon_space_mapping... this is really useful. This would make a nice patch on its own. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: It is enough to leave buffer heads we don't flush on the dirty list (and redirty the page), they'll get written by a future loop through flush_dirty_pages, or by page_launder. We could use ll_rw_block instead, even though anon pages do have a writepage with this patch (just check if page-mapping == anon_space_mapping). anon_space_mapping... this is really useful. This would make a nice patch on its own. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, December 22, 2000 21:26:33 -0200 Marcelo Tosatti If we use ll_rw_block directly on buffers of anonymous pages (page-mapping == anon_space_mapping) instead using dirty_list_writepage() (which will end up calling block_write_anon_page) we can fix the buffer flushtime issue. Ok, I'm just being stubborn. The point of the patch was to get rid of the ll_rw_block calls, so I'm resisting adding one back in ;-) But, your way does seem like the least complicated method to honor the flushtime param for anon pages, and I'll switch over to that. I've updated to test13-pre4, and removed the hunk for submit_bh. Looks as though pre4 changed the submit_bh callers to clear the dirty bit, so my code does the same. Other changes: sync_page_buffers doesn't write blocks on the page, it only waits on them. Minor change to when the page count is increased before writepage is called. I still need to update fsync_inode_buffersdoesn't look like that will happen before I head off to visit family, where for some reason they've banned me from reading email ;-) I'll be back on Tuesday, hope everyone has a good holiday. New patch below: -chris diff -urN linux-test13-pre4/fs/buffer.c linux-anon-space/fs/buffer.c --- linux-test13-pre4/fs/buffer.c Sat Dec 23 13:14:48 2000 +++ linux-anon-space/fs/buffer.cSat Dec 23 13:30:24 2000 @@ -97,6 +97,16 @@ static int grow_buffers(int size); static void __refile_buffer(struct buffer_head *); +static int block_write_anon_page(struct page *); + +static struct address_space_operations anon_space_ops = { + writepage: block_write_anon_page, + sync_page: block_sync_page, +} ; +static struct address_space anon_space_mapping = { + pages: { anon_space_mapping.pages, anon_space_mapping.pages }, + a_ops: anon_space_ops, +} ; /* This is used by some architectures to estimate available memory. */ atomic_t buffermem_pages = ATOMIC_INIT(0); @@ -161,6 +171,30 @@ atomic_dec(bh-b_count); } +/* +** util function for sync_buffers and flush_dirty_buffers +** uses either the writepage func supplied in the page's mapping, +** or the anon address space writepage +*/ +static int dirty_list_writepage(struct page *page) { + int (*writepage)(struct page *) ; + int ret ; + + writepage = page-mapping-a_ops-writepage ; + + if (!writepage) { + writepage = anon_space_ops.writepage ; + } + + ClearPageDirty(page) ; + ret = writepage(page) ; + if (ret == 1) { + SetPageDirty(page) ; + UnlockPage(page) ; + } + return ret ; +} + /* Call sync_buffers with wait!=0 to ensure that the call does not * return until all buffer writes have completed. Sync() may return * before the writes have finished; fsync() may not. @@ -175,6 +209,7 @@ { int i, retry, pass = 0, err = 0; struct buffer_head * bh, *next; + struct page *page ; /* One pass for no-wait, three for wait: * 0) write out all dirty, unlocked buffers; @@ -230,10 +265,22 @@ if (!buffer_dirty(bh) || pass = 2) continue; - atomic_inc(bh-b_count); + page = bh-b_page ; + page_cache_get(page) ; + if (TryLockPage(page)) { + if (!wait || !pass) { + retry = 1 ; + continue ; + } + spin_unlock(lru_list_lock); + wait_on_page(page) ; + page_cache_release(page) ; + goto repeat ; + } spin_unlock(lru_list_lock); - ll_rw_block(WRITE, 1, bh); - atomic_dec(bh-b_count); + + dirty_list_writepage(page) ; + page_cache_release(page) ; retry = 1; goto repeat; } @@ -1101,8 +1148,10 @@ int dispose = BUF_CLEAN; if (buffer_locked(bh)) dispose = BUF_LOCKED; - if (buffer_dirty(bh)) + if (buffer_dirty(bh)) { dispose = BUF_DIRTY; + SetPageDirty(bh-b_page) ; + } if (buffer_protected(bh)) dispose = BUF_PROTECTED; if (dispose != bh-b_list) { @@ -1478,6 +1527,53 @@ * "Dirty" is valid only with the last case (mapped+uptodate). */ +static int block_write_anon_page(struct page *page) +{ + struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE]; + int i, nr = 0 ; + int partial = 0 ; + int ret = 0 ; + + if (!PageLocked(page)) + BUG(); + + if (!page-buffers) + BUG() ; + + head = page-buffers; + bh = head; + +
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Sat, 23 Dec 2000, Chris Mason wrote: I've updated to test13-pre4, and removed the hunk for submit_bh. Looks as though pre4 changed the submit_bh callers to clear the dirty bit, so my code does the same. Basically, I wanted to think of "submit_bh()" as a pure IO thing. When we call submit_bh(), that is basically the same as "statr IO on this thing". Which implies to me that submit_bh() doesn't care, or know, about why the higher layers did this. Which is why I prefer the higher layers handling the dirty/uptodate/xxx bits. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Saturday, December 23, 2000 11:02:53 -0800 Linus Torvalds [EMAIL PROTECTED] wrote: Which is why I prefer the higher layers handling the dirty/uptodate/xxx bits. Grin, I should have taken the hint when we talked about the buffer up to date checks for block_read_full_page, it made sense then too. brw_page doesn't clear the dirty bit in pre4, was that on purpose? -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 22 Dec 2000, Chris Mason wrote: > It is enough to leave buffer heads we don't flush on the dirty list (and > redirty the page), they'll get written by a future loop through > flush_dirty_pages, or by page_launder. We could use ll_rw_block instead, > even though anon pages do have a writepage with this patch (just check if > page->mapping == _space_mapping). If we use ll_rw_block directly on buffers of anonymous pages (page->mapping == _space_mapping) instead using dirty_list_writepage() (which will end up calling block_write_anon_page) we can fix the buffer flushtime issue. > There are lots of things we could try here, including some logic inside > block_write_anon_page to check the current memory pressure/dirty levels to > see how much work should be done. At writepage() context we do not know if we should care about the flushtime. > I'll play with this a bit, but more ideas would be appreciated. > > BTW, recent change to my local code was to remove the ll_rw_block call from > sync_page_buffers. It seemed cleaner than making try_to_free_buffers > understand that sometimes writepage will be called, and sometimes the page > will end up unlocked because of itcomments? Could you please send me your latest patch ? Thanks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, December 22, 2000 17:52:28 -0200 Marcelo Tosatti <[EMAIL PROTECTED]> wrote: > There is one more nasty issue to deal with. > > You only want to take into account the buffer flushtime if > "check_flushtime" parameter is passed as true to flush_dirty_buffers > (which is done by kupdate). > > Thinking a bit more about the issue, I dont see any reason why we want to > write all buffers of an anonymous page at > sync_buffers/flush_dirty_buffers. > > I think we could simply write the buffer with ll_rw_block() if the page > which this buffer is sitting on does not have mapping->a_ops->writepage > operation defined. > It is enough to leave buffer heads we don't flush on the dirty list (and redirty the page), they'll get written by a future loop through flush_dirty_pages, or by page_launder. We could use ll_rw_block instead, even though anon pages do have a writepage with this patch (just check if page->mapping == _space_mapping). There are lots of things we could try here, including some logic inside block_write_anon_page to check the current memory pressure/dirty levels to see how much work should be done. I'll play with this a bit, but more ideas would be appreciated. BTW, recent change to my local code was to remove the ll_rw_block call from sync_page_buffers. It seemed cleaner than making try_to_free_buffers understand that sometimes writepage will be called, and sometimes the page will end up unlocked because of itcomments? -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, December 22, 2000 17:45:57 +0100 Daniel Phillips <[EMAIL PROTECTED]> wrote: [ flushing a page at a time in bdflush ] > Um. Why cater to the uncommon case of 1K blocks? Just let > bdflush/kupdated deal with them in the normal way - it's pretty > efficient. Only try to do the clustering optimization when buffer size > matches memory page size. > This isn't really an attempt at a clustering optimization. The problem at hand is that buffer cache buffers can be on relatively random pages. So, a page might have buffers that are very far apart, where one needs flushing and the other doesn't. In the blocksize == page size case, this won't happen, and we don't lose any speed over the existing code. In the blocksize < pagesize case, my new code is slower, so my goal is to fix just that problem. Real write clustering would be a different issue entirely, and is worth doing ;-) -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: > > On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti ><[EMAIL PROTECTED]> wrote: > > > > On Thu, 21 Dec 2000, Andreas Dilger wrote: > > > >> Marcelo Tosatti writes: > >> > It seems your code has a problem with bh flush time. > >> > > >> > In flush_dirty_buffers(), a buffer may (if being called from kupdate) only > >> > be written in case its old enough. (bh->b_flushtime) > >> > > >> > If the flush happens for an anonymous buffer, you'll end up writing all > >> > buffers which are sitting on the same page (with block_write_anon_page), > >> > but these other buffers are not necessarily old enough to be flushed. > >> > >> This isn't really a "problem" however. The page is the _maximum_ age of > >> the buffer before it needs to be written. If we can efficiently write it > >> out with another buffer > > > >> (essentially for free if they are on the same spot on disk) > > > > Are you sure this is true for buffer pages in most cases? > > It's a good point. block_write_anon_page could be changed to just > write the oldest buffer and redirty the page (if the buffers are > far apart). If memory is tight, and we *really* need the page back, > it will be flushed by try_to_free_buffers. > > It seems a bit nasty to me though...writepage should write the page. Um. Why cater to the uncommon case of 1K blocks? Just let bdflush/kupdated deal with them in the normal way - it's pretty efficient. Only try to do the clustering optimization when buffer size matches memory page size. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti <[EMAIL PROTECTED]> wrote: >> Marcelo Tosatti writes: >> > It seems your code has a problem with bh flush time. >> > >> > In flush_dirty_buffers(), a buffer may (if being called from kupdate) >> > only be written in case its old enough. (bh->b_flushtime) >> > >> > If the flush happens for an anonymous buffer, you'll end up writing all >> > buffers which are sitting on the same page (with >> > block_write_anon_page), but these other buffers are not necessarily >> > old enough to be flushed. >> A quick benchmark shows there's room for improvement here. I'll play around with a version of block_write_anon_page that tries to be more selective when flushing things out. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti <[EMAIL PROTECTED]> wrote: > > On Thu, 21 Dec 2000, Andreas Dilger wrote: > >> Marcelo Tosatti writes: >> > It seems your code has a problem with bh flush time. >> > >> > In flush_dirty_buffers(), a buffer may (if being called from kupdate) only >> > be written in case its old enough. (bh->b_flushtime) >> > >> > If the flush happens for an anonymous buffer, you'll end up writing all >> > buffers which are sitting on the same page (with block_write_anon_page), >> > but these other buffers are not necessarily old enough to be flushed. >> >> This isn't really a "problem" however. The page is the _maximum_ age of >> the buffer before it needs to be written. If we can efficiently write it >> out with another buffer > > >> (essentially for free if they are on the same spot on disk) > > Are you sure this is true for buffer pages in most cases? > > It's a good point. block_write_anon_page could be changed to just write the oldest buffer and redirty the page (if the buffers are far apart). If memory is tight, and we *really* need the page back, it will be flushed by try_to_free_buffers. It seems a bit nasty to me though...writepage should write the page. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 21, 2000 20:54:09 -0500 Alexander Viro <[EMAIL PROTECTED]> wrote: > > > On Thu, 21 Dec 2000, Chris Mason wrote: > >> Obvious bug, block_write_full_page zeros out the bits past the end of >> file every time. This should not be needed for normal file writes. > > Unfortunately, it _is_ needed for pageout path. mmap() the last page > of file. Dirty the data past the EOF (MMU will not catch that access). > Let the pageout send the page to disk. You don't want to have the data > past EOF end up on the backstore. > Sorry, I wasn't very clear. I'd like to find a way to just do the memset in the one case it is needed (a real writepage), and not do it when we are using writepage just to flush dirty buffers. I don't see how to do it though, without making a new flush operation. That would also solve my concerns about the change in writepage semantics (even though every FS I could find in the kernel tree was using block_write_full_page). thanks, Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 21, 2000 20:54:09 -0500 Alexander Viro [EMAIL PROTECTED] wrote: On Thu, 21 Dec 2000, Chris Mason wrote: Obvious bug, block_write_full_page zeros out the bits past the end of file every time. This should not be needed for normal file writes. Unfortunately, it _is_ needed for pageout path. mmap() the last page of file. Dirty the data past the EOF (MMU will not catch that access). Let the pageout send the page to disk. You don't want to have the data past EOF end up on the backstore. Sorry, I wasn't very clear. I'd like to find a way to just do the memset in the one case it is needed (a real writepage), and not do it when we are using writepage just to flush dirty buffers. I don't see how to do it though, without making a new flush operation. That would also solve my concerns about the change in writepage semantics (even though every FS I could find in the kernel tree was using block_write_full_page). thanks, Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti [EMAIL PROTECTED] wrote: On Thu, 21 Dec 2000, Andreas Dilger wrote: Marcelo Tosatti writes: It seems your code has a problem with bh flush time. In flush_dirty_buffers(), a buffer may (if being called from kupdate) only be written in case its old enough. (bh-b_flushtime) If the flush happens for an anonymous buffer, you'll end up writing all buffers which are sitting on the same page (with block_write_anon_page), but these other buffers are not necessarily old enough to be flushed. This isn't really a "problem" however. The page is the _maximum_ age of the buffer before it needs to be written. If we can efficiently write it out with another buffer (essentially for free if they are on the same spot on disk) Are you sure this is true for buffer pages in most cases? It's a good point. block_write_anon_page could be changed to just write the oldest buffer and redirty the page (if the buffers are far apart). If memory is tight, and we *really* need the page back, it will be flushed by try_to_free_buffers. It seems a bit nasty to me though...writepage should write the page. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti [EMAIL PROTECTED] wrote: Marcelo Tosatti writes: It seems your code has a problem with bh flush time. In flush_dirty_buffers(), a buffer may (if being called from kupdate) only be written in case its old enough. (bh-b_flushtime) If the flush happens for an anonymous buffer, you'll end up writing all buffers which are sitting on the same page (with block_write_anon_page), but these other buffers are not necessarily old enough to be flushed. A quick benchmark shows there's room for improvement here. I'll play around with a version of block_write_anon_page that tries to be more selective when flushing things out. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Chris Mason wrote: On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti [EMAIL PROTECTED] wrote: On Thu, 21 Dec 2000, Andreas Dilger wrote: Marcelo Tosatti writes: It seems your code has a problem with bh flush time. In flush_dirty_buffers(), a buffer may (if being called from kupdate) only be written in case its old enough. (bh-b_flushtime) If the flush happens for an anonymous buffer, you'll end up writing all buffers which are sitting on the same page (with block_write_anon_page), but these other buffers are not necessarily old enough to be flushed. This isn't really a "problem" however. The page is the _maximum_ age of the buffer before it needs to be written. If we can efficiently write it out with another buffer (essentially for free if they are on the same spot on disk) Are you sure this is true for buffer pages in most cases? It's a good point. block_write_anon_page could be changed to just write the oldest buffer and redirty the page (if the buffers are far apart). If memory is tight, and we *really* need the page back, it will be flushed by try_to_free_buffers. It seems a bit nasty to me though...writepage should write the page. Um. Why cater to the uncommon case of 1K blocks? Just let bdflush/kupdated deal with them in the normal way - it's pretty efficient. Only try to do the clustering optimization when buffer size matches memory page size. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, December 22, 2000 17:45:57 +0100 Daniel Phillips [EMAIL PROTECTED] wrote: [ flushing a page at a time in bdflush ] Um. Why cater to the uncommon case of 1K blocks? Just let bdflush/kupdated deal with them in the normal way - it's pretty efficient. Only try to do the clustering optimization when buffer size matches memory page size. This isn't really an attempt at a clustering optimization. The problem at hand is that buffer cache buffers can be on relatively random pages. So, a page might have buffers that are very far apart, where one needs flushing and the other doesn't. In the blocksize == page size case, this won't happen, and we don't lose any speed over the existing code. In the blocksize pagesize case, my new code is slower, so my goal is to fix just that problem. Real write clustering would be a different issue entirely, and is worth doing ;-) -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Friday, December 22, 2000 17:52:28 -0200 Marcelo Tosatti [EMAIL PROTECTED] wrote: There is one more nasty issue to deal with. You only want to take into account the buffer flushtime if "check_flushtime" parameter is passed as true to flush_dirty_buffers (which is done by kupdate). Thinking a bit more about the issue, I dont see any reason why we want to write all buffers of an anonymous page at sync_buffers/flush_dirty_buffers. I think we could simply write the buffer with ll_rw_block() if the page which this buffer is sitting on does not have mapping-a_ops-writepage operation defined. It is enough to leave buffer heads we don't flush on the dirty list (and redirty the page), they'll get written by a future loop through flush_dirty_pages, or by page_launder. We could use ll_rw_block instead, even though anon pages do have a writepage with this patch (just check if page-mapping == anon_space_mapping). There are lots of things we could try here, including some logic inside block_write_anon_page to check the current memory pressure/dirty levels to see how much work should be done. I'll play with this a bit, but more ideas would be appreciated. BTW, recent change to my local code was to remove the ll_rw_block call from sync_page_buffers. It seemed cleaner than making try_to_free_buffers understand that sometimes writepage will be called, and sometimes the page will end up unlocked because of itcomments? -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Fri, 22 Dec 2000, Chris Mason wrote: It is enough to leave buffer heads we don't flush on the dirty list (and redirty the page), they'll get written by a future loop through flush_dirty_pages, or by page_launder. We could use ll_rw_block instead, even though anon pages do have a writepage with this patch (just check if page-mapping == anon_space_mapping). If we use ll_rw_block directly on buffers of anonymous pages (page-mapping == anon_space_mapping) instead using dirty_list_writepage() (which will end up calling block_write_anon_page) we can fix the buffer flushtime issue. There are lots of things we could try here, including some logic inside block_write_anon_page to check the current memory pressure/dirty levels to see how much work should be done. At writepage() context we do not know if we should care about the flushtime. I'll play with this a bit, but more ideas would be appreciated. BTW, recent change to my local code was to remove the ll_rw_block call from sync_page_buffers. It seemed cleaner than making try_to_free_buffers understand that sometimes writepage will be called, and sometimes the page will end up unlocked because of itcomments? Could you please send me your latest patch ? Thanks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thu, 21 Dec 2000, Andreas Dilger wrote: > Marcelo Tosatti writes: > > It seems your code has a problem with bh flush time. > > > > In flush_dirty_buffers(), a buffer may (if being called from kupdate) only > > be written in case its old enough. (bh->b_flushtime) > > > > If the flush happens for an anonymous buffer, you'll end up writing all > > buffers which are sitting on the same page (with block_write_anon_page), > > but these other buffers are not necessarily old enough to be flushed. > > This isn't really a "problem" however. The page is the _maximum_ age of > the buffer before it needs to be written. If we can efficiently write it > out with another buffer > (essentially for free if they are on the same spot on disk) Are you sure this is true for buffer pages in most cases? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Marcelo Tosatti writes: > It seems your code has a problem with bh flush time. > > In flush_dirty_buffers(), a buffer may (if being called from kupdate) only > be written in case its old enough. (bh->b_flushtime) > > If the flush happens for an anonymous buffer, you'll end up writing all > buffers which are sitting on the same page (with block_write_anon_page), > but these other buffers are not necessarily old enough to be flushed. This isn't really a "problem" however. The page is the _maximum_ age of the buffer before it needs to be written. If we can efficiently write it out with another buffer (essentially for free if they are on the same spot on disk), then there is less work for us to do later. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thu, 21 Dec 2000, Chris Mason wrote: > Ok guys, I think I've taken Linus' suggestion to have buffer.c use its > own writepage a bit too far. This patch marks pages dirty when the > buffer head is marked dirty, and changes flush_dirty_buffers and > sync_buffers to use writepage instead of ll_rw_block. The idea is > to allow filesystems to use the buffer lists and provide their own > i/o mechanism. > > The result is a serious semantics change for writepage, which now is > expected to do partial page writes when the page isn't up to date and > there are dirty buffers inside. For all the obvious reasons, this isn't > fit for 2.4.0, and if you all feel it is a 2.5. thing I'll send along > the shorter patch Linus originally suggested. But, I think it would > be pretty useful for the new filesystems (once I also fix > fsync_inode_buffers and sync_page_buffers). It is very powerful. With this on place, the filesystem is able to do write clustering at its writepage() function by checking if the on-disk physically nearby pages are dirty. > Other changes: submit_bh now cleans the buffers. I don't see how > they were getting cleaned before, it must have been try_to_free_buffers > sending the page through sync_page_buffers, meaning they were probably > getting written twice. Unless someone throws a clue my way, I'll send > this out as a separate diff. Ouch. > page_launder doesn't fiddle with the buffermem_pages counter, it is done > in try_to_free_buffers instead. > > Obvious bug, block_write_full_page zeros out the bits past the end of > file every time. This should not be needed for normal file writes. > > Most testing was on ext2, who actually calls mark_buffer_dirty, and > supports blocksizes < intel page size. More tests are running > overnight. It seems your code has a problem with bh flush time. In flush_dirty_buffers(), a buffer may (if being called from kupdate) only be written in case its old enough. (bh->b_flushtime) If the flush happens for an anonymous buffer, you'll end up writing all buffers which are sitting on the same page (with block_write_anon_page), but these other buffers are not necessarily old enough to be flushed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thu, 21 Dec 2000, Chris Mason wrote: > Obvious bug, block_write_full_page zeros out the bits past the end of > file every time. This should not be needed for normal file writes. Unfortunately, it _is_ needed for pageout path. mmap() the last page of file. Dirty the data past the EOF (MMU will not catch that access). Let the pageout send the page to disk. You don't want to have the data past EOF end up on the backstore. Al, slowly getting back to life and sanity. Fsck the flu... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[RFC] changes to buffer.c (was Test12 ll_rw_block error)
Ok guys, I think I've taken Linus' suggestion to have buffer.c use its own writepage a bit too far. This patch marks pages dirty when the buffer head is marked dirty, and changes flush_dirty_buffers and sync_buffers to use writepage instead of ll_rw_block. The idea is to allow filesystems to use the buffer lists and provide their own i/o mechanism. The result is a serious semantics change for writepage, which now is expected to do partial page writes when the page isn't up to date and there are dirty buffers inside. For all the obvious reasons, this isn't fit for 2.4.0, and if you all feel it is a 2.5. thing I'll send along the shorter patch Linus originally suggested. But, I think it would be pretty useful for the new filesystems (once I also fix fsync_inode_buffers and sync_page_buffers). Other changes: submit_bh now cleans the buffers. I don't see how they were getting cleaned before, it must have been try_to_free_buffers sending the page through sync_page_buffers, meaning they were probably getting written twice. Unless someone throws a clue my way, I'll send this out as a separate diff. page_launder doesn't fiddle with the buffermem_pages counter, it is done in try_to_free_buffers instead. Obvious bug, block_write_full_page zeros out the bits past the end of file every time. This should not be needed for normal file writes. Most testing was on ext2, who actually calls mark_buffer_dirty, and supports blocksizes < intel page size. More tests are running overnight. -chris diff -urN linux.test13p3p/drivers/block/ll_rw_blk.c linux-test12/drivers/block/ll_rw_blk.c --- linux.test13p3p/drivers/block/ll_rw_blk.c Tue Dec 19 05:07:50 2000 +++ linux.test13p3/drivers/block/ll_rw_blk.cThu Dec 21 16:56:43 2000 @@ -954,6 +954,7 @@ if (!test_bit(BH_Lock, >b_state)) BUG(); + mark_buffer_clean(bh) ; set_bit(BH_Req, >b_state); /* diff -urN linux.test13p3p/fs/buffer.c linux-test12/fs/buffer.c --- linux.test13p3p/fs/buffer.c Thu Dec 21 17:24:17 2000 +++ linux.test13p3/fs/buffer.c Thu Dec 21 17:28:46 2000 @@ -97,6 +97,16 @@ static int grow_buffers(int size); static void __refile_buffer(struct buffer_head *); +static int block_write_anon_page(struct page *); + +static struct address_space_operations anon_space_ops = { + writepage: block_write_anon_page, + sync_page: block_sync_page, +} ; +static struct address_space anon_space_mapping = { + pages: { _space_mapping.pages, _space_mapping.pages }, + a_ops: _space_ops, +} ; /* This is used by some architectures to estimate available memory. */ atomic_t buffermem_pages = ATOMIC_INIT(0); @@ -161,6 +171,37 @@ atomic_dec(>b_count); } +/* +** util function for sync_buffers and flush_dirty_buffers +** uses either the writepage func supplied in the page's mapping, +** or the anon address space writepage +*/ +static int dirty_list_writepage(struct page *page) { + int (*writepage)(struct page *) ; + int ret ; + + writepage = page->mapping->a_ops->writepage ; + + if (!writepage) { + writepage = anon_space_ops.writepage ; + } + + page_cache_get(page) ; + ClearPageDirty(page) ; + ret = writepage(page) ; + /* note, when writepage returns 1, we mark the page dirty again + ** but the writepage func is responsible for making sure any + ** buffers that need to stay dirty really do stay dirty + ** ick. + */ + if (ret == 1) { + SetPageDirty(page) ; + UnlockPage(page) ; + } + page_cache_release(page) ; + return ret ; +} + /* Call sync_buffers with wait!=0 to ensure that the call does not * return until all buffer writes have completed. Sync() may return * before the writes have finished; fsync() may not. @@ -175,6 +216,7 @@ { int i, retry, pass = 0, err = 0; struct buffer_head * bh, *next; + struct page *page ; /* One pass for no-wait, three for wait: * 0) write out all dirty, unlocked buffers; @@ -230,10 +272,19 @@ if (!buffer_dirty(bh) || pass >= 2) continue; - atomic_inc(>b_count); + page = bh->b_page ; + if (TryLockPage(page)) { + if (!wait || !pass) { + retry = 1 ; + continue ; + } + spin_unlock(_list_lock); + wait_on_page(page) ; + goto repeat ; + } spin_unlock(_list_lock); - ll_rw_block(WRITE, 1, ); - atomic_dec(>b_count); + + dirty_list_writepage(page) ; retry = 1;
[RFC] changes to buffer.c (was Test12 ll_rw_block error)
Ok guys, I think I've taken Linus' suggestion to have buffer.c use its own writepage a bit too far. This patch marks pages dirty when the buffer head is marked dirty, and changes flush_dirty_buffers and sync_buffers to use writepage instead of ll_rw_block. The idea is to allow filesystems to use the buffer lists and provide their own i/o mechanism. The result is a serious semantics change for writepage, which now is expected to do partial page writes when the page isn't up to date and there are dirty buffers inside. For all the obvious reasons, this isn't fit for 2.4.0, and if you all feel it is a 2.5. thing I'll send along the shorter patch Linus originally suggested. But, I think it would be pretty useful for the new filesystems (once I also fix fsync_inode_buffers and sync_page_buffers). Other changes: submit_bh now cleans the buffers. I don't see how they were getting cleaned before, it must have been try_to_free_buffers sending the page through sync_page_buffers, meaning they were probably getting written twice. Unless someone throws a clue my way, I'll send this out as a separate diff. page_launder doesn't fiddle with the buffermem_pages counter, it is done in try_to_free_buffers instead. Obvious bug, block_write_full_page zeros out the bits past the end of file every time. This should not be needed for normal file writes. Most testing was on ext2, who actually calls mark_buffer_dirty, and supports blocksizes intel page size. More tests are running overnight. -chris diff -urN linux.test13p3p/drivers/block/ll_rw_blk.c linux-test12/drivers/block/ll_rw_blk.c --- linux.test13p3p/drivers/block/ll_rw_blk.c Tue Dec 19 05:07:50 2000 +++ linux.test13p3/drivers/block/ll_rw_blk.cThu Dec 21 16:56:43 2000 @@ -954,6 +954,7 @@ if (!test_bit(BH_Lock, bh-b_state)) BUG(); + mark_buffer_clean(bh) ; set_bit(BH_Req, bh-b_state); /* diff -urN linux.test13p3p/fs/buffer.c linux-test12/fs/buffer.c --- linux.test13p3p/fs/buffer.c Thu Dec 21 17:24:17 2000 +++ linux.test13p3/fs/buffer.c Thu Dec 21 17:28:46 2000 @@ -97,6 +97,16 @@ static int grow_buffers(int size); static void __refile_buffer(struct buffer_head *); +static int block_write_anon_page(struct page *); + +static struct address_space_operations anon_space_ops = { + writepage: block_write_anon_page, + sync_page: block_sync_page, +} ; +static struct address_space anon_space_mapping = { + pages: { anon_space_mapping.pages, anon_space_mapping.pages }, + a_ops: anon_space_ops, +} ; /* This is used by some architectures to estimate available memory. */ atomic_t buffermem_pages = ATOMIC_INIT(0); @@ -161,6 +171,37 @@ atomic_dec(bh-b_count); } +/* +** util function for sync_buffers and flush_dirty_buffers +** uses either the writepage func supplied in the page's mapping, +** or the anon address space writepage +*/ +static int dirty_list_writepage(struct page *page) { + int (*writepage)(struct page *) ; + int ret ; + + writepage = page-mapping-a_ops-writepage ; + + if (!writepage) { + writepage = anon_space_ops.writepage ; + } + + page_cache_get(page) ; + ClearPageDirty(page) ; + ret = writepage(page) ; + /* note, when writepage returns 1, we mark the page dirty again + ** but the writepage func is responsible for making sure any + ** buffers that need to stay dirty really do stay dirty + ** ick. + */ + if (ret == 1) { + SetPageDirty(page) ; + UnlockPage(page) ; + } + page_cache_release(page) ; + return ret ; +} + /* Call sync_buffers with wait!=0 to ensure that the call does not * return until all buffer writes have completed. Sync() may return * before the writes have finished; fsync() may not. @@ -175,6 +216,7 @@ { int i, retry, pass = 0, err = 0; struct buffer_head * bh, *next; + struct page *page ; /* One pass for no-wait, three for wait: * 0) write out all dirty, unlocked buffers; @@ -230,10 +272,19 @@ if (!buffer_dirty(bh) || pass = 2) continue; - atomic_inc(bh-b_count); + page = bh-b_page ; + if (TryLockPage(page)) { + if (!wait || !pass) { + retry = 1 ; + continue ; + } + spin_unlock(lru_list_lock); + wait_on_page(page) ; + goto repeat ; + } spin_unlock(lru_list_lock); - ll_rw_block(WRITE, 1, bh); - atomic_dec(bh-b_count); + + dirty_list_writepage(page) ; retry =
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thu, 21 Dec 2000, Chris Mason wrote: Obvious bug, block_write_full_page zeros out the bits past the end of file every time. This should not be needed for normal file writes. Unfortunately, it _is_ needed for pageout path. mmap() the last page of file. Dirty the data past the EOF (MMU will not catch that access). Let the pageout send the page to disk. You don't want to have the data past EOF end up on the backstore. Al, slowly getting back to life and sanity. Fsck the flu... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thu, 21 Dec 2000, Chris Mason wrote: Ok guys, I think I've taken Linus' suggestion to have buffer.c use its own writepage a bit too far. This patch marks pages dirty when the buffer head is marked dirty, and changes flush_dirty_buffers and sync_buffers to use writepage instead of ll_rw_block. The idea is to allow filesystems to use the buffer lists and provide their own i/o mechanism. The result is a serious semantics change for writepage, which now is expected to do partial page writes when the page isn't up to date and there are dirty buffers inside. For all the obvious reasons, this isn't fit for 2.4.0, and if you all feel it is a 2.5. thing I'll send along the shorter patch Linus originally suggested. But, I think it would be pretty useful for the new filesystems (once I also fix fsync_inode_buffers and sync_page_buffers). It is very powerful. With this on place, the filesystem is able to do write clustering at its writepage() function by checking if the on-disk physically nearby pages are dirty. Other changes: submit_bh now cleans the buffers. I don't see how they were getting cleaned before, it must have been try_to_free_buffers sending the page through sync_page_buffers, meaning they were probably getting written twice. Unless someone throws a clue my way, I'll send this out as a separate diff. Ouch. page_launder doesn't fiddle with the buffermem_pages counter, it is done in try_to_free_buffers instead. Obvious bug, block_write_full_page zeros out the bits past the end of file every time. This should not be needed for normal file writes. Most testing was on ext2, who actually calls mark_buffer_dirty, and supports blocksizes intel page size. More tests are running overnight. It seems your code has a problem with bh flush time. In flush_dirty_buffers(), a buffer may (if being called from kupdate) only be written in case its old enough. (bh-b_flushtime) If the flush happens for an anonymous buffer, you'll end up writing all buffers which are sitting on the same page (with block_write_anon_page), but these other buffers are not necessarily old enough to be flushed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
Marcelo Tosatti writes: It seems your code has a problem with bh flush time. In flush_dirty_buffers(), a buffer may (if being called from kupdate) only be written in case its old enough. (bh-b_flushtime) If the flush happens for an anonymous buffer, you'll end up writing all buffers which are sitting on the same page (with block_write_anon_page), but these other buffers are not necessarily old enough to be flushed. This isn't really a "problem" however. The page is the _maximum_ age of the buffer before it needs to be written. If we can efficiently write it out with another buffer (essentially for free if they are on the same spot on disk), then there is less work for us to do later. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)
On Thu, 21 Dec 2000, Andreas Dilger wrote: Marcelo Tosatti writes: It seems your code has a problem with bh flush time. In flush_dirty_buffers(), a buffer may (if being called from kupdate) only be written in case its old enough. (bh-b_flushtime) If the flush happens for an anonymous buffer, you'll end up writing all buffers which are sitting on the same page (with block_write_anon_page), but these other buffers are not necessarily old enough to be flushed. This isn't really a "problem" however. The page is the _maximum_ age of the buffer before it needs to be written. If we can efficiently write it out with another buffer (essentially for free if they are on the same spot on disk) Are you sure this is true for buffer pages in most cases? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/