Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Chris Mason



On Friday, January 05, 2001 04:32:50 PM -0200 Marcelo Tosatti
<[EMAIL PROTECTED]> wrote:
 
>> > I think we want to remove flush_dirty_buffers() from bdflush. 
>> > 
>> 
>> Whoops.  If bdflush doesn't balance the dirty list, who does?
> 
> Who marks buffers dirty. 
> 
> Linus changed mark_buffer_dirty() to use flush_dirty_buffers() in case
> there are too many dirty buffers.
> 

Yes, but mark_buffer_dirty only ends up calling flush_dirty_buffers when
balance_dirty_state returns 1.  This means the only people balancing are
the procs (not some async daemon), and the writing only starts when we are
over the hard dirty limit.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Rik van Riel

On Fri, 5 Jan 2001, Marcelo Tosatti wrote:
> On Fri, 5 Jan 2001, Rik van Riel wrote:
> 
> > Also, you do not want the writer to block on writing out buffers
> > if bdflush could write them out asynchronously while the dirty
> > buffer producer can work on in the background.
> 
> flush_dirty_buffers() do not wait on the buffers written to get clean.

This sounds more like a bug we should fix then a reason
not to use flush_dirty_buffers()

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to loose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Marcelo Tosatti



On Fri, 5 Jan 2001, Rik van Riel wrote:

> Also, you do not want the writer to block on writing out buffers
> if bdflush could write them out asynchronously while the dirty
> buffer producer can work on in the background.

flush_dirty_buffers() do not wait on the buffers written to get clean.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Rik van Riel

On Fri, 5 Jan 2001, Marcelo Tosatti wrote:
> On Fri, 5 Jan 2001, Chris Mason wrote:
> > On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti
> > <[EMAIL PROTECTED]> wrote:
> > > On Fri, 5 Jan 2001, Chris Mason wrote:
> > > 
> > >> 
> > >> Here's the latest version of the patch, against 2.4.0.  The
> > >> biggest open issues are what to do with bdflush, since
> > >> page_launder could do everything bdflush does.  
> > > 
> > > I think we want to remove flush_dirty_buffers() from bdflush. 
> > > 
> > 
> > Whoops.  If bdflush doesn't balance the dirty list, who does?
> 
> Who marks buffers dirty. 
> 
> Linus changed mark_buffer_dirty() to use flush_dirty_buffers() in case
> there are too many dirty buffers.
> 
> Also, I think in practice page_launder will help on balancing. 

Chris is right. It is possible (not very likely, but possible
notheless) that the majority of the dirty pages are _active_
pages ... in that case page_launder() won't help one bit and
only flush_dirty_buffers() can do the job.

Also, you do not want the writer to block on writing out buffers
if bdflush could write them out asynchronously while the dirty
buffer producer can work on in the background.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to loose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Marcelo Tosatti


On Fri, 5 Jan 2001, Chris Mason wrote:

> 
> 
> On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti
> <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Fri, 5 Jan 2001, Chris Mason wrote:
> > 
> >> 
> >> Here's the latest version of the patch, against 2.4.0.  The
> >> biggest open issues are what to do with bdflush, since
> >> page_launder could do everything bdflush does.  
> > 
> > I think we want to remove flush_dirty_buffers() from bdflush. 
> > 
> 
> Whoops.  If bdflush doesn't balance the dirty list, who does?

Who marks buffers dirty. 

Linus changed mark_buffer_dirty() to use flush_dirty_buffers() in case
there are too many dirty buffers.

Also, I think in practice page_launder will help on balancing. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Chris Mason



On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti
<[EMAIL PROTECTED]> wrote:

> 
> On Fri, 5 Jan 2001, Chris Mason wrote:
> 
>> 
>> Here's the latest version of the patch, against 2.4.0.  The
>> biggest open issues are what to do with bdflush, since
>> page_launder could do everything bdflush does.  
> 
> I think we want to remove flush_dirty_buffers() from bdflush. 
> 

Whoops.  If bdflush doesn't balance the dirty list, who does?

But, we can do this in bdflush:

if (free_shortage())
flushed = page_launder(GFP_KERNEL, 0) ;
else
flushed = flush_dirty_buffers(0) ;

The idea is that page_launder knows best which pages to free when memory is
low, and flush_dirty_buffers knows best which pages to write when things
are unbalanced.  Not the cleanest idea ever, since I'd prefer the first
pages written when we need balancing are also the ones most likely to be
freed by page_launder.

In other words, I'd like to make a flush_inactive_dirty() inside
mm/vmscan.c, and have bdflush call that before working on the buffer cache
dirty list.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Juergen Schneider

Hi Chris,

I don't know if this is already known,
but this patch seems to solve the loop block device problem too.
I've tested it several times and did not get any kernel lockups after
applying the patch.
Until now this was a showstopper for me but you gave the solution.
Thank you very much.

Juergen Schneider
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Daniel Phillips

Marcelo Tosatti wrote:
> 
> On Fri, 5 Jan 2001, Chris Mason wrote:
> 
> >
> > Here's the latest version of the patch, against 2.4.0.  The
> > biggest open issues are what to do with bdflush, since
> > page_launder could do everything bdflush does.
> 
> I think we want to remove flush_dirty_buffers() from bdflush.
> 
> While we are trying to be smart and do write clustering at the ->writepage
> operation, flush_dirty_buffers() is "dumb" and will interfere with the
> write clustering.

Actually, I found it doesn't interfere that much.  Remember, there's
still an elevator in there.  Even if bdflush and clustered page flushing
are running at the same time the elevator makes the final decision what
gets written when.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Chris Mason



On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti
<[EMAIL PROTECTED]> wrote:

> 
> On Fri, 5 Jan 2001, Chris Mason wrote:
> 
>> 
>> Here's the latest version of the patch, against 2.4.0.  The
>> biggest open issues are what to do with bdflush, since
>> page_launder could do everything bdflush does.  
> 
> I think we want to remove flush_dirty_buffers() from bdflush. 
> 

I think you're right.  Now that bdflush calls page_launder with GFP_KERNEL,
the flush_dirty_buffers call isn't needed there.  I think the current
bdflush (with or without the flush_dirty_buffers call) will be more
aggressive at freeing buffer cache pages from the inactive_dirty list, and
it will be interesting to see how it performs.  I think it will be better,
but the blocksize < pagesize case might screw us up.

> While we are trying to be smart and do write clustering at the ->writepage
> operation, flush_dirty_buffers() is "dumb" and will interfere with the
> write clustering. 
> 
Only for the buffer cache pages.  For actual file data, flush_dirty_buffers
is calling the writepage func, and we should still be able to cluster it.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Marcelo Tosatti


On Fri, 5 Jan 2001, Chris Mason wrote:

> 
> Here's the latest version of the patch, against 2.4.0.  The
> biggest open issues are what to do with bdflush, since
> page_launder could do everything bdflush does.  

I think we want to remove flush_dirty_buffers() from bdflush. 

While we are trying to be smart and do write clustering at the ->writepage
operation, flush_dirty_buffers() is "dumb" and will interfere with the
write clustering. 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Chris Mason


Here's the latest version of the patch, against 2.4.0.  The
biggest open issues are what to do with bdflush, since
page_launder could do everything bdflush does.  And, how to
deal with writepage funcs that return 1 for fsync_inode_buffers.

-chris

diff -urN linux.2.4.0/fs/buffer.c linux/fs/buffer.c
--- linux.2.4.0/fs/buffer.c Thu Jan  4 23:12:28 2001
+++ linux/fs/buffer.c   Thu Jan  4 18:58:09 2001
@@ -97,6 +97,20 @@
 
 static int grow_buffers(int size);
 static void __refile_buffer(struct buffer_head *);
+static int block_write_anon_page(struct page *);
+static void end_buffer_io_async(struct buffer_head * bh, int uptodate) ;
+
+static struct address_space_operations anon_space_ops = {
+   writepage: block_write_anon_page,
+   sync_page: block_sync_page,
+} ;
+static struct address_space anon_space_mapping = {
+   LIST_HEAD_INIT(anon_space_mapping.clean_pages),
+   LIST_HEAD_INIT(anon_space_mapping.dirty_pages),
+   LIST_HEAD_INIT(anon_space_mapping.locked_pages),
+   0, /* nr_pages */
+   _space_ops,
+} ;
 
 /* This is used by some architectures to estimate available memory. */
 atomic_t buffermem_pages = ATOMIC_INIT(0);
@@ -161,6 +175,73 @@
atomic_dec(>b_count);
 }
 
+/* just for use with anon pages, or pages that don't provide their own
+** writepage func.  We just want to write bh, not the whole page, so we
+** queue that io here instead of calling writepage.
+*/
+static int __dirty_list_writepage(struct page *page, struct buffer_head *bh) {
+   int other_dirty = 0 ;
+   struct buffer_head *cur ;
+
+   /* check for other dirty buffers on this page.  If there are none,
+   ** clear the page dirty bit
+   */
+   cur = bh->b_this_page ;
+   while(cur != bh) {
+   other_dirty += buffer_dirty(cur) ;  
+   cur = cur->b_this_page ;
+   } 
+   if (other_dirty == 0) {
+   ClearPageDirty(page) ;
+   } 
+
+   /* we want the page available for locking again right away.  
+   ** someone walking the dirty buffer list might find another
+   ** buffer from this page, and we don't want them to skip it in
+   ** favor of a younger buffer.
+   */
+   atomic_inc(>b_count) ;
+   UnlockPage(page) ;
+   ll_rw_block(WRITE, 1, ) ;
+   atomic_dec(>b_count) ;
+   return 0 ;
+}
+
+/*
+** util function for sync_buffers and flush_dirty_buffers
+** uses either the writepage func supplied in the page's mapping,
+** or the anon address space writepage
+*/
+static int dirty_list_writepage(struct page *page, struct buffer_head *bh) {
+   int (*writepage)(struct page *)  ;
+   int ret ;
+
+   /* someone wrote this page while we were locking before */
+   if (!PageDirty(page) && !buffer_dirty(bh)) {
+   UnlockPage(page) ;
+   return 0 ;
+   }
+   writepage = page->mapping->a_ops->writepage ;
+
+   /* For anon pages, and pages that don't have a writepage
+   ** func, just write this one dirty buffer.  __dirty_list_writepage
+   ** does a little more work to make sure the page dirty bit is cleared
+   ** when we are the only dirty buffer on this page
+   */
+   if (!writepage || page->mapping == _space_mapping) {
+   writepage = anon_space_ops.writepage ;
+   return __dirty_list_writepage(page, bh) ;
+   }
+
+   ClearPageDirty(page) ;
+   ret = writepage(page) ;
+   if (ret == 1) {
+   SetPageDirty(page) ;
+   UnlockPage(page) ;
+   }
+   return ret ;
+}
+
 /* Call sync_buffers with wait!=0 to ensure that the call does not
  * return until all buffer writes have completed.  Sync() may return
  * before the writes have finished; fsync() may not.
@@ -175,6 +256,7 @@
 {
int i, retry, pass = 0, err = 0;
struct buffer_head * bh, *next;
+   struct page *page ;
 
/* One pass for no-wait, three for wait:
 * 0) write out all dirty, unlocked buffers;
@@ -230,10 +312,27 @@
if (!buffer_dirty(bh) || pass >= 2)
continue;
 
-   atomic_inc(>b_count);
+   page = bh->b_page ;
+   page_cache_get(page) ;
+   if (TryLockPage(page)) {
+   if (!wait || !pass) {
+   retry = 1 ;
+   continue ;
+   }
+   spin_unlock(_list_lock);
+   wait_on_page(page) ;
+   page_cache_release(page) ;
+   goto repeat ;
+   }
spin_unlock(_list_lock);
-   ll_rw_block(WRITE, 1, );
-   atomic_dec(>b_count);
+
+   /* if the writepage func returns 1, it is 
+   

Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Marcelo Tosatti


On Fri, 5 Jan 2001, Chris Mason wrote:

 
 Here's the latest version of the patch, against 2.4.0.  The
 biggest open issues are what to do with bdflush, since
 page_launder could do everything bdflush does.  

I think we want to remove flush_dirty_buffers() from bdflush. 

While we are trying to be smart and do write clustering at the -writepage
operation, flush_dirty_buffers() is "dumb" and will interfere with the
write clustering. 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Daniel Phillips

Marcelo Tosatti wrote:
 
 On Fri, 5 Jan 2001, Chris Mason wrote:
 
 
  Here's the latest version of the patch, against 2.4.0.  The
  biggest open issues are what to do with bdflush, since
  page_launder could do everything bdflush does.
 
 I think we want to remove flush_dirty_buffers() from bdflush.
 
 While we are trying to be smart and do write clustering at the -writepage
 operation, flush_dirty_buffers() is "dumb" and will interfere with the
 write clustering.

Actually, I found it doesn't interfere that much.  Remember, there's
still an elevator in there.  Even if bdflush and clustered page flushing
are running at the same time the elevator makes the final decision what
gets written when.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Chris Mason



On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti
[EMAIL PROTECTED] wrote:

 
 On Fri, 5 Jan 2001, Chris Mason wrote:
 
 
 Here's the latest version of the patch, against 2.4.0.  The
 biggest open issues are what to do with bdflush, since
 page_launder could do everything bdflush does.  
 
 I think we want to remove flush_dirty_buffers() from bdflush. 
 

I think you're right.  Now that bdflush calls page_launder with GFP_KERNEL,
the flush_dirty_buffers call isn't needed there.  I think the current
bdflush (with or without the flush_dirty_buffers call) will be more
aggressive at freeing buffer cache pages from the inactive_dirty list, and
it will be interesting to see how it performs.  I think it will be better,
but the blocksize  pagesize case might screw us up.

 While we are trying to be smart and do write clustering at the -writepage
 operation, flush_dirty_buffers() is "dumb" and will interfere with the
 write clustering. 
 
Only for the buffer cache pages.  For actual file data, flush_dirty_buffers
is calling the writepage func, and we should still be able to cluster it.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Juergen Schneider

Hi Chris,

I don't know if this is already known,
but this patch seems to solve the loop block device problem too.
I've tested it several times and did not get any kernel lockups after
applying the patch.
Until now this was a showstopper for me but you gave the solution.
Thank you very much.

Juergen Schneider
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Rik van Riel

On Fri, 5 Jan 2001, Marcelo Tosatti wrote:
 On Fri, 5 Jan 2001, Chris Mason wrote:
  On Friday, January 05, 2001 01:43:07 PM -0200 Marcelo Tosatti
  [EMAIL PROTECTED] wrote:
   On Fri, 5 Jan 2001, Chris Mason wrote:
   
   
   Here's the latest version of the patch, against 2.4.0.  The
   biggest open issues are what to do with bdflush, since
   page_launder could do everything bdflush does.  
   
   I think we want to remove flush_dirty_buffers() from bdflush. 
   
  
  Whoops.  If bdflush doesn't balance the dirty list, who does?
 
 Who marks buffers dirty. 
 
 Linus changed mark_buffer_dirty() to use flush_dirty_buffers() in case
 there are too many dirty buffers.
 
 Also, I think in practice page_launder will help on balancing. 

Chris is right. It is possible (not very likely, but possible
notheless) that the majority of the dirty pages are _active_
pages ... in that case page_launder() won't help one bit and
only flush_dirty_buffers() can do the job.

Also, you do not want the writer to block on writing out buffers
if bdflush could write them out asynchronously while the dirty
buffer producer can work on in the background.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to loose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Marcelo Tosatti



On Fri, 5 Jan 2001, Rik van Riel wrote:

 Also, you do not want the writer to block on writing out buffers
 if bdflush could write them out asynchronously while the dirty
 buffer producer can work on in the background.

flush_dirty_buffers() do not wait on the buffers written to get clean.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-05 Thread Rik van Riel

On Fri, 5 Jan 2001, Marcelo Tosatti wrote:
 On Fri, 5 Jan 2001, Rik van Riel wrote:
 
  Also, you do not want the writer to block on writing out buffers
  if bdflush could write them out asynchronously while the dirty
  buffer producer can work on in the background.
 
 flush_dirty_buffers() do not wait on the buffers written to get clean.

This sounds more like a bug we should fix then a reason
not to use flush_dirty_buffers()

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to loose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-02 Thread Chris Mason



On Friday, December 29, 2000 06:58:01 PM +0100 Daniel Phillips
<[EMAIL PROTECTED]> wrote:

> Chris Mason wrote:
>> 
>> BTW, the last anon space mapping patch I sent also works on test13-pre5.
>> The block_truncate_page fix does help my patch, since I have bdflush
>> locking pages ( thanks Marcelo )
> 
> Yes it does, but the little additional patch you posted no longer
> applies.  Your patch is suffering badly under pre5 here, reducing
> throughput from around 10 MB/sec to 5-6 MB/sec, and highly variable.  If
> you're not seeing this you should probably try to get a few others to
> run it.
> 

The additional patch I sent (for skipping writepage calls on the first loop
in page_launder) was applied to test13-pre5, in a slightly better form.
So, you'll either need to benchmark in pre4, or make a reversed version for
pre5.

Porting to the prerelease now ;-)

-chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2001-01-02 Thread Chris Mason



On Friday, December 29, 2000 06:58:01 PM +0100 Daniel Phillips
[EMAIL PROTECTED] wrote:

 Chris Mason wrote:
 
 BTW, the last anon space mapping patch I sent also works on test13-pre5.
 The block_truncate_page fix does help my patch, since I have bdflush
 locking pages ( thanks Marcelo )
 
 Yes it does, but the little additional patch you posted no longer
 applies.  Your patch is suffering badly under pre5 here, reducing
 throughput from around 10 MB/sec to 5-6 MB/sec, and highly variable.  If
 you're not seeing this you should probably try to get a few others to
 run it.
 

The additional patch I sent (for skipping writepage calls on the first loop
in page_launder) was applied to test13-pre5, in a slightly better form.
So, you'll either need to benchmark in pre4, or make a reversed version for
pre5.

Porting to the prerelease now ;-)

-chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-29 Thread Alexander Viro



On Fri, 29 Dec 2000, Marcelo Tosatti wrote:

> Now the ext2 changes will for sure make a difference since right now the
> superblock lock is suffering from contention. 

superblock lock is suffering from more than just contention. Consider the
following:

sys_ustat() does get_super(), which leaves the sb unlocked.
We are going to call ->statfs().
In the meanwhile, fs is umounted. Woops...

In other words, get_super() callers are oopsen waiting to happen. That's
what is fundamentally wrong with lock_super()/wait_super()/unlock_super() -
we have no reference counter on superblocks and we blindly grab references
to them assuming that they will stay.

The main source of get_super() calls is in device drivers. I propose to add

void invalidate_dev(kdev_t dev, int sync_flag)
{
struct super_block *sb = get_super(dev);
switch (sync_flag) {
case 1: sync_dev(dev);break;
case 2: fsync_dev(dev);break;
}
if (sb)
invalidate_inodes(sb);
invalidate_buffers(dev);
}

to fs/buffer.c, export it and let drivers call that instead of doing the
thing by hands. Then we have a chance to deal with the problem inside the
kernel proper. Right now almost every block device has that sequence in
BLKRRPART handling and fixing each of them separately is going to hurt like
hell.

Linus, I realize that it's changing the block devices near to release, but
AFAICS we have to change them anyway. I'm asking for permission to take
get_super() out.
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-29 Thread Marcelo Tosatti



On Fri, 29 Dec 2000, Chris Mason wrote:

> BTW, the last anon space mapping patch I sent also works on test13-pre5.
> The block_truncate_page fix does help my patch, since I have bdflush
> locking pages ( thanks Marcelo )

Good to know. I thought that fix would not change performance noticeable.

Now the ext2 changes will for sure make a difference since right now the
superblock lock is suffering from contention. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-29 Thread Daniel Phillips

Chris Mason wrote:
> 
> On Thursday, December 28, 2000 11:29:01 AM -0800 Linus Torvalds
> <[EMAIL PROTECTED]> wrote:
> [ skipping io on the first walk in page_launder ]
> >
> > There are some arguments for starting the writeout early, but there are
> > tons of arguments against it too (the main one being "avoid doing IO if
> > you can do so"), so your patch is probably fine. In the end, the
> > performance characteristics are what matters. Does the patch make for
> > smoother behaviour and better performance?
> 
> My dbench speeds have always varied from run to run, but the average speed
> went up about 9% with the anon space mapping patch and the page_launder
> change.  I could not find much difference in a pure test13-pre4, probably
> because dbench doesn't generate much swap on my machine.  I'll do more
> tests when I get back on Monday night.
> 
> Daniel, sounds like dbench varies less on your machine, what did the patch
> do for you?
> 
> BTW, the last anon space mapping patch I sent also works on test13-pre5.
> The block_truncate_page fix does help my patch, since I have bdflush
> locking pages ( thanks Marcelo )

Yes it does, but the little additional patch you posted no longer
applies.  Your patch is suffering badly under pre5 here, reducing
throughput from around 10 MB/sec to 5-6 MB/sec, and highly variable.  If
you're not seeing this you should probably try to get a few others to
run it.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-29 Thread Chris Mason



On Thursday, December 28, 2000 11:29:01 AM -0800 Linus Torvalds
<[EMAIL PROTECTED]> wrote:
[ skipping io on the first walk in page_launder ]
> 
> There are some arguments for starting the writeout early, but there are
> tons of arguments against it too (the main one being "avoid doing IO if
> you can do so"), so your patch is probably fine. In the end, the
> performance characteristics are what matters. Does the patch make for
> smoother behaviour and better performance?

My dbench speeds have always varied from run to run, but the average speed
went up about 9% with the anon space mapping patch and the page_launder
change.  I could not find much difference in a pure test13-pre4, probably
because dbench doesn't generate much swap on my machine.  I'll do more
tests when I get back on Monday night.

Daniel, sounds like dbench varies less on your machine, what did the patch
do for you?

BTW, the last anon space mapping patch I sent also works on test13-pre5.
The block_truncate_page fix does help my patch, since I have bdflush
locking pages ( thanks Marcelo )

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-29 Thread Chris Mason



On Thursday, December 28, 2000 11:29:01 AM -0800 Linus Torvalds
[EMAIL PROTECTED] wrote:
[ skipping io on the first walk in page_launder ]
 
 There are some arguments for starting the writeout early, but there are
 tons of arguments against it too (the main one being "avoid doing IO if
 you can do so"), so your patch is probably fine. In the end, the
 performance characteristics are what matters. Does the patch make for
 smoother behaviour and better performance?

My dbench speeds have always varied from run to run, but the average speed
went up about 9% with the anon space mapping patch and the page_launder
change.  I could not find much difference in a pure test13-pre4, probably
because dbench doesn't generate much swap on my machine.  I'll do more
tests when I get back on Monday night.

Daniel, sounds like dbench varies less on your machine, what did the patch
do for you?

BTW, the last anon space mapping patch I sent also works on test13-pre5.
The block_truncate_page fix does help my patch, since I have bdflush
locking pages ( thanks Marcelo )

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-29 Thread Daniel Phillips

Chris Mason wrote:
 
 On Thursday, December 28, 2000 11:29:01 AM -0800 Linus Torvalds
 [EMAIL PROTECTED] wrote:
 [ skipping io on the first walk in page_launder ]
 
  There are some arguments for starting the writeout early, but there are
  tons of arguments against it too (the main one being "avoid doing IO if
  you can do so"), so your patch is probably fine. In the end, the
  performance characteristics are what matters. Does the patch make for
  smoother behaviour and better performance?
 
 My dbench speeds have always varied from run to run, but the average speed
 went up about 9% with the anon space mapping patch and the page_launder
 change.  I could not find much difference in a pure test13-pre4, probably
 because dbench doesn't generate much swap on my machine.  I'll do more
 tests when I get back on Monday night.
 
 Daniel, sounds like dbench varies less on your machine, what did the patch
 do for you?
 
 BTW, the last anon space mapping patch I sent also works on test13-pre5.
 The block_truncate_page fix does help my patch, since I have bdflush
 locking pages ( thanks Marcelo )

Yes it does, but the little additional patch you posted no longer
applies.  Your patch is suffering badly under pre5 here, reducing
throughput from around 10 MB/sec to 5-6 MB/sec, and highly variable.  If
you're not seeing this you should probably try to get a few others to
run it.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-29 Thread Marcelo Tosatti



On Fri, 29 Dec 2000, Chris Mason wrote:

 BTW, the last anon space mapping patch I sent also works on test13-pre5.
 The block_truncate_page fix does help my patch, since I have bdflush
 locking pages ( thanks Marcelo )

Good to know. I thought that fix would not change performance noticeable.

Now the ext2 changes will for sure make a difference since right now the
superblock lock is suffering from contention. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-29 Thread Alexander Viro



On Fri, 29 Dec 2000, Marcelo Tosatti wrote:

 Now the ext2 changes will for sure make a difference since right now the
 superblock lock is suffering from contention. 

superblock lock is suffering from more than just contention. Consider the
following:

sys_ustat() does get_super(), which leaves the sb unlocked.
We are going to call -statfs().
In the meanwhile, fs is umounted. Woops...

In other words, get_super() callers are oopsen waiting to happen. That's
what is fundamentally wrong with lock_super()/wait_super()/unlock_super() -
we have no reference counter on superblocks and we blindly grab references
to them assuming that they will stay.

The main source of get_super() calls is in device drivers. I propose to add

void invalidate_dev(kdev_t dev, int sync_flag)
{
struct super_block *sb = get_super(dev);
switch (sync_flag) {
case 1: sync_dev(dev);break;
case 2: fsync_dev(dev);break;
}
if (sb)
invalidate_inodes(sb);
invalidate_buffers(dev);
}

to fs/buffer.c, export it and let drivers call that instead of doing the
thing by hands. Then we have a chance to deal with the problem inside the
kernel proper. Right now almost every block device has that sequence in
BLKRRPART handling and fixing each of them separately is going to hurt like
hell.

Linus, I realize that it's changing the block devices near to release, but
AFAICS we have to change them anyway. I'm asking for permission to take
get_super() out.
Cheers,
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-28 Thread Linus Torvalds



On Thu, 28 Dec 2000, Chris Mason wrote:
> 
> Linus and Rik are cc'd in to find out if this is a good idea in
> general.

Probably. 

There are some arguments for starting the writeout early, but there are
tons of arguments against it too (the main one being "avoid doing IO if
you can do so"), so your patch is probably fine. In the end, the
performance characteristics are what matters. Does the patch make for
smoother behaviour and better performance?

Anyway, the "can_get_io_locks" check is subsumed by the "launder_loop"
check: we will never set "launder_loop" to non-zero if we can't get the
io_locks, so you might as well just make the test be

/* First loop through? Don't start IO, just move it to the back of the list */
if (!launder_loop) {


and be done with it.

I'd like to hear what that does for dbench.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-28 Thread Chris Mason



On Thursday, December 28, 2000 16:49:14 +0100 Daniel Phillips <[EMAIL PROTECTED]> 
wrote:

[ dbench 48 test on the anon space mapping patch ]

> 
> This benchmark doesn't seem to suffer a lot from noise, so the 7%
> slowdown with your patch likely real.
> 

Ok, page_launder is supposed to run through the inactive dirty
list twice, and on the second run, it wants to start i/o.  But,
if the page is dirty, writepage is called on the first run.  With
my patch, this flushes lots more data than it used to.  

I have writepage doing all the i/o, and try_to_free_buffers
only waits on it.  This diff makes it so writepage is only called
on the second loop through the inactive dirty list, could you 
please give it a try (slightly faster in my tests).

Linus and Rik are cc'd in to find out if this is a good idea in
general.

-chris

--- linux-test13-pre4/mm/vmscan.c   Sat Dec 23 13:14:26 2000
+++ linux/mm/vmscan.c   Thu Dec 28 15:02:08 2000
@@ -609,7 +609,7 @@
goto page_active;
 
/* Can't start IO? Move it to the back of the list */
-   if (!can_get_io_locks) {
+   if (!launder_loop || !can_get_io_locks) {
list_del(page_lru);
list_add(page_lru, _dirty_list);
UnlockPage(page);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-28 Thread Daniel Phillips

Chris Mason wrote:
> 
> On Wednesday, December 27, 2000 21:26:02 +0100 Daniel Phillips
> <[EMAIL PROTECTED]> wrote:
> 
> > Hi Chris.  I took your patch for a test drive under dbench and it seems
> > impressively stable under load, but there are performance problems.
> >
> >Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K
> >Without patch: 9.5 MB/sec, 11 min 6 secs
> >With patch: 3.12 MB/sec, 33 min 51 sec
> >
> 
> Cool, thanks for the testing.  Which benchmark are you using?  bonnie and
> dbench don't show any changes on my scsi disks, I'll give IDE a try as well.

Chris, this was an error, I had accidently booted the wrong kernel.  The
'With patch' results above are for 2.2.16, not your patch.

With correct results you are looking much better:

  Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K
  Test: dbench 48

  pre13 without patch:  9.5 MB/sec  11 min 6 secs
  pre13 with patch: 8.9 MB/sec  11 min 46 secs
  2.2.16:   3.1 MB/sec  33 min 51 sec

This benchmark doesn't seem to suffer a lot from noise, so the 7%
slowdown with your patch likely real.

We've come a long way from 2.2.16, haven't we?  I'll run some of these
tests against 2.2 pre19 kernels and maybe fan the flames of the 2.2/2.4
competition a little.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-28 Thread Daniel Phillips

Chris Mason wrote:
 
 On Wednesday, December 27, 2000 21:26:02 +0100 Daniel Phillips
 [EMAIL PROTECTED] wrote:
 
  Hi Chris.  I took your patch for a test drive under dbench and it seems
  impressively stable under load, but there are performance problems.
 
 Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K
 Without patch: 9.5 MB/sec, 11 min 6 secs
 With patch: 3.12 MB/sec, 33 min 51 sec
 
 
 Cool, thanks for the testing.  Which benchmark are you using?  bonnie and
 dbench don't show any changes on my scsi disks, I'll give IDE a try as well.

Chris, this was an error, I had accidently booted the wrong kernel.  The
'With patch' results above are for 2.2.16, not your patch.

With correct results you are looking much better:

  Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K
  Test: dbench 48

  pre13 without patch:  9.5 MB/sec  11 min 6 secs
  pre13 with patch: 8.9 MB/sec  11 min 46 secs
  2.2.16:   3.1 MB/sec  33 min 51 sec

This benchmark doesn't seem to suffer a lot from noise, so the 7%
slowdown with your patch likely real.

We've come a long way from 2.2.16, haven't we?  I'll run some of these
tests against 2.2 pre19 kernels and maybe fan the flames of the 2.2/2.4
competition a little.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-28 Thread Chris Mason



On Thursday, December 28, 2000 16:49:14 +0100 Daniel Phillips [EMAIL PROTECTED] 
wrote:

[ dbench 48 test on the anon space mapping patch ]

 
 This benchmark doesn't seem to suffer a lot from noise, so the 7%
 slowdown with your patch likely real.
 

Ok, page_launder is supposed to run through the inactive dirty
list twice, and on the second run, it wants to start i/o.  But,
if the page is dirty, writepage is called on the first run.  With
my patch, this flushes lots more data than it used to.  

I have writepage doing all the i/o, and try_to_free_buffers
only waits on it.  This diff makes it so writepage is only called
on the second loop through the inactive dirty list, could you 
please give it a try (slightly faster in my tests).

Linus and Rik are cc'd in to find out if this is a good idea in
general.

-chris

--- linux-test13-pre4/mm/vmscan.c   Sat Dec 23 13:14:26 2000
+++ linux/mm/vmscan.c   Thu Dec 28 15:02:08 2000
@@ -609,7 +609,7 @@
goto page_active;
 
/* Can't start IO? Move it to the back of the list */
-   if (!can_get_io_locks) {
+   if (!launder_loop || !can_get_io_locks) {
list_del(page_lru);
list_add(page_lru, inactive_dirty_list);
UnlockPage(page);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-28 Thread Linus Torvalds



On Thu, 28 Dec 2000, Chris Mason wrote:
 
 Linus and Rik are cc'd in to find out if this is a good idea in
 general.

Probably. 

There are some arguments for starting the writeout early, but there are
tons of arguments against it too (the main one being "avoid doing IO if
you can do so"), so your patch is probably fine. In the end, the
performance characteristics are what matters. Does the patch make for
smoother behaviour and better performance?

Anyway, the "can_get_io_locks" check is subsumed by the "launder_loop"
check: we will never set "launder_loop" to non-zero if we can't get the
io_locks, so you might as well just make the test be

/* First loop through? Don't start IO, just move it to the back of the list */
if (!launder_loop) {


and be done with it.

I'd like to hear what that does for dbench.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-27 Thread Chris Mason



On Wednesday, December 27, 2000 21:26:02 +0100 Daniel Phillips
<[EMAIL PROTECTED]> wrote:

> Hi Chris.  I took your patch for a test drive under dbench and it seems
> impressively stable under load, but there are performance problems.
> 
>Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K
>Without patch: 9.5 MB/sec, 11 min 6 secs
>With patch: 3.12 MB/sec, 33 min 51 sec
> 

Cool, thanks for the testing.  Which benchmark are you using?  bonnie and
dbench don't show any changes on my scsi disks, I'll give IDE a try as well.

> Philosophically, I wonder if it's right for the buffer flush mechanism
> to be calling into the filesystem.  It seems like the buffer lists
> should stay sitting between the filesystem and the block layer, it
> actually does a pretty good job.
> 
What I'm looking for is a separation of the write management (aging, memory
pressure, etc, etc) from the actual write method.  The lists (VM, buffer.c,
whatever) should do the management, and the FS should do the i/o.  This
patch is not a perfect solution by any means, but its a start.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-27 Thread Daniel Phillips

Chris Mason wrote:
> 
> Hi guys,
> 
> Here's my latest code, which uses ll_rw_block for anon pages (or
> pages without a writepage func) when flush_dirty_buffers,
> sync_buffers, or fsync_inode_buffers are flushing things.  This
> seems to have fixed my slowdown on 1k buffer sizes, but I
> haven't done extensive benchmarks yet.
> 
> Other changes:  After freeing a page with buffers, page_launder
> now stops if (!free_shortage()).  This is a mod of the check where
> page_launder checked free_shortage after freeing a buffer cache
> page.  Code outside buffer.c can't detect buffer cache pages with
> this patch, so the old check doesn't apply.
> 
> My change doesn't seem quite right though, if page_launder wants
> to stop when there isn't a shortage, it should do that regardless of
> if the page it just freed had buffers.  It looks like this was added
> so bdflush could call page_launder, and get an early out after
> freeing some buffer heads, but I'm not sure.
> 
> In test13-pre4, invalidate_buffers skips buffers on a page
> with a mapping.  I changed that to skip mappings other than the
> anon space mapping.
> 
> Comments and/or suggestions on how to make better use of this stuff
> are more than welcome ;-)

Hi Chris.  I took your patch for a test drive under dbench and it seems
impressively stable under load, but there are performance problems.

Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K
Without patch: 9.5 MB/sec, 11 min 6 secs
With patch: 3.12 MB/sec, 33 min 51 sec

Philosophically, I wonder if it's right for the buffer flush mechanism
to be calling into the filesystem.  It seems like the buffer lists
should stay sitting between the filesystem and the block layer, it
actually does a pretty good job.

What about just bumping the buffers of pages found on the inactive_dirty
list to the front of the buffer LRU list?  Then we can do write
clustering by being selective about the bumping.
 
--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-27 Thread Daniel Phillips

Chris Mason wrote:
 
 Hi guys,
 
 Here's my latest code, which uses ll_rw_block for anon pages (or
 pages without a writepage func) when flush_dirty_buffers,
 sync_buffers, or fsync_inode_buffers are flushing things.  This
 seems to have fixed my slowdown on 1k buffer sizes, but I
 haven't done extensive benchmarks yet.
 
 Other changes:  After freeing a page with buffers, page_launder
 now stops if (!free_shortage()).  This is a mod of the check where
 page_launder checked free_shortage after freeing a buffer cache
 page.  Code outside buffer.c can't detect buffer cache pages with
 this patch, so the old check doesn't apply.
 
 My change doesn't seem quite right though, if page_launder wants
 to stop when there isn't a shortage, it should do that regardless of
 if the page it just freed had buffers.  It looks like this was added
 so bdflush could call page_launder, and get an early out after
 freeing some buffer heads, but I'm not sure.
 
 In test13-pre4, invalidate_buffers skips buffers on a page
 with a mapping.  I changed that to skip mappings other than the
 anon space mapping.
 
 Comments and/or suggestions on how to make better use of this stuff
 are more than welcome ;-)

Hi Chris.  I took your patch for a test drive under dbench and it seems
impressively stable under load, but there are performance problems.

Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K
Without patch: 9.5 MB/sec, 11 min 6 secs
With patch: 3.12 MB/sec, 33 min 51 sec

Philosophically, I wonder if it's right for the buffer flush mechanism
to be calling into the filesystem.  It seems like the buffer lists
should stay sitting between the filesystem and the block layer, it
actually does a pretty good job.

What about just bumping the buffers of pages found on the inactive_dirty
list to the front of the buffer LRU list?  Then we can do write
clustering by being selective about the bumping.
 
--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-27 Thread Chris Mason



On Wednesday, December 27, 2000 21:26:02 +0100 Daniel Phillips
[EMAIL PROTECTED] wrote:

 Hi Chris.  I took your patch for a test drive under dbench and it seems
 impressively stable under load, but there are performance problems.
 
Test machine: 64 meg, 500 Mhz K6, IDE, Ext2, Blocksize=4K
Without patch: 9.5 MB/sec, 11 min 6 secs
With patch: 3.12 MB/sec, 33 min 51 sec
 

Cool, thanks for the testing.  Which benchmark are you using?  bonnie and
dbench don't show any changes on my scsi disks, I'll give IDE a try as well.

 Philosophically, I wonder if it's right for the buffer flush mechanism
 to be calling into the filesystem.  It seems like the buffer lists
 should stay sitting between the filesystem and the block layer, it
 actually does a pretty good job.
 
What I'm looking for is a separation of the write management (aging, memory
pressure, etc, etc) from the actual write method.  The lists (VM, buffer.c,
whatever) should do the management, and the FS should do the i/o.  This
patch is not a perfect solution by any means, but its a start.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-26 Thread Marcelo Tosatti



On Tue, 26 Dec 2000, Chris Mason wrote:

> 
> Hi guys,
> 
> Here's my latest code, which uses ll_rw_block for anon pages (or
> pages without a writepage func) when flush_dirty_buffers, 
> sync_buffers, or fsync_inode_buffers are flushing things.  This
> seems to have fixed my slowdown on 1k buffer sizes, but I
> haven't done extensive benchmarks yet.

Great.

I'll run some benchmarks around here too and let you know. 

> Other changes:  After freeing a page with buffers, page_launder
> now stops if (!free_shortage()).  This is a mod of the check where 
> page_launder checked free_shortage after freeing a buffer cache 
> page.  Code outside buffer.c can't detect buffer cache pages with 
> this patch, so the old check doesn't apply.  
> 
> My change doesn't seem quite right though, if page_launder wants 
> to stop when there isn't a shortage, it should do that regardless of
> if the page it just freed had buffers.  It looks like this was added
> so bdflush could call page_launder, and get an early out after
> freeing some buffer heads, but I'm not sure.
> 
> In test13-pre4, invalidate_buffers skips buffers on a page
> with a mapping.  I changed that to skip mappings other than the
> anon space mapping.
> 
> Comments and/or suggestions on how to make better use of this stuff
> are more than welcome ;-)

Well, the best use of this patch seems to be the ability to do write
clustering in the ->writepage() operation for normal filesystems.

I'll try to do a lightweight write clustering patch for
block_write_full_page soon.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-26 Thread Chris Mason


Hi guys,

Here's my latest code, which uses ll_rw_block for anon pages (or
pages without a writepage func) when flush_dirty_buffers, 
sync_buffers, or fsync_inode_buffers are flushing things.  This
seems to have fixed my slowdown on 1k buffer sizes, but I
haven't done extensive benchmarks yet.

Other changes:  After freeing a page with buffers, page_launder
now stops if (!free_shortage()).  This is a mod of the check where 
page_launder checked free_shortage after freeing a buffer cache 
page.  Code outside buffer.c can't detect buffer cache pages with 
this patch, so the old check doesn't apply.  

My change doesn't seem quite right though, if page_launder wants 
to stop when there isn't a shortage, it should do that regardless of
if the page it just freed had buffers.  It looks like this was added
so bdflush could call page_launder, and get an early out after
freeing some buffer heads, but I'm not sure.

In test13-pre4, invalidate_buffers skips buffers on a page
with a mapping.  I changed that to skip mappings other than the
anon space mapping.

Comments and/or suggestions on how to make better use of this stuff
are more than welcome ;-)

-chris

diff -urN linux-test13-pre4/fs/buffer.c linux-anon-space/fs/buffer.c
--- linux-test13-pre4/fs/buffer.c   Sat Dec 23 13:14:48 2000
+++ linux-anon-space/fs/buffer.cTue Dec 26 00:58:06 2000
@@ -97,6 +97,17 @@
 
 static int grow_buffers(int size);
 static void __refile_buffer(struct buffer_head *);
+static int block_write_anon_page(struct page *);
+static void end_buffer_io_async(struct buffer_head * bh, int uptodate) ;
+
+static struct address_space_operations anon_space_ops = {
+   writepage: block_write_anon_page,
+   sync_page: block_sync_page,
+} ;
+static struct address_space anon_space_mapping = {
+   pages: { _space_mapping.pages, _space_mapping.pages },
+   a_ops: _space_ops,
+} ;
 
 /* This is used by some architectures to estimate available memory. */
 atomic_t buffermem_pages = ATOMIC_INIT(0);
@@ -161,6 +172,73 @@
atomic_dec(>b_count);
 }
 
+/* just for use with anon pages, or pages that don't provide their own
+** writepage func.  We just want to write bh, not the whole page, so we
+** queue that io here instead of calling writepage.
+*/
+static int __dirty_list_writepage(struct page *page, struct buffer_head *bh) {
+   int other_dirty = 0 ;
+   struct buffer_head *cur ;
+
+   /* check for other dirty buffers on this page.  If there are none,
+   ** clear the page dirty bit
+   */
+   cur = bh->b_this_page ;
+   while(cur != bh) {
+   other_dirty += buffer_dirty(cur) ;  
+   cur = cur->b_this_page ;
+   } 
+   if (other_dirty == 0) {
+   ClearPageDirty(page) ;
+   } 
+
+   /* we want the page available for locking again right away.  
+   ** someone walking the dirty buffer list might find another
+   ** buffer from this page, and we don't want them to skip it in
+   ** favor of a younger buffer.
+   */
+   atomic_inc(>b_count) ;
+   ll_rw_block(WRITE, 1, ) ;
+   atomic_dec(>b_count) ;
+   UnlockPage(page) ;
+   return 0 ;
+}
+
+/*
+** util function for sync_buffers and flush_dirty_buffers
+** uses either the writepage func supplied in the page's mapping,
+** or the anon address space writepage
+*/
+static int dirty_list_writepage(struct page *page, struct buffer_head *bh) {
+   int (*writepage)(struct page *)  ;
+   int ret ;
+
+   /* someone wrote this page while we were locking before */
+   if (!PageDirty(page) && !buffer_dirty(bh)) {
+   UnlockPage(page) ;
+   return 0 ;
+   }
+   writepage = page->mapping->a_ops->writepage ;
+
+   /* For anon pages, and pages that don't have a writepage
+   ** func, just write this one dirty buffer.  __dirty_list_writepage
+   ** does a little more work to make sure the page dirty bit is cleared
+   ** when we are the only dirty buffer on this page
+   */
+   if (!writepage || page->mapping == _space_mapping) {
+   writepage = anon_space_ops.writepage ;
+   return __dirty_list_writepage(page, bh) ;
+   }
+
+   ClearPageDirty(page) ;
+   ret = writepage(page) ;
+   if (ret == 1) {
+   SetPageDirty(page) ;
+   UnlockPage(page) ;
+   }
+   return ret ;
+}
+
 /* Call sync_buffers with wait!=0 to ensure that the call does not
  * return until all buffer writes have completed.  Sync() may return
  * before the writes have finished; fsync() may not.
@@ -175,6 +253,7 @@
 {
int i, retry, pass = 0, err = 0;
struct buffer_head * bh, *next;
+   struct page *page ;
 
/* One pass for no-wait, three for wait:
 * 0) write out all dirty, unlocked buffers;
@@ -230,10 +309,27 @@
if (!buffer_dirty(bh) || pass >= 2)
continue;
 
-

Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-26 Thread Chris Mason


Hi guys,

Here's my latest code, which uses ll_rw_block for anon pages (or
pages without a writepage func) when flush_dirty_buffers, 
sync_buffers, or fsync_inode_buffers are flushing things.  This
seems to have fixed my slowdown on 1k buffer sizes, but I
haven't done extensive benchmarks yet.

Other changes:  After freeing a page with buffers, page_launder
now stops if (!free_shortage()).  This is a mod of the check where 
page_launder checked free_shortage after freeing a buffer cache 
page.  Code outside buffer.c can't detect buffer cache pages with 
this patch, so the old check doesn't apply.  

My change doesn't seem quite right though, if page_launder wants 
to stop when there isn't a shortage, it should do that regardless of
if the page it just freed had buffers.  It looks like this was added
so bdflush could call page_launder, and get an early out after
freeing some buffer heads, but I'm not sure.

In test13-pre4, invalidate_buffers skips buffers on a page
with a mapping.  I changed that to skip mappings other than the
anon space mapping.

Comments and/or suggestions on how to make better use of this stuff
are more than welcome ;-)

-chris

diff -urN linux-test13-pre4/fs/buffer.c linux-anon-space/fs/buffer.c
--- linux-test13-pre4/fs/buffer.c   Sat Dec 23 13:14:48 2000
+++ linux-anon-space/fs/buffer.cTue Dec 26 00:58:06 2000
@@ -97,6 +97,17 @@
 
 static int grow_buffers(int size);
 static void __refile_buffer(struct buffer_head *);
+static int block_write_anon_page(struct page *);
+static void end_buffer_io_async(struct buffer_head * bh, int uptodate) ;
+
+static struct address_space_operations anon_space_ops = {
+   writepage: block_write_anon_page,
+   sync_page: block_sync_page,
+} ;
+static struct address_space anon_space_mapping = {
+   pages: { anon_space_mapping.pages, anon_space_mapping.pages },
+   a_ops: anon_space_ops,
+} ;
 
 /* This is used by some architectures to estimate available memory. */
 atomic_t buffermem_pages = ATOMIC_INIT(0);
@@ -161,6 +172,73 @@
atomic_dec(bh-b_count);
 }
 
+/* just for use with anon pages, or pages that don't provide their own
+** writepage func.  We just want to write bh, not the whole page, so we
+** queue that io here instead of calling writepage.
+*/
+static int __dirty_list_writepage(struct page *page, struct buffer_head *bh) {
+   int other_dirty = 0 ;
+   struct buffer_head *cur ;
+
+   /* check for other dirty buffers on this page.  If there are none,
+   ** clear the page dirty bit
+   */
+   cur = bh-b_this_page ;
+   while(cur != bh) {
+   other_dirty += buffer_dirty(cur) ;  
+   cur = cur-b_this_page ;
+   } 
+   if (other_dirty == 0) {
+   ClearPageDirty(page) ;
+   } 
+
+   /* we want the page available for locking again right away.  
+   ** someone walking the dirty buffer list might find another
+   ** buffer from this page, and we don't want them to skip it in
+   ** favor of a younger buffer.
+   */
+   atomic_inc(bh-b_count) ;
+   ll_rw_block(WRITE, 1, bh) ;
+   atomic_dec(bh-b_count) ;
+   UnlockPage(page) ;
+   return 0 ;
+}
+
+/*
+** util function for sync_buffers and flush_dirty_buffers
+** uses either the writepage func supplied in the page's mapping,
+** or the anon address space writepage
+*/
+static int dirty_list_writepage(struct page *page, struct buffer_head *bh) {
+   int (*writepage)(struct page *)  ;
+   int ret ;
+
+   /* someone wrote this page while we were locking before */
+   if (!PageDirty(page)  !buffer_dirty(bh)) {
+   UnlockPage(page) ;
+   return 0 ;
+   }
+   writepage = page-mapping-a_ops-writepage ;
+
+   /* For anon pages, and pages that don't have a writepage
+   ** func, just write this one dirty buffer.  __dirty_list_writepage
+   ** does a little more work to make sure the page dirty bit is cleared
+   ** when we are the only dirty buffer on this page
+   */
+   if (!writepage || page-mapping == anon_space_mapping) {
+   writepage = anon_space_ops.writepage ;
+   return __dirty_list_writepage(page, bh) ;
+   }
+
+   ClearPageDirty(page) ;
+   ret = writepage(page) ;
+   if (ret == 1) {
+   SetPageDirty(page) ;
+   UnlockPage(page) ;
+   }
+   return ret ;
+}
+
 /* Call sync_buffers with wait!=0 to ensure that the call does not
  * return until all buffer writes have completed.  Sync() may return
  * before the writes have finished; fsync() may not.
@@ -175,6 +253,7 @@
 {
int i, retry, pass = 0, err = 0;
struct buffer_head * bh, *next;
+   struct page *page ;
 
/* One pass for no-wait, three for wait:
 * 0) write out all dirty, unlocked buffers;
@@ -230,10 +309,27 @@
if (!buffer_dirty(bh) || pass = 2)
continue;
 
- 

Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-26 Thread Marcelo Tosatti



On Tue, 26 Dec 2000, Chris Mason wrote:

 
 Hi guys,
 
 Here's my latest code, which uses ll_rw_block for anon pages (or
 pages without a writepage func) when flush_dirty_buffers, 
 sync_buffers, or fsync_inode_buffers are flushing things.  This
 seems to have fixed my slowdown on 1k buffer sizes, but I
 haven't done extensive benchmarks yet.

Great.

I'll run some benchmarks around here too and let you know. 

 Other changes:  After freeing a page with buffers, page_launder
 now stops if (!free_shortage()).  This is a mod of the check where 
 page_launder checked free_shortage after freeing a buffer cache 
 page.  Code outside buffer.c can't detect buffer cache pages with 
 this patch, so the old check doesn't apply.  
 
 My change doesn't seem quite right though, if page_launder wants 
 to stop when there isn't a shortage, it should do that regardless of
 if the page it just freed had buffers.  It looks like this was added
 so bdflush could call page_launder, and get an early out after
 freeing some buffer heads, but I'm not sure.
 
 In test13-pre4, invalidate_buffers skips buffers on a page
 with a mapping.  I changed that to skip mappings other than the
 anon space mapping.
 
 Comments and/or suggestions on how to make better use of this stuff
 are more than welcome ;-)

Well, the best use of this patch seems to be the ability to do write
clustering in the -writepage() operation for normal filesystems.

I'll try to do a lightweight write clustering patch for
block_write_full_page soon.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-23 Thread Chris Mason


On Saturday, December 23, 2000 11:02:53 -0800 Linus Torvalds
<[EMAIL PROTECTED]> wrote:
> 
> Which is why I prefer the higher layers handling the dirty/uptodate/xxx
> bits. 
> 

Grin, I should have taken the hint when we talked about the buffer up to
date checks for block_read_full_page, it made sense then too.  

brw_page doesn't clear the dirty bit in pre4, was that on purpose?

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-23 Thread Linus Torvalds



On Sat, 23 Dec 2000, Chris Mason wrote:
> 
> I've updated to test13-pre4, and removed the hunk for submit_bh.
> Looks as though pre4 changed the submit_bh callers to clear the dirty
> bit, so my code does the same.

Basically, I wanted to think of "submit_bh()" as a pure IO thing. When we
call submit_bh(), that is basically the same as "statr IO on this thing".
Which implies to me that submit_bh() doesn't care, or know, about why the
higher layers did this.

Which is why I prefer the higher layers handling the dirty/uptodate/xxx
bits. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-23 Thread Chris Mason



On Friday, December 22, 2000 21:26:33 -0200 Marcelo Tosatti 
> If we use ll_rw_block directly on buffers of anonymous pages
> (page->mapping == _space_mapping) instead using
> dirty_list_writepage() (which will end up calling block_write_anon_page)
> we can fix the buffer flushtime issue.
> 

Ok, I'm just being stubborn.  The point of the patch was to get rid
of the ll_rw_block calls, so I'm resisting adding one back in ;-)
But, your way does seem like the least complicated method to honor 
the flushtime param for anon pages, and I'll switch over to that.

I've updated to test13-pre4, and removed the hunk for submit_bh.
Looks as though pre4 changed the submit_bh callers to clear the dirty
bit, so my code does the same.

Other changes: sync_page_buffers doesn't write blocks on the page, it 
only waits on them.  Minor change to when the page count is
increased before writepage is called.

I still need to update fsync_inode_buffersdoesn't look like that
will happen before I head off to visit family, where for some reason
they've banned me from reading email ;-)  I'll be back on Tuesday,
hope everyone has a good holiday.  New patch below:

-chris

diff -urN linux-test13-pre4/fs/buffer.c linux-anon-space/fs/buffer.c
--- linux-test13-pre4/fs/buffer.c   Sat Dec 23 13:14:48 2000
+++ linux-anon-space/fs/buffer.cSat Dec 23 13:30:24 2000
@@ -97,6 +97,16 @@
 
 static int grow_buffers(int size);
 static void __refile_buffer(struct buffer_head *);
+static int block_write_anon_page(struct page *);
+
+static struct address_space_operations anon_space_ops = {
+   writepage: block_write_anon_page,
+   sync_page: block_sync_page,
+} ;
+static struct address_space anon_space_mapping = {
+   pages: { _space_mapping.pages, _space_mapping.pages },
+   a_ops: _space_ops,
+} ;
 
 /* This is used by some architectures to estimate available memory. */
 atomic_t buffermem_pages = ATOMIC_INIT(0);
@@ -161,6 +171,30 @@
atomic_dec(>b_count);
 }
 
+/*
+** util function for sync_buffers and flush_dirty_buffers
+** uses either the writepage func supplied in the page's mapping,
+** or the anon address space writepage
+*/
+static int dirty_list_writepage(struct page *page) {
+   int (*writepage)(struct page *)  ;
+   int ret ;
+
+   writepage = page->mapping->a_ops->writepage ;
+
+   if (!writepage) {
+   writepage = anon_space_ops.writepage ;
+   }
+
+   ClearPageDirty(page) ;
+   ret = writepage(page) ;
+   if (ret == 1) {
+   SetPageDirty(page) ;
+   UnlockPage(page) ;
+   }
+   return ret ;
+}
+
 /* Call sync_buffers with wait!=0 to ensure that the call does not
  * return until all buffer writes have completed.  Sync() may return
  * before the writes have finished; fsync() may not.
@@ -175,6 +209,7 @@
 {
int i, retry, pass = 0, err = 0;
struct buffer_head * bh, *next;
+   struct page *page ;
 
/* One pass for no-wait, three for wait:
 * 0) write out all dirty, unlocked buffers;
@@ -230,10 +265,22 @@
if (!buffer_dirty(bh) || pass >= 2)
continue;
 
-   atomic_inc(>b_count);
+   page = bh->b_page ;
+   page_cache_get(page) ;
+   if (TryLockPage(page)) {
+   if (!wait || !pass) {
+   retry = 1 ;
+   continue ;
+   }
+   spin_unlock(_list_lock);
+   wait_on_page(page) ;
+   page_cache_release(page) ;
+   goto repeat ;
+   }
spin_unlock(_list_lock);
-   ll_rw_block(WRITE, 1, );
-   atomic_dec(>b_count);
+
+   dirty_list_writepage(page) ;
+   page_cache_release(page) ;
retry = 1;
goto repeat;
}
@@ -1101,8 +1148,10 @@
int dispose = BUF_CLEAN;
if (buffer_locked(bh))
dispose = BUF_LOCKED;
-   if (buffer_dirty(bh))
+   if (buffer_dirty(bh)) {
dispose = BUF_DIRTY;
+   SetPageDirty(bh->b_page) ;
+   }
if (buffer_protected(bh))
dispose = BUF_PROTECTED;
if (dispose != bh->b_list) {
@@ -1478,6 +1527,53 @@
  * "Dirty" is valid only with the last case (mapped+uptodate).
  */
 
+static int block_write_anon_page(struct page *page) 
+{
+   struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
+   int i, nr = 0 ;
+   int partial = 0 ;
+   int ret = 0 ;
+
+   if (!PageLocked(page))
+   BUG();
+
+   if (!page->buffers)
+   BUG() ;
+
+   head = page->buffers;
+   bh = head;
+
+   /* Stage 1: 

Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-23 Thread Daniel Phillips

Chris Mason wrote:
> It is enough to leave buffer heads we don't flush on the dirty list (and
> redirty the page), they'll get written by a future loop through
> flush_dirty_pages, or by page_launder.  We could use ll_rw_block instead,
> even though anon pages do have a writepage with this patch (just check if
> page->mapping == _space_mapping).

anon_space_mapping... this is really useful.  This would make a nice
patch on its own.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-23 Thread Daniel Phillips

Chris Mason wrote:
 It is enough to leave buffer heads we don't flush on the dirty list (and
 redirty the page), they'll get written by a future loop through
 flush_dirty_pages, or by page_launder.  We could use ll_rw_block instead,
 even though anon pages do have a writepage with this patch (just check if
 page-mapping == anon_space_mapping).

anon_space_mapping... this is really useful.  This would make a nice
patch on its own.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-23 Thread Chris Mason



On Friday, December 22, 2000 21:26:33 -0200 Marcelo Tosatti 
 If we use ll_rw_block directly on buffers of anonymous pages
 (page-mapping == anon_space_mapping) instead using
 dirty_list_writepage() (which will end up calling block_write_anon_page)
 we can fix the buffer flushtime issue.
 

Ok, I'm just being stubborn.  The point of the patch was to get rid
of the ll_rw_block calls, so I'm resisting adding one back in ;-)
But, your way does seem like the least complicated method to honor 
the flushtime param for anon pages, and I'll switch over to that.

I've updated to test13-pre4, and removed the hunk for submit_bh.
Looks as though pre4 changed the submit_bh callers to clear the dirty
bit, so my code does the same.

Other changes: sync_page_buffers doesn't write blocks on the page, it 
only waits on them.  Minor change to when the page count is
increased before writepage is called.

I still need to update fsync_inode_buffersdoesn't look like that
will happen before I head off to visit family, where for some reason
they've banned me from reading email ;-)  I'll be back on Tuesday,
hope everyone has a good holiday.  New patch below:

-chris

diff -urN linux-test13-pre4/fs/buffer.c linux-anon-space/fs/buffer.c
--- linux-test13-pre4/fs/buffer.c   Sat Dec 23 13:14:48 2000
+++ linux-anon-space/fs/buffer.cSat Dec 23 13:30:24 2000
@@ -97,6 +97,16 @@
 
 static int grow_buffers(int size);
 static void __refile_buffer(struct buffer_head *);
+static int block_write_anon_page(struct page *);
+
+static struct address_space_operations anon_space_ops = {
+   writepage: block_write_anon_page,
+   sync_page: block_sync_page,
+} ;
+static struct address_space anon_space_mapping = {
+   pages: { anon_space_mapping.pages, anon_space_mapping.pages },
+   a_ops: anon_space_ops,
+} ;
 
 /* This is used by some architectures to estimate available memory. */
 atomic_t buffermem_pages = ATOMIC_INIT(0);
@@ -161,6 +171,30 @@
atomic_dec(bh-b_count);
 }
 
+/*
+** util function for sync_buffers and flush_dirty_buffers
+** uses either the writepage func supplied in the page's mapping,
+** or the anon address space writepage
+*/
+static int dirty_list_writepage(struct page *page) {
+   int (*writepage)(struct page *)  ;
+   int ret ;
+
+   writepage = page-mapping-a_ops-writepage ;
+
+   if (!writepage) {
+   writepage = anon_space_ops.writepage ;
+   }
+
+   ClearPageDirty(page) ;
+   ret = writepage(page) ;
+   if (ret == 1) {
+   SetPageDirty(page) ;
+   UnlockPage(page) ;
+   }
+   return ret ;
+}
+
 /* Call sync_buffers with wait!=0 to ensure that the call does not
  * return until all buffer writes have completed.  Sync() may return
  * before the writes have finished; fsync() may not.
@@ -175,6 +209,7 @@
 {
int i, retry, pass = 0, err = 0;
struct buffer_head * bh, *next;
+   struct page *page ;
 
/* One pass for no-wait, three for wait:
 * 0) write out all dirty, unlocked buffers;
@@ -230,10 +265,22 @@
if (!buffer_dirty(bh) || pass = 2)
continue;
 
-   atomic_inc(bh-b_count);
+   page = bh-b_page ;
+   page_cache_get(page) ;
+   if (TryLockPage(page)) {
+   if (!wait || !pass) {
+   retry = 1 ;
+   continue ;
+   }
+   spin_unlock(lru_list_lock);
+   wait_on_page(page) ;
+   page_cache_release(page) ;
+   goto repeat ;
+   }
spin_unlock(lru_list_lock);
-   ll_rw_block(WRITE, 1, bh);
-   atomic_dec(bh-b_count);
+
+   dirty_list_writepage(page) ;
+   page_cache_release(page) ;
retry = 1;
goto repeat;
}
@@ -1101,8 +1148,10 @@
int dispose = BUF_CLEAN;
if (buffer_locked(bh))
dispose = BUF_LOCKED;
-   if (buffer_dirty(bh))
+   if (buffer_dirty(bh)) {
dispose = BUF_DIRTY;
+   SetPageDirty(bh-b_page) ;
+   }
if (buffer_protected(bh))
dispose = BUF_PROTECTED;
if (dispose != bh-b_list) {
@@ -1478,6 +1527,53 @@
  * "Dirty" is valid only with the last case (mapped+uptodate).
  */
 
+static int block_write_anon_page(struct page *page) 
+{
+   struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
+   int i, nr = 0 ;
+   int partial = 0 ;
+   int ret = 0 ;
+
+   if (!PageLocked(page))
+   BUG();
+
+   if (!page-buffers)
+   BUG() ;
+
+   head = page-buffers;
+   bh = head;
+
+  

Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-23 Thread Linus Torvalds



On Sat, 23 Dec 2000, Chris Mason wrote:
 
 I've updated to test13-pre4, and removed the hunk for submit_bh.
 Looks as though pre4 changed the submit_bh callers to clear the dirty
 bit, so my code does the same.

Basically, I wanted to think of "submit_bh()" as a pure IO thing. When we
call submit_bh(), that is basically the same as "statr IO on this thing".
Which implies to me that submit_bh() doesn't care, or know, about why the
higher layers did this.

Which is why I prefer the higher layers handling the dirty/uptodate/xxx
bits. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-23 Thread Chris Mason


On Saturday, December 23, 2000 11:02:53 -0800 Linus Torvalds
[EMAIL PROTECTED] wrote:
 
 Which is why I prefer the higher layers handling the dirty/uptodate/xxx
 bits. 
 

Grin, I should have taken the hint when we talked about the buffer up to
date checks for block_read_full_page, it made sense then too.  

brw_page doesn't clear the dirty bit in pre4, was that on purpose?

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Marcelo Tosatti


On Fri, 22 Dec 2000, Chris Mason wrote:

> It is enough to leave buffer heads we don't flush on the dirty list (and
> redirty the page), they'll get written by a future loop through
> flush_dirty_pages, or by page_launder.  We could use ll_rw_block instead,
> even though anon pages do have a writepage with this patch (just check if
> page->mapping == _space_mapping).

If we use ll_rw_block directly on buffers of anonymous pages
(page->mapping == _space_mapping) instead using
dirty_list_writepage() (which will end up calling block_write_anon_page)
we can fix the buffer flushtime issue.

> There are lots of things we could try here, including some logic inside
> block_write_anon_page to check the current memory pressure/dirty levels to
> see how much work should be done.

At writepage() context we do not know if we should care about the
flushtime.

> I'll play with this a bit, but more ideas would be appreciated.
>
> BTW, recent change to my local code was to remove the ll_rw_block call from
> sync_page_buffers.  It seemed cleaner than making try_to_free_buffers
> understand that sometimes writepage will be called, and sometimes the page
> will end up unlocked because of itcomments?

Could you please send me your latest patch ?

Thanks

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Friday, December 22, 2000 17:52:28 -0200 Marcelo Tosatti
<[EMAIL PROTECTED]> wrote:

> There is one more nasty issue to deal with. 
> 
> You only want to take into account the buffer flushtime if
> "check_flushtime" parameter is passed as true to flush_dirty_buffers
> (which is done by kupdate).
> 
> Thinking a bit more about the issue, I dont see any reason why we want to
> write all buffers of an anonymous page at
> sync_buffers/flush_dirty_buffers.
> 
> I think we could simply write the buffer with ll_rw_block() if the page
> which this buffer is sitting on does not have mapping->a_ops->writepage
> operation defined.
> 

It is enough to leave buffer heads we don't flush on the dirty list (and
redirty the page), they'll get written by a future loop through
flush_dirty_pages, or by page_launder.  We could use ll_rw_block instead,
even though anon pages do have a writepage with this patch (just check if
page->mapping == _space_mapping).

There are lots of things we could try here, including some logic inside
block_write_anon_page to check the current memory pressure/dirty levels to
see how much work should be done.

I'll play with this a bit, but more ideas would be appreciated.

BTW, recent change to my local code was to remove the ll_rw_block call from
sync_page_buffers.  It seemed cleaner than making try_to_free_buffers
understand that sometimes writepage will be called, and sometimes the page
will end up unlocked because of itcomments?

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Friday, December 22, 2000 17:45:57 +0100 Daniel Phillips
<[EMAIL PROTECTED]> wrote:

[ flushing a page at a time in bdflush ]

> Um.  Why cater to the uncommon case of 1K blocks?  Just let
> bdflush/kupdated deal with them in the normal way - it's pretty
> efficient.  Only try to do the clustering optimization when buffer size
> matches memory page size.
> 

This isn't really an attempt at a clustering optimization.  The problem at
hand is that buffer cache buffers can be on relatively random pages.  So, a
page might have buffers that are very far apart, where one needs flushing
and the other doesn't.

In the blocksize == page size case, this won't happen, and we don't lose
any speed over the existing code.  In the blocksize < pagesize case, my new
code is slower, so my goal is to fix just that problem.

Real write clustering would be a different issue entirely, and is worth
doing ;-)

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Daniel Phillips

Chris Mason wrote:
> 
> On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti 
><[EMAIL PROTECTED]> wrote:
> >
> > On Thu, 21 Dec 2000, Andreas Dilger wrote:
> >
> >> Marcelo Tosatti writes:
> >> > It seems your code has a problem with bh flush time.
> >> >
> >> > In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
> >> > be written in case its old enough. (bh->b_flushtime)
> >> >
> >> > If the flush happens for an anonymous buffer, you'll end up writing all
> >> > buffers which are sitting on the same page (with block_write_anon_page),
> >> > but these other buffers are not necessarily old enough to be flushed.
> >>
> >> This isn't really a "problem" however.  The page is the _maximum_ age of
> >> the buffer before it needs to be written.  If we can efficiently write it
> >> out with another buffer
> >
> >> (essentially for free if they are on the same spot on disk)
> >
> > Are you sure this is true for buffer pages in most cases?
> 
> It's a good point.  block_write_anon_page could be changed to just 
> write the oldest buffer and redirty the page (if the buffers are 
> far apart).  If memory is tight, and we *really* need the page back,
> it will be flushed by try_to_free_buffers.
> 
> It seems a bit nasty to me though...writepage should write the page.

Um.  Why cater to the uncommon case of 1K blocks?  Just let
bdflush/kupdated deal with them in the normal way - it's pretty
efficient.  Only try to do the clustering optimization when buffer size
matches memory page size.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti
<[EMAIL PROTECTED]> wrote:

>> Marcelo Tosatti writes:
>> > It seems your code has a problem with bh flush time.
>> > 
>> > In flush_dirty_buffers(), a buffer may (if being called from kupdate)
>> > only be written in case its old enough. (bh->b_flushtime)
>> > 
>> > If the flush happens for an anonymous buffer, you'll end up writing all
>> > buffers which are sitting on the same page (with
>> > block_write_anon_page), but these other buffers are not necessarily
>> > old enough to be flushed.
>> 

A quick benchmark shows there's room for improvement here.  I'll play
around with a version of block_write_anon_page that tries to be more
selective when flushing things out.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti 
<[EMAIL PROTECTED]> wrote:

> 
> On Thu, 21 Dec 2000, Andreas Dilger wrote:
> 
>> Marcelo Tosatti writes:
>> > It seems your code has a problem with bh flush time.
>> > 
>> > In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
>> > be written in case its old enough. (bh->b_flushtime)
>> > 
>> > If the flush happens for an anonymous buffer, you'll end up writing all
>> > buffers which are sitting on the same page (with block_write_anon_page),
>> > but these other buffers are not necessarily old enough to be flushed.
>> 
>> This isn't really a "problem" however.  The page is the _maximum_ age of
>> the buffer before it needs to be written.  If we can efficiently write it
>> out with another buffer 
> 
> 
>> (essentially for free if they are on the same spot on disk)
> 
> Are you sure this is true for buffer pages in most cases? 
> 
> 

It's a good point.  block_write_anon_page could be changed to just write the oldest 
buffer and redirty the page (if the buffers are far apart).  If memory is tight, and 
we *really* need the page back, it will be flushed by try_to_free_buffers.

It seems a bit nasty to me though...writepage should write the page.

-chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Thursday, December 21, 2000 20:54:09 -0500 Alexander Viro <[EMAIL PROTECTED]> wrote:

> 
> 
> On Thu, 21 Dec 2000, Chris Mason wrote:
> 
>> Obvious bug, block_write_full_page zeros out the bits past the end of 
>> file every time.  This should not be needed for normal file writes.
> 
> Unfortunately, it _is_ needed for pageout path. mmap() the last page
> of file. Dirty the data past the EOF (MMU will not catch that access).
> Let the pageout send the page to disk. You don't want to have the data
> past EOF end up on the backstore.
> 

Sorry, I wasn't very clear.  I'd like to find a way to just do the memset in the one 
case it is needed (a real writepage), and not do it when we are using writepage just 
to flush dirty buffers.  I don't see how to do it though, without making a new flush 
operation.  That would also solve my concerns about the change in writepage semantics 
(even though every FS I could find in the kernel tree was using 
block_write_full_page). 

thanks,
Chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Thursday, December 21, 2000 20:54:09 -0500 Alexander Viro [EMAIL PROTECTED] wrote:

 
 
 On Thu, 21 Dec 2000, Chris Mason wrote:
 
 Obvious bug, block_write_full_page zeros out the bits past the end of 
 file every time.  This should not be needed for normal file writes.
 
 Unfortunately, it _is_ needed for pageout path. mmap() the last page
 of file. Dirty the data past the EOF (MMU will not catch that access).
 Let the pageout send the page to disk. You don't want to have the data
 past EOF end up on the backstore.
 

Sorry, I wasn't very clear.  I'd like to find a way to just do the memset in the one 
case it is needed (a real writepage), and not do it when we are using writepage just 
to flush dirty buffers.  I don't see how to do it though, without making a new flush 
operation.  That would also solve my concerns about the change in writepage semantics 
(even though every FS I could find in the kernel tree was using 
block_write_full_page). 

thanks,
Chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti 
[EMAIL PROTECTED] wrote:

 
 On Thu, 21 Dec 2000, Andreas Dilger wrote:
 
 Marcelo Tosatti writes:
  It seems your code has a problem with bh flush time.
  
  In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
  be written in case its old enough. (bh-b_flushtime)
  
  If the flush happens for an anonymous buffer, you'll end up writing all
  buffers which are sitting on the same page (with block_write_anon_page),
  but these other buffers are not necessarily old enough to be flushed.
 
 This isn't really a "problem" however.  The page is the _maximum_ age of
 the buffer before it needs to be written.  If we can efficiently write it
 out with another buffer 
 
 
 (essentially for free if they are on the same spot on disk)
 
 Are you sure this is true for buffer pages in most cases? 
 
 

It's a good point.  block_write_anon_page could be changed to just write the oldest 
buffer and redirty the page (if the buffers are far apart).  If memory is tight, and 
we *really* need the page back, it will be flushed by try_to_free_buffers.

It seems a bit nasty to me though...writepage should write the page.

-chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti
[EMAIL PROTECTED] wrote:

 Marcelo Tosatti writes:
  It seems your code has a problem with bh flush time.
  
  In flush_dirty_buffers(), a buffer may (if being called from kupdate)
  only be written in case its old enough. (bh-b_flushtime)
  
  If the flush happens for an anonymous buffer, you'll end up writing all
  buffers which are sitting on the same page (with
  block_write_anon_page), but these other buffers are not necessarily
  old enough to be flushed.
 

A quick benchmark shows there's room for improvement here.  I'll play
around with a version of block_write_anon_page that tries to be more
selective when flushing things out.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Daniel Phillips

Chris Mason wrote:
 
 On Thursday, December 21, 2000 22:38:04 -0200 Marcelo Tosatti 
[EMAIL PROTECTED] wrote:
 
  On Thu, 21 Dec 2000, Andreas Dilger wrote:
 
  Marcelo Tosatti writes:
   It seems your code has a problem with bh flush time.
  
   In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
   be written in case its old enough. (bh-b_flushtime)
  
   If the flush happens for an anonymous buffer, you'll end up writing all
   buffers which are sitting on the same page (with block_write_anon_page),
   but these other buffers are not necessarily old enough to be flushed.
 
  This isn't really a "problem" however.  The page is the _maximum_ age of
  the buffer before it needs to be written.  If we can efficiently write it
  out with another buffer
 
  (essentially for free if they are on the same spot on disk)
 
  Are you sure this is true for buffer pages in most cases?
 
 It's a good point.  block_write_anon_page could be changed to just 
 write the oldest buffer and redirty the page (if the buffers are 
 far apart).  If memory is tight, and we *really* need the page back,
 it will be flushed by try_to_free_buffers.
 
 It seems a bit nasty to me though...writepage should write the page.

Um.  Why cater to the uncommon case of 1K blocks?  Just let
bdflush/kupdated deal with them in the normal way - it's pretty
efficient.  Only try to do the clustering optimization when buffer size
matches memory page size.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Friday, December 22, 2000 17:45:57 +0100 Daniel Phillips
[EMAIL PROTECTED] wrote:

[ flushing a page at a time in bdflush ]

 Um.  Why cater to the uncommon case of 1K blocks?  Just let
 bdflush/kupdated deal with them in the normal way - it's pretty
 efficient.  Only try to do the clustering optimization when buffer size
 matches memory page size.
 

This isn't really an attempt at a clustering optimization.  The problem at
hand is that buffer cache buffers can be on relatively random pages.  So, a
page might have buffers that are very far apart, where one needs flushing
and the other doesn't.

In the blocksize == page size case, this won't happen, and we don't lose
any speed over the existing code.  In the blocksize  pagesize case, my new
code is slower, so my goal is to fix just that problem.

Real write clustering would be a different issue entirely, and is worth
doing ;-)

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Chris Mason



On Friday, December 22, 2000 17:52:28 -0200 Marcelo Tosatti
[EMAIL PROTECTED] wrote:

 There is one more nasty issue to deal with. 
 
 You only want to take into account the buffer flushtime if
 "check_flushtime" parameter is passed as true to flush_dirty_buffers
 (which is done by kupdate).
 
 Thinking a bit more about the issue, I dont see any reason why we want to
 write all buffers of an anonymous page at
 sync_buffers/flush_dirty_buffers.
 
 I think we could simply write the buffer with ll_rw_block() if the page
 which this buffer is sitting on does not have mapping-a_ops-writepage
 operation defined.
 

It is enough to leave buffer heads we don't flush on the dirty list (and
redirty the page), they'll get written by a future loop through
flush_dirty_pages, or by page_launder.  We could use ll_rw_block instead,
even though anon pages do have a writepage with this patch (just check if
page-mapping == anon_space_mapping).

There are lots of things we could try here, including some logic inside
block_write_anon_page to check the current memory pressure/dirty levels to
see how much work should be done.

I'll play with this a bit, but more ideas would be appreciated.

BTW, recent change to my local code was to remove the ll_rw_block call from
sync_page_buffers.  It seemed cleaner than making try_to_free_buffers
understand that sometimes writepage will be called, and sometimes the page
will end up unlocked because of itcomments?

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-22 Thread Marcelo Tosatti


On Fri, 22 Dec 2000, Chris Mason wrote:

 It is enough to leave buffer heads we don't flush on the dirty list (and
 redirty the page), they'll get written by a future loop through
 flush_dirty_pages, or by page_launder.  We could use ll_rw_block instead,
 even though anon pages do have a writepage with this patch (just check if
 page-mapping == anon_space_mapping).

If we use ll_rw_block directly on buffers of anonymous pages
(page-mapping == anon_space_mapping) instead using
dirty_list_writepage() (which will end up calling block_write_anon_page)
we can fix the buffer flushtime issue.

 There are lots of things we could try here, including some logic inside
 block_write_anon_page to check the current memory pressure/dirty levels to
 see how much work should be done.

At writepage() context we do not know if we should care about the
flushtime.

 I'll play with this a bit, but more ideas would be appreciated.

 BTW, recent change to my local code was to remove the ll_rw_block call from
 sync_page_buffers.  It seemed cleaner than making try_to_free_buffers
 understand that sometimes writepage will be called, and sometimes the page
 will end up unlocked because of itcomments?

Could you please send me your latest patch ?

Thanks

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Marcelo Tosatti


On Thu, 21 Dec 2000, Andreas Dilger wrote:

> Marcelo Tosatti writes:
> > It seems your code has a problem with bh flush time.
> > 
> > In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
> > be written in case its old enough. (bh->b_flushtime)
> > 
> > If the flush happens for an anonymous buffer, you'll end up writing all
> > buffers which are sitting on the same page (with block_write_anon_page),
> > but these other buffers are not necessarily old enough to be flushed.
> 
> This isn't really a "problem" however.  The page is the _maximum_ age of
> the buffer before it needs to be written.  If we can efficiently write it
> out with another buffer 


> (essentially for free if they are on the same spot on disk)

Are you sure this is true for buffer pages in most cases? 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Andreas Dilger

Marcelo Tosatti writes:
> It seems your code has a problem with bh flush time.
> 
> In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
> be written in case its old enough. (bh->b_flushtime)
> 
> If the flush happens for an anonymous buffer, you'll end up writing all
> buffers which are sitting on the same page (with block_write_anon_page),
> but these other buffers are not necessarily old enough to be flushed.

This isn't really a "problem" however.  The page is the _maximum_ age of
the buffer before it needs to be written.  If we can efficiently write it
out with another buffer (essentially for free if they are on the same
spot on disk), then there is less work for us to do later.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Marcelo Tosatti


On Thu, 21 Dec 2000, Chris Mason wrote:

> Ok guys, I think I've taken Linus' suggestion to have buffer.c use its
> own writepage a bit too far.  This patch marks pages dirty when the 
> buffer head is marked dirty, and changes flush_dirty_buffers and 
> sync_buffers to use writepage instead of ll_rw_block.  The idea is 
> to allow filesystems to use the buffer lists and provide their own 
> i/o mechanism.
> 
> The result is a serious semantics change for writepage, which now is 
> expected to do partial page writes when the page isn't up to date and 
> there are dirty buffers inside.  For all the obvious reasons, this isn't 
> fit for 2.4.0, and if you all feel it is a 2.5. thing I'll send along 
> the  shorter  patch Linus originally suggested.  But, I think it would 
> be  pretty useful for the new filesystems (once I also fix 
> fsync_inode_buffers and sync_page_buffers).

It is very powerful.

With this on place, the filesystem is able to do write clustering at its
writepage() function by checking if the on-disk physically nearby pages
are dirty.
  
> Other changes:  submit_bh now cleans the buffers.  I don't see how 
> they were getting cleaned before, it must have been try_to_free_buffers 
> sending the page through sync_page_buffers, meaning they were probably 
> getting written twice.  Unless someone throws a clue my way, I'll send 
> this out as a separate diff.

Ouch.

> page_launder doesn't fiddle with the buffermem_pages counter, it is done 
> in try_to_free_buffers instead.
>
> Obvious bug, block_write_full_page zeros out the bits past the end of 
> file every time.  This should not be needed for normal file writes.
> 
> Most testing was on ext2, who actually calls mark_buffer_dirty, and 
> supports blocksizes < intel page size.  More tests are running 
> overnight.

It seems your code has a problem with bh flush time.

In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
be written in case its old enough. (bh->b_flushtime)

If the flush happens for an anonymous buffer, you'll end up writing all
buffers which are sitting on the same page (with block_write_anon_page),
but these other buffers are not necessarily old enough to be flushed.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Alexander Viro



On Thu, 21 Dec 2000, Chris Mason wrote:

> Obvious bug, block_write_full_page zeros out the bits past the end of 
> file every time.  This should not be needed for normal file writes.

Unfortunately, it _is_ needed for pageout path. mmap() the last page
of file. Dirty the data past the EOF (MMU will not catch that access).
Let the pageout send the page to disk. You don't want to have the data
past EOF end up on the backstore.

Al, slowly getting back to life and sanity. Fsck the flu...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Chris Mason



Ok guys, I think I've taken Linus' suggestion to have buffer.c use its
own writepage a bit too far.  This patch marks pages dirty when the 
buffer head is marked dirty, and changes flush_dirty_buffers and 
sync_buffers to use writepage instead of ll_rw_block.  The idea is 
to allow filesystems to use the buffer lists and provide their own 
i/o mechanism.

The result is a serious semantics change for writepage, which now is 
expected to do partial page writes when the page isn't up to date and 
there are dirty buffers inside.  For all the obvious reasons, this isn't 
fit for 2.4.0, and if you all feel it is a 2.5. thing I'll send along 
the  shorter  patch Linus originally suggested.  But, I think it would 
be  pretty useful for the new filesystems (once I also fix 
fsync_inode_buffers and sync_page_buffers).

Other changes:  submit_bh now cleans the buffers.  I don't see how 
they were getting cleaned before, it must have been try_to_free_buffers 
sending the page through sync_page_buffers, meaning they were probably 
getting written twice.  Unless someone throws a clue my way, I'll send 
this out as a separate diff.

page_launder doesn't fiddle with the buffermem_pages counter, it is done 
in try_to_free_buffers instead.

Obvious bug, block_write_full_page zeros out the bits past the end of 
file every time.  This should not be needed for normal file writes.

Most testing was on ext2, who actually calls mark_buffer_dirty, and 
supports blocksizes < intel page size.  More tests are running 
overnight.

-chris

diff -urN linux.test13p3p/drivers/block/ll_rw_blk.c 
linux-test12/drivers/block/ll_rw_blk.c
--- linux.test13p3p/drivers/block/ll_rw_blk.c   Tue Dec 19 05:07:50 2000
+++ linux.test13p3/drivers/block/ll_rw_blk.cThu Dec 21 16:56:43 2000
@@ -954,6 +954,7 @@
if (!test_bit(BH_Lock, >b_state))
BUG();
 
+   mark_buffer_clean(bh) ;
set_bit(BH_Req, >b_state);
 
/*
diff -urN linux.test13p3p/fs/buffer.c linux-test12/fs/buffer.c
--- linux.test13p3p/fs/buffer.c Thu Dec 21 17:24:17 2000
+++ linux.test13p3/fs/buffer.c  Thu Dec 21 17:28:46 2000
@@ -97,6 +97,16 @@
 
 static int grow_buffers(int size);
 static void __refile_buffer(struct buffer_head *);
+static int block_write_anon_page(struct page *);
+
+static struct address_space_operations anon_space_ops = {
+   writepage: block_write_anon_page,
+   sync_page: block_sync_page,
+} ;
+static struct address_space anon_space_mapping = {
+   pages: { _space_mapping.pages, _space_mapping.pages },
+   a_ops: _space_ops,
+} ;
 
 /* This is used by some architectures to estimate available memory. */
 atomic_t buffermem_pages = ATOMIC_INIT(0);
@@ -161,6 +171,37 @@
atomic_dec(>b_count);
 }
 
+/*
+** util function for sync_buffers and flush_dirty_buffers
+** uses either the writepage func supplied in the page's mapping,
+** or the anon address space writepage
+*/
+static int dirty_list_writepage(struct page *page) {
+   int (*writepage)(struct page *)  ;
+   int ret ;
+
+   writepage = page->mapping->a_ops->writepage ;
+
+   if (!writepage) {
+   writepage = anon_space_ops.writepage ;
+   }
+
+   page_cache_get(page) ;
+   ClearPageDirty(page) ;
+   ret = writepage(page) ;
+   /* note, when writepage returns 1, we mark the page dirty again
+   ** but the writepage func is responsible for making sure any
+   ** buffers that need to stay dirty really do stay dirty
+   ** ick.
+   */
+   if (ret == 1) {
+   SetPageDirty(page) ;
+   UnlockPage(page) ;
+   }
+   page_cache_release(page) ;
+   return ret ;
+}
+
 /* Call sync_buffers with wait!=0 to ensure that the call does not
  * return until all buffer writes have completed.  Sync() may return
  * before the writes have finished; fsync() may not.
@@ -175,6 +216,7 @@
 {
int i, retry, pass = 0, err = 0;
struct buffer_head * bh, *next;
+   struct page *page ;
 
/* One pass for no-wait, three for wait:
 * 0) write out all dirty, unlocked buffers;
@@ -230,10 +272,19 @@
if (!buffer_dirty(bh) || pass >= 2)
continue;
 
-   atomic_inc(>b_count);
+   page = bh->b_page ;
+   if (TryLockPage(page)) {
+   if (!wait || !pass) {
+   retry = 1 ;
+   continue ;
+   }
+   spin_unlock(_list_lock);
+   wait_on_page(page) ;
+   goto repeat ;
+   }
spin_unlock(_list_lock);
-   ll_rw_block(WRITE, 1, );
-   atomic_dec(>b_count);
+
+   dirty_list_writepage(page) ;
retry = 1;
 

[RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Chris Mason



Ok guys, I think I've taken Linus' suggestion to have buffer.c use its
own writepage a bit too far.  This patch marks pages dirty when the 
buffer head is marked dirty, and changes flush_dirty_buffers and 
sync_buffers to use writepage instead of ll_rw_block.  The idea is 
to allow filesystems to use the buffer lists and provide their own 
i/o mechanism.

The result is a serious semantics change for writepage, which now is 
expected to do partial page writes when the page isn't up to date and 
there are dirty buffers inside.  For all the obvious reasons, this isn't 
fit for 2.4.0, and if you all feel it is a 2.5. thing I'll send along 
the  shorter  patch Linus originally suggested.  But, I think it would 
be  pretty useful for the new filesystems (once I also fix 
fsync_inode_buffers and sync_page_buffers).

Other changes:  submit_bh now cleans the buffers.  I don't see how 
they were getting cleaned before, it must have been try_to_free_buffers 
sending the page through sync_page_buffers, meaning they were probably 
getting written twice.  Unless someone throws a clue my way, I'll send 
this out as a separate diff.

page_launder doesn't fiddle with the buffermem_pages counter, it is done 
in try_to_free_buffers instead.

Obvious bug, block_write_full_page zeros out the bits past the end of 
file every time.  This should not be needed for normal file writes.

Most testing was on ext2, who actually calls mark_buffer_dirty, and 
supports blocksizes  intel page size.  More tests are running 
overnight.

-chris

diff -urN linux.test13p3p/drivers/block/ll_rw_blk.c 
linux-test12/drivers/block/ll_rw_blk.c
--- linux.test13p3p/drivers/block/ll_rw_blk.c   Tue Dec 19 05:07:50 2000
+++ linux.test13p3/drivers/block/ll_rw_blk.cThu Dec 21 16:56:43 2000
@@ -954,6 +954,7 @@
if (!test_bit(BH_Lock, bh-b_state))
BUG();
 
+   mark_buffer_clean(bh) ;
set_bit(BH_Req, bh-b_state);
 
/*
diff -urN linux.test13p3p/fs/buffer.c linux-test12/fs/buffer.c
--- linux.test13p3p/fs/buffer.c Thu Dec 21 17:24:17 2000
+++ linux.test13p3/fs/buffer.c  Thu Dec 21 17:28:46 2000
@@ -97,6 +97,16 @@
 
 static int grow_buffers(int size);
 static void __refile_buffer(struct buffer_head *);
+static int block_write_anon_page(struct page *);
+
+static struct address_space_operations anon_space_ops = {
+   writepage: block_write_anon_page,
+   sync_page: block_sync_page,
+} ;
+static struct address_space anon_space_mapping = {
+   pages: { anon_space_mapping.pages, anon_space_mapping.pages },
+   a_ops: anon_space_ops,
+} ;
 
 /* This is used by some architectures to estimate available memory. */
 atomic_t buffermem_pages = ATOMIC_INIT(0);
@@ -161,6 +171,37 @@
atomic_dec(bh-b_count);
 }
 
+/*
+** util function for sync_buffers and flush_dirty_buffers
+** uses either the writepage func supplied in the page's mapping,
+** or the anon address space writepage
+*/
+static int dirty_list_writepage(struct page *page) {
+   int (*writepage)(struct page *)  ;
+   int ret ;
+
+   writepage = page-mapping-a_ops-writepage ;
+
+   if (!writepage) {
+   writepage = anon_space_ops.writepage ;
+   }
+
+   page_cache_get(page) ;
+   ClearPageDirty(page) ;
+   ret = writepage(page) ;
+   /* note, when writepage returns 1, we mark the page dirty again
+   ** but the writepage func is responsible for making sure any
+   ** buffers that need to stay dirty really do stay dirty
+   ** ick.
+   */
+   if (ret == 1) {
+   SetPageDirty(page) ;
+   UnlockPage(page) ;
+   }
+   page_cache_release(page) ;
+   return ret ;
+}
+
 /* Call sync_buffers with wait!=0 to ensure that the call does not
  * return until all buffer writes have completed.  Sync() may return
  * before the writes have finished; fsync() may not.
@@ -175,6 +216,7 @@
 {
int i, retry, pass = 0, err = 0;
struct buffer_head * bh, *next;
+   struct page *page ;
 
/* One pass for no-wait, three for wait:
 * 0) write out all dirty, unlocked buffers;
@@ -230,10 +272,19 @@
if (!buffer_dirty(bh) || pass = 2)
continue;
 
-   atomic_inc(bh-b_count);
+   page = bh-b_page ;
+   if (TryLockPage(page)) {
+   if (!wait || !pass) {
+   retry = 1 ;
+   continue ;
+   }
+   spin_unlock(lru_list_lock);
+   wait_on_page(page) ;
+   goto repeat ;
+   }
spin_unlock(lru_list_lock);
-   ll_rw_block(WRITE, 1, bh);
-   atomic_dec(bh-b_count);
+
+   dirty_list_writepage(page) ;
retry = 

Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Alexander Viro



On Thu, 21 Dec 2000, Chris Mason wrote:

 Obvious bug, block_write_full_page zeros out the bits past the end of 
 file every time.  This should not be needed for normal file writes.

Unfortunately, it _is_ needed for pageout path. mmap() the last page
of file. Dirty the data past the EOF (MMU will not catch that access).
Let the pageout send the page to disk. You don't want to have the data
past EOF end up on the backstore.

Al, slowly getting back to life and sanity. Fsck the flu...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Marcelo Tosatti


On Thu, 21 Dec 2000, Chris Mason wrote:

 Ok guys, I think I've taken Linus' suggestion to have buffer.c use its
 own writepage a bit too far.  This patch marks pages dirty when the 
 buffer head is marked dirty, and changes flush_dirty_buffers and 
 sync_buffers to use writepage instead of ll_rw_block.  The idea is 
 to allow filesystems to use the buffer lists and provide their own 
 i/o mechanism.
 
 The result is a serious semantics change for writepage, which now is 
 expected to do partial page writes when the page isn't up to date and 
 there are dirty buffers inside.  For all the obvious reasons, this isn't 
 fit for 2.4.0, and if you all feel it is a 2.5. thing I'll send along 
 the  shorter  patch Linus originally suggested.  But, I think it would 
 be  pretty useful for the new filesystems (once I also fix 
 fsync_inode_buffers and sync_page_buffers).

It is very powerful.

With this on place, the filesystem is able to do write clustering at its
writepage() function by checking if the on-disk physically nearby pages
are dirty.
  
 Other changes:  submit_bh now cleans the buffers.  I don't see how 
 they were getting cleaned before, it must have been try_to_free_buffers 
 sending the page through sync_page_buffers, meaning they were probably 
 getting written twice.  Unless someone throws a clue my way, I'll send 
 this out as a separate diff.

Ouch.

 page_launder doesn't fiddle with the buffermem_pages counter, it is done 
 in try_to_free_buffers instead.

 Obvious bug, block_write_full_page zeros out the bits past the end of 
 file every time.  This should not be needed for normal file writes.
 
 Most testing was on ext2, who actually calls mark_buffer_dirty, and 
 supports blocksizes  intel page size.  More tests are running 
 overnight.

It seems your code has a problem with bh flush time.

In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
be written in case its old enough. (bh-b_flushtime)

If the flush happens for an anonymous buffer, you'll end up writing all
buffers which are sitting on the same page (with block_write_anon_page),
but these other buffers are not necessarily old enough to be flushed.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Andreas Dilger

Marcelo Tosatti writes:
 It seems your code has a problem with bh flush time.
 
 In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
 be written in case its old enough. (bh-b_flushtime)
 
 If the flush happens for an anonymous buffer, you'll end up writing all
 buffers which are sitting on the same page (with block_write_anon_page),
 but these other buffers are not necessarily old enough to be flushed.

This isn't really a "problem" however.  The page is the _maximum_ age of
the buffer before it needs to be written.  If we can efficiently write it
out with another buffer (essentially for free if they are on the same
spot on disk), then there is less work for us to do later.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [RFC] changes to buffer.c (was Test12 ll_rw_block error)

2000-12-21 Thread Marcelo Tosatti


On Thu, 21 Dec 2000, Andreas Dilger wrote:

 Marcelo Tosatti writes:
  It seems your code has a problem with bh flush time.
  
  In flush_dirty_buffers(), a buffer may (if being called from kupdate) only
  be written in case its old enough. (bh-b_flushtime)
  
  If the flush happens for an anonymous buffer, you'll end up writing all
  buffers which are sitting on the same page (with block_write_anon_page),
  but these other buffers are not necessarily old enough to be flushed.
 
 This isn't really a "problem" however.  The page is the _maximum_ age of
 the buffer before it needs to be written.  If we can efficiently write it
 out with another buffer 


 (essentially for free if they are on the same spot on disk)

Are you sure this is true for buffer pages in most cases? 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/