On Sun, Sep 23, 2007 at 03:02:35PM +0200, Peter Zijlstra wrote: > On Sun, 23 Sep 2007 09:20:49 +0800 Fengguang Wu <[EMAIL PROTECTED]> > wrote: > > > On Sat, Sep 22, 2007 at 03:16:22PM +0200, Peter Zijlstra wrote: > > > On Sat, 22 Sep 2007 09:55:09 +0800 Fengguang Wu <[EMAIL PROTECTED]> > > > wrote: > > > > > > > --- linux-2.6.22.orig/mm/page-writeback.c > > > > +++ linux-2.6.22/mm/page-writeback.c > > > > @@ -426,6 +426,14 @@ static void balance_dirty_pages(struct a > > > > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > > > > } > > > > > > > > + printk(KERN_DEBUG "balance_dirty_pages written %lu %lu > > > > congested %d limits %lu %lu %lu %lu %lu %ld\n", > > > > + pages_written, > > > > + write_chunk - wbc.nr_to_write, > > > > + bdi_write_congested(bdi), > > > > + background_thresh, dirty_thresh, > > > > + bdi_thresh, bdi_nr_reclaimable, > > > > bdi_nr_writeback, > > > > + bdi_thresh - bdi_nr_reclaimable - > > > > bdi_nr_writeback); > > > > + > > > > if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > > > > break; > > > > if (pages_written >= write_chunk) > > > > > > > > > > > [ 1305.361511] balance_dirty_pages written 0 0 congested 0 limits 48869 > > > > 195477 5801 5760 288 -247 > > > > > > <snip long series of mostly identical lines> > > > > > > Could you perhaps instrument the writeback_inodes() path to see why > > > nothing is written out? - the attached patch would be a nice start. > > > > Curiously the lockup problem disappeared after upgrading to 2.6.23-rc6-mm1. > > (need to watch it in a longer time window). > > > > Anyway here's the output of your patch: > > sb_locked 0 > > sb_empty 97011 > > It this the delta during one of these lockups? If so, it would seem
delta since boot time, for 2.6.23-rc6-mm1, no lockups ;-) > that although dirty pages are reported against the BDI, no actual dirty > inodes could be found. no lockups, therefore not necessarily. There are many other calls into writeback_inodes(). > [ note to self: writeback_inodes() seems to write out to any superblock > in the system. Might want to limit that to superblocks on wbc->bdi ] generic_sync_sb_inodes() does have something like: if (wbc->bdi && bdi != wbc->bdi) continue; > You say that switching to .23-rc6-mm1 solved it in your case. You are > developing in the writeback_inodes() path, right? Could it be one of > your local changes that confused it here? There are a lot of changes between them: - bdi-v9 vs bdi-v10; - a lot writeback patches in -mm - some writeback patches maintained locally I just rebased my patches to .23-rc6-mm1... > > > Most peculiar. It seems writeback_inodes() doesn't even attempt to > > > write out stuff. Nor are outstanding writeback pages completed. > > > > Still true. Another problem is that balance_dirty_pages() is being called > > even > > when there are only 54 dirty pages. That could slow down writers > > unnecessarily. > > > > balance_dirty_pages() should not be entered at all with small nr_dirty. > > > > Look at these lines: > > [ 197.471619] balance_dirty_pages for tar written 405 405 congested 0 > > global 196554 54 403 196097 bdi 0 0 398 -398 > > [ 197.472196] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 0 0 380 -380 > > [ 197.472893] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 369 -346 > > [ 197.473158] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 366 -343 > > [ 197.473403] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 365 -342 > > [ 197.473674] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 364 -341 > > [ 197.474265] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 372 196128 bdi 23 0 362 -339 > > [ 197.475440] balance_dirty_pages for tar written 405 0 congested 0 global > > 196554 54 341 196159 bdi 47 0 327 -280 > > [ 197.476970] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 279 196213 bdi 95 0 279 -184 > > [ 197.477773] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 248 196244 bdi 95 0 255 -160 > > [ 197.479463] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 217 196275 bdi 143 0 210 -67 > > [ 197.479656] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 217 196275 bdi 143 0 209 -66 > > [ 197.481159] balance_dirty_pages for tar written 405 0 congested 0 global > > 196546 54 155 196337 bdi 167 0 163 4 > > That is an interesting idea how about this: It looks like a workaround, but it does solve the most important problem. And it is a good logic by itself. So I'd vote for it. The fundamental problem is that the per-bdi-writeback-completion based estimation is not accurate under light loads. The problem remains for a light-load sda when there is a heavy-load sdb. One more workaround could be to grant bdi(s) a minimal bdi_thresh. Or better to adjust the estimation logic? > --- > Subject: mm: speed up writeback ramp-up on clean systems > > We allow violation of bdi limits if there is a lot of room on the > system. Once we hit half the total limit we start enforcing bdi limits > and bdi ramp-up should happen. Doing it this way avoids many small > writeouts on an otherwise idle system and should also speed up the > ramp-up. > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> > --- > > Index: linux-2.6/mm/page-writeback.c > =================================================================== > --- linux-2.6.orig/mm/page-writeback.c > +++ linux-2.6/mm/page-writeback.c > @@ -355,8 +355,8 @@ get_dirty_limits(long *pbackground, long > */ > static void balance_dirty_pages(struct address_space *mapping) > { > - long bdi_nr_reclaimable; > - long bdi_nr_writeback; > + long nr_reclaimable, bdi_nr_reclaimable; > + long nr_writeback, bdi_nr_writeback; > long background_thresh; > long dirty_thresh; > long bdi_thresh; > @@ -376,9 +376,24 @@ static void balance_dirty_pages(struct a > > get_dirty_limits(&background_thresh, &dirty_thresh, > &bdi_thresh, bdi); > + > + nr_reclaimable = global_page_state(NR_FILE_DIRTY) + > + global_page_state(NR_UNSTABLE_NFS); > + nr_writeback = global_page_state(NR_WRITEBACK); > + > bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > - if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > + > + /* > + * break out early when: > + * - we're below the bdi limit > + * - we're below half the total limit > + * > + * we let the numbers exceed the strict bdi limit if the total > + * numbers are too low, this avoids (excessive) small writeouts. > + */ > + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh || > + nr_reclaimable + nr_writeback < dirty_thresh / 2) > break; This may be slightly better: if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) break; /* * Throttle it only when the background writeback cannot catchup. */ if (nr_reclaimable + nr_writeback < (background_thresh + dirty_thresh) / 2) break; > if (!bdi->dirty_exceeded) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/