Re: [PATCH 09/12] mm: remove throttle_vm_writeback

2007-09-26 Thread Peter Zijlstra

On Thu, 2007-04-05 at 15:44 -0700, Andrew Morton wrote:
> On Thu, 05 Apr 2007 19:42:18 +0200
> [EMAIL PROTECTED] wrote:
> 
> > rely on accurate dirty page accounting to provide enough push back
> 
> I think we'd like to see a bit more justification than that, please.

it should read like this:

for ( ; ; ) {
get_dirty_limits(_thresh, _thresh, NULL, NULL);

/*
 * Boost the allowable dirty threshold a bit for page
 * allocators so they don't get DoS'ed by heavy writers
 */
dirty_thresh += dirty_thresh / 10;  /* wh... */

if (global_page_state(NR_FILE_DIRTY) + 
global_page_state(NR_UNSTABLE_NFS) +
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;

congestion_wait(WRITE, HZ/10);
}

[ note the extra NR_FILE_DIRTY ]

now, balance_dirty_pages() is there to ensure:

  nr_dirty + nr_unstable + nr_writeback < dirty_thresh  (1)

reclaim will (with the introduction of dirty page tracking) never
generate dirty pages, so the only disturbance of that equation is an
increase in nr_writeback.

[ pageout() sets wbc.for_reclaim=1, so NFS traffic will not generate
  unstable pages ]

So, what throttle_vm_writeout() does is limit the number of added
writeback pages to 10% of the total limit.

pageout() seems to avoid stuffing pages down a congested bdi 
(TODO: has details), along with the much smaller io-queues, the initial
purpose of this function - which was to avoid all memory getting stuck
in io-queues - seems to be handled.

Now the problems...

Trouble is that it currently does not take nr_dirty into account which
in the worst case limits it to 110% of the limit.

Also, I'm seeing (2.6.23-rc8-mm1) live-locks in throttle_vm_writeback()
where nr_dirty + nr_unstable > thresh - which according to (1) should
not happen, and will not change without explicit action.

Hmm maybe the 10% is < nr_cpus * ratelimit_pages.

2 cpus, mem=128M -> ratelimit_pages ~ 512
threshold ~ 1500

so indeed: 150 < 1024.

Still not conclusive but at least getting somewhere.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] mm: remove throttle_vm_writeback

2007-09-26 Thread Peter Zijlstra

On Thu, 2007-04-05 at 15:44 -0700, Andrew Morton wrote:
 On Thu, 05 Apr 2007 19:42:18 +0200
 [EMAIL PROTECTED] wrote:
 
  rely on accurate dirty page accounting to provide enough push back
 
 I think we'd like to see a bit more justification than that, please.

it should read like this:

for ( ; ; ) {
get_dirty_limits(background_thresh, dirty_thresh, NULL, NULL);

/*
 * Boost the allowable dirty threshold a bit for page
 * allocators so they don't get DoS'ed by heavy writers
 */
dirty_thresh += dirty_thresh / 10;  /* wh... */

if (global_page_state(NR_FILE_DIRTY) + 
global_page_state(NR_UNSTABLE_NFS) +
global_page_state(NR_WRITEBACK) = dirty_thresh)
break;

congestion_wait(WRITE, HZ/10);
}

[ note the extra NR_FILE_DIRTY ]

now, balance_dirty_pages() is there to ensure:

  nr_dirty + nr_unstable + nr_writeback  dirty_thresh  (1)

reclaim will (with the introduction of dirty page tracking) never
generate dirty pages, so the only disturbance of that equation is an
increase in nr_writeback.

[ pageout() sets wbc.for_reclaim=1, so NFS traffic will not generate
  unstable pages ]

So, what throttle_vm_writeout() does is limit the number of added
writeback pages to 10% of the total limit.

pageout() seems to avoid stuffing pages down a congested bdi 
(TODO: has details), along with the much smaller io-queues, the initial
purpose of this function - which was to avoid all memory getting stuck
in io-queues - seems to be handled.

Now the problems...

Trouble is that it currently does not take nr_dirty into account which
in the worst case limits it to 110% of the limit.

Also, I'm seeing (2.6.23-rc8-mm1) live-locks in throttle_vm_writeback()
where nr_dirty + nr_unstable  thresh - which according to (1) should
not happen, and will not change without explicit action.

Hmm maybe the 10% is  nr_cpus * ratelimit_pages.

2 cpus, mem=128M - ratelimit_pages ~ 512
threshold ~ 1500

so indeed: 150  1024.

Still not conclusive but at least getting somewhere.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] mm: remove throttle_vm_writeback

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 19:42:18 +0200
[EMAIL PROTECTED] wrote:

> rely on accurate dirty page accounting to provide enough push back

I think we'd like to see a bit more justification than that, please.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/12] mm: remove throttle_vm_writeback

2007-04-05 Thread root
rely on accurate dirty page accounting to provide enough push back

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---

Index: linux-2.6-mm/include/linux/writeback.h
===
--- linux-2.6-mm.orig/include/linux/writeback.h 2007-04-05 13:23:51.0 
+0200
+++ linux-2.6-mm/include/linux/writeback.h  2007-04-05 13:24:11.0 
+0200
@@ -84,7 +84,6 @@ static inline void wait_on_inode(struct 
 int wakeup_pdflush(long nr_pages);
 void laptop_io_completion(void);
 void laptop_sync_completion(void);
-void throttle_vm_writeout(gfp_t gfp_mask);
 
 extern struct timer_list laptop_mode_wb_timer;
 static inline int laptop_spinned_down(void)
Index: linux-2.6-mm/mm/page-writeback.c
===
--- linux-2.6-mm.orig/mm/page-writeback.c   2007-04-05 13:23:51.0 
+0200
+++ linux-2.6-mm/mm/page-writeback.c2007-04-05 13:24:38.0 +0200
@@ -437,37 +437,6 @@ void balance_dirty_pages_ratelimited_nr(
 }
 EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
 
-void throttle_vm_writeout(gfp_t gfp_mask)
-{
-   long background_thresh;
-   long dirty_thresh;
-
-   if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
-   /*
-* The caller might hold locks which can prevent IO completion
-* or progress in the filesystem.  So we cannot just sit here
-* waiting for IO to complete.
-*/
-   congestion_wait(WRITE, HZ/10);
-   return;
-   }
-
-for ( ; ; ) {
-   get_dirty_limits(_thresh, _thresh, NULL, NULL);
-
-/*
- * Boost the allowable dirty threshold a bit for page
- * allocators so they don't get DoS'ed by heavy writers
- */
-dirty_thresh += dirty_thresh / 10;  /* wh... */
-
-if (global_page_state(NR_UNSTABLE_NFS) +
-   global_page_state(NR_WRITEBACK) <= dirty_thresh)
-   break;
-congestion_wait(WRITE, HZ/10);
-}
-}
-
 /*
  * writeback at least _min_pages, and keep writing until the amount of dirty
  * memory is less than the background threshold, or until we're all clean.
Index: linux-2.6-mm/mm/vmscan.c
===
--- linux-2.6-mm.orig/mm/vmscan.c   2007-04-03 12:17:57.0 +0200
+++ linux-2.6-mm/mm/vmscan.c2007-04-05 13:24:03.0 +0200
@@ -1047,8 +1047,6 @@ static unsigned long shrink_zone(int pri
}
}
 
-   throttle_vm_writeout(sc->gfp_mask);
-
atomic_dec(>reclaim_in_progress);
return nr_reclaimed;
 }

--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] mm: remove throttle_vm_writeback

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 19:42:18 +0200
[EMAIL PROTECTED] wrote:

 rely on accurate dirty page accounting to provide enough push back

I think we'd like to see a bit more justification than that, please.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/12] mm: remove throttle_vm_writeback

2007-04-05 Thread root
rely on accurate dirty page accounting to provide enough push back

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
---

Index: linux-2.6-mm/include/linux/writeback.h
===
--- linux-2.6-mm.orig/include/linux/writeback.h 2007-04-05 13:23:51.0 
+0200
+++ linux-2.6-mm/include/linux/writeback.h  2007-04-05 13:24:11.0 
+0200
@@ -84,7 +84,6 @@ static inline void wait_on_inode(struct 
 int wakeup_pdflush(long nr_pages);
 void laptop_io_completion(void);
 void laptop_sync_completion(void);
-void throttle_vm_writeout(gfp_t gfp_mask);
 
 extern struct timer_list laptop_mode_wb_timer;
 static inline int laptop_spinned_down(void)
Index: linux-2.6-mm/mm/page-writeback.c
===
--- linux-2.6-mm.orig/mm/page-writeback.c   2007-04-05 13:23:51.0 
+0200
+++ linux-2.6-mm/mm/page-writeback.c2007-04-05 13:24:38.0 +0200
@@ -437,37 +437,6 @@ void balance_dirty_pages_ratelimited_nr(
 }
 EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
 
-void throttle_vm_writeout(gfp_t gfp_mask)
-{
-   long background_thresh;
-   long dirty_thresh;
-
-   if ((gfp_mask  (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
-   /*
-* The caller might hold locks which can prevent IO completion
-* or progress in the filesystem.  So we cannot just sit here
-* waiting for IO to complete.
-*/
-   congestion_wait(WRITE, HZ/10);
-   return;
-   }
-
-for ( ; ; ) {
-   get_dirty_limits(background_thresh, dirty_thresh, NULL, NULL);
-
-/*
- * Boost the allowable dirty threshold a bit for page
- * allocators so they don't get DoS'ed by heavy writers
- */
-dirty_thresh += dirty_thresh / 10;  /* wh... */
-
-if (global_page_state(NR_UNSTABLE_NFS) +
-   global_page_state(NR_WRITEBACK) = dirty_thresh)
-   break;
-congestion_wait(WRITE, HZ/10);
-}
-}
-
 /*
  * writeback at least _min_pages, and keep writing until the amount of dirty
  * memory is less than the background threshold, or until we're all clean.
Index: linux-2.6-mm/mm/vmscan.c
===
--- linux-2.6-mm.orig/mm/vmscan.c   2007-04-03 12:17:57.0 +0200
+++ linux-2.6-mm/mm/vmscan.c2007-04-05 13:24:03.0 +0200
@@ -1047,8 +1047,6 @@ static unsigned long shrink_zone(int pri
}
}
 
-   throttle_vm_writeout(sc-gfp_mask);
-
atomic_dec(zone-reclaim_in_progress);
return nr_reclaimed;
 }

--

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/