Re: [PATCH] fix page_alloc for larger I/O segments (improved)
On Fri, Dec 14, 2007 at 11:13:40AM -0700, Matthew Wilcox wrote: > I'll send it to our DB team to see if this improves our numbers at all. It does, by approximately 0.67%. This is about double the margin of error, and a significant improvement. Thanks! -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
On (14/12/07 13:07), Mark Lord didst pronounce: > > > That (also) works for me here, regularly generating 64KB I/O segments with > SLAB. > Brilliant. Thanks a lot Mark. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
Matthew Wilcox wrote: On Fri, Dec 14, 2007 at 05:42:37PM +, Mel Gorman wrote: Regrettably this interferes with anti-fragmentation because the "next" page on the list on return from rmqueue_bulk is not guaranteed to be of the right mobility type. I fixed it as an additional patch but it adds additional cost that should not be necessary and it's visible in microbenchmark results on at least one machine. Is this patch to be preferred to the one Andrew Morton posted to do list_for_each_entry_reverse? .. This patch replaces my earlier patch that Andrew has: - list_add(&page->lru, list); + list_add_tail(&page->lru, list); Which, in turn, replaced the even-earlier list_for_each_entry_reverse patch. -ml - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
On Fri, Dec 14, 2007 at 05:42:37PM +, Mel Gorman wrote: > Regrettably this interferes with anti-fragmentation because the "next" page > on the list on return from rmqueue_bulk is not guaranteed to be of the right > mobility type. I fixed it as an additional patch but it adds additional cost > that should not be necessary and it's visible in microbenchmark results on > at least one machine. Is this patch to be preferred to the one Andrew Morton posted to do list_for_each_entry_reverse? I'll send it to our DB team to see if this improves our numbers at all. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
Mel Gorman wrote: On (13/12/07 19:46), Mark Lord didst pronounce: "Improved version", more similar to the 2.6.23 code: Fix page allocator to give better chance of larger contiguous segments (again). Signed-off-by: Mark Lord <[EMAIL PROTECTED] Regrettably this interferes with anti-fragmentation because the "next" page on the list on return from rmqueue_bulk is not guaranteed to be of the right mobility type. I fixed it as an additional patch but it adds additional cost that should not be necessary and it's visible in microbenchmark results on at least one machine. The following patch should fix the page ordering problem without incurring an additional cost or interfering with anti-fragmentation. However, I haven't anything in place yet to verify that the physical page ordering is correct but it makes sense. Can you verify it fixes the problem please? It'll still be some time before I have a full set of performance results but initially at least, this fix seems to avoid any impact. == Subject: Fix page allocation for larger I/O segments In some cases the IO subsystem is able to merge requests if the pages are adjacent in physical memory. This was achieved in the allocator by having expand() return pages in physically contiguous order in situations were a large buddy was split. However, list-based anti-fragmentation changed the order pages were returned in to avoid searching in buffered_rmqueue() for a page of the appropriate migrate type. This patch restores behaviour of rmqueue_bulk() preserving the physical order of pages returned by the allocator without incurring increased search costs for anti-fragmentation. Signed-off-by: Mel Gorman <[EMAIL PROTECTED]> --- page_alloc.c | 11 +++ 1 file changed, 11 insertions(+) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc5-clean/mm/page_alloc.c linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c --- linux-2.6.24-rc5-clean/mm/page_alloc.c 2007-12-14 11:55:13.0 + +++ linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c 2007-12-14 15:33:12.0 + @@ -847,8 +847,19 @@ static int rmqueue_bulk(struct zone *zon struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; + + /* +* Split buddy pages returned by expand() are received here +* in physical page order. The page is added to the callers and +* list and the list head then moves forward. From the callers +* perspective, the linked list is ordered by page number in +* some conditions. This is useful for IO devices that can +* merge IO requests if the physical pages are ordered +* properly. +*/ list_add(&page->lru, list); set_page_private(page, migratetype); + list = &page->lru; } spin_unlock(&zone->lock); return i; .. That (also) works for me here, regularly generating 64KB I/O segments with SLAB. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
On (13/12/07 19:46), Mark Lord didst pronounce: > > "Improved version", more similar to the 2.6.23 code: > > Fix page allocator to give better chance of larger contiguous segments > (again). > > Signed-off-by: Mark Lord <[EMAIL PROTECTED] Regrettably this interferes with anti-fragmentation because the "next" page on the list on return from rmqueue_bulk is not guaranteed to be of the right mobility type. I fixed it as an additional patch but it adds additional cost that should not be necessary and it's visible in microbenchmark results on at least one machine. The following patch should fix the page ordering problem without incurring an additional cost or interfering with anti-fragmentation. However, I haven't anything in place yet to verify that the physical page ordering is correct but it makes sense. Can you verify it fixes the problem please? It'll still be some time before I have a full set of performance results but initially at least, this fix seems to avoid any impact. == Subject: Fix page allocation for larger I/O segments In some cases the IO subsystem is able to merge requests if the pages are adjacent in physical memory. This was achieved in the allocator by having expand() return pages in physically contiguous order in situations were a large buddy was split. However, list-based anti-fragmentation changed the order pages were returned in to avoid searching in buffered_rmqueue() for a page of the appropriate migrate type. This patch restores behaviour of rmqueue_bulk() preserving the physical order of pages returned by the allocator without incurring increased search costs for anti-fragmentation. Signed-off-by: Mel Gorman <[EMAIL PROTECTED]> --- page_alloc.c | 11 +++ 1 file changed, 11 insertions(+) diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc5-clean/mm/page_alloc.c linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c --- linux-2.6.24-rc5-clean/mm/page_alloc.c 2007-12-14 11:55:13.0 + +++ linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c 2007-12-14 15:33:12.0 + @@ -847,8 +847,19 @@ static int rmqueue_bulk(struct zone *zon struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; + + /* +* Split buddy pages returned by expand() are received here +* in physical page order. The page is added to the callers and +* list and the list head then moves forward. From the callers +* perspective, the linked list is ordered by page number in +* some conditions. This is useful for IO devices that can +* merge IO requests if the physical pages are ordered +* properly. +*/ list_add(&page->lru, list); set_page_private(page, migratetype); + list = &page->lru; } spin_unlock(&zone->lock); return i; - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
Andrew Morton wrote: On Thu, 13 Dec 2007 19:57:29 -0500 James Bottomley <[EMAIL PROTECTED]> wrote: On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote: "Improved version", more similar to the 2.6.23 code: Fix page allocator to give better chance of larger contiguous segments (again). Signed-off-by: Mark Lord <[EMAIL PROTECTED] --- --- old/mm/page_alloc.c 2007-12-13 19:25:15.0 -0500 +++ linux-2.6/mm/page_alloc.c 2007-12-13 19:43:07.0 -0500 @@ -760,7 +760,7 @@ struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; - list_add(&page->lru, list); + list_add_tail(&page->lru, list); Could we put a big comment above this explaining to the would be vm tweakers why this has to be a list_add_tail, so we don't end up back in this position after another two years? Already done ;) .. I thought of the comment as I rushed off for dinner. Thanks, Andrew! --- a/mm/page_alloc.c~fix-page_alloc-for-larger-i-o-segments-fix +++ a/mm/page_alloc.c @@ -847,6 +847,10 @@ static int rmqueue_bulk(struct zone *zon struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; + /* +* Doing a list_add_tail() here helps us to hand out pages in +* ascending physical-address order. +*/ list_add_tail(&page->lru, list); set_page_private(page, migratetype); } _ - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
On Thu, 13 Dec 2007 19:57:29 -0500 James Bottomley <[EMAIL PROTECTED]> wrote: > > On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote: > > "Improved version", more similar to the 2.6.23 code: > > > > Fix page allocator to give better chance of larger contiguous segments > > (again). > > > > Signed-off-by: Mark Lord <[EMAIL PROTECTED] > > --- > > > > --- old/mm/page_alloc.c 2007-12-13 19:25:15.0 -0500 > > +++ linux-2.6/mm/page_alloc.c 2007-12-13 19:43:07.0 -0500 > > @@ -760,7 +760,7 @@ > > struct page *page = __rmqueue(zone, order, migratetype); > > if (unlikely(page == NULL)) > > break; > > - list_add(&page->lru, list); > > + list_add_tail(&page->lru, list); > > Could we put a big comment above this explaining to the would be vm > tweakers why this has to be a list_add_tail, so we don't end up back in > this position after another two years? > Already done ;) --- a/mm/page_alloc.c~fix-page_alloc-for-larger-i-o-segments-fix +++ a/mm/page_alloc.c @@ -847,6 +847,10 @@ static int rmqueue_bulk(struct zone *zon struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; + /* +* Doing a list_add_tail() here helps us to hand out pages in +* ascending physical-address order. +*/ list_add_tail(&page->lru, list); set_page_private(page, migratetype); } _ - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix page_alloc for larger I/O segments (improved)
On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote: > "Improved version", more similar to the 2.6.23 code: > > Fix page allocator to give better chance of larger contiguous segments > (again). > > Signed-off-by: Mark Lord <[EMAIL PROTECTED] > --- > > --- old/mm/page_alloc.c 2007-12-13 19:25:15.0 -0500 > +++ linux-2.6/mm/page_alloc.c 2007-12-13 19:43:07.0 -0500 > @@ -760,7 +760,7 @@ > struct page *page = __rmqueue(zone, order, migratetype); > if (unlikely(page == NULL)) > break; > - list_add(&page->lru, list); > + list_add_tail(&page->lru, list); Could we put a big comment above this explaining to the would be vm tweakers why this has to be a list_add_tail, so we don't end up back in this position after another two years? James - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix page_alloc for larger I/O segments (improved)
"Improved version", more similar to the 2.6.23 code: Fix page allocator to give better chance of larger contiguous segments (again). Signed-off-by: Mark Lord <[EMAIL PROTECTED] --- --- old/mm/page_alloc.c 2007-12-13 19:25:15.0 -0500 +++ linux-2.6/mm/page_alloc.c 2007-12-13 19:43:07.0 -0500 @@ -760,7 +760,7 @@ struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; - list_add(&page->lru, list); + list_add_tail(&page->lru, list); set_page_private(page, migratetype); } spin_unlock(&zone->lock); - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html