Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-20 Thread Matthew Wilcox
On Fri, Dec 14, 2007 at 11:13:40AM -0700, Matthew Wilcox wrote:
> I'll send it to our DB team to see if this improves our numbers at all.

It does, by approximately 0.67%.  This is about double the margin of
error, and a significant improvement.  Thanks!

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-16 Thread Mel Gorman
On (14/12/07 13:07), Mark Lord didst pronounce:
> 
> 
> That (also) works for me here, regularly generating 64KB I/O segments with 
> SLAB.
> 

Brilliant. Thanks a lot Mark.

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-14 Thread Mark Lord

Matthew Wilcox wrote:

On Fri, Dec 14, 2007 at 05:42:37PM +, Mel Gorman wrote:

Regrettably this interferes with anti-fragmentation because the "next" page
on the list on return from rmqueue_bulk is not guaranteed to be of the right
mobility type. I fixed it as an additional patch but it adds additional cost
that should not be necessary and it's visible in microbenchmark results on
at least one machine.


Is this patch to be preferred to the one Andrew Morton posted to do
list_for_each_entry_reverse?

..

This patch replaces my earlier patch that Andrew has:

-   list_add(&page->lru, list);
+   list_add_tail(&page->lru, list);

Which, in turn, replaced the even-earlier list_for_each_entry_reverse patch.

-ml
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-14 Thread Matthew Wilcox
On Fri, Dec 14, 2007 at 05:42:37PM +, Mel Gorman wrote:
> Regrettably this interferes with anti-fragmentation because the "next" page
> on the list on return from rmqueue_bulk is not guaranteed to be of the right
> mobility type. I fixed it as an additional patch but it adds additional cost
> that should not be necessary and it's visible in microbenchmark results on
> at least one machine.

Is this patch to be preferred to the one Andrew Morton posted to do
list_for_each_entry_reverse?

I'll send it to our DB team to see if this improves our numbers at all.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-14 Thread Mark Lord

Mel Gorman wrote:

On (13/12/07 19:46), Mark Lord didst pronounce:

"Improved version", more similar to the 2.6.23 code:

Fix page allocator to give better chance of larger contiguous segments 
(again).


Signed-off-by: Mark Lord <[EMAIL PROTECTED]


Regrettably this interferes with anti-fragmentation because the "next" page
on the list on return from rmqueue_bulk is not guaranteed to be of the right
mobility type. I fixed it as an additional patch but it adds additional cost
that should not be necessary and it's visible in microbenchmark results on
at least one machine.

The following patch should fix the page ordering problem without incurring an
additional cost or interfering with anti-fragmentation. However, I haven't
anything in place yet to verify that the physical page ordering is correct
but it makes sense. Can you verify it fixes the problem please?

It'll still be some time before I have a full set of performance results
but initially at least, this fix seems to avoid any impact.

==
Subject: Fix page allocation for larger I/O segments

In some cases the IO subsystem is able to merge requests if the pages are
adjacent in physical memory. This was achieved in the allocator by having
expand() return pages in physically contiguous order in situations were
a large buddy was split. However, list-based anti-fragmentation changed
the order pages were returned in to avoid searching in buffered_rmqueue()
for a page of the appropriate migrate type.

This patch restores behaviour of rmqueue_bulk() preserving the physical order
of pages returned by the allocator without incurring increased search costs for
anti-fragmentation.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
--- 
 page_alloc.c |   11 +++

 1 file changed, 11 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.24-rc5-clean/mm/page_alloc.c 
linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c
--- linux-2.6.24-rc5-clean/mm/page_alloc.c  2007-12-14 11:55:13.0 
+
+++ linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c
2007-12-14 15:33:12.0 +
@@ -847,8 +847,19 @@ static int rmqueue_bulk(struct zone *zon
struct page *page = __rmqueue(zone, order, migratetype);
if (unlikely(page == NULL))
break;
+
+   /*
+* Split buddy pages returned by expand() are received here
+* in physical page order. The page is added to the callers and
+* list and the list head then moves forward. From the callers
+* perspective, the linked list is ordered by page number in
+* some conditions. This is useful for IO devices that can
+* merge IO requests if the physical pages are ordered
+* properly.
+*/
list_add(&page->lru, list);
set_page_private(page, migratetype);
+   list = &page->lru;
}
spin_unlock(&zone->lock);
return i;

..

That (also) works for me here, regularly generating 64KB I/O segments with SLAB.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-14 Thread Mel Gorman
On (13/12/07 19:46), Mark Lord didst pronounce:
> 
> "Improved version", more similar to the 2.6.23 code:
> 
> Fix page allocator to give better chance of larger contiguous segments 
> (again).
> 
> Signed-off-by: Mark Lord <[EMAIL PROTECTED]

Regrettably this interferes with anti-fragmentation because the "next" page
on the list on return from rmqueue_bulk is not guaranteed to be of the right
mobility type. I fixed it as an additional patch but it adds additional cost
that should not be necessary and it's visible in microbenchmark results on
at least one machine.

The following patch should fix the page ordering problem without incurring an
additional cost or interfering with anti-fragmentation. However, I haven't
anything in place yet to verify that the physical page ordering is correct
but it makes sense. Can you verify it fixes the problem please?

It'll still be some time before I have a full set of performance results
but initially at least, this fix seems to avoid any impact.

==
Subject: Fix page allocation for larger I/O segments

In some cases the IO subsystem is able to merge requests if the pages are
adjacent in physical memory. This was achieved in the allocator by having
expand() return pages in physically contiguous order in situations were
a large buddy was split. However, list-based anti-fragmentation changed
the order pages were returned in to avoid searching in buffered_rmqueue()
for a page of the appropriate migrate type.

This patch restores behaviour of rmqueue_bulk() preserving the physical order
of pages returned by the allocator without incurring increased search costs for
anti-fragmentation.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
--- 
 page_alloc.c |   11 +++
 1 file changed, 11 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.24-rc5-clean/mm/page_alloc.c 
linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c
--- linux-2.6.24-rc5-clean/mm/page_alloc.c  2007-12-14 11:55:13.0 
+
+++ linux-2.6.24-rc5-giveback-physorder-listmove/mm/page_alloc.c
2007-12-14 15:33:12.0 +
@@ -847,8 +847,19 @@ static int rmqueue_bulk(struct zone *zon
struct page *page = __rmqueue(zone, order, migratetype);
if (unlikely(page == NULL))
break;
+
+   /*
+* Split buddy pages returned by expand() are received here
+* in physical page order. The page is added to the callers and
+* list and the list head then moves forward. From the callers
+* perspective, the linked list is ordered by page number in
+* some conditions. This is useful for IO devices that can
+* merge IO requests if the physical pages are ordered
+* properly.
+*/
list_add(&page->lru, list);
set_page_private(page, migratetype);
+   list = &page->lru;
}
spin_unlock(&zone->lock);
return i;
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-13 Thread Mark Lord

Andrew Morton wrote:

On Thu, 13 Dec 2007 19:57:29 -0500
James Bottomley <[EMAIL PROTECTED]> wrote:


On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote:

"Improved version", more similar to the 2.6.23 code:

Fix page allocator to give better chance of larger contiguous segments (again).

Signed-off-by: Mark Lord <[EMAIL PROTECTED]
---

--- old/mm/page_alloc.c 2007-12-13 19:25:15.0 -0500
+++ linux-2.6/mm/page_alloc.c   2007-12-13 19:43:07.0 -0500
@@ -760,7 +760,7 @@
struct page *page = __rmqueue(zone, order, migratetype);
if (unlikely(page == NULL))
break;
-   list_add(&page->lru, list);
+   list_add_tail(&page->lru, list);

Could we put a big comment above this explaining to the would be vm
tweakers why this has to be a list_add_tail, so we don't end up back in
this position after another two years?



Already done ;)

..

I thought of the comment as I rushed off for dinner.
Thanks, Andrew!


--- a/mm/page_alloc.c~fix-page_alloc-for-larger-i-o-segments-fix
+++ a/mm/page_alloc.c
@@ -847,6 +847,10 @@ static int rmqueue_bulk(struct zone *zon
struct page *page = __rmqueue(zone, order, migratetype);
if (unlikely(page == NULL))
break;
+   /*
+* Doing a list_add_tail() here helps us to hand out pages in
+* ascending physical-address order.
+*/
list_add_tail(&page->lru, list);
set_page_private(page, migratetype);
}
_


-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-13 Thread Andrew Morton
On Thu, 13 Dec 2007 19:57:29 -0500
James Bottomley <[EMAIL PROTECTED]> wrote:

> 
> On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote:
> > "Improved version", more similar to the 2.6.23 code:
> > 
> > Fix page allocator to give better chance of larger contiguous segments 
> > (again).
> > 
> > Signed-off-by: Mark Lord <[EMAIL PROTECTED]
> > ---
> > 
> > --- old/mm/page_alloc.c 2007-12-13 19:25:15.0 -0500
> > +++ linux-2.6/mm/page_alloc.c   2007-12-13 19:43:07.0 -0500
> > @@ -760,7 +760,7 @@
> > struct page *page = __rmqueue(zone, order, migratetype);
> > if (unlikely(page == NULL))
> > break;
> > -   list_add(&page->lru, list);
> > +   list_add_tail(&page->lru, list);
> 
> Could we put a big comment above this explaining to the would be vm
> tweakers why this has to be a list_add_tail, so we don't end up back in
> this position after another two years?
> 

Already done ;)

--- a/mm/page_alloc.c~fix-page_alloc-for-larger-i-o-segments-fix
+++ a/mm/page_alloc.c
@@ -847,6 +847,10 @@ static int rmqueue_bulk(struct zone *zon
struct page *page = __rmqueue(zone, order, migratetype);
if (unlikely(page == NULL))
break;
+   /*
+* Doing a list_add_tail() here helps us to hand out pages in
+* ascending physical-address order.
+*/
list_add_tail(&page->lru, list);
set_page_private(page, migratetype);
}
_

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-13 Thread James Bottomley

On Thu, 2007-12-13 at 19:46 -0500, Mark Lord wrote:
> "Improved version", more similar to the 2.6.23 code:
> 
> Fix page allocator to give better chance of larger contiguous segments 
> (again).
> 
> Signed-off-by: Mark Lord <[EMAIL PROTECTED]
> ---
> 
> --- old/mm/page_alloc.c   2007-12-13 19:25:15.0 -0500
> +++ linux-2.6/mm/page_alloc.c 2007-12-13 19:43:07.0 -0500
> @@ -760,7 +760,7 @@
>   struct page *page = __rmqueue(zone, order, migratetype);
>   if (unlikely(page == NULL))
>   break;
> - list_add(&page->lru, list);
> + list_add_tail(&page->lru, list);

Could we put a big comment above this explaining to the would be vm
tweakers why this has to be a list_add_tail, so we don't end up back in
this position after another two years?

James


-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix page_alloc for larger I/O segments (improved)

2007-12-13 Thread Mark Lord


"Improved version", more similar to the 2.6.23 code:

Fix page allocator to give better chance of larger contiguous segments (again).

Signed-off-by: Mark Lord <[EMAIL PROTECTED]
---

--- old/mm/page_alloc.c 2007-12-13 19:25:15.0 -0500
+++ linux-2.6/mm/page_alloc.c   2007-12-13 19:43:07.0 -0500
@@ -760,7 +760,7 @@
struct page *page = __rmqueue(zone, order, migratetype);
if (unlikely(page == NULL))
break;
-   list_add(&page->lru, list);
+   list_add_tail(&page->lru, list);
set_page_private(page, migratetype);
}
spin_unlock(&zone->lock);
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html