Re: [PATCH 7/7] readahead: basic support of interleaved reads

2007-07-21 Thread Fengguang Wu
On Sat, Jul 21, 2007 at 05:24:39PM +1000, Rusty Russell wrote:
>   This is really clever!

Thank you :)

>   Only one slight complaint: I wonder if "radix_tree_scan_hole" could be
> expressed as "radix_tree_extent_size" which return the number of

In fact the full context based readahead requires three functions:

- radix_tree_scan_hole(root, index, max_scan)   

- radix_tree_scan_hole_backward(root, index, max_scan)  

- radix_tree_scan_data_backward(root, index, max_scan)  


I would be very glad if you can work up a consistent naming scheme
for them all ;)

> populated indices up to "max_scan".  Returning a length seems more
> intuitive to me (and I think gets rid of the wraparound error case?)

No, changing the return value won't eliminate the wraparound case.

But I guess we can safely assume that it won't wrap around at all?
The last page in the pagecache address space will not be
granted/allocated thanks to the not-too-aggressive MAX_LFS_FILESIZE?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] readahead: basic support of interleaved reads

2007-07-21 Thread Rusty Russell
On Sat, 2007-07-21 at 12:43 +0800, Fengguang Wu wrote:
> plain text document attachment (readahead-interleaved-reads.patch)
> This is a simplified version of the pagecache context based readahead.
> It handles the case of multiple threads reading on the same fd and 
> invalidating
> each others' readahead state. It does the trick by scanning the pagecache and
> recovering the current read stream's readahead status.
> 
> The algorithm works in a opportunistic way, in that it do not try to detect
> interleaved reads _actively_, which requires a probe into the page cache(which
> means a little more overheads for random reads). It only tries to handle a
> previously started sequential readahead whose state was overwritten by
> another concurrent stream, and it can do this job pretty well.

Hi Fengguang,

This is really clever!

Only one slight complaint: I wonder if "radix_tree_scan_hole" could be
expressed as "radix_tree_extent_size" which return the number of
populated indices up to "max_scan".  Returning a length seems more
intuitive to me (and I think gets rid of the wraparound error case?)

Cheers,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] readahead: basic support of interleaved reads

2007-07-21 Thread Rusty Russell
On Sat, 2007-07-21 at 12:43 +0800, Fengguang Wu wrote:
 plain text document attachment (readahead-interleaved-reads.patch)
 This is a simplified version of the pagecache context based readahead.
 It handles the case of multiple threads reading on the same fd and 
 invalidating
 each others' readahead state. It does the trick by scanning the pagecache and
 recovering the current read stream's readahead status.
 
 The algorithm works in a opportunistic way, in that it do not try to detect
 interleaved reads _actively_, which requires a probe into the page cache(which
 means a little more overheads for random reads). It only tries to handle a
 previously started sequential readahead whose state was overwritten by
 another concurrent stream, and it can do this job pretty well.

Hi Fengguang,

This is really clever!

Only one slight complaint: I wonder if radix_tree_scan_hole could be
expressed as radix_tree_extent_size which return the number of
populated indices up to max_scan.  Returning a length seems more
intuitive to me (and I think gets rid of the wraparound error case?)

Cheers,
Rusty.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7] readahead: basic support of interleaved reads

2007-07-21 Thread Fengguang Wu
On Sat, Jul 21, 2007 at 05:24:39PM +1000, Rusty Russell wrote:
   This is really clever!

Thank you :)

   Only one slight complaint: I wonder if radix_tree_scan_hole could be
 expressed as radix_tree_extent_size which return the number of

In fact the full context based readahead requires three functions:

- radix_tree_scan_hole(root, index, max_scan)   

- radix_tree_scan_hole_backward(root, index, max_scan)  

- radix_tree_scan_data_backward(root, index, max_scan)  


I would be very glad if you can work up a consistent naming scheme
for them all ;)

 populated indices up to max_scan.  Returning a length seems more
 intuitive to me (and I think gets rid of the wraparound error case?)

No, changing the return value won't eliminate the wraparound case.

But I guess we can safely assume that it won't wrap around at all?
The last page in the pagecache address space will not be
granted/allocated thanks to the not-too-aggressive MAX_LFS_FILESIZE?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] readahead: basic support of interleaved reads

2007-07-20 Thread Fengguang Wu
This is a simplified version of the pagecache context based readahead.
It handles the case of multiple threads reading on the same fd and invalidating
each others' readahead state. It does the trick by scanning the pagecache and
recovering the current read stream's readahead status.

The algorithm works in a opportunistic way, in that it do not try to detect
interleaved reads _actively_, which requires a probe into the page cache(which
means a little more overheads for random reads). It only tries to handle a
previously started sequential readahead whose state was overwritten by
another concurrent stream, and it can do this job pretty well.

Negative and positive examples(or what you can expect from it):

1) it cannot detect and serve perfect request-by-request interleaved reads
   right:
timestream 1  stream 2
0   1 
1 1001
2   2
3 1002
4   3
5 1003
6   4
7 1004
8   5
9 1005
Here no single readahead will be carried out.

2) However, if it's two concurrent reads by two threads, the chance of the
   initial sequential readahead be started is huge. Once the first sequential
   readahead is started for a stream, this patch will ensure that the readahead
   window continues to rampup and won't be disturbed by other streams.

timestream 1  stream 2
0   1 
1   2
2 1001
3   3
4 1002
5 1003
6   4
7   5
8 1004
9   6
101005
11  7
121006
131007
Here steam 1 will start a readahead at page 2, and stream 2 will start its
first readahead at page 1003. From then on the two streams will be served right.

Cc: Rusty Russell <[EMAIL PROTECTED]>
Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 mm/readahead.c |   33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -363,6 +363,29 @@ ondemand_readahead(struct address_space 
}
 
/*
+* Hit a marked page without valid readahead state.
+* E.g. interleaved reads.
+* Query the pagecache for async_size, which normally equals to
+* readahead size. Ramp it up and use it as the new readahead size.
+*/
+   if (hit_readahead_marker) {
+   pgoff_t start;
+
+   read_lock_irq(>tree_lock);
+   start = radix_tree_scan_hole(>page_tree, offset, 
max+1);
+   read_unlock_irq(>tree_lock);
+
+   if (!start || start - offset > max)
+   return 0;
+
+   ra->start = start;
+   ra->size = start - offset;  /* old async_size */
+   ra->size = get_next_ra_size(ra, max);
+   ra->async_size = ra->size;
+   goto readit;
+   }
+
+   /*
 * It may be one of
 *  - first read on start of file
 *  - sequential cache miss
@@ -373,16 +396,6 @@ ondemand_readahead(struct address_space 
ra->size = get_init_ra_size(req_size, max);
ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
-   /*
-* Hit on a marked page without valid readahead state.
-* E.g. interleaved reads.
-* Not knowing its readahead pos/size, bet on the minimal possible one.
-*/
-   if (hit_readahead_marker) {
-   ra->start++;
-   ra->size = get_next_ra_size(ra, max);
-   }
-
 readit:
return ra_submit(ra, mapping, filp);
 }

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] readahead: basic support of interleaved reads

2007-07-20 Thread Fengguang Wu
This is a simplified version of the pagecache context based readahead.
It handles the case of multiple threads reading on the same fd and invalidating
each others' readahead state. It does the trick by scanning the pagecache and
recovering the current read stream's readahead status.

The algorithm works in a opportunistic way, in that it do not try to detect
interleaved reads _actively_, which requires a probe into the page cache(which
means a little more overheads for random reads). It only tries to handle a
previously started sequential readahead whose state was overwritten by
another concurrent stream, and it can do this job pretty well.

Negative and positive examples(or what you can expect from it):

1) it cannot detect and serve perfect request-by-request interleaved reads
   right:
timestream 1  stream 2
0   1 
1 1001
2   2
3 1002
4   3
5 1003
6   4
7 1004
8   5
9 1005
Here no single readahead will be carried out.

2) However, if it's two concurrent reads by two threads, the chance of the
   initial sequential readahead be started is huge. Once the first sequential
   readahead is started for a stream, this patch will ensure that the readahead
   window continues to rampup and won't be disturbed by other streams.

timestream 1  stream 2
0   1 
1   2
2 1001
3   3
4 1002
5 1003
6   4
7   5
8 1004
9   6
101005
11  7
121006
131007
Here steam 1 will start a readahead at page 2, and stream 2 will start its
first readahead at page 1003. From then on the two streams will be served right.

Cc: Rusty Russell [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 mm/readahead.c |   33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -363,6 +363,29 @@ ondemand_readahead(struct address_space 
}
 
/*
+* Hit a marked page without valid readahead state.
+* E.g. interleaved reads.
+* Query the pagecache for async_size, which normally equals to
+* readahead size. Ramp it up and use it as the new readahead size.
+*/
+   if (hit_readahead_marker) {
+   pgoff_t start;
+
+   read_lock_irq(mapping-tree_lock);
+   start = radix_tree_scan_hole(mapping-page_tree, offset, 
max+1);
+   read_unlock_irq(mapping-tree_lock);
+
+   if (!start || start - offset  max)
+   return 0;
+
+   ra-start = start;
+   ra-size = start - offset;  /* old async_size */
+   ra-size = get_next_ra_size(ra, max);
+   ra-async_size = ra-size;
+   goto readit;
+   }
+
+   /*
 * It may be one of
 *  - first read on start of file
 *  - sequential cache miss
@@ -373,16 +396,6 @@ ondemand_readahead(struct address_space 
ra-size = get_init_ra_size(req_size, max);
ra-async_size = ra-size  req_size ? ra-size - req_size : ra-size;
 
-   /*
-* Hit on a marked page without valid readahead state.
-* E.g. interleaved reads.
-* Not knowing its readahead pos/size, bet on the minimal possible one.
-*/
-   if (hit_readahead_marker) {
-   ra-start++;
-   ra-size = get_next_ra_size(ra, max);
-   }
-
 readit:
return ra_submit(ra, mapping, filp);
 }

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/