Re: [RFC][PATCH] on-demand readahead

2007-04-26 Thread Andrew Morton
On Thu, 26 Apr 2007 00:04:00 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:

> If this new algorithm has been further tested and approved, I'll
> re-submit the patch in a cleaner, standalone form. The adaptive
> readahead patches can be dropped then. They may better be reworked as
> a kernel module.

that would be appreciated - I just don't know how to move the current patches
forward, and they do get in the way quite regularly.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-26 Thread Andrew Morton
On Thu, 26 Apr 2007 00:04:00 +0800 Fengguang Wu [EMAIL PROTECTED] wrote:

 If this new algorithm has been further tested and approved, I'll
 re-submit the patch in a cleaner, standalone form. The adaptive
 readahead patches can be dropped then. They may better be reworked as
 a kernel module.

that would be appreciated - I just don't know how to move the current patches
forward, and they do get in the way quite regularly.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Fengguang Wu
On Wed, Apr 25, 2007 at 06:08:44PM +0200, Andi Kleen wrote:
> > Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache 
> > files.
> 
> How?

In filemap.c:
if (!page) {
page_cache_readahead_adaptive(mapping,
, filp, page,
index, last_index - index);
page = find_get_page(mapping, index);
}
if (page && PageReadahead(page)) {
page_cache_readahead_adaptive(mapping,
, filp, page,
index, last_index - index);
}

Cache hot files neither have missing pages (!page) or lookahead
pages (PageReadahead(page)).  So it will not even be called.

> > > You seem to have a lot of magic numbers. They probably all need symbols 
> > > and 
> > > explanations.
> > 
> > The magic numbers are for easier testings, and will be removed in
> > future.  For now, they enables convenient comparing of the two
> > algorithms in one kernel.
> 
> I mean the 16 and 4 not the sysctl

The numbers and the code in get_next_ra_size2() is simply copied from
get_next_ra_size():

if (cur < max / 16) {
newsize = 4 * cur;
} else {
newsize = 2 * cur;
}

It's a trick to ramp up small sizes more quickly.
That trick is documented in the related get_init_ra_size().
So, it would be better to put the two routines together to make it clear.

> > 
> > If this new algorithm has been further tested and approved, I'll
> > re-submit the patch in a cleaner, standalone form. The adaptive
> > readahead patches can be dropped then. They may better be reworked as
> > a kernel module.
> 
> If they actually help and don't cause regressions they shouldn't be a module, 
> but integrated eventually Just it has to be all step by step.

Yeah, the adaptive readahead is complex and the possible workloads diverse.
It becomes obvious that there is a long way to go, and kernel module makes
life easier.

> > > Your white space also needs some work.
> > 
> > White space in patch description?
> 
> In the code indentation.

Ah, got it: a silly copy/paste mistake.

Thank you,
Wu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Andi Kleen
> Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache 
> files.

How?

> But what do you mean by AS?

struct address_space

> > You seem to have a lot of magic numbers. They probably all need symbols and 
> > explanations.
> 
> The magic numbers are for easier testings, and will be removed in
> future.  For now, they enables convenient comparing of the two
> algorithms in one kernel.

I mean the 16 and 4 not the sysctl

> 
> If this new algorithm has been further tested and approved, I'll
> re-submit the patch in a cleaner, standalone form. The adaptive
> readahead patches can be dropped then. They may better be reworked as
> a kernel module.

If they actually help and don't cause regressions they shouldn't be a module, 
but integrated eventually Just it has to be all step by step.
> 
> > Your white space also needs some work.
> 
> White space in patch description?

In the code indentation.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Fengguang Wu
On Wed, Apr 25, 2007 at 04:37:41PM +0200, Andi Kleen wrote:
> Fengguang Wu <[EMAIL PROTECTED]> writes:
> 
> > OVERHEADS
> > 
> > The new code reduced the overheads of
> > 
> > - excessively calling the readahead routine on small sized reads
> >   (the current readahead code insists on seeing all requests)
> > 
> > - doing a lot of pointless page-cache lookups for small cached files
> >   (the current readahead only turns itself off after 256 cache hits,
> >   unfortunately most files are < 1MB, so never see that chance)
> 
> Would it make sense to keep track in the AS if the file is completely in 
> cache?
> Then you could probably avoid a lot of these lookups for small in cache files

Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache files.
But what do you mean by AS?

> > --- linux-2.6.21-rc7-mm1.orig/mm/readahead.c
> > +++ linux-2.6.21-rc7-mm1/mm/readahead.c
> > @@ -733,6 +733,11 @@ unsigned long max_sane_readahead(unsigne
> 
> Quite simple patch, why is it that much simpler than your earlier patchkits?
> Or is that on top of them?

The earlier ones focus on features, while this one aims to be simple.
Even simpler than the current readahead, while keeping the same feature set.

The on-demand readahead is now on top of them, but can/will be made
independent.

> You seem to have a lot of magic numbers. They probably all need symbols and 
> explanations.

The magic numbers are for easier testings, and will be removed in
future.  For now, they enables convenient comparing of the two
algorithms in one kernel.

If this new algorithm has been further tested and approved, I'll
re-submit the patch in a cleaner, standalone form. The adaptive
readahead patches can be dropped then. They may better be reworked as
a kernel module.

> Your white space also needs some work.

White space in patch description?
OK, thanks.

Wu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Andi Kleen
Fengguang Wu <[EMAIL PROTECTED]> writes:

> OVERHEADS
> 
> The new code reduced the overheads of
> 
>   - excessively calling the readahead routine on small sized reads
> (the current readahead code insists on seeing all requests)
> 
>   - doing a lot of pointless page-cache lookups for small cached files
> (the current readahead only turns itself off after 256 cache hits,
> unfortunately most files are < 1MB, so never see that chance)

Would it make sense to keep track in the AS if the file is completely in cache?
Then you could probably avoid a lot of these lookups for small in cache files

> --- linux-2.6.21-rc7-mm1.orig/mm/readahead.c
> +++ linux-2.6.21-rc7-mm1/mm/readahead.c
> @@ -733,6 +733,11 @@ unsigned long max_sane_readahead(unsigne

Quite simple patch, why is it that much simpler than your earlier patchkits?
Or is that on top of them?

You seem to have a lot of magic numbers. They probably all need symbols and 
explanations.

Your white space also needs some work.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] on-demand readahead

2007-04-25 Thread Fengguang Wu
Andrew,

This is a minimal readahead algorithm that aims to replace the current one.
It is more flexible and reliable, while maintaining almost the same behavior
and performance. Also it is full integrated with adaptive readahead.

It is designed to be called on demand:
- on a missing page, to do synchronous readahead
- on a lookahead page, to do asynchronous readahead

In this way it eliminated the awkward workarounds for cache hit/miss,
readahead thrashing, retried read, and unaligned read. It also adopts the
data structure introduced by adaptive readahead, parameterizes readahead
pipelining with `lookahead_index', and reduces the current/ahead windows
to one single window.

The patch is made convenient for testing out.
Do a
# echo 2 > /proc/sys/vm/readahead_ratio
and it is selected.
Do a
# echo 1 > /proc/sys/vm/readahead_ratio
and the vanilla readahead is selected.

Comments and benchmark numbers are welcome, thank you.


HEURISTICS

The logic deals with four cases:

- sequential-next
found a consistent readahead window, so push it forward

- random
standalone small read, so read as is

- sequential-first
create a new readahead window for a sequential/oversize request

- lookahead-clueless
hit a lookahead page not associated with the readahead window,
so create a new readahead window and ramp it up

In each case, three parameters are determined:

- readahead index: where the next readahead begins
- readahead size:  how much to readahead
- lookahead size:  when to do the next readahead (for pipelining)


BEHAVIORS

The old behaviors are maximally preserved for trivial sequential/random reads.
Notable changes are:

- It no longer imposes strict sequential checks.
  It might help some interleaved cases, and clustered random reads.
  It does introduce risks of a random lookahead hit triggering an
  unexpected readahead. But in general it is more likely to do good
  than to do evil.

- Interleaved reads are supported in a minimal way.
  Their chances of being detected and proper handled are still low.

- Readahead thrashings are better handled.
  The current readahead leads to tiny average I/O sizes, because it
  never turn back for the thrashed pages.  They have to be fault in
  by do_generic_mapping_read() one by one.  Whereas the on-demand
  readahead will redo readahead for them.


OVERHEADS

The new code reduced the overheads of

- excessively calling the readahead routine on small sized reads
  (the current readahead code insists on seeing all requests)

- doing a lot of pointless page-cache lookups for small cached files
  (the current readahead only turns itself off after 256 cache hits,
  unfortunately most files are < 1MB, so never see that chance)

That accounts for speedup of
- 0.3% on 1-page sequential reads on sparse file
- 1.2% on 1-page cache hot sequential reads
- 3.2% on 256-page cache hot sequential reads
- 1.3% on cache hot `tar /lib`

However, it does introduce one extra page-cache lookup per cache miss, which
impacts random reads slightly. That's 1% overheads for 1-page random reads on
sparse file.


PERFORMANCE

The basic benchmark setup is
- 2.6.20 kernel with on-demand readahead
- 1MB max readahead size
- 2.9GHz Intel Core 2 CPU
- 2GB memory
- 160G/8M Hitachi SATA II 7200 RPM disk

The benchmarks show that
- it maintains the same performance for trivial sequential/random reads
- sysbench/OLTP performance on MySQL gains up to 8%
- performance on readahead thrashing gains up to 3 times


iozone throughput (KB/s): roughly the same
==
iozone -c -t1 -s 4096m -r 64k

   2.6.20  on-demand  gain
first run
  "  Initial write "   61437.2764521.53  +5.0%
  "Rewrite "   47893.0248335.20  +0.9%
  "   Read "   62111.8462141.49  +0.0%
  "Re-read "   62242.6662193.17  -0.1%
  "   Reverse Read "   50031.4649989.79  -0.1%
  "Stride read "8657.61 8652.81  -0.1%
  "Random read "   13914.2813898.23  -0.1%
  " Mixed workload "   19069.2719033.32  -0.2%
  "   Random write "   14849.8014104.38  -5.0%
  " Pwrite "   62955.3065701.57  +4.4%
  "  Pread "   62209.9962256.26  +0.1%

second run
  "  Initial write "   60810.3166258.69  +9.0%
  "Rewrite "   49373.8957833.66 +17.1%
  "   Read "   62059.39

[RFC][PATCH] on-demand readahead

2007-04-25 Thread Fengguang Wu
Andrew,

This is a minimal readahead algorithm that aims to replace the current one.
It is more flexible and reliable, while maintaining almost the same behavior
and performance. Also it is full integrated with adaptive readahead.

It is designed to be called on demand:
- on a missing page, to do synchronous readahead
- on a lookahead page, to do asynchronous readahead

In this way it eliminated the awkward workarounds for cache hit/miss,
readahead thrashing, retried read, and unaligned read. It also adopts the
data structure introduced by adaptive readahead, parameterizes readahead
pipelining with `lookahead_index', and reduces the current/ahead windows
to one single window.

The patch is made convenient for testing out.
Do a
# echo 2  /proc/sys/vm/readahead_ratio
and it is selected.
Do a
# echo 1  /proc/sys/vm/readahead_ratio
and the vanilla readahead is selected.

Comments and benchmark numbers are welcome, thank you.


HEURISTICS

The logic deals with four cases:

- sequential-next
found a consistent readahead window, so push it forward

- random
standalone small read, so read as is

- sequential-first
create a new readahead window for a sequential/oversize request

- lookahead-clueless
hit a lookahead page not associated with the readahead window,
so create a new readahead window and ramp it up

In each case, three parameters are determined:

- readahead index: where the next readahead begins
- readahead size:  how much to readahead
- lookahead size:  when to do the next readahead (for pipelining)


BEHAVIORS

The old behaviors are maximally preserved for trivial sequential/random reads.
Notable changes are:

- It no longer imposes strict sequential checks.
  It might help some interleaved cases, and clustered random reads.
  It does introduce risks of a random lookahead hit triggering an
  unexpected readahead. But in general it is more likely to do good
  than to do evil.

- Interleaved reads are supported in a minimal way.
  Their chances of being detected and proper handled are still low.

- Readahead thrashings are better handled.
  The current readahead leads to tiny average I/O sizes, because it
  never turn back for the thrashed pages.  They have to be fault in
  by do_generic_mapping_read() one by one.  Whereas the on-demand
  readahead will redo readahead for them.


OVERHEADS

The new code reduced the overheads of

- excessively calling the readahead routine on small sized reads
  (the current readahead code insists on seeing all requests)

- doing a lot of pointless page-cache lookups for small cached files
  (the current readahead only turns itself off after 256 cache hits,
  unfortunately most files are  1MB, so never see that chance)

That accounts for speedup of
- 0.3% on 1-page sequential reads on sparse file
- 1.2% on 1-page cache hot sequential reads
- 3.2% on 256-page cache hot sequential reads
- 1.3% on cache hot `tar /lib`

However, it does introduce one extra page-cache lookup per cache miss, which
impacts random reads slightly. That's 1% overheads for 1-page random reads on
sparse file.


PERFORMANCE

The basic benchmark setup is
- 2.6.20 kernel with on-demand readahead
- 1MB max readahead size
- 2.9GHz Intel Core 2 CPU
- 2GB memory
- 160G/8M Hitachi SATA II 7200 RPM disk

The benchmarks show that
- it maintains the same performance for trivial sequential/random reads
- sysbench/OLTP performance on MySQL gains up to 8%
- performance on readahead thrashing gains up to 3 times


iozone throughput (KB/s): roughly the same
==
iozone -c -t1 -s 4096m -r 64k

   2.6.20  on-demand  gain
first run
Initial write61437.2764521.53  +5.0%
  Rewrite47893.0248335.20  +0.9%
 Read62111.8462141.49  +0.0%
  Re-read62242.6662193.17  -0.1%
 Reverse Read50031.4649989.79  -0.1%
  Stride read 8657.61 8652.81  -0.1%
  Random read13914.2813898.23  -0.1%
   Mixed workload19069.2719033.32  -0.2%
 Random write14849.8014104.38  -5.0%
   Pwrite62955.3065701.57  +4.4%
Pread62209.9962256.26  +0.1%

second run
Initial write60810.3166258.69  +9.0%
  Rewrite49373.8957833.66 +17.1%
 Read62059.3962251.28  +0.3%
  

Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Andi Kleen
Fengguang Wu [EMAIL PROTECTED] writes:

 OVERHEADS
 
 The new code reduced the overheads of
 
   - excessively calling the readahead routine on small sized reads
 (the current readahead code insists on seeing all requests)
 
   - doing a lot of pointless page-cache lookups for small cached files
 (the current readahead only turns itself off after 256 cache hits,
 unfortunately most files are  1MB, so never see that chance)

Would it make sense to keep track in the AS if the file is completely in cache?
Then you could probably avoid a lot of these lookups for small in cache files

 --- linux-2.6.21-rc7-mm1.orig/mm/readahead.c
 +++ linux-2.6.21-rc7-mm1/mm/readahead.c
 @@ -733,6 +733,11 @@ unsigned long max_sane_readahead(unsigne

Quite simple patch, why is it that much simpler than your earlier patchkits?
Or is that on top of them?

You seem to have a lot of magic numbers. They probably all need symbols and 
explanations.

Your white space also needs some work.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Fengguang Wu
On Wed, Apr 25, 2007 at 04:37:41PM +0200, Andi Kleen wrote:
 Fengguang Wu [EMAIL PROTECTED] writes:
 
  OVERHEADS
  
  The new code reduced the overheads of
  
  - excessively calling the readahead routine on small sized reads
(the current readahead code insists on seeing all requests)
  
  - doing a lot of pointless page-cache lookups for small cached files
(the current readahead only turns itself off after 256 cache hits,
unfortunately most files are  1MB, so never see that chance)
 
 Would it make sense to keep track in the AS if the file is completely in 
 cache?
 Then you could probably avoid a lot of these lookups for small in cache files

Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache files.
But what do you mean by AS?

  --- linux-2.6.21-rc7-mm1.orig/mm/readahead.c
  +++ linux-2.6.21-rc7-mm1/mm/readahead.c
  @@ -733,6 +733,11 @@ unsigned long max_sane_readahead(unsigne
 
 Quite simple patch, why is it that much simpler than your earlier patchkits?
 Or is that on top of them?

The earlier ones focus on features, while this one aims to be simple.
Even simpler than the current readahead, while keeping the same feature set.

The on-demand readahead is now on top of them, but can/will be made
independent.

 You seem to have a lot of magic numbers. They probably all need symbols and 
 explanations.

The magic numbers are for easier testings, and will be removed in
future.  For now, they enables convenient comparing of the two
algorithms in one kernel.

If this new algorithm has been further tested and approved, I'll
re-submit the patch in a cleaner, standalone form. The adaptive
readahead patches can be dropped then. They may better be reworked as
a kernel module.

 Your white space also needs some work.

White space in patch description?
OK, thanks.

Wu

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Andi Kleen
 Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache 
 files.

How?

 But what do you mean by AS?

struct address_space

  You seem to have a lot of magic numbers. They probably all need symbols and 
  explanations.
 
 The magic numbers are for easier testings, and will be removed in
 future.  For now, they enables convenient comparing of the two
 algorithms in one kernel.

I mean the 16 and 4 not the sysctl

 
 If this new algorithm has been further tested and approved, I'll
 re-submit the patch in a cleaner, standalone form. The adaptive
 readahead patches can be dropped then. They may better be reworked as
 a kernel module.

If they actually help and don't cause regressions they shouldn't be a module, 
but integrated eventually Just it has to be all step by step.
 
  Your white space also needs some work.
 
 White space in patch description?

In the code indentation.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Fengguang Wu
On Wed, Apr 25, 2007 at 06:08:44PM +0200, Andi Kleen wrote:
  Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache 
  files.
 
 How?

In filemap.c:
if (!page) {
page_cache_readahead_adaptive(mapping,
ra, filp, page,
index, last_index - index);
page = find_get_page(mapping, index);
}
if (page  PageReadahead(page)) {
page_cache_readahead_adaptive(mapping,
ra, filp, page,
index, last_index - index);
}

Cache hot files neither have missing pages (!page) or lookahead
pages (PageReadahead(page)).  So it will not even be called.

   You seem to have a lot of magic numbers. They probably all need symbols 
   and 
   explanations.
  
  The magic numbers are for easier testings, and will be removed in
  future.  For now, they enables convenient comparing of the two
  algorithms in one kernel.
 
 I mean the 16 and 4 not the sysctl

The numbers and the code in get_next_ra_size2() is simply copied from
get_next_ra_size():

if (cur  max / 16) {
newsize = 4 * cur;
} else {
newsize = 2 * cur;
}

It's a trick to ramp up small sizes more quickly.
That trick is documented in the related get_init_ra_size().
So, it would be better to put the two routines together to make it clear.

  
  If this new algorithm has been further tested and approved, I'll
  re-submit the patch in a cleaner, standalone form. The adaptive
  readahead patches can be dropped then. They may better be reworked as
  a kernel module.
 
 If they actually help and don't cause regressions they shouldn't be a module, 
 but integrated eventually Just it has to be all step by step.

Yeah, the adaptive readahead is complex and the possible workloads diverse.
It becomes obvious that there is a long way to go, and kernel module makes
life easier.

   Your white space also needs some work.
  
  White space in patch description?
 
 In the code indentation.

Ah, got it: a silly copy/paste mistake.

Thank you,
Wu

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/