Re: [RFC][PATCH] on-demand readahead
On Thu, 26 Apr 2007 00:04:00 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote: > If this new algorithm has been further tested and approved, I'll > re-submit the patch in a cleaner, standalone form. The adaptive > readahead patches can be dropped then. They may better be reworked as > a kernel module. that would be appreciated - I just don't know how to move the current patches forward, and they do get in the way quite regularly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
On Thu, 26 Apr 2007 00:04:00 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: If this new algorithm has been further tested and approved, I'll re-submit the patch in a cleaner, standalone form. The adaptive readahead patches can be dropped then. They may better be reworked as a kernel module. that would be appreciated - I just don't know how to move the current patches forward, and they do get in the way quite regularly. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
On Wed, Apr 25, 2007 at 06:08:44PM +0200, Andi Kleen wrote: > > Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache > > files. > > How? In filemap.c: if (!page) { page_cache_readahead_adaptive(mapping, , filp, page, index, last_index - index); page = find_get_page(mapping, index); } if (page && PageReadahead(page)) { page_cache_readahead_adaptive(mapping, , filp, page, index, last_index - index); } Cache hot files neither have missing pages (!page) or lookahead pages (PageReadahead(page)). So it will not even be called. > > > You seem to have a lot of magic numbers. They probably all need symbols > > > and > > > explanations. > > > > The magic numbers are for easier testings, and will be removed in > > future. For now, they enables convenient comparing of the two > > algorithms in one kernel. > > I mean the 16 and 4 not the sysctl The numbers and the code in get_next_ra_size2() is simply copied from get_next_ra_size(): if (cur < max / 16) { newsize = 4 * cur; } else { newsize = 2 * cur; } It's a trick to ramp up small sizes more quickly. That trick is documented in the related get_init_ra_size(). So, it would be better to put the two routines together to make it clear. > > > > If this new algorithm has been further tested and approved, I'll > > re-submit the patch in a cleaner, standalone form. The adaptive > > readahead patches can be dropped then. They may better be reworked as > > a kernel module. > > If they actually help and don't cause regressions they shouldn't be a module, > but integrated eventually Just it has to be all step by step. Yeah, the adaptive readahead is complex and the possible workloads diverse. It becomes obvious that there is a long way to go, and kernel module makes life easier. > > > Your white space also needs some work. > > > > White space in patch description? > > In the code indentation. Ah, got it: a silly copy/paste mistake. Thank you, Wu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
> Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache > files. How? > But what do you mean by AS? struct address_space > > You seem to have a lot of magic numbers. They probably all need symbols and > > explanations. > > The magic numbers are for easier testings, and will be removed in > future. For now, they enables convenient comparing of the two > algorithms in one kernel. I mean the 16 and 4 not the sysctl > > If this new algorithm has been further tested and approved, I'll > re-submit the patch in a cleaner, standalone form. The adaptive > readahead patches can be dropped then. They may better be reworked as > a kernel module. If they actually help and don't cause regressions they shouldn't be a module, but integrated eventually Just it has to be all step by step. > > > Your white space also needs some work. > > White space in patch description? In the code indentation. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
On Wed, Apr 25, 2007 at 04:37:41PM +0200, Andi Kleen wrote: > Fengguang Wu <[EMAIL PROTECTED]> writes: > > > OVERHEADS > > > > The new code reduced the overheads of > > > > - excessively calling the readahead routine on small sized reads > > (the current readahead code insists on seeing all requests) > > > > - doing a lot of pointless page-cache lookups for small cached files > > (the current readahead only turns itself off after 256 cache hits, > > unfortunately most files are < 1MB, so never see that chance) > > Would it make sense to keep track in the AS if the file is completely in > cache? > Then you could probably avoid a lot of these lookups for small in cache files Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache files. But what do you mean by AS? > > --- linux-2.6.21-rc7-mm1.orig/mm/readahead.c > > +++ linux-2.6.21-rc7-mm1/mm/readahead.c > > @@ -733,6 +733,11 @@ unsigned long max_sane_readahead(unsigne > > Quite simple patch, why is it that much simpler than your earlier patchkits? > Or is that on top of them? The earlier ones focus on features, while this one aims to be simple. Even simpler than the current readahead, while keeping the same feature set. The on-demand readahead is now on top of them, but can/will be made independent. > You seem to have a lot of magic numbers. They probably all need symbols and > explanations. The magic numbers are for easier testings, and will be removed in future. For now, they enables convenient comparing of the two algorithms in one kernel. If this new algorithm has been further tested and approved, I'll re-submit the patch in a cleaner, standalone form. The adaptive readahead patches can be dropped then. They may better be reworked as a kernel module. > Your white space also needs some work. White space in patch description? OK, thanks. Wu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
Fengguang Wu <[EMAIL PROTECTED]> writes: > OVERHEADS > > The new code reduced the overheads of > > - excessively calling the readahead routine on small sized reads > (the current readahead code insists on seeing all requests) > > - doing a lot of pointless page-cache lookups for small cached files > (the current readahead only turns itself off after 256 cache hits, > unfortunately most files are < 1MB, so never see that chance) Would it make sense to keep track in the AS if the file is completely in cache? Then you could probably avoid a lot of these lookups for small in cache files > --- linux-2.6.21-rc7-mm1.orig/mm/readahead.c > +++ linux-2.6.21-rc7-mm1/mm/readahead.c > @@ -733,6 +733,11 @@ unsigned long max_sane_readahead(unsigne Quite simple patch, why is it that much simpler than your earlier patchkits? Or is that on top of them? You seem to have a lot of magic numbers. They probably all need symbols and explanations. Your white space also needs some work. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] on-demand readahead
Andrew, This is a minimal readahead algorithm that aims to replace the current one. It is more flexible and reliable, while maintaining almost the same behavior and performance. Also it is full integrated with adaptive readahead. It is designed to be called on demand: - on a missing page, to do synchronous readahead - on a lookahead page, to do asynchronous readahead In this way it eliminated the awkward workarounds for cache hit/miss, readahead thrashing, retried read, and unaligned read. It also adopts the data structure introduced by adaptive readahead, parameterizes readahead pipelining with `lookahead_index', and reduces the current/ahead windows to one single window. The patch is made convenient for testing out. Do a # echo 2 > /proc/sys/vm/readahead_ratio and it is selected. Do a # echo 1 > /proc/sys/vm/readahead_ratio and the vanilla readahead is selected. Comments and benchmark numbers are welcome, thank you. HEURISTICS The logic deals with four cases: - sequential-next found a consistent readahead window, so push it forward - random standalone small read, so read as is - sequential-first create a new readahead window for a sequential/oversize request - lookahead-clueless hit a lookahead page not associated with the readahead window, so create a new readahead window and ramp it up In each case, three parameters are determined: - readahead index: where the next readahead begins - readahead size: how much to readahead - lookahead size: when to do the next readahead (for pipelining) BEHAVIORS The old behaviors are maximally preserved for trivial sequential/random reads. Notable changes are: - It no longer imposes strict sequential checks. It might help some interleaved cases, and clustered random reads. It does introduce risks of a random lookahead hit triggering an unexpected readahead. But in general it is more likely to do good than to do evil. - Interleaved reads are supported in a minimal way. Their chances of being detected and proper handled are still low. - Readahead thrashings are better handled. The current readahead leads to tiny average I/O sizes, because it never turn back for the thrashed pages. They have to be fault in by do_generic_mapping_read() one by one. Whereas the on-demand readahead will redo readahead for them. OVERHEADS The new code reduced the overheads of - excessively calling the readahead routine on small sized reads (the current readahead code insists on seeing all requests) - doing a lot of pointless page-cache lookups for small cached files (the current readahead only turns itself off after 256 cache hits, unfortunately most files are < 1MB, so never see that chance) That accounts for speedup of - 0.3% on 1-page sequential reads on sparse file - 1.2% on 1-page cache hot sequential reads - 3.2% on 256-page cache hot sequential reads - 1.3% on cache hot `tar /lib` However, it does introduce one extra page-cache lookup per cache miss, which impacts random reads slightly. That's 1% overheads for 1-page random reads on sparse file. PERFORMANCE The basic benchmark setup is - 2.6.20 kernel with on-demand readahead - 1MB max readahead size - 2.9GHz Intel Core 2 CPU - 2GB memory - 160G/8M Hitachi SATA II 7200 RPM disk The benchmarks show that - it maintains the same performance for trivial sequential/random reads - sysbench/OLTP performance on MySQL gains up to 8% - performance on readahead thrashing gains up to 3 times iozone throughput (KB/s): roughly the same == iozone -c -t1 -s 4096m -r 64k 2.6.20 on-demand gain first run " Initial write " 61437.2764521.53 +5.0% "Rewrite " 47893.0248335.20 +0.9% " Read " 62111.8462141.49 +0.0% "Re-read " 62242.6662193.17 -0.1% " Reverse Read " 50031.4649989.79 -0.1% "Stride read "8657.61 8652.81 -0.1% "Random read " 13914.2813898.23 -0.1% " Mixed workload " 19069.2719033.32 -0.2% " Random write " 14849.8014104.38 -5.0% " Pwrite " 62955.3065701.57 +4.4% " Pread " 62209.9962256.26 +0.1% second run " Initial write " 60810.3166258.69 +9.0% "Rewrite " 49373.8957833.66 +17.1% " Read " 62059.39
[RFC][PATCH] on-demand readahead
Andrew, This is a minimal readahead algorithm that aims to replace the current one. It is more flexible and reliable, while maintaining almost the same behavior and performance. Also it is full integrated with adaptive readahead. It is designed to be called on demand: - on a missing page, to do synchronous readahead - on a lookahead page, to do asynchronous readahead In this way it eliminated the awkward workarounds for cache hit/miss, readahead thrashing, retried read, and unaligned read. It also adopts the data structure introduced by adaptive readahead, parameterizes readahead pipelining with `lookahead_index', and reduces the current/ahead windows to one single window. The patch is made convenient for testing out. Do a # echo 2 /proc/sys/vm/readahead_ratio and it is selected. Do a # echo 1 /proc/sys/vm/readahead_ratio and the vanilla readahead is selected. Comments and benchmark numbers are welcome, thank you. HEURISTICS The logic deals with four cases: - sequential-next found a consistent readahead window, so push it forward - random standalone small read, so read as is - sequential-first create a new readahead window for a sequential/oversize request - lookahead-clueless hit a lookahead page not associated with the readahead window, so create a new readahead window and ramp it up In each case, three parameters are determined: - readahead index: where the next readahead begins - readahead size: how much to readahead - lookahead size: when to do the next readahead (for pipelining) BEHAVIORS The old behaviors are maximally preserved for trivial sequential/random reads. Notable changes are: - It no longer imposes strict sequential checks. It might help some interleaved cases, and clustered random reads. It does introduce risks of a random lookahead hit triggering an unexpected readahead. But in general it is more likely to do good than to do evil. - Interleaved reads are supported in a minimal way. Their chances of being detected and proper handled are still low. - Readahead thrashings are better handled. The current readahead leads to tiny average I/O sizes, because it never turn back for the thrashed pages. They have to be fault in by do_generic_mapping_read() one by one. Whereas the on-demand readahead will redo readahead for them. OVERHEADS The new code reduced the overheads of - excessively calling the readahead routine on small sized reads (the current readahead code insists on seeing all requests) - doing a lot of pointless page-cache lookups for small cached files (the current readahead only turns itself off after 256 cache hits, unfortunately most files are 1MB, so never see that chance) That accounts for speedup of - 0.3% on 1-page sequential reads on sparse file - 1.2% on 1-page cache hot sequential reads - 3.2% on 256-page cache hot sequential reads - 1.3% on cache hot `tar /lib` However, it does introduce one extra page-cache lookup per cache miss, which impacts random reads slightly. That's 1% overheads for 1-page random reads on sparse file. PERFORMANCE The basic benchmark setup is - 2.6.20 kernel with on-demand readahead - 1MB max readahead size - 2.9GHz Intel Core 2 CPU - 2GB memory - 160G/8M Hitachi SATA II 7200 RPM disk The benchmarks show that - it maintains the same performance for trivial sequential/random reads - sysbench/OLTP performance on MySQL gains up to 8% - performance on readahead thrashing gains up to 3 times iozone throughput (KB/s): roughly the same == iozone -c -t1 -s 4096m -r 64k 2.6.20 on-demand gain first run Initial write61437.2764521.53 +5.0% Rewrite47893.0248335.20 +0.9% Read62111.8462141.49 +0.0% Re-read62242.6662193.17 -0.1% Reverse Read50031.4649989.79 -0.1% Stride read 8657.61 8652.81 -0.1% Random read13914.2813898.23 -0.1% Mixed workload19069.2719033.32 -0.2% Random write14849.8014104.38 -5.0% Pwrite62955.3065701.57 +4.4% Pread62209.9962256.26 +0.1% second run Initial write60810.3166258.69 +9.0% Rewrite49373.8957833.66 +17.1% Read62059.3962251.28 +0.3%
Re: [RFC][PATCH] on-demand readahead
Fengguang Wu [EMAIL PROTECTED] writes: OVERHEADS The new code reduced the overheads of - excessively calling the readahead routine on small sized reads (the current readahead code insists on seeing all requests) - doing a lot of pointless page-cache lookups for small cached files (the current readahead only turns itself off after 256 cache hits, unfortunately most files are 1MB, so never see that chance) Would it make sense to keep track in the AS if the file is completely in cache? Then you could probably avoid a lot of these lookups for small in cache files --- linux-2.6.21-rc7-mm1.orig/mm/readahead.c +++ linux-2.6.21-rc7-mm1/mm/readahead.c @@ -733,6 +733,11 @@ unsigned long max_sane_readahead(unsigne Quite simple patch, why is it that much simpler than your earlier patchkits? Or is that on top of them? You seem to have a lot of magic numbers. They probably all need symbols and explanations. Your white space also needs some work. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
On Wed, Apr 25, 2007 at 04:37:41PM +0200, Andi Kleen wrote: Fengguang Wu [EMAIL PROTECTED] writes: OVERHEADS The new code reduced the overheads of - excessively calling the readahead routine on small sized reads (the current readahead code insists on seeing all requests) - doing a lot of pointless page-cache lookups for small cached files (the current readahead only turns itself off after 256 cache hits, unfortunately most files are 1MB, so never see that chance) Would it make sense to keep track in the AS if the file is completely in cache? Then you could probably avoid a lot of these lookups for small in cache files Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache files. But what do you mean by AS? --- linux-2.6.21-rc7-mm1.orig/mm/readahead.c +++ linux-2.6.21-rc7-mm1/mm/readahead.c @@ -733,6 +733,11 @@ unsigned long max_sane_readahead(unsigne Quite simple patch, why is it that much simpler than your earlier patchkits? Or is that on top of them? The earlier ones focus on features, while this one aims to be simple. Even simpler than the current readahead, while keeping the same feature set. The on-demand readahead is now on top of them, but can/will be made independent. You seem to have a lot of magic numbers. They probably all need symbols and explanations. The magic numbers are for easier testings, and will be removed in future. For now, they enables convenient comparing of the two algorithms in one kernel. If this new algorithm has been further tested and approved, I'll re-submit the patch in a cleaner, standalone form. The adaptive readahead patches can be dropped then. They may better be reworked as a kernel module. Your white space also needs some work. White space in patch description? OK, thanks. Wu - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache files. How? But what do you mean by AS? struct address_space You seem to have a lot of magic numbers. They probably all need symbols and explanations. The magic numbers are for easier testings, and will be removed in future. For now, they enables convenient comparing of the two algorithms in one kernel. I mean the 16 and 4 not the sysctl If this new algorithm has been further tested and approved, I'll re-submit the patch in a cleaner, standalone form. The adaptive readahead patches can be dropped then. They may better be reworked as a kernel module. If they actually help and don't cause regressions they shouldn't be a module, but integrated eventually Just it has to be all step by step. Your white space also needs some work. White space in patch description? In the code indentation. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
On Wed, Apr 25, 2007 at 06:08:44PM +0200, Andi Kleen wrote: Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache files. How? In filemap.c: if (!page) { page_cache_readahead_adaptive(mapping, ra, filp, page, index, last_index - index); page = find_get_page(mapping, index); } if (page PageReadahead(page)) { page_cache_readahead_adaptive(mapping, ra, filp, page, index, last_index - index); } Cache hot files neither have missing pages (!page) or lookahead pages (PageReadahead(page)). So it will not even be called. You seem to have a lot of magic numbers. They probably all need symbols and explanations. The magic numbers are for easier testings, and will be removed in future. For now, they enables convenient comparing of the two algorithms in one kernel. I mean the 16 and 4 not the sysctl The numbers and the code in get_next_ra_size2() is simply copied from get_next_ra_size(): if (cur max / 16) { newsize = 4 * cur; } else { newsize = 2 * cur; } It's a trick to ramp up small sizes more quickly. That trick is documented in the related get_init_ra_size(). So, it would be better to put the two routines together to make it clear. If this new algorithm has been further tested and approved, I'll re-submit the patch in a cleaner, standalone form. The adaptive readahead patches can be dropped then. They may better be reworked as a kernel module. If they actually help and don't cause regressions they shouldn't be a module, but integrated eventually Just it has to be all step by step. Yeah, the adaptive readahead is complex and the possible workloads diverse. It becomes obvious that there is a long way to go, and kernel module makes life easier. Your white space also needs some work. White space in patch description? In the code indentation. Ah, got it: a silly copy/paste mistake. Thank you, Wu - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/