Hi Rusty, On Tue, Jun 12, 2007 at 02:36:26PM +1000, Rusty Russell wrote: > On Thu, 2007-05-17 at 06:47 +0800, Fengguang Wu wrote: > > +static unsigned long > > +ondemand_readahead(struct address_space *mapping, > > + struct file_ra_state *ra, struct file *filp, > > + struct page *page, pgoff_t offset, > > + unsigned long req_size) > > +{ > > + unsigned long max; /* max readahead pages */ > > + pgoff_t ra_index; /* readahead index */ > > + unsigned long ra_size; /* readahead size */ > > + unsigned long la_size; /* lookahead size */ > > + int sequential; > > + > > + max = ra->ra_pages; > > + sequential = (offset - ra->prev_index <= 1UL) || (req_size > max); > > This <= 1UL seems weird. prev_index is end of last request, so I'd > expect offset == prev_index + 1 for sequential reads? Does offset == > ra->prev_index happen? If not, this would be clearer as (offset == > ra->prev_index + 1).
It's possible to have (offset == ra->prev_index) when someone is doing 1K reads or 10K reads, which do not always align to page boundaries. > (prev_index is not a great name either, but that's not your patch 8). It was just renamed from `prev_page', hehe. > > + /* > > + * Lookahead/readahead hit, assume sequential access. > > + * Ramp up sizes, and push forward the readahead window. > > + */ > > + if (offset && (offset == ra->lookahead_index || > > + offset == ra->readahead_index)) { > > + ra_index = ra->readahead_index; > > + ra_size = get_next_ra_size2(ra, max); > > + la_size = ra_size; > > + goto fill_ra; > > + } > > Will offset hit lookahead_index or readahead_index exactly? Should this > be checking the range from offset to offset + req_size? Yes, normally lookahead_index will be hit(1). But in case readahead is canceled because of IO congestion at that time, readahead_index will be hit later(2). The readahead code is called on two possible conditions: (1) page != NULL and PageReadahead(page) It will be an asynchronous readahead. In this case, (offset == ra->lookahead_index) indicates sequential reads that have been associated with a valid readahead window. (2) page == NULL It will be a synchronous readahead. In this case, (offset == ra->readahead_index) indicates sequential reads that has just consumed all of the readahead pages. > > + ra_index = offset; > > + ra_size = get_init_ra_size(req_size, max); > > + la_size = ra_size > req_size ? ra_size - req_size : ra_size; > > So if we're doing a big sequential read, ra_size < req_size, so next > time offset will be > ra->readahead_index and the "ramp up sizes" code > won't get run? For big reads, (ra_size = max) and (max < req_size), la_size will be equal to ra_size, or max. So after this readahead invocation submits max pages of I/O and returns to do_generic_mapping_read(), it will *immediately* be called again because of lookahead hit: if (!page) { page_cache_readahead_ondemand(mapping, &ra, filp, page, index, last_index - index); page = find_get_page(mapping, index); if (unlikely(page == NULL)) goto no_cached_page; } lookahead hit: if (PageReadahead(page)) { page_cache_readahead_ondemand(mapping, &ra, filp, page, index, last_index - index); } Then it will submit another max pages of readahead I/O, whether or not the size ramp up code will be executed: either the remaining request size is still > max and get_init_ra_size() is called, or the remaining request size is <= max and get_next_ra_size() is called, in both cases they will return max. This behavior is inherited from the current readahead, and makes sense. > > + /* > > + * Hit on a lookahead page without valid readahead state. > > + * E.g. interleaved reads. > > + * Not knowing its readahead pos/size, bet on the minimal possible one. > > + */ > > + if (page) { > > + ra_index++; > > + ra_size = min(4 * ra_size, max); > > + } > > If I understand correctly, it's expected to happen when we have multiple > streams: we previously marked the lookahead page, but then the other > stream changed the ra to somewhere else in the file. We now change it > back to our stream, but we've lost information so we make it up. Yeah, exactly! > This seems a little like two functions crammed into one. Do you think > page_cache_readahead_ondemand() should be split into > "page_cache_readahead()" which doesn't take a page*, and > "page_cache_check_readahead_page()" which is an inline which does the > PageReadahead(page) check as well? page_cache_check_readahead_page(..., page) is a good idea. But which part of the code should we put to page_cache_readahead() that does not take a page param? Thank you, Fengguang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/