Hi Rusty,

On Tue, Jun 12, 2007 at 02:36:26PM +1000, Rusty Russell wrote:
> On Thu, 2007-05-17 at 06:47 +0800, Fengguang Wu wrote:
> > +static unsigned long
> > +ondemand_readahead(struct address_space *mapping,
> > +              struct file_ra_state *ra, struct file *filp,
> > +              struct page *page, pgoff_t offset,
> > +              unsigned long req_size)
> > +{
> > +   unsigned long max;      /* max readahead pages */
> > +   pgoff_t ra_index;       /* readahead index */
> > +   unsigned long ra_size;  /* readahead size */
> > +   unsigned long la_size;  /* lookahead size */
> > +   int sequential;
> > +
> > +   max = ra->ra_pages;
> > +   sequential = (offset - ra->prev_index <= 1UL) || (req_size > max);
>
> This <= 1UL seems weird.  prev_index is end of last request, so I'd
> expect offset == prev_index + 1 for sequential reads?  Does offset ==
> ra->prev_index happen?  If not, this would be clearer as (offset ==
> ra->prev_index + 1).

It's possible to have (offset == ra->prev_index) when someone is doing
1K reads or 10K reads, which do not always align to page boundaries.

> (prev_index is not a great name either, but that's not your patch 8).

It was just renamed from `prev_page', hehe.

> > +   /*
> > +    * Lookahead/readahead hit, assume sequential access.
> > +    * Ramp up sizes, and push forward the readahead window.
> > +    */
> > +   if (offset && (offset == ra->lookahead_index ||
> > +                   offset == ra->readahead_index)) {
> > +           ra_index = ra->readahead_index;
> > +           ra_size = get_next_ra_size2(ra, max);
> > +           la_size = ra_size;
> > +           goto fill_ra;
> > +   }
>
> Will offset hit lookahead_index or readahead_index exactly?  Should this
> be checking the range from offset to offset + req_size?

Yes, normally lookahead_index will be hit(1). But in case readahead is
canceled because of IO congestion at that time, readahead_index will
be hit later(2).

The readahead code is called on two possible conditions:
(1) page != NULL and PageReadahead(page)
It will be an asynchronous readahead.
In this case, (offset == ra->lookahead_index) indicates sequential
reads that have been associated with a valid readahead window.

(2) page == NULL
It will be a synchronous readahead.
In this case, (offset == ra->readahead_index) indicates sequential
reads that has just consumed all of the readahead pages.

> > +   ra_index = offset;
> > +   ra_size = get_init_ra_size(req_size, max);
> > +   la_size = ra_size > req_size ? ra_size - req_size : ra_size;
>
> So if we're doing a big sequential read, ra_size < req_size, so next
> time offset will be > ra->readahead_index and the "ramp up sizes" code
> won't get run?

For big reads, (ra_size = max) and (max < req_size), la_size will be
equal to ra_size, or max. So after this readahead invocation submits
max pages of I/O and returns to do_generic_mapping_read(), it will
*immediately* be called again because of lookahead hit:

                if (!page) {
                        page_cache_readahead_ondemand(mapping,
                                        &ra, filp, page,
                                        index, last_index - index);
                        page = find_get_page(mapping, index);
                        if (unlikely(page == NULL))
                                goto no_cached_page;
                }
lookahead hit:  if (PageReadahead(page)) {
                        page_cache_readahead_ondemand(mapping,
                                        &ra, filp, page,
                                        index, last_index - index);
                }

Then it will submit another max pages of readahead I/O, whether or not
the size ramp up code will be executed: either the remaining request
size is still > max and get_init_ra_size() is called, or the remaining
request size is <= max and get_next_ra_size() is called, in both cases
they will return max.  This behavior is inherited from the current
readahead, and makes sense.

> > +   /*
> > +    * Hit on a lookahead page without valid readahead state.
> > +    * E.g. interleaved reads.
> > +    * Not knowing its readahead pos/size, bet on the minimal possible one.
> > +    */
> > +   if (page) {
> > +           ra_index++;
> > +           ra_size = min(4 * ra_size, max);
> > +   }
>
> If I understand correctly, it's expected to happen when we have multiple
> streams: we previously marked the lookahead page, but then the other
> stream changed the ra to somewhere else in the file.  We now change it
> back to our stream, but we've lost information so we make it up.

Yeah, exactly!

> This seems a little like two functions crammed into one.  Do you think
> page_cache_readahead_ondemand() should be split into
> "page_cache_readahead()" which doesn't take a page*, and
> "page_cache_check_readahead_page()" which is an inline which does the
> PageReadahead(page) check as well?

page_cache_check_readahead_page(..., page) is a good idea.
But which part of the code should we put to page_cache_readahead()
that does not take a page param?

Thank you,
Fengguang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to