Fengguang Wu a écrit :
2007/5/2, Eric Dumazet <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>:
Since you work on readahead, could you please find the reason
following program triggers a problem in splice() syscall ?
Description :
I tried to use splice(SPLICE_F_NONBLOCK) in a non blocking
environnement, in an attempt to implement cheap AIO, and zero-copy
splice() feature.
I quicky found that readahead in splice() is not really working.
To demonstrate the problem, just compile the attached program, and
use it to pipe a big file (not yet in cache) to /dev/null :
$ gcc -o spliceout spliceout.c
$ spliceout -d BIGFILE | cat >/dev/null
offset=49152 ret=49152
offset=65536 ret=16384
offset=131072 ret=65536
...no more progress... (splice() returns -1 and EAGAIN)
reading splice(SPLICE_F_NONBLOCK) syscall implementation, I expected
to exploit its ability to call readahead(), and do some progress if
pages are ready in cache.
But apparently, even on an idle machine, it is not working as expected.
Eric Dumazet, thank you for disclosing this bug.
Readahead logic somehow fails to populate the page range with data.
It can be because
1) the readahead routine is not always called in the following lines of
fs/splice.c:
if (!loff || nr_pages > 1)
page_cache_readahead(mapping, &in->f_ra, in, index,
nr_pages);
2) even called, page_cache_readahead() wont guarantee the pages are there.
It wont submit readahead I/O for pages already in the radix tree, or
when (ra_pages == 0), or after 256 cache hits.
In your case, it should be because of the retried reads, which lead to
excessive cache hits, and disables readahead at some time.
And that _one_ failure of readahead blocks the whole read process.
The application receives EAGAIN and retries the read, but
__generic_file_splice_read() refuse to make progress:
- in the previous invocation, it has allocated a blank page and inserted
it into the radix tree, but never has the chance to start I/O for it:
the test of SPLICE_F_NONBLOCK goes before that.
- in the retried invocation, the readahead code will neither get out of
the cache hit mode, nor will it submit I/O for an already existing page.
The attached patch should fix the critical splice bug. Sorry for not
being able to test it locally for now - I'm at home and running knoppix.
And the readahead bug will be fixed by the upcoming on-demand readahead
patch. I should be back and submit it after a week.
Thank you,
Fengguang Wu
------------------------------------------------------------------------
--- linux-2.6.21.1/fs/splice.c.old 2007-05-05 04:40:38.000000000 -0400
+++ linux-2.6.21.1/fs/splice.c 2007-05-05 04:41:59.000000000 -0400
@@ -378,10 +378,11 @@
* If in nonblock mode then dont block on waiting
* for an in-flight io page
*/
- if (flags & SPLICE_F_NONBLOCK)
- break;
-
- lock_page(page);
+ if (flags & SPLICE_F_NONBLOCK) {
+ if (TestSetPageLocked(page))
+ break;
+ } else
+ lock_page(page);
/*
* page was truncated, stop here. if this isn't the
Sorry for the delay.
This patches solves the problem, thank you !
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/