On Tue, Aug 3, 2021 at 12:18 PM Andreas Gruenbacher <agrue...@redhat.com> wrote: > > > With this patch queue, fstest generic/208 (aio-dio-invalidate-failure.c) > endlessly spins in gfs2_file_direct_write. It looks as if there's a bug > in get_user_pages_fast when called with FOLL_FAST_ONLY: > > (1) The test case performs an aio write into a 32 MB buffer. > > (2) The buffer is initially not in memory, so when iomap_dio_rw() -> > ... -> bio_iov_iter_get_pages() is called with the iter->noio flag > set, we get to get_user_pages_fast() with FOLL_FAST_ONLY set. > get_user_pages_fast() returns 0, which causes > bio_iov_iter_get_pages to return -EFAULT. > > (3) Then gfs2_file_direct_write faults in the entire buffer with > fault_in_iov_iter_readable(), which succeeds. > > (4) With the buffer in memory, we retry the iomap_dio_rw() -> > ... -> bio_iov_iter_get_pages() -> ... -> get_user_pages_fast(). > This should succeed now, but get_user_pages_fast() still returns 0.
Hmm. Have you tried to figure out why that "still returns 0" happens? One option - for debugging only - would be to introduce a new flag to get_user_pages_fast() tyhat says "print out reason if failed" and make the retry (but not the original one) have that flag set. There are a couple of things of note when it comes to "get_user_pages_fast()": (a) some architectures don't even enable it (b) it can be very picky about the page table contents, and wants the accessed bit to already be set (or the dirty bit, in the case of a write). but (a) shouldn't be an issue on any common platform and (b) shouldn't be an issue with fault_in_iov_iter_readable() that actually does a __get_user() so it will access through the page tables. (It might be more of an issue with fault_in_iov_iter_writable() due to walking the page tables by hand - if we don't do the proper access/dirty setting, I could see get_user_pages_fast() failing). Anyway, for reason (a) I do think that eventually we should probably introduce FOLL_NOFAULT, and allow the full "slow" page table walk - just not calling down to handle_mm_fault() if it fails. But (a) should be a non-issue in your test environment, and so it would be interesting to hear what it is that fails. Because scanning through the patches, they all _look_ fine to me (apart from the one comment about return values, which is more about being consistent with copy_to/from_user() and making the code simpler - not about correctness) Linus