Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-24 Thread Dave Chinner
On Wed, Feb 23, 2022 at 10:50:09PM -0500, Theodore Ts'o wrote:
> On Thu, Feb 24, 2022 at 12:48:42PM +1100, Dave Chinner wrote:
> > > Fair enough; on the other hand, we could also view this as making ext4
> > > more robust against buggy code in other subsystems, and while other
> > > file systems may be losing user data if they are actually trying to do
> > > remote memory access to file-backed memory, apparently other file
> > > systems aren't noticing and so they're not crashing.
> > 
> > Oh, we've noticed them, no question about that.  We've got bug
> > reports going back years for systems being crashed, triggering BUGs
> > and/or corrupting data on both XFS and ext4 filesystems due to users
> > trying to run RDMA applications with file backed pages.
> 
> Is this issue causing XFS to crash?  I didn't know that.

I have no idea if crashes nowdays -  go back a few years before and
search for XFS BUGging out in ->invalidate_page (or was it
->release_page?) because of unexpected dirty pages. I think it could
also trigger BUGs in writeback when ->writepages tripped over a
dirty page without a delayed allocation mapping over the hole...

We were pretty aggressive about telling people reporting such issues
that they get to keep all the borken bits to themselves and to stop
wasting our time with unsolvable problems caused by their
broken-by-design RDMA applications. Hence people have largely
stopped bothering us with random filesystem crashes on systems using
RDMA on file-backed pages...

> I tried the Syzbot reproducer with XFS mounted, and it didn't trigger
> any crashes.  I'm sure data was getting corrupted, but I figured I
> should bring ext4 to the XFS level of "at least we're not reliably
> killing the kernel".

Oh, well, good to know XFS didn't die a horrible death immediately.
Thanks for checking, Ted.

> On ext4, an unprivileged process can use process_vm_writev(2) to crash
> the system.  I don't know how quickly we can get a fix into mm/gup.c,
> but if some other kernel path tries calling set_page_dirty() on a
> file-backed page without first asking permission from the file system,
> it seems to be nice if the file system doesn't BUG() --- as near as I
> can tell, xfs isn't crashing in this case, but ext4 is.

iomap is probably refusing to map holes for writepage - we've
cleaned up most of the weird edge cases to return errors, so I'm
guessing iomap is just ignoring such pages these days.

Yeah, see iomap_writepage_map():

error = wpc->ops->map_blocks(wpc, inode, pos);
if (error)
break;
if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
continue;
if (wpc->iomap.type == IOMAP_HOLE)
continue;

Yeah, so if writeback maps a hole rather than converts a delalloc
region to IOMAP_MAPPED, it'll just skip over the block/page.  IIRC,
they essentially become uncleanable pages, and I think eventually
inode reclaim will just toss them out of memory.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-23 Thread Theodore Ts'o
On Wed, Feb 23, 2022 at 04:44:07PM -0800, John Hubbard wrote:
> 
> Actually...I can confirm that real customers really are doing *exactly* 
> that! Despite the kernel crashes--because the crashes don't always 
> happen unless you have a large (supercomputer-sized) installation. And 
> even then it is not always root-caused properly.

Interesting.  The syzbot reproducer triggers *reliably* on ext4 using
a 2 CPU qemu kernel running on a laptop, and it doesn't require root,
so it's reasonable that Lee is pushing for a fix --- even if for the
Android O or newer, Seccomp can probably prohibit trap
process_vm_writev(2), but it seems unfortunate if say, someone running
a Docker container could take down the entire host OS.

  - Ted



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-23 Thread Theodore Ts'o
On Thu, Feb 24, 2022 at 12:48:42PM +1100, Dave Chinner wrote:
> > Fair enough; on the other hand, we could also view this as making ext4
> > more robust against buggy code in other subsystems, and while other
> > file systems may be losing user data if they are actually trying to do
> > remote memory access to file-backed memory, apparently other file
> > systems aren't noticing and so they're not crashing.
> 
> Oh, we've noticed them, no question about that.  We've got bug
> reports going back years for systems being crashed, triggering BUGs
> and/or corrupting data on both XFS and ext4 filesystems due to users
> trying to run RDMA applications with file backed pages.

Is this issue causing XFS to crash?  I didn't know that.

I tried the Syzbot reproducer with XFS mounted, and it didn't trigger
any crashes.  I'm sure data was getting corrupted, but I figured I
should bring ext4 to the XFS level of "at least we're not reliably
killing the kernel".

On ext4, an unprivileged process can use process_vm_writev(2) to crash
the system.  I don't know how quickly we can get a fix into mm/gup.c,
but if some other kernel path tries calling set_page_dirty() on a
file-backed page without first asking permission from the file system,
it seems to be nice if the file system doesn't BUG() --- as near as I
can tell, xfs isn't crashing in this case, but ext4 is.

- Ted



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-23 Thread Dave Chinner
On Wed, Feb 23, 2022 at 06:35:54PM -0500, Theodore Ts'o wrote:
> On Fri, Feb 18, 2022 at 08:51:54AM +0100, Greg Kroah-Hartman wrote:
> > > The challenge is that fixing this "the right away" is probably not
> > > something we can backport into an LTS kernel, whether it's 5.15 or
> > > 5.10... or 4.19.
> > 
> > Don't worry about stable backports to start with.  Do it the "right way"
> > first and then we can consider if it needs to be backported or not.
> 
> Fair enough; on the other hand, we could also view this as making ext4
> more robust against buggy code in other subsystems, and while other
> file systems may be losing user data if they are actually trying to do
> remote memory access to file-backed memory, apparently other file
> systems aren't noticing and so they're not crashing.

Oh, we've noticed them, no question about that.  We've got bug
reports going back years for systems being crashed, triggering BUGs
and/or corrupting data on both XFS and ext4 filesystems due to users
trying to run RDMA applications with file backed pages.

Most of the people doing this now know that we won't support such
applications until the RDMA stack/hardware can trigger on-demand
write page faults the same way CPUs do when they first write to a
clean page. They don't have this, so mostly these people don't
bother reporting these class of problems to us anymore.  The
gup/RDMA infrastructure to make this all work is slowly moving
forwards, but it's not here yet.

> Issuing a
> warning and then not crashing is arguably a better way for ext4 to
> react, especially if there are other parts of the kernel that are
> randomly calling set_page_dirty() on file-backed memory without
> properly first informing the file system in a context where it can
> block and potentially do I/O to do things like allocate blocks.

I'm not sure that replacing the BUG() with a warning is good enough
- it's still indicative of an application doing something dangerous
that could result in silent data corruption and/or other problems.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-23 Thread Theodore Ts'o
On Fri, Feb 18, 2022 at 08:51:54AM +0100, Greg Kroah-Hartman wrote:
> > The challenge is that fixing this "the right away" is probably not
> > something we can backport into an LTS kernel, whether it's 5.15 or
> > 5.10... or 4.19.
> 
> Don't worry about stable backports to start with.  Do it the "right way"
> first and then we can consider if it needs to be backported or not.

Fair enough; on the other hand, we could also view this as making ext4
more robust against buggy code in other subsystems, and while other
file systems may be losing user data if they are actually trying to do
remote memory access to file-backed memory, apparently other file
systems aren't noticing and so they're not crashing.  Issuing a
warning and then not crashing is arguably a better way for ext4 to
react, especially if there are other parts of the kernel that are
randomly calling set_page_dirty() on file-backed memory without
properly first informing the file system in a context where it can
block and potentially do I/O to do things like allocate blocks.

 - Ted



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-23 Thread Theodore Ts'o
On Thu, Feb 17, 2022 at 10:33:34PM -0800, John Hubbard wrote:
> 
> Just a small thing I'll say once, to get it out of my system. No action
> required here, I just want it understood:
> 
> Before commit 803e4572d7c5 ("mm/process_vm_access: set FOLL_PIN via
> pin_user_pages_remote()"), you would have written that like this:
> 
> "process_vm_writev() is dirtying pages without properly warning the file
> system in advance..."
> 
> Because, there were many callers that were doing this:
> 
> get_user_pages*()
> ...use the pages...
> 
> for_each_page() {
> set_page_dirty*()
> put_page()
> }

Sure, but that's not sufficient when modifying file-backed pages.
Previously, there was only two ways of modifying file-backed pages in
the page cache --- either using the write(2) system call, or when a
mmaped page is modified by the userspace.

In the case of write(2), the file system gets notified before the page
cache is modified by a call to the address operation's write_begin()
call, and after the page cache is modified, the address operation's
write_end() call lets the file system know that the modification is
done.  After the write is done, the 30 second writeback timer is
triggered, and in the case of ext4's data=journalling mode, we close
the ext4 micro-transation (and therefore the time between write_begin
and write_end calls needs to be minimal); otherwise this can block
ext4 transactions.

In the case of a user page fault, the address operation's
page_mkwrite() is called, and at that point we will allocate any
blocks needed to back memory if necessary (in the case of delayed
allocation, file system space has to get reserved).  The problem here
for remote access is that page_mkwrite() can block, and it can
potentially fail (e.g., with ENOSPC or ENOMEM).  This is also why we
can't just add the page buffers and do the file system allocation in
set_page_dirty(), since set_page_dirty() isn't allowed to block.

One approach might be to make all of the pages writeable when
pin_user_pages_remote() is called.  That that would mean that in the
case of remote access via process_vm_writev or RDMA, all of the blocks
will be allocated early.  But that's probably better since at that
point the userspace code is in a position to receive the error when
setting up the RDMA memory, and we don't want to be asking the file
system to do block allocation when incoming data is coming in from
Infiniband or iWARP.

> I see that ext4_warning_inode() has rate limiting, but it doesn't look
> like it's really intended for a per-page rate. It looks like it is
> per-superblock (yes?), so I suspect one instance of this problem, with
> many pages involved, could hit the limit.
> 
> Often, WARN_ON_ONCE() is used with per-page assertions. That's not great
> either, but it might be better here. OTOH, I have minimal experience
> with ext4_warning_inode() and maybe it really is just fine with per-page
> failure rates.

With the syzbot reproducer, we're not actually triggering the rate
limiter, since the ext4 warning is only getting hit a bit more than
once every 4 seconds.  And I'm guessing that in the real world, people
aren't actually trying to do remote direct access to file-backed
memory, at least not using ext4, since that's an invitation to a
kernel crash, and we would have gotten user complaints.  If some user
actually tries to use process_vm_writev for realsies, as opposed to a
random fuzzer or from a malicious program , we do want to warn them
about the potential data loss, so I'd prefer to warn once for each
inode --- but I'm not convinced that it's worth the effort.

Cheers,

- Ted



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-17 Thread Greg Kroah-Hartman
On Thu, Feb 17, 2022 at 11:08:38PM -0500, Theodore Ts'o wrote:
> On Thu, Feb 17, 2022 at 05:06:45PM -0800, John Hubbard wrote:
> > Yes. And looking at the pair of backtraces below, this looks very much
> > like another aspect of the "get_user_pages problem" [1], originally
> > described in Jan Kara's 2018 email [2].
> 
> Hmm... I just posted my analysis, which tracks with yours; but I had
> forgotten about Jan's 2018 e-mail on the matter.
> 
> > I'm getting close to posting an RFC for the direct IO conversion to
> > FOLL_PIN, but even after that, various parts of the kernel (reclaim,
> > filesystems/block layer) still need to be changed so as to use
> > page_maybe_dma_pinned() to help avoid this problem. There's a bit
> > more than that, actually.
> 
> The challenge is that fixing this "the right away" is probably not
> something we can backport into an LTS kernel, whether it's 5.15 or
> 5.10... or 4.19.

Don't worry about stable backports to start with.  Do it the "right way"
first and then we can consider if it needs to be backported or not.

thanks,

greg k-h



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-17 Thread Theodore Ts'o
On Fri, Feb 18, 2022 at 04:24:20AM +, Matthew Wilcox wrote:
> On Thu, Feb 17, 2022 at 09:54:30PM -0500, Theodore Ts'o wrote:
> > process_vm_writev() uses [un]pin_user_pages_remote() which is the same
> > interface uses for RDMA.  But it's not clear this is ever supposed to
> > work for memory which is mmap'ed region backed by a file.
> > pin_user_pages_remote() appears to assume that it is an anonymous
> > region, since the get_user_pages functions in mm/gup.c don't call
> > read_page() to read data into any pages that might not be mmaped in.
> 
> ... it doesn't end up calling handle_mm_fault() in faultin_page()?

Ah yes, sorry, I missed that.  This is what happens when a syzbot bug
is thrown to a file system developer, who then has to wade theough
mm code for which he is not understand

- Ted




Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-17 Thread Matthew Wilcox
On Thu, Feb 17, 2022 at 09:54:30PM -0500, Theodore Ts'o wrote:
> process_vm_writev() uses [un]pin_user_pages_remote() which is the same
> interface uses for RDMA.  But it's not clear this is ever supposed to
> work for memory which is mmap'ed region backed by a file.
> pin_user_pages_remote() appears to assume that it is an anonymous
> region, since the get_user_pages functions in mm/gup.c don't call
> read_page() to read data into any pages that might not be mmaped in.

... it doesn't end up calling handle_mm_fault() in faultin_page()?



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-17 Thread Theodore Ts'o
On Thu, Feb 17, 2022 at 05:06:45PM -0800, John Hubbard wrote:
> Yes. And looking at the pair of backtraces below, this looks very much
> like another aspect of the "get_user_pages problem" [1], originally
> described in Jan Kara's 2018 email [2].

Hmm... I just posted my analysis, which tracks with yours; but I had
forgotten about Jan's 2018 e-mail on the matter.

> I'm getting close to posting an RFC for the direct IO conversion to
> FOLL_PIN, but even after that, various parts of the kernel (reclaim,
> filesystems/block layer) still need to be changed so as to use
> page_maybe_dma_pinned() to help avoid this problem. There's a bit
> more than that, actually.

The challenge is that fixing this "the right away" is probably not
something we can backport into an LTS kernel, whether it's 5.15 or
5.10... or 4.19.

The only thing which can probably survive getting backported is
something like this.  It won't make the right thing happen if someone
is trying to RDMA or call process_vm_writev() into a file-backed
memory region --- but I'm not sure I care.  Certainly if the goal is
to make Android kernels, I'm pretty sure they are't either using RDMA,
and I suspect they are probably masking out the process_vm_writev(2)
system call (at least, for Android O and newer).  So if the goal is to
just to close some Syzbot bugs, what do folks think about this?

- Ted

commit 7711b1fda6f7f04274fa1cba6f092410262b0296
Author: Theodore Ts'o 
Date:   Thu Feb 17 22:54:03 2022 -0500

ext4: work around bugs in mm/gup.c that can cause ext4 to BUG()

[un]pin_user_pages_remote is dirtying pages without properly warning
the file system in advance (or faulting in the file data if the page
is not yet in the page cache).  This was noted by Jan Kara in 2018[1]
and more recently has resulted in bug reports by Syzbot in various
Android kernels[2].

Fixing this for real is non-trivial, and will never be backportable
into stable kernels.  So this is a simple workaround that stops the
kernel from BUG()'ing.  The changed pages will not be properly written
back, but given that the current gup code is missing the "read" in
"read-modify-write", the dirty page in the page cache might contain
corrupted data anyway.

[1] https://www.spinics.net/lists/linux-mm/msg142700.html
[2] https://lore.kernel.org/r/yg0m6ijcnmfas...@google.com

Reported-by: 
syzbot+d59332e2db681cf18f0318a06e994ebbb529a...@syzkaller.appspotmail.com
Reported-by: Lee Jones 
Signed-off-by: Theodore Ts'o 

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 01c9e4f743ba..3b2f336a90d1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1993,6 +1993,15 @@ static int ext4_writepage(struct page *page,
else
len = PAGE_SIZE;
 
+   /* Should never happen but for buggy gup code */
+   if (!page_has_buffers(page)) {
+   ext4_warning_inode(inode,
+  "page %lu does not have buffers attached", page->index);
+   ClearPageDirty(page);
+   unlock_page(page);
+   return 0;
+   }
+
page_bufs = page_buffers(page);
/*
 * We cannot do block allocation or other extent handling in this
@@ -2594,6 +2603,14 @@ static int mpage_prepare_extent_to_map(struct 
mpage_da_data *mpd)
wait_on_page_writeback(page);
BUG_ON(PageWriteback(page));
 
+   /* Should never happen but for buggy gup code */
+   if (!page_has_buffers(page)) {
+   ext4_warning_inode(mpd->inode, "page %lu does 
not have buffers attached", page->index);
+   ClearPageDirty(page);
+   unlock_page(page);
+   continue;
+   }
+
if (mpd->map.m_len == 0)
mpd->first_page = page->index;
mpd->next_page = page->index + 1;



Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-17 Thread Theodore Ts'o
On Wed, Feb 16, 2022 at 04:31:36PM +, Lee Jones wrote:
> 
> After recently receiving a bug report from Syzbot [0] which was raised
> specifically against the Android v5.10 kernel, I spent some time
> trying to get to the crux.  Firstly I reproduced the issue on the
> reported kernel, then did the same using the latest release kernel
> v5.16.
> 
> The full kernel trace can be seen below at [1].
> 

Lee, thanks for your work in trimming down the syzkaller reproducer.
The moral equivalent of what it is doing (except each system call is
done in a separate thread, with synchronization so each gets executed
in order, but perhaps on a different CPU) is:

int dest_fd, src_fd, truncate_fd, mmap_fd;
pid_t tid;
struct iovec local_iov, remote_iov;

dest_fd = open("./bus", O_RDWR|O_CREAT|O_NONBLOCK|O_SYNC|
   O_DIRECT|O_LARGEFILE|O_NOATIME, 0);
src_fd = openat(AT_FDCWD, "/bin/bash", O_RDONLY);
truncate_fd = syscall(__NR_open, "./bus", 
O_RDWR|O_CREAT|O_SYNC|O_NOATIME|O_ASYNC);
ftruncate(truncate_fd, 0x2008002ul);
mmap((void *) 0x2000ul /* addr */, 0x60ul /* length */,
 PROT_WRITE|PROT_EXEC|PROT_SEM||0x70, MAP_FIXED|MAP_SHARED, 
mmap_fd, 0);
sendfile(dest_fd, src_fd, 0 /* offset */, 0x8005ul /* count */);
local_iov.iov_base = (void *) 0x2034afa4;
local_iov.iov_len = 0x1f80;
remote_iov.iov_base = (void *) 0x2080;
remote_iov.iov_len = 0x2034afa5;
process_vm_writev(gettid(), _iov, 1, _iov, 1, 0);
sendfile(dest_fd, src_fd, 0 /* offset */, 0x1f05ul /* count */);

Which is then executed repeataedly over and over again.  (With the
file descriptors closed at each loop, so the file is reopened each time.)

So basically, we have a scratch file which is initialized using an
sendfile using O_DIRECT.  The scratch file is also mmap'ed into the
process's address space, and we then *modify* that an mmap'ed reagion
using process_vm_writev() --- and this is where the problem starts.

process_vm_writev() uses [un]pin_user_pages_remote() which is the same
interface uses for RDMA.  But it's not clear this is ever supposed to
work for memory which is mmap'ed region backed by a file.
pin_user_pages_remote() appears to assume that it is an anonymous
region, since the get_user_pages functions in mm/gup.c don't call
read_page() to read data into any pages that might not be mmaped in.

They also don't follow the mm / file system protocol of calling the
file system's write_begin() or page_mkwrite() before modifying a page,
and so when when process_vm_writev() calls unpin_user_pages_remote(),
this tries to dirty the page, but since page_mkwrite() or
write_begin() hasn't been called, buffers haven't been attached to the
page, and so that triggers the following ext4 WARN_ON:

[ 1451.095939] WARNING: CPU: 1 PID: 449 at fs/ext4/inode.c:3565 
ext4_set_page_dirty+0x48/0x50
  ...
[ 1451.139877] Call Trace:
[ 1451.140833]  
[ 1451.141889]  folio_mark_dirty+0x2f/0x60
[ 1451.143408]  set_page_dirty_lock+0x3e/0x60
[ 1451.145130]  unpin_user_pages_dirty_lock+0x108/0x130
[ 1451.147405]  process_vm_rw_single_vec.constprop.0+0x1b9/0x260
[ 1451.150135]  process_vm_rw_core.constprop.0+0x156/0x280
[ 1451.159576]  process_vm_rw+0xc4/0x110


Then when ext4_writepages() gets called, we trigger the BUG because
buffers haven't been attached to the page.  We can certainly avoid the
BUG by checking to see if buffers are attached first, and if not, skip
writing the page trigger a WARN_ON, and then forcibly clear the page
dirty flag so don't permanently leak memory and allow the file system
to be unmounted.  (Note: we can't fix the problem by attaching the
buffers in set_page_dirty(), since is occasionally called under
spinlocks and without the page being locked, so we can't do any kind
of allocation, so ix-nay on calling create_empty_buffers() which calls
alloc_buffer_head().)

BUT, that is really papering over the problem, since it's not clear
it's valid to try to use get_user_pages() and friends (including
[un]pin_user_pages_remote() on file-backed memory.

And if it is supposed to be valid, then mm/gup.c needs to first call
readpage() if the page is not present, so that if process_vm_writev()
is only modifying a few bytes in the mmap'ed region, we need to fault
in the page first, and then mm/gup.c needs to inform the file system
to make sure that if pinned memory region is not yet allocated, than
whatever needs to happen to allocate space, via page_mkwrite() has
taken place.  (And by the way, that means that pin_user_pages_remote()
may need to return ENOSPC if there is not free space in the file
system, and hence ENOSPC may need to reflected all the way back to
process_vm_writev().

Alternatively, if we don't expect process_vm_writev() to work on
file-backed memory, perhaps it and pin_user_pages_remote() should
return some kind of error?

  - Ted



[Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-16 Thread Lee Jones
Good afternoon,

After recently receiving a bug report from Syzbot [0] which was raised
specifically against the Android v5.10 kernel, I spent some time
trying to get to the crux.  Firstly I reproduced the issue on the
reported kernel, then did the same using the latest release kernel
v5.16.

The full kernel trace can be seen below at [1].

I managed to seemingly bisect the issue down to commit:

  60263d5889e6d ("iomap: fall back to buffered writes for invalidation 
failures")

Although it appears to be the belief of the Filesystem community that
this is likely not the cause of the issue and should therefore not be
reverted.

It took quite some time, but I managed to strip down the reported
kernel config (which was very large) down to x86_defconfig plus only a
few additional config symbols.  Most of which are platform (qemu in
this case) config options the other is KASAN, required to successfully
reproduce this:

  CONFIG_HYPERVISOR_GUEST=y  
  CONFIG_PARAVIRT=y  
  CONFIG_PARAVIRT_DEBUG=y
  CONFIG_PARAVIRT_SPINLOCKS=y
  CONFIG_KASAN=y

The (most likely non-optimised) qemu command currently being used is:

qemu-system-x86_64 -smp 8 -m 16G -enable-kvm -cpu max,migratable=off -no-reboot 
\
-kernel ${BUILDDIR}/arch/x86/boot/bzImage -nographic
\
-hda ${IMAGEDIR}/wheezy-auto-repro.img  
\   
-chardev stdio,id=char0,mux=on,logfile=serial.out,signal=off
\
-serial chardev:char0 -mon chardev=char0
\
-append "root=/dev/sda rw console=ttyS0"
 

Darrick seems to suggest that:

  "The BUG report came from page_buffers failing to find any buffer heads
   attached to the page."

If the reproducer, also massively stripped down from the original
report, would be of any use to you, it can be found further down at
[2].

I don't how true this is, but it is my current belief that user-space
should not be able to force the kernel to BUG.  This seems to be a
temporary DoS issue.  So although not a critically serious security
problem involved memory leakage or data corruption, it could
potentially cause a nuisance if not rectified.

Any well meaning help with this would be gratefully received.

Kind regards,
Lee

[0] https://syzkaller.appspot.com/bug?extid=41c966bf0729a530bd8d

[1]
[   15.200920] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   15.215877] File: /syzkaller.IsS3Yc/0/bus PID: 1497 Comm: repro
[   16.718970] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   16.734250] File: /syzkaller.IsS3Yc/5/bus PID: 1512 Comm: repro
[   17.013871] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   17.028193] File: /syzkaller.IsS3Yc/6/bus PID: 1515 Comm: repro
[   17.320498] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   17.336115] File: /syzkaller.IsS3Yc/7/bus PID: 1518 Comm: repro
[   17.617921] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   17.633063] File: /syzkaller.IsS3Yc/8/bus PID: 1521 Comm: repro
[   18.527260] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   18.544236] File: /syzkaller.IsS3Yc/11/bus PID: 1530 Comm: repro
[   18.810347] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   18.824721] File: /syzkaller.IsS3Yc/12/bus PID: 1533 Comm: repro
[   19.099315] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   19.114151] File: /syzkaller.IsS3Yc/13/bus PID: 1536 Comm: repro
[   19.403882] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   19.418467] File: /syzkaller.IsS3Yc/14/bus PID: 1539 Comm: repro
[   19.703934] Page cache invalidation failure on direct I/O.  Possible data 
corruption due to collision with buffered I/O!
[   19.718400] File: /syzkaller.IsS3Yc/15/bus PID: 1542 Comm: repro
[   26.533129] [ cut here ]
[   26.540473] WARNING: CPU: 1 PID: 1612 at fs/ext4/inode.c:3576 
ext4_set_page_dirty+0xaf/0xc0
[   26.553171] Modules linked in:
[   26.557354] CPU: 1 PID: 1612 Comm: repro Not tainted 5.16.0+ #169
[   26.565238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.15.0-1 04/01/2014
[   26.576182] RIP: 0010:ext4_set_page_dirty+0xaf/0xc0
[   26.583077] Code: 4c 89 ff e8 e3 86 e7 ff 49 f7 07 00 20 00 00 74 19 4c 89 
ff 5b 41 5e 41 5f e9 8d 05 f0 ff 48 83 c0 ff 48 89 c3 e9 76 ff ff ff <0f> 0b eb 
e3 48 83 c0 ff 48 89 c3 eb 9e 0f 0b eb b8 55 48 89 e5 41
[   26.607402] RSP: 0018:88810f4ffa10 EFLAGS: