On Mon, 2022-11-21 at 08:35 -0500, Benjamin Coddington wrote: > Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the > GFP_NOIO flag on sk_allocation which the networking system uses to decide > when it is safe to use current->task_frag. The results of this are > unexpected corruption in task_frag when SUNRPC is involved in memory > reclaim. > > The corruption can be seen in crashes, but the root cause is often > difficult to ascertain as a crashing machine's stack trace will have no > evidence of being near NFS or SUNRPC code. I believe this problem to > be much more pervasive than reports to the community may indicate. > > Fix this by having kernel users of sockets that may corrupt task_frag due > to reclaim set sk_use_task_frag = false. Preemptively correcting this > situation for users that still set sk_allocation allows them to convert to > memalloc_nofs_save/restore without the same unexpected corruptions that are > sure to follow, unlikely to show up in testing, and difficult to bisect. > > CC: Philipp Reisner <philipp.reis...@linbit.com> > CC: Lars Ellenberg <lars.ellenb...@linbit.com> > CC: "Christoph Böhmwalder" <christoph.boehmwal...@linbit.com> > CC: Jens Axboe <ax...@kernel.dk> > CC: Josef Bacik <jo...@toxicpanda.com> > CC: Keith Busch <kbu...@kernel.org> > CC: Christoph Hellwig <h...@lst.de> > CC: Sagi Grimberg <s...@grimberg.me> > CC: Lee Duncan <ldun...@suse.com> > CC: Chris Leech <cle...@redhat.com> > CC: Mike Christie <michael.chris...@oracle.com> > CC: "James E.J. Bottomley" <j...@linux.ibm.com> > CC: "Martin K. Petersen" <martin.peter...@oracle.com> > CC: Valentina Manea <valentina.mane...@gmail.com> > CC: Shuah Khan <sh...@kernel.org> > CC: Greg Kroah-Hartman <gre...@linuxfoundation.org> > CC: David Howells <dhowe...@redhat.com> > CC: Marc Dionne <marc.dio...@auristor.com> > CC: Steve French <sfre...@samba.org> > CC: Christine Caulfield <ccaul...@redhat.com> > CC: David Teigland <teigl...@redhat.com> > CC: Mark Fasheh <m...@fasheh.com> > CC: Joel Becker <jl...@evilplan.org> > CC: Joseph Qi <joseph...@linux.alibaba.com> > CC: Eric Van Hensbergen <eri...@gmail.com> > CC: Latchesar Ionkov <lu...@ionkov.net> > CC: Dominique Martinet <asmad...@codewreck.org> > CC: "David S. Miller" <da...@davemloft.net> > CC: Eric Dumazet <eduma...@google.com> > CC: Jakub Kicinski <k...@kernel.org> > CC: Paolo Abeni <pab...@redhat.com> > CC: Ilya Dryomov <idryo...@gmail.com> > CC: Xiubo Li <xiu...@redhat.com> > CC: Chuck Lever <chuck.le...@oracle.com> > CC: Jeff Layton <jlay...@kernel.org> > CC: Trond Myklebust <trond.mykleb...@hammerspace.com> > CC: Anna Schumaker <a...@kernel.org> > CC: drbd-...@lists.linbit.com > CC: linux-bl...@vger.kernel.org > CC: linux-ker...@vger.kernel.org > CC: n...@other.debian.org > CC: linux-n...@lists.infradead.org > CC: open-is...@googlegroups.com > CC: linux-s...@vger.kernel.org > CC: linux-...@vger.kernel.org > CC: linux-...@lists.infradead.org > CC: linux-c...@vger.kernel.org > CC: samba-techni...@lists.samba.org > CC: cluster-devel@redhat.com > CC: ocfs2-de...@oss.oracle.com > CC: v9fs-develo...@lists.sourceforge.net > CC: net...@vger.kernel.org > CC: ceph-de...@vger.kernel.org > CC: linux-...@vger.kernel.org > > Suggested-by: Guillaume Nault <gna...@redhat.com> > Signed-off-by: Benjamin Coddington <bcodd...@redhat.com>
I think this is the most feasible way out of the existing issue, and I think this patchset should go via the networking tree, targeting the Linux 6.2. If someone has disagreement with the above, please speak! Thanks, Paolo