On Mon, 2022-11-21 at 08:35 -0500, Benjamin Coddington wrote: > Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the > GFP_NOIO flag on sk_allocation which the networking system uses to decide > when it is safe to use current->task_frag. The results of this are > unexpected corruption in task_frag when SUNRPC is involved in memory > reclaim. > > The corruption can be seen in crashes, but the root cause is often > difficult to ascertain as a crashing machine's stack trace will have no > evidence of being near NFS or SUNRPC code. I believe this problem to > be much more pervasive than reports to the community may indicate. > > Fix this by having kernel users of sockets that may corrupt task_frag due > to reclaim set sk_use_task_frag = false. Preemptively correcting this > situation for users that still set sk_allocation allows them to convert to > memalloc_nofs_save/restore without the same unexpected corruptions that are > sure to follow, unlikely to show up in testing, and difficult to bisect. > > CC: Philipp Reisner <[email protected]> > CC: Lars Ellenberg <[email protected]> > CC: "Christoph Böhmwalder" <[email protected]> > CC: Jens Axboe <[email protected]> > CC: Josef Bacik <[email protected]> > CC: Keith Busch <[email protected]> > CC: Christoph Hellwig <[email protected]> > CC: Sagi Grimberg <[email protected]> > CC: Lee Duncan <[email protected]> > CC: Chris Leech <[email protected]> > CC: Mike Christie <[email protected]> > CC: "James E.J. Bottomley" <[email protected]> > CC: "Martin K. Petersen" <[email protected]> > CC: Valentina Manea <[email protected]> > CC: Shuah Khan <[email protected]> > CC: Greg Kroah-Hartman <[email protected]> > CC: David Howells <[email protected]> > CC: Marc Dionne <[email protected]> > CC: Steve French <[email protected]> > CC: Christine Caulfield <[email protected]> > CC: David Teigland <[email protected]> > CC: Mark Fasheh <[email protected]> > CC: Joel Becker <[email protected]> > CC: Joseph Qi <[email protected]> > CC: Eric Van Hensbergen <[email protected]> > CC: Latchesar Ionkov <[email protected]> > CC: Dominique Martinet <[email protected]> > CC: "David S. Miller" <[email protected]> > CC: Eric Dumazet <[email protected]> > CC: Jakub Kicinski <[email protected]> > CC: Paolo Abeni <[email protected]> > CC: Ilya Dryomov <[email protected]> > CC: Xiubo Li <[email protected]> > CC: Chuck Lever <[email protected]> > CC: Jeff Layton <[email protected]> > CC: Trond Myklebust <[email protected]> > CC: Anna Schumaker <[email protected]> > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > CC: [email protected] > > Suggested-by: Guillaume Nault <[email protected]> > Signed-off-by: Benjamin Coddington <[email protected]>
I think this is the most feasible way out of the existing issue, and I think this patchset should go via the networking tree, targeting the Linux 6.2. If someone has disagreement with the above, please speak! Thanks, Paolo
