On Mon, 2022-11-21 at 08:35 -0500, Benjamin Coddington wrote:
> Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
> GFP_NOIO flag on sk_allocation which the networking system uses to decide
> when it is safe to use current->task_frag.  The results of this are
> unexpected corruption in task_frag when SUNRPC is involved in memory
> reclaim.
> 
> The corruption can be seen in crashes, but the root cause is often
> difficult to ascertain as a crashing machine's stack trace will have no
> evidence of being near NFS or SUNRPC code.  I believe this problem to
> be much more pervasive than reports to the community may indicate.
> 
> Fix this by having kernel users of sockets that may corrupt task_frag due
> to reclaim set sk_use_task_frag = false.  Preemptively correcting this
> situation for users that still set sk_allocation allows them to convert to
> memalloc_nofs_save/restore without the same unexpected corruptions that are
> sure to follow, unlikely to show up in testing, and difficult to bisect.
> 
> CC: Philipp Reisner <[email protected]>
> CC: Lars Ellenberg <[email protected]>
> CC: "Christoph Böhmwalder" <[email protected]>
> CC: Jens Axboe <[email protected]>
> CC: Josef Bacik <[email protected]>
> CC: Keith Busch <[email protected]>
> CC: Christoph Hellwig <[email protected]>
> CC: Sagi Grimberg <[email protected]>
> CC: Lee Duncan <[email protected]>
> CC: Chris Leech <[email protected]>
> CC: Mike Christie <[email protected]>
> CC: "James E.J. Bottomley" <[email protected]>
> CC: "Martin K. Petersen" <[email protected]>
> CC: Valentina Manea <[email protected]>
> CC: Shuah Khan <[email protected]>
> CC: Greg Kroah-Hartman <[email protected]>
> CC: David Howells <[email protected]>
> CC: Marc Dionne <[email protected]>
> CC: Steve French <[email protected]>
> CC: Christine Caulfield <[email protected]>
> CC: David Teigland <[email protected]>
> CC: Mark Fasheh <[email protected]>
> CC: Joel Becker <[email protected]>
> CC: Joseph Qi <[email protected]>
> CC: Eric Van Hensbergen <[email protected]>
> CC: Latchesar Ionkov <[email protected]>
> CC: Dominique Martinet <[email protected]>
> CC: "David S. Miller" <[email protected]>
> CC: Eric Dumazet <[email protected]>
> CC: Jakub Kicinski <[email protected]>
> CC: Paolo Abeni <[email protected]>
> CC: Ilya Dryomov <[email protected]>
> CC: Xiubo Li <[email protected]>
> CC: Chuck Lever <[email protected]>
> CC: Jeff Layton <[email protected]>
> CC: Trond Myklebust <[email protected]>
> CC: Anna Schumaker <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> 
> Suggested-by: Guillaume Nault <[email protected]>
> Signed-off-by: Benjamin Coddington <[email protected]>

I think this is the most feasible way out of the existing issue, and I
think this patchset should go via the networking tree, targeting the
Linux 6.2.

If someone has disagreement with the above, please speak! 

Thanks,

Paolo

Reply via email to