On Fri, Feb 11, 2022 at 8:48 PM Linus Torvalds <[email protected]> wrote: > On Fri, Feb 11, 2022 at 9:05 AM Andreas Gruenbacher <[email protected]> > wrote: > > > > * Revert debug commit that causes unexpected data corruption. > > Well, apparently not just unexpected, but unexplained too. > > That's a bit worrisome. It sounds like the corruption cause is still > there, just hidden by the lack of __cond_resched()?
Yes, that's what it looks like. My initial suspicion was that we're somewhere using gfs2_glock_dq() in non-sleepable context when we know that we're not dropping the last reference and so gfs2_glock_dq() won't sleep, but there's no such instance in the code, and testing would also have revealed such cases. The corruption we've seen always affects whole pages/blocks. Maybe it's an ordering / memory barrier issue. Thanks, Andreas
