Hi Ross, On Tue, 2 Apr 2019 at 00:59, Andreas Gruenbacher <agrue...@redhat.com> wrote: > thanks for the great analysis. > > On Tue, 26 Mar 2019 at 20:14, Bob Peterson <rpete...@redhat.com> wrote: > > ----- Original Message ----- > > > 6. gfs2_log_flush() continues and calls revoke_lo_after_commit() and > > > uses the freed glock (stack trace above). > > > > > > Should evict_inode call gfs2_ail_flush() or something so that the revoke > > > is written before it drops its reference to the glock? > > > > > > Or is there something else that is meant to keep the glock around if the > > > inode is evicted while there is a linked gfs2_bufdata sitting on some > > > AIL list? > > > > > > Let me know if any more specific information is needed since I now have > > > a test setup that can usually reproduce this within 10 minutes. > > > > Very interesting. > > > > It's not unusual for glocks to outlive their inodes, but I'm not sure > > what the right answer is in this case. Either the revoke should > > take an additional reference to the glock, and not let go until the > > revoke is written by some log flush, or else the evict needs to do the > > log flush to make sure the revoke is committed. But we've had issues with > > evict in the past, so we need to be careful about how we fix it. > > Andreas and I will look into the best way to fix it. Thanks again for your > > help. > > Usually, glocks are attached to inodes or other objects. When the > state of the glock changes, the go_lock, go_sync, and go_inval > operations determine what happens to the attached object. In > gfs2_evict_inode, we disassociate the inode from the glock, do the > necessary cleanup work locally, and put the glock asynchronously when > needed, though. We do that in PF_MEMALLOC context to avoid > deadlocking; see commit 71c1b2136835 ("gfs2: gfs2_evict_inode: Put > glocks asynchronously"). Even if we didn't do that, there could still > be other glock holders like the glock state machine. There is no > guarantee that the glock will go away anytime soon, or actually at > all: it could get reused by another instance of the same inode. So > waiting for the glock to go away first in gfs2_evict_inode definitely > is not an option. > > This basically answers your above question: gfs2_evict_inode should > definitely clean up before putting the glock. I'm just not sure how to > best achieve that, yet.
after more discussions, Bob convinced me that it makes perfect sense to not write out outstanding revokes in gfs2_evict_inode. We really want to delay writing revokes as long as possible so that as many of the revokes as possible will go away (either because the blocks are re-added to the journal and the revokes are "unrevoked", or because the journal wraps around). So the right fix here is indeed taking additional glock references. I've come up with a patch that takes one additional reference for each glock instead of one reference for each bufdata object; that should scale better. I'll post that and a follow-up patch separately. Could you please verify? Thanks, Andreas