On Thu, Jun 5, 2014 at 10:46 AM, Krutika Dhananjay <kdhan...@redhat.com> wrote:
> To summarize, the real "problems" are: > > - Deref of pl_inode->refkeeper as an inode_t in the cleanup logger. It > is an internal artifact of pl_update_refkeeper() working and nobody else > is supposed to "touch" it. > > For this, the solution I have in mind is to > a. have a placeholder for gfid in pl_inode_t object, > b. remember the gfid of the inode at the time of pl_inode_t creation in > pl_ctx_get(), and > c. print pl_inode->gfid in the log message in > pl_{inodelk,entrylk}_log_cleanup(). > OK. - Missing call to pl_update_refkeeper() in client_disconnect_cbk(). This > can result in a potential leak of inode ref (not a leak if the same > inode gets any locking activity by another client in the future). > > As far as updates to refkeeper is concerned during DISCONNECT is > concerned, Pranith and I did have discussions at length but could > not find a single safe place in cleanup functions to call > pl_update_refkeeper() that does not lead to illegal memory access, > whether before or after the call to __pl_inodelk_unref() in the third for > loop. > > Case #1 explained: > ============= > > <snip> > list_for_each_entry_safe() { > ... > ... > pl_update_refkeeper(inode); > pthread_mutex_lock (&pl_inode->mutex); > __pl_inodelk_unref(l); > pthread_mutex_unlock (&pl_inode->mutex); > </snip> > > This causes the last unref in pl_update_refkeeper() to implicitly call > pl_forget() where pl_inode_t object is destroyed while it is > still needed in terms of having to obtain pl_inode->mutex before unrefing > the lock object. > > Case 2 explained: > ============= > <snip> > list_for_each_entry_safe() { > ... > ... > pthread_mutex_lock (&pl_inode->mutex); > __pl_inodelk_unref(l); > pthread_mutex_unlock (&pl_inode->mutex); > pl_update_refkeeper(inode); > </snip> > > After the first for loop is processed, the domain's lists could have been > emptied. And at this point, there could well come a second thread that > could update refkeeper, > triggering last unref on the inode leading to a call to pl_forget() (where > pl_inode_t is freed), after which the DISCONNECT thread would be trying to > perform illegal > memory access in its call to pl_update_refkeeper() during its turn. > > So the solution Pranith and I came up with involves making sure that the > inode_t object is alive for as long as there is atleast one lock on it: > > 1. refkeeper will not be used for inodelks and entrylks (but will continue > to be used in fcntl locks). > 2. Each pl_inode_t object will also host an inode_t member which is > initialised during a call to pl_inode_get() in pl_common_{inodelk, > entrylks}(). > 3. Everytime a lock is granted/blocked, pl_inode->inode is ref'd (in > pl_inode_setlk() after successful locking. > 4. Everytime a lock is unlocked, pl_inode->inode is unref'd. > 5. In disconnect codepath, as part of "released" list processing, the > inodes associated with the client are unref'd in the same loop right after > every lock object is unref'd. > > With all this, there is one problem with the lock object that is found to > be in both blocked and granted list in the transient state, when there's a > race between the unlocking thread > and the disconnecting thread. This can be best explained with the > following example: > <example> > Consider an inode I on which there's a granted lock G and a blocked lock > B, from clients C1 and C2 respectively. With this approach, at this point > the number of refs taken > by the locks xlator on I would be 2. > C1 unlocks B, at which point I's refcount becomes 1. > Now as part of granting other blocked locks in unlock codepath, B is put > in granted list. > Now B's client C2 sends a DISCONNECT event, which would cause I to be > unref'd again. This being the last unref, pl_forget() is called on I > causing its pl_inode to be destroyed > At this point, the unlocking thread could try to do a mutex lock on > pl_inode->mutex in grant_blocked_{inode,entry}_locks() (whereas pl_inode is > now already destroyed) leading to a crash. > </example> > > The problem described in the example can be fixed by making sure > grant_blocked_{inode,entry}_locks() refs the inode first thing and then > unrefs it just before returning. > This would fix illegal memory access. > This sounds a bit complicated. I think there is a much simpler solution: - First, make update_refkeeper() check for blocked locks (which I mentioned as "optional" previously) - Make grant_blocked_locks() double up and do the job of update_refkeeper() internally. Something which looks like this: diff --git a/xlators/features/locks/src/common.c b/xlators/features/locks/src/common.c index f6c71c1..38df385 100644 --- a/xlators/features/locks/src/common.c +++ b/xlators/features/locks/src/common.c @@ -126,8 +126,14 @@ __pl_inode_is_empty (pl_inode_t *pl_inode) if (!list_empty (&dom->entrylk_list)) is_empty = 0; + if (!list_empty (&dom->blocked_entrylks)) + is_empty = 0; + if (!list_empty (&dom->inodelk_list)) is_empty = 0; + + if (!list_empty (&dom->blocked_inodelks)) + is_empty = 0; } return is_empty; @@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) struct list_head granted_list; posix_lock_t *tmp = NULL; posix_lock_t *lock = NULL; + inode_t *unref = NULL; INIT_LIST_HEAD (&granted_list); pthread_mutex_lock (&pl_inode->mutex); { __grant_blocked_locks (this, pl_inode, &granted_list); + + if (__pl_inode_is_empty (pl_inode) && pl_inode->refkeeper) { + unref = pl_inode->refkeeper; + pl_inode->refkeeper = NULL; + } } pthread_mutex_unlock (&pl_inode->mutex); @@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t *this, pl_inode_t *pl_inode) GF_FREE (lock); } + if (unref) + inode_unref (unref); + return; } This should make pl_disconnect_cbk() pretty much race free w.r.t refkpeer. Thoughts?
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel