+ Ravi, Anuradha
On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote:
All,
Pranith and me were discussing about implementation of compound operations like "create + lock",
"mkdir + lock", "open + lock" etc. These operations are useful in situations like:
1. To prevent locking on all subvols during directory creation as part of self
heal in dht. Currently we are following approach of locking _all_ subvols by
both rmdir and lookup-heal [1].
2. To lock a file in advance so that there is less performance hit during
transactions in afr.
While thinking about implementing such compound operations, it occurred to me
that one of the problems would be how do we handle a racing mkdir/create and a
(named lookup - simply referred as lookup from now on - followed by lock). This
is because,
1. creation of directory/file on backend
2. linking of the inode with the gfid corresponding to that file/directory
are not atomic. It is not guaranteed that inode passed down during mkdir/create
call need not be the one that survives in inode table. Since posix-locks xlator
maintains all the lock-state in inode, it would be a problem if a different
inode is linked in inode table than the one passed during mkdir/create. One way
to solve this problem is to serialize fops (like mkdir/create, lookup, rename,
rmdir, unlink) that are happening on a particular dentry. This serialization
would also solve other bugs like:
1. issues solved by [2][3] and possibly many such issues.
2. Stale dentries left out in bricks' inode table because of a racing lookup
and dentry modification ops (like rmdir, unlink, rename etc).
Initial idea I've now is to maintain fops in-progress on a dentry in parent
inode (may be resolver code in protocol/server). Based on this we can serialize
the operations. Since we need to serialize _only_ operations on a dentry (we
don't serialize nameless lookups), it is guaranteed that we do have a parent
inode always. Any comments/discussion on this would be appreciated.
[1] http://review.gluster.org/11725
[2] http://review.gluster.org/9913
[3] http://review.gluster.org/5240
regards,
Raghavendra.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel