+ Ravi, Anuradha

On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote:
All,

Pranith and me were discussing about implementation of compound operations like "create + lock", 
"mkdir + lock", "open + lock" etc. These operations are useful in situations like:

1. To prevent locking on all subvols during directory creation as part of self 
heal in dht. Currently we are following approach of locking _all_ subvols by 
both rmdir and lookup-heal [1].
2. To lock a file in advance so that there is less performance hit during 
transactions in afr.

While thinking about implementing such compound operations, it occurred to me 
that one of the problems would be how do we handle a racing mkdir/create and a 
(named lookup - simply referred as lookup from now on - followed by lock). This 
is because,
1. creation of directory/file on backend
2. linking of the inode with the gfid corresponding to that file/directory

are not atomic. It is not guaranteed that inode passed down during mkdir/create 
call need not be the one that survives in inode table. Since posix-locks xlator 
maintains all the lock-state in inode, it would be a problem if a different 
inode is linked in inode table than the one passed during mkdir/create. One way 
to solve this problem is to serialize fops (like mkdir/create, lookup, rename, 
rmdir, unlink) that are happening on a particular dentry. This serialization 
would also solve other bugs like:

1. issues solved by [2][3] and possibly many such issues.
2. Stale dentries left out in bricks' inode table because of a racing lookup 
and dentry modification ops (like rmdir, unlink, rename etc).

Initial idea I've now is to maintain fops in-progress on a dentry in parent 
inode (may be resolver code in protocol/server). Based on this we can serialize 
the operations. Since we need to serialize _only_ operations on a dentry (we 
don't serialize nameless lookups), it is guaranteed that we do have a parent 
inode always. Any comments/discussion on this would be appreciated.

[1] http://review.gluster.org/11725
[2] http://review.gluster.org/9913
[3] http://review.gluster.org/5240

regards,
Raghavendra.

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Reply via email to