----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > To: "Krutika Dhananjay" <kdhan...@redhat.com> > Cc: "Mohammed Rafi K C" <rkavu...@redhat.com>, "Gluster Devel" > <gluster-devel@gluster.org>, "Dan Lambright" <dlamb...@redhat.com>, "Nithya > Balachandran" <nbala...@redhat.com>, "Ben Turner" <btur...@redhat.com>, "Ben > England" <bengl...@redhat.com>, "Manoj Pillai" <mpil...@redhat.com>, > "Pranith Kumar Karampuri" <pkara...@redhat.com>, "Ravishankar > Narayanankutty" <ranar...@redhat.com>, xhernan...@datalab.es > Sent: Thursday, August 13, 2015 3:51:43 PM > Subject: Re: Inconsistent behavior due to lack of lookup on entry followed by > readdirp
> ----- Original Message ----- > > From: "Krutika Dhananjay" <kdhan...@redhat.com> > > To: "Raghavendra Gowdappa" <rgowd...@redhat.com> > > Cc: "Mohammed Rafi K C" <rkavu...@redhat.com>, "Gluster Devel" > > <gluster-devel@gluster.org>, "Dan Lambright" > > <dlamb...@redhat.com>, "Nithya Balachandran" <nbala...@redhat.com>, "Ben > > Turner" <btur...@redhat.com>, "Ben England" > > <bengl...@redhat.com>, "Manoj Pillai" <mpil...@redhat.com>, "Pranith Kumar > > Karampuri" <pkara...@redhat.com>, > > "Ravishankar Narayanankutty" <ranar...@redhat.com>, xhernan...@datalab.es > > Sent: Thursday, August 13, 2015 9:53:41 AM > > Subject: Re: Inconsistent behavior due to lack of lookup on entry followed > > by readdirp > > > > ----- Original Message ----- > > > > > From: "Raghavendra Gowdappa" <rgowd...@redhat.com> > > > To: "Krutika Dhananjay" <kdhan...@redhat.com> > > > Cc: "Mohammed Rafi K C" <rkavu...@redhat.com>, "Gluster Devel" > > > <gluster-devel@gluster.org>, "Dan Lambright" <dlamb...@redhat.com>, > > > "Nithya > > > Balachandran" <nbala...@redhat.com>, "Ben Turner" <btur...@redhat.com>, > > > "Ben > > > England" <bengl...@redhat.com>, "Manoj Pillai" <mpil...@redhat.com>, > > > "Pranith Kumar Karampuri" <pkara...@redhat.com>, "Ravishankar > > > Narayanankutty" <ranar...@redhat.com>, xhernan...@datalab.es > > > Sent: Thursday, August 13, 2015 9:06:37 AM > > > Subject: Re: Inconsistent behavior due to lack of lookup on entry > > > followed > > > by > > > readdirp > > > > > ----- Original Message ----- > > > > From: "Krutika Dhananjay" <kdhan...@redhat.com> > > > > To: "Mohammed Rafi K C" <rkavu...@redhat.com> > > > > Cc: "Gluster Devel" <gluster-devel@gluster.org>, "Dan Lambright" > > > > <dlamb...@redhat.com>, "Nithya Balachandran" > > > > <nbala...@redhat.com>, "Raghavendra Gowdappa" <rgowd...@redhat.com>, > > > > "Ben > > > > Turner" <btur...@redhat.com>, "Ben > > > > England" <bengl...@redhat.com>, "Manoj Pillai" <mpil...@redhat.com>, > > > > "Pranith Kumar Karampuri" > > > > <pkara...@redhat.com>, "Ravishankar Narayanankutty" > > > > <ranar...@redhat.com>, > > > > xhernan...@datalab.es > > > > Sent: Wednesday, August 12, 2015 9:02:44 PM > > > > Subject: Re: Inconsistent behavior due to lack of lookup on entry > > > > followed > > > > by readdirp > > > > > > > > I faced the same issue with the sharding translator. I fixed it by > > > > making > > > > its > > > > readdirp callback initialize individual entries' inode ctx, some of > > > > these > > > > being xattr values, which are filled in entry->dict by the posix > > > > translator. > > > > Here is the patch that got merged recently: > > > > http://review.gluster.org/11854 > > > > Would that be as easy to do in DHT as well? > > > > > The problem is not just filling out state in the inode. The bigger > > > problem > > > is > > > healing, which is supposed to maintain a directory/file to be in state > > > consistent with our design before a successful reply to lookup. The > > > operations can involve creating directories on missing subvols, setting > > > appropriate layout, etc. Effectively for readdirp to replace lookup, it > > > should be calling dht_lookup on each of the dentry it is passing back to > > > application. > > > > OK. > > > > > > > > > > As far as AFR is concerned, it indirectly forces LOOKUP on entries > > > > which > > > > are > > > > being retrieved for the first time through a READDIRP (and as a result > > > > do > > > > not have their inode ctx etc initialised yet) by setting entry->inode > > > > to > > > > NULL. See afr_readdir_transform_entries(). > > > > > Hmm. Then we already have "disabled" readdirp through code :). Without an > > > inode corresponding to entry, readdirp will be effectively readdir > > > stripping > > > any performance benefits by having readdirp as a "batched" lookup (of all > > > the dentries). > > No. Not every single READDIRP will be transformed into a READDIR by AFR. > > AFR > > resets the inode corresponding to an entry, before responding to its > > parent, > > _only_ under the following two conditions: > > 1) if this entry in question is being retrieved by this client for the > > first > > time through a READDIRP. In other words, this client has not _yet_ > > performed > > a LOOKUP on it. > > 2) if that sub-volume of AFR on which the parent directory is being > > READDIRP'd (remember AFR would only need to serve inode and directory reads > > from one of the replicas) does _not_ contain a good copy of the entry. > > In other words this entry needs to be healed on parent's read child. This > > is > > because we do not want the caching translators or the application itself to > > get incorrect entry attributes. > Thanks Krutika. We'll be borrowing the idea of setting entry->inode to NULL, > when dht determines that inode needs to be healed. Since afr is already > doing that for all dentries during first READDIRP (barring any lookups on > that inode before), I don't think doing this will have any further > performance degradation (As most of the setups will be > distributed-replicated). Yep. As long as we don't have http://review.gluster.org/#/c/11846/ , AFR will mask the behavior that you would be introducing in DHT. However, once Pranith is back, he should be able to give us good reasons for merging this patch. -Krutika > > > > This means that more often than not, AFR _would_ be leaving the inode > > corresponding to the entry as it is, and not setting it to NULL. > > > > > > This is the default behavior which is being made optional as part of > > > > http://review.gluster.org/#/c/11846/ which is still under review (see > > > > BZ > > > > 1250803, a performance bug :) ). > > > > > If it is made optional, when we enable setting entry->inode we still see > > > consistency issues. Also, it seems to me that there is no point in having > > > each individual xlator option controlling this behaviour. Instead we can > > > make each xlator behave in compliance to global mount option > > > "--use-readdirp=yes/no". Is there any specific reason to have an option > > > to > > > control this behaviour in afr? > > Agreed that there will be consistency issues. > > The reason to move away from letting 1) and 2) above be the default > > behavior > > is performance. :) And I guess it is also partly because AFR-v1 does not > > have these restrictions in READDIRP. But the patch is still under review. > > > > -Krutika > > > > > > > > > > -Krutika > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "Mohammed Rafi K C" <rkavu...@redhat.com> > > > > > To: "Gluster Devel" <gluster-devel@gluster.org> > > > > > Cc: "Dan Lambright" <dlamb...@redhat.com>, "Nithya Balachandran" > > > > > <nbala...@redhat.com>, "Raghavendra Gowdappa" <rgowd...@redhat.com>, > > > > > "Ben > > > > > Turner" <btur...@redhat.com>, "Ben England" <bengl...@redhat.com>, > > > > > "Manoj > > > > > Pillai" <mpil...@redhat.com>, "Pranith Kumar Karampuri" > > > > > <pkara...@redhat.com>, "Ravishankar Narayanankutty" > > > > > <ranar...@redhat.com>, > > > > > kdhan...@redhat.com, xhernan...@datalab.es > > > > > Sent: Wednesday, August 12, 2015 7:29:48 PM > > > > > Subject: Inconsistent behavior due to lack of lookup on entry > > > > > followed > > > > > by > > > > > readdirp > > > > > > > > > Hi All, > > > > > > > > > We are facing some inconsistent behavior for fops like rename, unlink > > > > > etc due to lack of lookup followed by a readdirp, more specifically > > > > > if > > > > > inodes/gfid are populated via readdirp call and this nodeid is shared > > > > > with kernal, md-cache will cache this based on base-name. Then > > > > > subsequent named lookup will be served from md-cache and it > > > > > winds-back > > > > > immediately. So there is a chance to have an FOP triggered with out > > > > > having a lookup on an entry. DHT does lot of things like creating > > > > > link > > > > > files and populate inode_ctx etc, during lookup. In such scenario it > > > > > is > > > > > must to have at least one lookup to be happened on an entry. Since > > > > > readdirp preventing the lookup, it has been very hard for fops to > > > > > proceed without a first lookup on the entry. We are also suspecting > > > > > some > > > > > problems due to same with afr/ec self healing also. So If we remove > > > > > readdirp from md-cache ([1], [2]) it causes, an additional hop for > > > > > first > > > > > lookup for every entry. I'm mostly concerned with this one extra > > > > > network > > > > > call, and the performance degradation caused by the same. > > > > > > > > > Now with this, the only advantage with readdirp is, it removes one > > > > > context switch between kernal and userspace. Is it really worth to > > > > > sacrifice this for consistency ? > > > > > > > > > What do you think about removing readdirp functionality? > > > > > > > > > Please provide your input/suggestion/ideas. > > > > > > > > > [1] : http://review.gluster.org/#/c/11892/ > > > > > > > > > [2] : http://review.gluster.org/#/c/11894/ > > > > > > > > > Thanks in Advance > > > > > Rafi KC > > > > > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel