----- Original Message ----- > From: "Anand Avati" <av...@gluster.org> > To: "Vijay Bellur" <vbel...@redhat.com> > Cc: "Krishnan Parthasarathi" <kpart...@redhat.com>, "Anand Avati" > <aav...@redhat.com>, "Raghavendra Gowdappa" <rgowd...@redhat.com>, "Varun > Shastry" <vshas...@redhat.com>, "Pranith Kumar Karampuri" > <pkara...@redhat.com>, "Venky Shankar" <vshan...@redhat.com>, "Kaushal M" > <kaus...@redhat.com>, "Rajesh Joseph" <rjos...@redhat.com>, "Kotresh > Hiremath Ravishankar" <khire...@redhat.com>, gluster-devel@nongnu.org > Sent: Friday, March 7, 2014 12:21:54 AM > Subject: Re: [Gluster-devel] Barrier design issues wrt volume snapshot
> On Thu, Mar 6, 2014 at 12:21 AM, Vijay Bellur < vbel...@redhat.com > wrote: > > Adding gluster-devel. > > > On 03/06/2014 01:15 PM, Krishnan Parthasarathi wrote: > > > > All, > > > > > > In recent discussions around design (and implementation) of the barrier > > > > > > feature, couple of things came to light. > > > > > > 1) changelog xlator needs barrier xlator to block unlink and rename FOPs > > > > > > in the call path. This is apart from the current list of FOPs that are > > > blocked > > > > > > in their call back path. > > > > > > This is to make sure that the changelog has a bounded queue of unlink and > > > rename FOPs, > > > > > > from the time barriering is enabled, to be drained, committed to > > > changelog > > > file and published. > > > > Why is this necessary? FOPs that are still coming through after enabling barrier (assuming that barrier is done in the call path) would end up in a non-consumable changelog. For these operations, geo-rep would resort to FS crawl based on xtime which does not handle unlinks and renames. > > > 2) It is possible in a pure distribute volume that the following sequence > > > of > > > FOPs could result > > > > > > in snapshots of bricks disagreeing on inode type for a file or directory. > > > > > > t1: snap b1 > > > > > > t2: unlink /a > > > > > > t3: mkdir /a > > > > > > t4: snap b2 > > > > > > where, b1 and b2 are bricks of a pure distribute volume V. > > > > > > The above sequence can happen with the current barrier xlator design, > > > since > > > we allow unlink FOPs > > > > > > to go through to the disk and only block their acknowledgement to the > > > application. This implies > > > > > > a concurrent mkdir on the same name could succeed, since DHT doesn't > > > serialize unlink and mkdir FOPs, > > > > > > unlike AFR. > > > > > > Avati, > > > > > > I hear that you have a solution for problem 2). Could you please start > > > the > > > discussion on this thread? > > > > > > It would help us to decide how to go about with the barrier xlator > > > implementation. > > > > The solution is really a long pending implementation of dentry serialization > in the resolver of protocol server. Today we allow multiple FOPs to happen > in parallel which modify the same dentry. This results in hairy races > (including non atomicity of rename) and has been kept open for a while now. > Implementing the dentry serialization in the resolver will "solve" 2 as a > side effect. Hence that is a better approach than making changes in the > barrier translator. > Avati
_______________________________________________ Gluster-devel mailing list Gluster-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel