On Wed, Mar 27, 2019 at 4:22 PM Raghavendra Gowdappa <rgowd...@redhat.com> wrote:
> > > On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez <jaher...@redhat.com> > wrote: > >> Hi Raghavendra, >> >> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa <rgowd...@redhat.com> >> wrote: >> >>> All, >>> >>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>> through which those locks are held disconnects from bricks/server. This >>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>> application unlocks while the connection was still down). However, this >>> means the lock is no longer exclusive as other applications/clients can >>> acquire the same lock. To communicate that locks are no longer valid, we >>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>> that any future operations on that fd will fail, forcing the application to >>> re-open the fd and re-acquire locks it needs [1]. >>> >> >> Wouldn't it be better to retake the locks when the brick is reconnected >> if the lock is still in use ? >> > > There is also a possibility that clients may never reconnect. That's the > primary reason why bricks assume the worst (client will not reconnect) and > cleanup the locks. > > >> BTW, the referenced bug is not public. Should we open another bug to >> track this ? >> > > I've just opened up the comment to give enough context. I'll open a bug > upstream too. > > >> >> >>> >>> Note that with AFR/replicate in picture we can prevent errors to >>> application as long as Quorum number of children "never ever" lost >>> connection with bricks after locks have been acquired. I am using the term >>> "never ever" as locks are not healed back after re-connection and hence >>> first disconnect would've marked the fd bad and the fd remains so even >>> after re-connection happens. So, its not just Quorum number of children >>> "currently online", but Quorum number of children "never having >>> disconnected with bricks after locks are acquired". >>> >> >> I think this requisite is not feasible. In a distributed file system, >> sooner or later all bricks will be disconnected. It could be because of >> failures or because an upgrade is done, but it will happen. >> >> The difference here is how long are fd's kept open. If applications open >> and close files frequently enough (i.e. the fd is not kept open more time >> than it takes to have more than Quorum bricks disconnected) then there's no >> problem. The problem can only appear on applications that open files for a >> long time and also use posix locks. In this case, the only good solution I >> see is to retake the locks on brick reconnection. >> > > Agree. But lock-healing should be done only by HA layers like AFR/EC as > only they know whether there are enough online bricks to have prevented any > conflicting lock. Protocol/client itself doesn't have enough information to > do that. If its a plain distribute, I don't see a way to heal locks without > loosing the property of exclusivity of locks. > > What I proposed is a short term solution. mid to long term solution should > be lock healing feature implemented in AFR/EC. In fact I had this > conversation with +Karampuri, Pranith <pkara...@redhat.com> before > posting this msg to ML. > > >> >>> However, this use case is not affected if the application don't acquire >>> any POSIX locks. So, I am interested in knowing >>> * whether your use cases use POSIX locks? >>> * Is it feasible for your application to re-open fds and re-acquire >>> locks on seeing EBADFD errors? >>> >> >> I think that many applications are not prepared to handle that. >> > > I too suspected that and in fact not too happy with the solution. But went > ahead with this mail as I heard implementing lock-heal in AFR will take > time and hence there are no alternative short term solutions. > Also failing loudly is preferred to silently dropping locks. > > >> Xavi >> >> >>> >>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>> >>> regards, >>> Raghavendra >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> gluster-us...@gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >>
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel