On Wed, Mar 27, 2019 at 5:13 PM Xavi Hernandez <jaher...@redhat.com> wrote:
> On Wed, Mar 27, 2019 at 11:52 AM Raghavendra Gowdappa <rgowd...@redhat.com> > wrote: > >> >> >> On Wed, Mar 27, 2019 at 12:56 PM Xavi Hernandez <jaher...@redhat.com> >> wrote: >> >>> Hi Raghavendra, >>> >>> On Wed, Mar 27, 2019 at 2:49 AM Raghavendra Gowdappa < >>> rgowd...@redhat.com> wrote: >>> >>>> All, >>>> >>>> Glusterfs cleans up POSIX locks held on an fd when the client/mount >>>> through which those locks are held disconnects from bricks/server. This >>>> helps Glusterfs to not run into a stale lock problem later (For eg., if >>>> application unlocks while the connection was still down). However, this >>>> means the lock is no longer exclusive as other applications/clients can >>>> acquire the same lock. To communicate that locks are no longer valid, we >>>> are planning to mark the fd (which has POSIX locks) bad on a disconnect so >>>> that any future operations on that fd will fail, forcing the application to >>>> re-open the fd and re-acquire locks it needs [1]. >>>> >>> >>> Wouldn't it be better to retake the locks when the brick is reconnected >>> if the lock is still in use ? >>> >> >> There is also a possibility that clients may never reconnect. That's the >> primary reason why bricks assume the worst (client will not reconnect) and >> cleanup the locks. >> > > True, so it's fine to cleanup the locks. I'm not saying that locks > shouldn't be released on disconnect. The assumption is that if the client > has really died, it will also disconnect from other bricks, who will > release the locks. So, eventually, another client will have enough quorum > to attempt a lock that will succeed. In other words, if a client gets > disconnected from too many bricks simultaneously (loses Quorum), then that > client can be considered as bad and can return errors to the application. > This should also cause to release the locks on the remaining connected > bricks. > > On the other hand, if the disconnection is very short and the client has > not died, it will keep enough locked files (it has quorum) to avoid other > clients to successfully acquire a lock. In this case, if the brick is > reconnected, all existing locks should be reacquired to recover the > original state before the disconnection. > > >> >>> BTW, the referenced bug is not public. Should we open another bug to >>> track this ? >>> >> >> I've just opened up the comment to give enough context. I'll open a bug >> upstream too. >> >> >>> >>> >>>> >>>> Note that with AFR/replicate in picture we can prevent errors to >>>> application as long as Quorum number of children "never ever" lost >>>> connection with bricks after locks have been acquired. I am using the term >>>> "never ever" as locks are not healed back after re-connection and hence >>>> first disconnect would've marked the fd bad and the fd remains so even >>>> after re-connection happens. So, its not just Quorum number of children >>>> "currently online", but Quorum number of children "never having >>>> disconnected with bricks after locks are acquired". >>>> >>> >>> I think this requisite is not feasible. In a distributed file system, >>> sooner or later all bricks will be disconnected. It could be because of >>> failures or because an upgrade is done, but it will happen. >>> >>> The difference here is how long are fd's kept open. If applications open >>> and close files frequently enough (i.e. the fd is not kept open more time >>> than it takes to have more than Quorum bricks disconnected) then there's no >>> problem. The problem can only appear on applications that open files for a >>> long time and also use posix locks. In this case, the only good solution I >>> see is to retake the locks on brick reconnection. >>> >> >> Agree. But lock-healing should be done only by HA layers like AFR/EC as >> only they know whether there are enough online bricks to have prevented any >> conflicting lock. Protocol/client itself doesn't have enough information to >> do that. If its a plain distribute, I don't see a way to heal locks without >> loosing the property of exclusivity of locks. >> > > Lock-healing of locks acquired while a brick was disconnected need to be > handled by AFR/EC. However, locks already present at the moment of > disconnection could be recovered by client xlator itself as long as the > file has not been closed (which client xlator already knows). > What if another client (say mount-2) took locks at the time of disconnect from mount-1 and modified the file and unlocked? client xlator doing the heal may not be a good idea. > > Xavi > > >> What I proposed is a short term solution. mid to long term solution >> should be lock healing feature implemented in AFR/EC. In fact I had this >> conversation with +Karampuri, Pranith <pkara...@redhat.com> before >> posting this msg to ML. >> >> >>> >>>> However, this use case is not affected if the application don't acquire >>>> any POSIX locks. So, I am interested in knowing >>>> * whether your use cases use POSIX locks? >>>> * Is it feasible for your application to re-open fds and re-acquire >>>> locks on seeing EBADFD errors? >>>> >>> >>> I think that many applications are not prepared to handle that. >>> >> >> I too suspected that and in fact not too happy with the solution. But >> went ahead with this mail as I heard implementing lock-heal in AFR will >> take time and hence there are no alternative short term solutions. >> > >> >>> Xavi >>> >>> >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1689375#c7 >>>> >>>> regards, >>>> Raghavendra >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> gluster-us...@gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -- Pranith
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel