On Fri, May 25, 2018 at 1:58 PM, Amar Tumballi <atumb...@redhat.com> wrote:
> Thanks for the detailed email Xavi!! Helps a lot to clarify about the > intention of the patch. > > Raghavendra / Nithya / Susant, > > Considering the patch surely needs one of your review to make sure DHT > locking is not impacted due to the patch, can you review the patch? > > As per the locking code itself, Kruthika actually did take a look. Xavi > and me are waiting on one of your reviews to go forward on this. Others, > more reviews on the patch is helpful too. > Me and Xavi have been interacting on this - mainly for me to understand the issues he ran into. I'll do a review. > -Amar > > On Fri, May 18, 2018 at 12:21 PM, Xavi Hernandez <xhernan...@redhat.com> > wrote: > >> Hi all, >> >> tl;dr: to solve a bug [1] I've written a patch [2] that needs some hacks >> to prevent deadlocks. I think the approach is quite good and maybe we >> should check if the hacks can be eliminated by making use of the new >> behavior in DHT and other xlators. >> >> Now a more detailed explanation: >> >> As part of debugging an issue [1] for EC where directories got in a bad >> state when there were concurrent renames and mkdirs, I found that the >> problem really came from the fact that DHT does additional operations after >> creating a directory, and it were these operations what was conflicting >> with renames. This is not really a problem in DHT, but it served to detect >> the issue. >> >> In fact, any concurrent modification of an entry while it's being removed >> can cause this bug. Normally, if you modify a file that is being deleted, >> the modification could fail (depending on the state it can fail with >> multiple errors, even EIO), but the file will be deleted anyway, so no big >> problem here. The problem is more important in the scenario described in >> the bug (though unlikely to be a real use case). >> >> The root cause of the problem is that entry modification operations take >> locks on the entry itself, while entry creation/deletion take locks on the >> parent directory. This allows that both operations arrive at brick level in >> parallel, and we can have some bricks executing them in one order and other >> bricks executing them in the reverse order. >> >> In the case of EC, having an entry disappearing in the middle of another >> operation can be very bad. It can disappear in multiple stages: >> >> 1. While acquiring locks. It's not much bad, but can return unwanted >> errors (EIO) >> 2. While requesting size and version information. This is very bad and >> will surely lead to EIO errors (instead of ESTALE that would be the most >> useful error) or mark healthy bricks as bad. >> 3. While the main operation is executed. In this case we can even have >> data corruption (though the file will be deleted, but it's more important >> for directories). >> 4. While updating backend info. Many combinations are possible here >> depending on what failed and on how many bricks. >> >> To prevent this I've taken the following approach [2]: >> >> 1. Take into account ESTALE errors when acquiring locks, so that EC is >> able to report ESTALE instead of EIO on most cases. >> 2. Once a lock is acquired, the locks xlator will prevent the entry from >> being deleted until locks are released again, giving a lot of stability to >> client side. >> >> I've tested this approach and it works pretty well. On heavy contention >> of modifications and removals, the client side sees completely stable and >> consistent updates, which is very good. >> >> The problem I've seen is that it's quite easy to get into deadlocks with >> this approach. The main reason for the deadlocks is how DHT tries to solve >> the same or similar problems from DHT level (I've found this document [3] >> that explains them). >> >> To prevent deadlocks I've had to completely ignore locks coming from >> special PIDs (< 0), but I've also had to ignore locks from domain >> "dht.file.migrate". Otherwise I have no way to prevent deadlocks on >> concurrent renames to the same destination from multiple clients. >> >> I don't like to use these hacks in the code because they tend to be hard >> to maintain and prone to errors. Do you think it makes sense to see if some >> of the DHT synchronization methods can be simplified by making use of this >> feature ? with some changes, we would probably be able to remove the hacks. >> >> Other ideas on how to implement this without delaying removes (thus >> preventing deadlocks) are also welcome. >> >> Xavi >> >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1578405 >> [2] https://review.gluster.org/20025 >> [3] https://review.gluster.org/16876 >> > > > > -- > Amar Tumballi (amarts) >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel