https://bugzilla.redhat.com/show_bug.cgi?id=1366818 is the bug I am referring to in the mail above. (Thanks sankarshan for pointing that I missed the link :-) )
On Tue, Oct 25, 2016 at 3:14 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > One of the Red hat QE engineers (Nag Pavan) found a day 1 bug in entry > self-heal where the file with good data can be replaced with file with bad > data when renames + self-heal is involved in a particular way. > > Sample steps (From the bz): > 1) have a plain replica volume with 2 bricks. start the volume and mount > it. > 2) mkdir dir && mkdir newdir && touch file1 > 3) bring first brick down > 4) echo abc > dir/file1 > 5) bring the first brick back up and quickly bring the second brick down > before self-heal can be triggered. > 6) do mv dir/file1 newdir/file2 <<--- note that this is empty file. > > Now bring the second brick back up. If entry self-heal of 'dir' happens > first then it deletes the file1 with content 'abc' now when 'newdir' heal > happens it leads to creation of empty file and the data in the file is lost. > > Same can be achieved using 'link' + 'unlink' as well. > > The main reason for this problem is that afr entry-self-heal at the moment > doesn't care completely about link-counts before deleting the final link of > an inode, so it always does unlink and recreates the file and does data > heals. In this corner case unlink happens on the good copy of the file and > we either lose data or get stale data based on what is the data present on > the sink file. > > Solution we are proposing is the following: > > 1) Posix will maintain a hidden directory '.glusterfs/anoninode'(We can > call it lost+found as well) directory which will be used by afr/ec for > keeping the 'inodes' until their names are resolved. > 2) Both afr and ec when they need to heal a directory and a 'name' has to > be deleted but on the other bricks if the inode is present, it renames this > file as 'anoninode/<gfid-of-file/dir>' instead of doing unlink/rmdir on it. > 3) For files: > a) Both afr, ec already has logic to do 'link' instead of new > file creation if a gfid already exists in the brick. So when a name is > resolved it does exactly what it does now. > b) Self-heal daemon will periodically crawl the first level of > 'anoninode' directory to make sure it deletes the 'inodes' represented as > files with gfid-string as names whenever the link count is > 1. It will > also delete the files if the gfid cease to exist on the other bricks. > 5) For directories: > a) both afr and ec need to perform 'rename' of the > 'anoninode/dir-gfid' to the name it will be resolved to as part of entry > self-heal, instead of 'mkdir'. > b) If self-heal daemon crawl detects that a directory is deleted > on the other bricks, then it has to scan the files inside the deleted > directory and move them into 'anoninode' if the gfid of the file/directory > exists on the other bricks. Otherwise they can be safely deleted. > > Please let us know if you see any issues with this approach. > > -- > Pranith > -- Pranith
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel