RE: [Bug] data loss with cyclic alternates
-Original Message- From: Jeff King [mailto:p...@peff.net] Sent: Friday, July 11, 2014 10:57 PM To: Keller, Jacob E Cc: gits...@pobox.com; dr.kh...@gmail.com; git@vger.kernel.org Subject: Re: [Bug] data loss with cyclic alternates On Fri, Jul 11, 2014 at 06:01:46PM +, Keller, Jacob E wrote: Yeah, don't do that. A thinks eh, the other guy must have it and B thinks the same. In general, do not prune or gc a repository other repositories borrow from, even if there is no cycle, because the borrowee does not know anythning about objects that it itself no longer needs but are still needed by its borrowers. Doesn't gc get run automatically at some points? Is the automatic gc run setup to avoid this problem? No, the automatic gc doesn't avoid this. It can't in the general case, as the parent repository does not know how many or which children are pointed to it as an alternate. And the borrowing repository does not even need to have write permission to the parent, so it cannot write a backpointer. If people are using alternates, they should probably turn off gc.auto in the borrowee (it doesn't seem unreasonable to me to do so automatically via clone -s in cases where we can write to the alternates repo, and to issue a warning otherwise). -Peff Yes, if this is a known problem and we can avoid it by disabling garbage collection, that makes sense to me to do so.. Thanks, Jake
Re: [Bug] data loss with cyclic alternates
Am 11.07.14 18:01, schrieb Junio C Hamano: Ephrim Khong dr.kh...@gmail.com writes: git seems to have issues with alternates when cycles are present (repo A has B/objects as alternates, B has A/objects as alternates). Yeah, don't do that. A thinks eh, the other guy must have it and B thinks the same. In general, do not prune or gc a repository other repositories borrow from, even if there is no cycle, because the borrowee does not know anythning about objects that it itself no longer needs but are still needed by its borrowers. It seems that there is a safeguard against this in sha1_file.c, link_alt_odb_entry(), that doesn't work as intended: if (!strcmp(ent-base, objdir)) { free(ent); return -1; } However, printf-debugging tells me that ent-base is absolute and objdir is relative (.git/objects) at this point, so the strings are different even though the files are the same. I never submitted a patch to git. Do you think someone can fix this hickup, otherwise I'll give it a shot next week. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug] data loss with cyclic alternates
Hi, git seems to have issues with alternates when cycles are present (repo A has B/objects as alternates, B has A/objects as alternates). In such cases, gc and repack might delete objects that are present in only one of the alternates, leading to data loss. I understand that this is no big use case, and requires manual editing of objects/info/alternates. But for the sake of preventing unneccesary data loss, and since I found no warning regarding alternate cycles in the documentation, it might make sense to handle such cases properly (maybe it's a simple after finding all alternates directories, remove duplicates?). Here is a small test case that adds the object storage of a repository as alternate to itsself. After git repack -adl, all objects are deleted. --- rm -rf test_repo mkdir test_repo cd test_repo git init touch a git add a git commit -m c1 git repack -adl echo $PWD/.git/objects .git/objects/info/alternates echo re-packing... git repack -adl echo git fsck... git fsck --- Output: --- Initialized empty Git repository in /somewhere/test_repo/.git/ [master (root-commit) ab9e123] c1 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 a Counting objects: 3, done. Writing objects: 100% (3/3), done. Total 3 (delta 0), reused 0 (delta 0) re-packing... Nothing new to pack. error: refs/heads/master does not point to a valid object! git fsck... Checking object directories: 100% (256/256), done. Checking object directories: 100% (256/256), done. error: HEAD: invalid sha1 pointer 1494ec24356cbbbd66e19f22cef762dd83de7f54 error: refs/heads/master does not point to a valid object! notice: No default references missing blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 --- Thanks - Eph -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug] data loss with cyclic alternates
Ephrim Khong dr.kh...@gmail.com writes: git seems to have issues with alternates when cycles are present (repo A has B/objects as alternates, B has A/objects as alternates). Yeah, don't do that. A thinks eh, the other guy must have it and B thinks the same. In general, do not prune or gc a repository other repositories borrow from, even if there is no cycle, because the borrowee does not know anythning about objects that it itself no longer needs but are still needed by its borrowers. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug] data loss with cyclic alternates
On Fri, 2014-07-11 at 09:01 -0700, Junio C Hamano wrote: Ephrim Khong dr.kh...@gmail.com writes: git seems to have issues with alternates when cycles are present (repo A has B/objects as alternates, B has A/objects as alternates). Yeah, don't do that. A thinks eh, the other guy must have it and B thinks the same. In general, do not prune or gc a repository other repositories borrow from, even if there is no cycle, because the borrowee does not know anythning about objects that it itself no longer needs but are still needed by its borrowers. Doesn't gc get run automatically at some points? Is the automatic gc run setup to avoid this problem? Thanks, Jake -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug] data loss with cyclic alternates
On Fri, Jul 11, 2014 at 06:01:46PM +, Keller, Jacob E wrote: Yeah, don't do that. A thinks eh, the other guy must have it and B thinks the same. In general, do not prune or gc a repository other repositories borrow from, even if there is no cycle, because the borrowee does not know anythning about objects that it itself no longer needs but are still needed by its borrowers. Doesn't gc get run automatically at some points? Is the automatic gc run setup to avoid this problem? No, the automatic gc doesn't avoid this. It can't in the general case, as the parent repository does not know how many or which children are pointed to it as an alternate. And the borrowing repository does not even need to have write permission to the parent, so it cannot write a backpointer. If people are using alternates, they should probably turn off gc.auto in the borrowee (it doesn't seem unreasonable to me to do so automatically via clone -s in cases where we can write to the alternates repo, and to issue a warning otherwise). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html