RE: [Bug] data loss with cyclic alternates

2014-07-14 Thread Keller, Jacob E


 -Original Message-
 From: Jeff King [mailto:p...@peff.net]
 Sent: Friday, July 11, 2014 10:57 PM
 To: Keller, Jacob E
 Cc: gits...@pobox.com; dr.kh...@gmail.com; git@vger.kernel.org
 Subject: Re: [Bug] data loss with cyclic alternates
 
 On Fri, Jul 11, 2014 at 06:01:46PM +, Keller, Jacob E wrote:
 
   Yeah, don't do that.  A thinks eh, the other guy must have it and
   B thinks the same.  In general, do not prune or gc a repository
   other repositories borrow from, even if there is no cycle, because
   the borrowee does not know anythning about objects that it itself no
   longer needs but are still needed by its borrowers.
  
 
  Doesn't gc get run automatically at some points? Is the automatic gc run
  setup to avoid this problem?
 
 No, the automatic gc doesn't avoid this. It can't in the general case,
 as the parent repository does not know how many or which children are
 pointed to it as an alternate. And the borrowing repository does not
 even need to have write permission to the parent, so it cannot write a
 backpointer.
 
 If people are using alternates, they should probably turn off gc.auto in
 the borrowee (it doesn't seem unreasonable to me to do so automatically
 via clone -s in cases where we can write to the alternates repo, and
 to issue a warning otherwise).
 
 -Peff

Yes, if this is a known problem and we can avoid it by disabling garbage 
collection, that makes sense to me to do so..

Thanks,
Jake


Re: [Bug] data loss with cyclic alternates

2014-07-13 Thread Ephrim Khong

Am 11.07.14 18:01, schrieb Junio C Hamano:

Ephrim Khong dr.kh...@gmail.com writes:


git seems to have issues with alternates when cycles are present (repo
A has B/objects as alternates, B has A/objects as alternates).


Yeah, don't do that.  A thinks eh, the other guy must have it and
B thinks the same.  In general, do not prune or gc a repository
other repositories borrow from, even if there is no cycle, because
the borrowee does not know anythning about objects that it itself no
longer needs but are still needed by its borrowers.


It seems that there is a safeguard against this in sha1_file.c, 
link_alt_odb_entry(), that doesn't work as intended:


if (!strcmp(ent-base, objdir)) {
free(ent);
return -1;
}

However, printf-debugging tells me that ent-base is absolute and objdir 
is relative (.git/objects) at this point, so the strings are different 
even though the files are the same.


I never submitted a patch to git. Do you think someone can fix this 
hickup, otherwise I'll give it a shot next week.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug] data loss with cyclic alternates

2014-07-11 Thread Ephrim Khong

Hi,

git seems to have issues with alternates when cycles are present (repo A 
has B/objects as alternates, B has A/objects as alternates). In such 
cases, gc and repack might delete objects that are present in only one 
of the alternates, leading to data loss.


I understand that this is no big use case, and requires manual editing 
of objects/info/alternates. But for the sake of preventing unneccesary 
data loss, and since I found no warning regarding alternate cycles in 
the documentation, it might make sense to handle such cases properly 
(maybe it's a simple after finding all alternates directories, remove 
duplicates?).


Here is a small test case that adds the object storage of a repository 
as alternate to itsself. After git repack -adl, all objects are deleted.


---
rm -rf test_repo 
mkdir test_repo 
cd test_repo 
git init 
touch a 
git add a 
git commit -m c1 
git repack -adl 
echo $PWD/.git/objects  .git/objects/info/alternates 
echo  re-packing... 
git repack -adl 
echo  git fsck... 
git fsck
---

Output:

---
Initialized empty Git repository in /somewhere/test_repo/.git/
[master (root-commit) ab9e123] c1
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 a
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 0 (delta 0)
 re-packing...
Nothing new to pack.
error: refs/heads/master does not point to a valid object!
 git fsck...
Checking object directories: 100% (256/256), done.
Checking object directories: 100% (256/256), done.
error: HEAD: invalid sha1 pointer 1494ec24356cbbbd66e19f22cef762dd83de7f54
error: refs/heads/master does not point to a valid object!
notice: No default references
missing blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
---

Thanks
- Eph
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug] data loss with cyclic alternates

2014-07-11 Thread Junio C Hamano
Ephrim Khong dr.kh...@gmail.com writes:

 git seems to have issues with alternates when cycles are present (repo
 A has B/objects as alternates, B has A/objects as alternates).

Yeah, don't do that.  A thinks eh, the other guy must have it and
B thinks the same.  In general, do not prune or gc a repository
other repositories borrow from, even if there is no cycle, because
the borrowee does not know anythning about objects that it itself no
longer needs but are still needed by its borrowers.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug] data loss with cyclic alternates

2014-07-11 Thread Keller, Jacob E
On Fri, 2014-07-11 at 09:01 -0700, Junio C Hamano wrote:
 Ephrim Khong dr.kh...@gmail.com writes:
 
  git seems to have issues with alternates when cycles are present (repo
  A has B/objects as alternates, B has A/objects as alternates).
 
 Yeah, don't do that.  A thinks eh, the other guy must have it and
 B thinks the same.  In general, do not prune or gc a repository
 other repositories borrow from, even if there is no cycle, because
 the borrowee does not know anythning about objects that it itself no
 longer needs but are still needed by its borrowers.
 

Doesn't gc get run automatically at some points? Is the automatic gc run
setup to avoid this problem? 

Thanks,
Jake

 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




Re: [Bug] data loss with cyclic alternates

2014-07-11 Thread Jeff King
On Fri, Jul 11, 2014 at 06:01:46PM +, Keller, Jacob E wrote:

  Yeah, don't do that.  A thinks eh, the other guy must have it and
  B thinks the same.  In general, do not prune or gc a repository
  other repositories borrow from, even if there is no cycle, because
  the borrowee does not know anythning about objects that it itself no
  longer needs but are still needed by its borrowers.
  
 
 Doesn't gc get run automatically at some points? Is the automatic gc run
 setup to avoid this problem?

No, the automatic gc doesn't avoid this. It can't in the general case,
as the parent repository does not know how many or which children are
pointed to it as an alternate. And the borrowing repository does not
even need to have write permission to the parent, so it cannot write a
backpointer.

If people are using alternates, they should probably turn off gc.auto in
the borrowee (it doesn't seem unreasonable to me to do so automatically
via clone -s in cases where we can write to the alternates repo, and
to issue a warning otherwise).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html