Re: [gitorious] git gc'ing all repositories destroys hardlinked clones

2011-02-08 Thread Marius Mårnes Mathiesen
On Tue, Feb 8, 2011 at 5:47 PM, Marc Guenther y...@schli.ch wrote:
 Well, from what I understand, if you have an alternates file in repo2, which 
 points to repo1, than a git gc in repo2 will remove all object files which 
 also exist in repo1. This would solve the problem in this particular 
 situation. You could do this by using git clone -s ... when creating the 
 clone.

 The downside of this is, that now repo2 is dependant on repo1, so you cannot 
 delete repo1 without first regenerating all the objects in repo2 (using 
 something like git repack -ad). And the repo1 does not know which other repos 
 are dependant on it, so that information has to be stored somewhere else.

Thanks for the insight! I remember we discussed using alternates in
Gitorious about two years ago, but I didn't remember why we ended up
not doing so - I guess this explains it.

All repositories that have been cloned through Gitorious will keep a
relation to its parent in the database, so on the model level we
should be able to keep track of the relationship between a parent
and a child repository. The issue here is that parents can be
deleted, in which case the relationship will be nullified - the
relation disappears.

My colleague Christian is working on a new feature in Gitorious:
promoting another repository to be a mainline. His work is in this
reassign_mainline branch:
http://gitorious.org/~cjohansen/gitorious/cjohansens-mainline/commits/reassign_mainline
. This feature is intended to help people who have started a
repository and want to step down as maintainers: they will be able to
pick one of the clones of his repository and make this the new parent.
One thing we could do is to prevent users from removing repositories
that have clones/children and rather encourage using the promote
feature instead.

If we choose to do so, we could take care of the alternates in that
process. I suspect, however, that updating
$GITDIR/objects/info/alternates to point to the new parent will not be
sufficient - but maybe if we do a repack on the new parent first?

Cheers,
- Marius

-- 
To post to this group, send email to gitorious@googlegroups.com
To unsubscribe from this group, send email to
gitorious+unsubscr...@googlegroups.com


Re: [gitorious] git gc'ing all repositories destroys hardlinked clones

2011-01-27 Thread Marius Mårnes Mathiesen
On Wed, Jan 26, 2011 at 10:08 PM, Marc Guenther y...@schli.ch wrote:
 Hi,

 We have a local installation of Gitorious. As seems to be good practice with 
 git, I wanted to regularly run git gc on all our repositories, so I added a 
 small cronjob which does this.

Marc,
First of all: there is already a script in the Gitorious distribution
that does this for you, it is in script/repo_housekeeping. Gitorious
already records the number of pushes to its repositories, and this
script does some heuristics to find which repositories are due for a
gc. Whenever a repository is gc-ed, we clear the counter which holds
the push count and saves how much disk this repository takes up on
disk.

 And this caused our disk space to explode. We have a repository which is 
 about 3.5GB in size. This is cloned 10 times inside of Gitorious. Which isn't 
 a problem, since git uses hardlinks for clones, so the complete disk usage is 
 still 3.5GB.

 Turns out, that git gc --aggressive breaks these hardlinks. After running it 
 everywhere, the size of the repository shrank down to 2.2 GB, but now I have 
 10 copies of them. So the situation is actually worse than before.

The script I mentioned above will use the Repository class's gc!
method, which will call out to Git for you. I suppose a repack will
regenerate the pack files, which will probably fill up your disk - do
you have any suggestions on how alternates could be used in this
setting?

Cheers,
- Marius

-- 
To post to this group, send email to gitorious@googlegroups.com
To unsubscribe from this group, send email to
gitorious+unsubscr...@googlegroups.com