Re: [gitorious] git gc'ing all repositories destroys hardlinked clones
On Tue, Feb 8, 2011 at 5:47 PM, Marc Guenther y...@schli.ch wrote: Well, from what I understand, if you have an alternates file in repo2, which points to repo1, than a git gc in repo2 will remove all object files which also exist in repo1. This would solve the problem in this particular situation. You could do this by using git clone -s ... when creating the clone. The downside of this is, that now repo2 is dependant on repo1, so you cannot delete repo1 without first regenerating all the objects in repo2 (using something like git repack -ad). And the repo1 does not know which other repos are dependant on it, so that information has to be stored somewhere else. Thanks for the insight! I remember we discussed using alternates in Gitorious about two years ago, but I didn't remember why we ended up not doing so - I guess this explains it. All repositories that have been cloned through Gitorious will keep a relation to its parent in the database, so on the model level we should be able to keep track of the relationship between a parent and a child repository. The issue here is that parents can be deleted, in which case the relationship will be nullified - the relation disappears. My colleague Christian is working on a new feature in Gitorious: promoting another repository to be a mainline. His work is in this reassign_mainline branch: http://gitorious.org/~cjohansen/gitorious/cjohansens-mainline/commits/reassign_mainline . This feature is intended to help people who have started a repository and want to step down as maintainers: they will be able to pick one of the clones of his repository and make this the new parent. One thing we could do is to prevent users from removing repositories that have clones/children and rather encourage using the promote feature instead. If we choose to do so, we could take care of the alternates in that process. I suspect, however, that updating $GITDIR/objects/info/alternates to point to the new parent will not be sufficient - but maybe if we do a repack on the new parent first? Cheers, - Marius -- To post to this group, send email to gitorious@googlegroups.com To unsubscribe from this group, send email to gitorious+unsubscr...@googlegroups.com
Re: [gitorious] git gc'ing all repositories destroys hardlinked clones
On Wed, Jan 26, 2011 at 10:08 PM, Marc Guenther y...@schli.ch wrote: Hi, We have a local installation of Gitorious. As seems to be good practice with git, I wanted to regularly run git gc on all our repositories, so I added a small cronjob which does this. Marc, First of all: there is already a script in the Gitorious distribution that does this for you, it is in script/repo_housekeeping. Gitorious already records the number of pushes to its repositories, and this script does some heuristics to find which repositories are due for a gc. Whenever a repository is gc-ed, we clear the counter which holds the push count and saves how much disk this repository takes up on disk. And this caused our disk space to explode. We have a repository which is about 3.5GB in size. This is cloned 10 times inside of Gitorious. Which isn't a problem, since git uses hardlinks for clones, so the complete disk usage is still 3.5GB. Turns out, that git gc --aggressive breaks these hardlinks. After running it everywhere, the size of the repository shrank down to 2.2 GB, but now I have 10 copies of them. So the situation is actually worse than before. The script I mentioned above will use the Repository class's gc! method, which will call out to Git for you. I suppose a repack will regenerate the pack files, which will probably fill up your disk - do you have any suggestions on how alternates could be used in this setting? Cheers, - Marius -- To post to this group, send email to gitorious@googlegroups.com To unsubscribe from this group, send email to gitorious+unsubscr...@googlegroups.com
[gitorious] git gc'ing all repositories destroys hardlinked clones
Hi, We have a local installation of Gitorious. As seems to be good practice with git, I wanted to regularly run git gc on all our repositories, so I added a small cronjob which does this. And this caused our disk space to explode. We have a repository which is about 3.5GB in size. This is cloned 10 times inside of Gitorious. Which isn't a problem, since git uses hardlinks for clones, so the complete disk usage is still 3.5GB. Turns out, that git gc --aggressive breaks these hardlinks. After running it everywhere, the size of the repository shrank down to 2.2 GB, but now I have 10 copies of them. So the situation is actually worse than before. What is the recommended way to do this? From the manpage it seems that git gc honors the alternates file, but gitorious doesn't use that when creating clones. Any other ideas? Thanks, Marc -- To post to this group, send email to gitorious@googlegroups.com To unsubscribe from this group, send email to gitorious+unsubscr...@googlegroups.com