Re: [HACKERS] managing git disk space usage
On ons, 2010-07-21 at 23:06 +0200, Dimitri Fontaine wrote: Alvaro Herrera alvhe...@commandprompt.com writes: This does not work as cleanly as you suppose, because some build objects are stored in the source tree. configure being one of them. So if you switch branches, configure is rerun even in a VPATH build, which is undesirable. Ouch. Reading -hackers led me to thinking this had received a cleaning effort in the Makefiles, so that any generated file appears in the build directory. Sorry to learn that's not (yet?) the case. It is, but not in the back branches. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
On Wed, Jul 21, 2010 at 6:17 AM, Abhijit Menon-Sen a...@toroid.org wrote: At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote: 1. Clone the origin. Then, clone the clone n times locally. This uses hard links, so it saves disk space. But, every time you want to pull, you first have to pull to the main clone, and then to each of the slave clones. And same thing when you want to push. If your extra clones are for occasionally-touched back branches, then: (a) In my experience, it is almost always much easier to work with many branches and move patches between them rather than use multiple clones; but (b) You don't need to do the double-pull and push. Clone your local repository as many times as needed, but create new git-remote(1)s in each extra clone and pull/push only the branch you care about directly from or to the remote. That way, you'll start off with the bulk of the storage shared with your main local repository, and waste a few KB when you make (presumably infrequent) new changes. Ah, that is clever. Perhaps we need to write up directions on how to do that. But that brings me to another point: In my experience (doing exactly this kind of old-branch-maintenance with Archiveopteryx), git doesn't help you much if you want to backport (i.e. cherry-pick) changes from a development branch to old release branches. It is much more helpful when you make changes to the *oldest* applicable branch and bring it *forward* to your development branch (by merging the old branch into your master). Cherry-picking can be done, but it becomes painful after a while. Well, per previous discussion, we're not going to change that at this point, or maybe ever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
On Wed, Jul 21, 2010 at 12:39, Robert Haas robertmh...@gmail.com wrote: On Wed, Jul 21, 2010 at 6:17 AM, Abhijit Menon-Sen a...@toroid.org wrote: At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote: 1. Clone the origin. Then, clone the clone n times locally. This uses hard links, so it saves disk space. But, every time you want to pull, you first have to pull to the main clone, and then to each of the slave clones. And same thing when you want to push. If your extra clones are for occasionally-touched back branches, then: (a) In my experience, it is almost always much easier to work with many branches and move patches between them rather than use multiple clones; but (b) You don't need to do the double-pull and push. Clone your local repository as many times as needed, but create new git-remote(1)s in each extra clone and pull/push only the branch you care about directly from or to the remote. That way, you'll start off with the bulk of the storage shared with your main local repository, and waste a few KB when you make (presumably infrequent) new changes. Ah, that is clever. Perhaps we need to write up directions on how to do that. Yeah, that's the way I work with some projects at least. But that brings me to another point: In my experience (doing exactly this kind of old-branch-maintenance with Archiveopteryx), git doesn't help you much if you want to backport (i.e. cherry-pick) changes from a development branch to old release branches. It is much more helpful when you make changes to the *oldest* applicable branch and bring it *forward* to your development branch (by merging the old branch into your master). Cherry-picking can be done, but it becomes painful after a while. Well, per previous discussion, we're not going to change that at this point, or maybe ever. Nope, the deal was definitely that we stick to the current workflow. Yes, this means we can't use git cherry-pick or similar git-specific tools to make life easier. But it shouldn't make life harder than it is *now*, with cvs. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote: 1. Clone the origin. Then, clone the clone n times locally. This uses hard links, so it saves disk space. But, every time you want to pull, you first have to pull to the main clone, and then to each of the slave clones. And same thing when you want to push. If your extra clones are for occasionally-touched back branches, then: (a) In my experience, it is almost always much easier to work with many branches and move patches between them rather than use multiple clones; but (b) You don't need to do the double-pull and push. Clone your local repository as many times as needed, but create new git-remote(1)s in each extra clone and pull/push only the branch you care about directly from or to the remote. That way, you'll start off with the bulk of the storage shared with your main local repository, and waste a few KB when you make (presumably infrequent) new changes. But that brings me to another point: In my experience (doing exactly this kind of old-branch-maintenance with Archiveopteryx), git doesn't help you much if you want to backport (i.e. cherry-pick) changes from a development branch to old release branches. It is much more helpful when you make changes to the *oldest* applicable branch and bring it *forward* to your development branch (by merging the old branch into your master). Cherry-picking can be done, but it becomes painful after a while. See http://toroid.org/ams/etc/git-merge-vs-p4-integrate for more. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
At 2010-07-21 06:39:28 -0400, robertmh...@gmail.com wrote: Perhaps we need to write up directions on how to do that. I'll write them if you tell me where to put them. It's trivial. Well, per previous discussion, we're not going to change that at this point, or maybe ever. Sure. I just wanted to mention it, because it's something I learned the hard way. It's also true that back-porting changes is a bigger deal for Postgres than it was for me (in the sense that it's an exception rather than a routine activity), and individual changes are usually backported as soon as, or very soon after, they are committed; so it should be less painful on the whole. Another point, in response to Magnus's followup: At 2010-07-21 12:42:03 +0200, mag...@hagander.net wrote: Yes, this means we can't use git cherry-pick or similar git-specific tools to make life easier. No, that's not right. You *can* use cherry-pick; in fact, it's the sane way to backport the occasional change. What you can't do is efficiently manage a queue of changes to be backported to multiple branches. But as I said above, that's not exactly what we want to do for Postgres, so it should not matter too much. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
On Wed, Jul 21, 2010 at 6:56 AM, Abhijit Menon-Sen a...@toroid.org wrote: At 2010-07-21 06:39:28 -0400, robertmh...@gmail.com wrote: Perhaps we need to write up directions on how to do that. I'll write them if you tell me where to put them. It's trivial. Post 'em here or drop them on the wiki and post a link. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
At 2010-07-21 06:57:53 -0400, robertmh...@gmail.com wrote: Post 'em here or drop them on the wiki and post a link. 1. Clone the remote repository as usual: git clone git://git.postgresql.org/git/postgresql.git 2. Create as many local clones as you want: git clone postgresql foobar 3. In each clone (supposing you care about branch xyzzy): 3.1. git remote origin set-url ssh://whatever/postgresql.git 3.2. git remote update git remote prune 3.2. git checkout -t origin/xyzzy 3.4. git branch -d master 3.5. Edit .git/config and set origin.fetch thus: [remote origin] fetch = +refs/heads/xyzzy:refs/remotes/origin/xyzzy (You can git config remote.origin.fetch '+refs/...' if you're squeamish about editing the config file.) 3.6. That's it. git pull and git push will work correctly. (This will replace the origin remote that pointed at your local postgresql.git clone with one that points to the real remote; but you could also add a remote definition named something other than origin, in which case you'd need to git push thatname etc.) -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
Aidan Van Dyk ai...@highrise.ca writes: * Robert Haas robertmh...@gmail.com [100720 13:04]: 3. Clone the origin once. Apply patches to multiple branches by switching branches. Playing around with it, this is probably a tolerable way to work when you're only going back one or two branches but it's certainly a big nuisance when you're going back 5-7 branches. This is what I do when I'm working on a project that has completely proper dependancies, and you don't need to always re-run configure between different branches. I use ccache heavily, so configure takes longer than a complete build with a couple-dozen actually-not-previously-seen changes... But *all* dependancies need to be proper in the build system, or you end up needing a git-clean-type-cleanup between branch switches, forcing a new configure run too, which takes too much time... Maybe this will cause make dependancies to be refined in PG ;-) Well, there's also the VPATH possibility, where all your build objects are stored out of the way of the repo. So you could checkout the branch you're interrested in, change to the associated build directory and build there. And automate that of course. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
Excerpts from Dimitri Fontaine's message of mié jul 21 15:00:48 -0400 2010: Well, there's also the VPATH possibility, where all your build objects are stored out of the way of the repo. So you could checkout the branch you're interrested in, change to the associated build directory and build there. And automate that of course. This does not work as cleanly as you suppose, because some build objects are stored in the source tree. configure being one of them. So if you switch branches, configure is rerun even in a VPATH build, which is undesirable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
Alvaro Herrera alvhe...@commandprompt.com writes: This does not work as cleanly as you suppose, because some build objects are stored in the source tree. configure being one of them. So if you switch branches, configure is rerun even in a VPATH build, which is undesirable. Ouch. Reading -hackers led me to thinking this had received a cleaning effort in the Makefiles, so that any generated file appears in the build directory. Sorry to learn that's not (yet?) the case. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] managing git disk space usage
Tom and, I believe, also Andrew have expressed some concerns about the space that will be taken up by having multiple copies of the git repository on their systems. While most users can probably get by with a single repository, committers will likely need one for each back-branch that they work with, and we have quite a few of those. After playing around with this a bit, I've come to the conclusion that there are a couple of possible options but they've all got some drawbacks. 1. Clone the origin. Then, clone the clone n times locally. This uses hard links, so it saves disk space. But, every time you want to pull, you first have to pull to the main clone, and then to each of the slave clones. And same thing when you want to push. 2. Clone the origin n times. Use more disk space. Live with it. :-) 3. Clone the origin once. Apply patches to multiple branches by switching branches. Playing around with it, this is probably a tolerable way to work when you're only going back one or two branches but it's certainly a big nuisance when you're going back 5-7 branches. 4. Clone the origin. Use that to get at the master branch. Then clone that clone n-1 times, one for each back-branch. This makes it a bit easier to push and pull when you're only dealing with the master branch, but you still have the double push/double pull problem for all the other branches. 5. Use git clone --shared or git clone --references or git-new-workdir. While I once thought this was the solution, I can't take very seriously any solution that has a warning in the manual that says, essentially, git gc may corrupt your repository if you do this. I'm not really sure which of these I'm going to do yet, and I'm not sure what to recommend to others, either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
* Robert Haas robertmh...@gmail.com [100720 13:04]: 3. Clone the origin once. Apply patches to multiple branches by switching branches. Playing around with it, this is probably a tolerable way to work when you're only going back one or two branches but it's certainly a big nuisance when you're going back 5-7 branches. This is what I do when I'm working on a project that has completely proper dependancies, and you don't need to always re-run configure between different branches. I use ccache heavily, so configure takes longer than a complete build with a couple-dozen actually-not-previously-seen changes... But *all* dependancies need to be proper in the build system, or you end up needing a git-clean-type-cleanup between branch switches, forcing a new configure run too, which takes too much time... Maybe this will cause make dependancies to be refined in PG ;-) It has the advantage, that if back patching (or in reality, forward patching) all happens in 1 repository, the git conflict machinery is all using the same cache of resolutions, meaning that if you apply the same patch to 2 different branches, with identical code/conflict, you don't need to do the whole conflict resolution by hand from scratch in the 2nd branch. 5. Use git clone --shared or git clone --references or git-new-workdir. While I once thought this was the solution, I can't take very seriously any solution that has a warning in the manual that says, essentially, git gc may corrupt your repository if you do this. This is the type of setup I often use. I have a central set of git repos that I have automatically straight mirror-clones of project repositories. And they are kept up-to-date via cron. And any time I clone a work repo, I use --reference. Since I make sure I don't remove anything from the reference repo, I don't have to worry about loosing objects other repositories might be using from the cache repo. In case anyone is wondering, that's: git clone --mirror $REPO /data/src/cache/$project.git git --git-dir=/data/src/cache/$project.git config gc.auto 0 And then in crontab: git --git-dir=/data/src/cache/$project.git fetch --quiet --all With gc.auto disabled, and the only commands ever run being git fetch, no objects are removed, even if a remote rewinds and throws away commits. But this way means that the seperate repos only share the past, from central repository history, which means that you have to jump through hoops if you want to be able to use git's handyj merging/cherry-picking/conflict tools when trying to rebase/port patches between branches. You're pretty much limited to exporting a patch, changing to a the new branch-repository, and applying the patch. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] managing git disk space usage
Robert Haas robertmh...@gmail.com wrote: 2. Clone the origin n times. Use more disk space. Live with it. :-) But each copy uses almost 0.36% of the formatted space on my 150GB drive! -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
On tis, 2010-07-20 at 13:28 -0400, Aidan Van Dyk wrote: But *all* dependancies need to be proper in the build system, or you end up needing a git-clean-type-cleanup between branch switches, forcing a new configure run too, which takes too much time... This realization, while true, doesn't really help, because we are talking about maintaining 5+ year old back branches, where we are not going to fiddle with the build system at this time. Also, the switch from 9.0 to 9.1 the other day showed everyone who cared to watch that the dependencies are currently not correct for major version switches, so this method will definitely not work at the moment. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
On tis, 2010-07-20 at 13:04 -0400, Robert Haas wrote: 2. Clone the origin n times. Use more disk space. Live with it. :-) Well, I plan to use cp -a to avoid cloning over the network n times, but other than that that was my plan. My .git directory currently takes 283 MB, so I think I can just about live with that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managing git disk space usage
Robert Haas wrote: Tom and, I believe, also Andrew have expressed some concerns about the space that will be taken up by having multiple copies of the git repository on their systems. While most users can probably get by with a single repository, committers will likely need one for each back-branch that they work with, and we have quite a few of those. After playing around with this a bit, I've come to the conclusion that there are a couple of possible options but they've all got some drawbacks. 1. Clone the origin. Then, clone the clone n times locally. This uses hard links, so it saves disk space. But, every time you want to pull, you first have to pull to the main clone, and then to each of the slave clones. And same thing when you want to push. You can have a cron job that does the first pull fairly frequently. It should be a fairly cheap operation unless the git protocol is dumber than I think. The second pull is the equivalent of what we do now with cvs update. Given that, you could push commits direct to the authoritative repo and wait for the cron job to catch up your local base clone. I think that's the pattern I will probably try to follow. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers