Re: [HACKERS] managing git disk space usage

2010-07-22 Thread Peter Eisentraut
On ons, 2010-07-21 at 23:06 +0200, Dimitri Fontaine wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  This does not work as cleanly as you suppose, because some build
  objects are stored in the source tree.  configure being one of them.
  So if you switch branches, configure is rerun even in a VPATH build,
  which is undesirable.
 
 Ouch. Reading -hackers led me to thinking this had received a cleaning
 effort in the Makefiles, so that any generated file appears in the build
 directory. Sorry to learn that's not (yet?) the case.

It is, but not in the back branches.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Robert Haas
On Wed, Jul 21, 2010 at 6:17 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote:

 1. Clone the origin.  Then, clone the clone n times locally.  This
 uses hard links, so it saves disk space.  But, every time you want to
 pull, you first have to pull to the main clone, and then to each of
 the slave clones.  And same thing when you want to push.

 If your extra clones are for occasionally-touched back branches, then:

 (a) In my experience, it is almost always much easier to work with many
 branches and move patches between them rather than use multiple clones;
 but

 (b) You don't need to do the double-pull and push. Clone your local
 repository as many times as needed, but create new git-remote(1)s in
 each extra clone and pull/push only the branch you care about directly
 from or to the remote. That way, you'll start off with the bulk of the
 storage shared with your main local repository, and waste a few KB
 when you make (presumably infrequent) new changes.

Ah, that is clever.  Perhaps we need to write up directions on how to do that.

 But that brings me to another point:

 In my experience (doing exactly this kind of old-branch-maintenance with
 Archiveopteryx), git doesn't help you much if you want to backport (i.e.
 cherry-pick) changes from a development branch to old release branches.
 It is much more helpful when you make changes to the *oldest* applicable
 branch and bring it *forward* to your development branch (by merging the
 old branch into your master). Cherry-picking can be done, but it becomes
 painful after a while.

Well, per previous discussion, we're not going to change that at this
point, or maybe ever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Magnus Hagander
On Wed, Jul 21, 2010 at 12:39, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jul 21, 2010 at 6:17 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote:

 1. Clone the origin.  Then, clone the clone n times locally.  This
 uses hard links, so it saves disk space.  But, every time you want to
 pull, you first have to pull to the main clone, and then to each of
 the slave clones.  And same thing when you want to push.

 If your extra clones are for occasionally-touched back branches, then:

 (a) In my experience, it is almost always much easier to work with many
 branches and move patches between them rather than use multiple clones;
 but

 (b) You don't need to do the double-pull and push. Clone your local
 repository as many times as needed, but create new git-remote(1)s in
 each extra clone and pull/push only the branch you care about directly
 from or to the remote. That way, you'll start off with the bulk of the
 storage shared with your main local repository, and waste a few KB
 when you make (presumably infrequent) new changes.

 Ah, that is clever.  Perhaps we need to write up directions on how to do that.

Yeah, that's the way I work with some projects at least.


 But that brings me to another point:

 In my experience (doing exactly this kind of old-branch-maintenance with
 Archiveopteryx), git doesn't help you much if you want to backport (i.e.
 cherry-pick) changes from a development branch to old release branches.
 It is much more helpful when you make changes to the *oldest* applicable
 branch and bring it *forward* to your development branch (by merging the
 old branch into your master). Cherry-picking can be done, but it becomes
 painful after a while.

 Well, per previous discussion, we're not going to change that at this
 point, or maybe ever.

Nope, the deal was definitely that we stick to the current workflow.

Yes, this means we can't use git cherry-pick or similar git-specific
tools to make life easier. But it shouldn't make life harder than it
is *now*, with cvs.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Abhijit Menon-Sen
At 2010-07-20 13:04:12 -0400, robertmh...@gmail.com wrote:

 1. Clone the origin.  Then, clone the clone n times locally.  This
 uses hard links, so it saves disk space.  But, every time you want to
 pull, you first have to pull to the main clone, and then to each of
 the slave clones.  And same thing when you want to push.

If your extra clones are for occasionally-touched back branches, then:

(a) In my experience, it is almost always much easier to work with many
branches and move patches between them rather than use multiple clones;
but

(b) You don't need to do the double-pull and push. Clone your local
repository as many times as needed, but create new git-remote(1)s in
each extra clone and pull/push only the branch you care about directly
from or to the remote. That way, you'll start off with the bulk of the
storage shared with your main local repository, and waste a few KB
when you make (presumably infrequent) new changes.

But that brings me to another point:

In my experience (doing exactly this kind of old-branch-maintenance with
Archiveopteryx), git doesn't help you much if you want to backport (i.e.
cherry-pick) changes from a development branch to old release branches.
It is much more helpful when you make changes to the *oldest* applicable
branch and bring it *forward* to your development branch (by merging the
old branch into your master). Cherry-picking can be done, but it becomes
painful after a while.

See http://toroid.org/ams/etc/git-merge-vs-p4-integrate for more.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Abhijit Menon-Sen
At 2010-07-21 06:39:28 -0400, robertmh...@gmail.com wrote:

 Perhaps we need to write up directions on how to do that.

I'll write them if you tell me where to put them. It's trivial.

 Well, per previous discussion, we're not going to change that at this
 point, or maybe ever.

Sure. I just wanted to mention it, because it's something I learned the
hard way. It's also true that back-porting changes is a bigger deal for
Postgres than it was for me (in the sense that it's an exception rather
than a routine activity), and individual changes are usually backported
as soon as, or very soon after, they are committed; so it should be less
painful on the whole.

Another point, in response to Magnus's followup:

At 2010-07-21 12:42:03 +0200, mag...@hagander.net wrote:

 Yes, this means we can't use git cherry-pick or similar git-specific
 tools to make life easier.

No, that's not right. You *can* use cherry-pick; in fact, it's the sane
way to backport the occasional change. What you can't do is efficiently
manage a queue of changes to be backported to multiple branches. But as
I said above, that's not exactly what we want to do for Postgres, so it
should not matter too much.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Robert Haas
On Wed, Jul 21, 2010 at 6:56 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2010-07-21 06:39:28 -0400, robertmh...@gmail.com wrote:

 Perhaps we need to write up directions on how to do that.

 I'll write them if you tell me where to put them. It's trivial.

Post 'em here or drop them on the wiki and post a link.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Abhijit Menon-Sen
At 2010-07-21 06:57:53 -0400, robertmh...@gmail.com wrote:

 Post 'em here or drop them on the wiki and post a link.

1. Clone the remote repository as usual:

git clone git://git.postgresql.org/git/postgresql.git

2. Create as many local clones as you want:

git clone postgresql foobar

3. In each clone (supposing you care about branch xyzzy):

3.1. git remote origin set-url ssh://whatever/postgresql.git

3.2. git remote update  git remote prune

3.2. git checkout -t origin/xyzzy

3.4. git branch -d master

3.5. Edit .git/config and set origin.fetch thus:

 [remote origin]
 fetch = +refs/heads/xyzzy:refs/remotes/origin/xyzzy

 (You can git config remote.origin.fetch '+refs/...' if you're
 squeamish about editing the config file.)

3.6. That's it. git pull and git push will work correctly.

(This will replace the origin remote that pointed at your local
postgresql.git clone with one that points to the real remote; but you
could also add a remote definition named something other than origin,
in which case you'd need to git push thatname etc.)

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Dimitri Fontaine
Aidan Van Dyk ai...@highrise.ca writes:
 * Robert Haas robertmh...@gmail.com [100720 13:04]:
  
 3. Clone the origin once.  Apply patches to multiple branches by
 switching branches.  Playing around with it, this is probably a
 tolerable way to work when you're only going back one or two branches
 but it's certainly a big nuisance when you're going back 5-7 branches.

 This is what I do when I'm working on a project that has completely
 proper dependancies, and you don't need to always re-run configure
 between different branches.  I use ccache heavily, so configure takes
 longer than a complete build with a couple-dozen
 actually-not-previously-seen changes...

 But *all* dependancies need to be proper in the build system, or you end
 up needing a git-clean-type-cleanup between branch switches, forcing a
 new configure run too, which takes too much time...

 Maybe this will cause make dependancies to be refined in PG ;-)

Well, there's also the VPATH possibility, where all your build objects
are stored out of the way of the repo. So you could checkout the branch
you're interrested in, change to the associated build directory and
build there. And automate that of course.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Alvaro Herrera
Excerpts from Dimitri Fontaine's message of mié jul 21 15:00:48 -0400 2010:

 Well, there's also the VPATH possibility, where all your build objects
 are stored out of the way of the repo. So you could checkout the branch
 you're interrested in, change to the associated build directory and
 build there. And automate that of course.

This does not work as cleanly as you suppose, because some build
objects are stored in the source tree.  configure being one of them.
So if you switch branches, configure is rerun even in a VPATH build,
which is undesirable.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-21 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@commandprompt.com writes:
 This does not work as cleanly as you suppose, because some build
 objects are stored in the source tree.  configure being one of them.
 So if you switch branches, configure is rerun even in a VPATH build,
 which is undesirable.

Ouch. Reading -hackers led me to thinking this had received a cleaning
effort in the Makefiles, so that any generated file appears in the build
directory. Sorry to learn that's not (yet?) the case.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-20 Thread Aidan Van Dyk
* Robert Haas robertmh...@gmail.com [100720 13:04]:
 
 3. Clone the origin once.  Apply patches to multiple branches by
 switching branches.  Playing around with it, this is probably a
 tolerable way to work when you're only going back one or two branches
 but it's certainly a big nuisance when you're going back 5-7 branches.

This is what I do when I'm working on a project that has completely
proper dependancies, and you don't need to always re-run configure
between different branches.  I use ccache heavily, so configure takes
longer than a complete build with a couple-dozen
actually-not-previously-seen changes...

But *all* dependancies need to be proper in the build system, or you end
up needing a git-clean-type-cleanup between branch switches, forcing a
new configure run too, which takes too much time...

Maybe this will cause make dependancies to be refined in PG ;-)

It has the advantage, that if back patching (or in reality, forward
patching) all happens in 1 repository, the git conflict machinery is all
using the same cache of resolutions, meaning that if you apply the same
patch to 2 different branches, with identical code/conflict, you don't
need to do the whole conflict resolution by hand from scratch in the 2nd
branch.

 5. Use git clone --shared or git clone --references or
 git-new-workdir.  While I once thought this was the solution, I can't
 take very seriously any solution that has a warning in the manual that
 says, essentially, git gc may corrupt your repository if you do this.

This is the type of setup I often use.  I have a central set of git
repos that I have automatically straight mirror-clones of project
repositories.   And they are kept up-to-date via cron.  And any time I
clone a work repo, I use --reference.

Since I make sure I don't remove anything from the reference repo, I
don't have to worry about loosing objects other repositories might be
using from the cache repo.  In case anyone is wondering, that's:
git clone --mirror $REPO /data/src/cache/$project.git
git --git-dir=/data/src/cache/$project.git config gc.auto 0

And then in crontab:
git --git-dir=/data/src/cache/$project.git fetch --quiet --all

With gc.auto disabled, and the only commands ever run being git fetch,
no objects are removed, even if a remote rewinds and throws away
commits.

But this way means that the seperate repos only share the past, from
central repository history, which means that you have to jump through
hoops if you want to be able to use git's handyj
merging/cherry-picking/conflict tools when trying to rebase/port
patches between branches.  You're pretty much limited to exporting a
patch, changing to a the new branch-repository, and applying the patch.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] managing git disk space usage

2010-07-20 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 2. Clone the origin n times.  Use more disk space.  Live with it. 
:-)
 
But each copy uses almost 0.36% of the formatted space on my 150GB
drive!
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-20 Thread Peter Eisentraut
On tis, 2010-07-20 at 13:28 -0400, Aidan Van Dyk wrote:
 But *all* dependancies need to be proper in the build system, or you
 end
 up needing a git-clean-type-cleanup between branch switches, forcing a
 new configure run too, which takes too much time...

This realization, while true, doesn't really help, because we are
talking about maintaining 5+ year old back branches, where we are not
going to fiddle with the build system at this time.  Also, the switch
from 9.0 to 9.1 the other day showed everyone who cared to watch that
the dependencies are currently not correct for major version switches,
so this method will definitely not work at the moment.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-20 Thread Peter Eisentraut
On tis, 2010-07-20 at 13:04 -0400, Robert Haas wrote:
 2. Clone the origin n times.  Use more disk space.  Live with it.  :-)

Well, I plan to use cp -a to avoid cloning over the network n times, but
other than that that was my plan.  My .git directory currently takes 283
MB, so I think I can just about live with that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managing git disk space usage

2010-07-20 Thread Andrew Dunstan



Robert Haas wrote:

Tom and, I believe, also Andrew have expressed some concerns about the
space that will be taken up by having multiple copies of the git
repository on their systems.  While most users can probably get by
with a single repository, committers will likely need one for each
back-branch that they work with, and we have quite a few of those.

After playing around with this a bit, I've come to the conclusion that
there are a couple of possible options but they've all got some
drawbacks.

1. Clone the origin.  Then, clone the clone n times locally.  This
uses hard links, so it saves disk space.  But, every time you want to
pull, you first have to pull to the main clone, and then to each of
the slave clones.  And same thing when you want to push.


  


You can have a cron job that does the first pull fairly frequently. It 
should be a fairly cheap operation unless the git protocol is dumber 
than I think.


The second pull is the equivalent of what we do now with cvs update.

Given that, you could push commits direct to the authoritative repo and 
wait for the cron job to catch up your local base clone.


I think that's the pattern I will probably try to follow.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers