Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 07/09/10 16:21, Magnus Hagander wrote:
 On Tue, Sep 7, 2010 at 17:07, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Tue, Sep 7, 2010 at 16:16, Tom Lane t...@sss.pgh.pa.us wrote:
 If you want to try, and it doesn't take much time, go for it.  I was
 just saying I wouldn't complain if we decide to live with it as-is.

 Ok. Do we have a way of identifying them - e.g. is it all the commits
 with a certain commit msg?

 Look for
This commit was manufactured by cvs2svn to create branch ...
 
 Ok, found a bunch of those (78 to be exact). And the issue with them
 is we want to change the commit author on them to be whomever made the
 first commit on the branch *after* that?

I would say you emphatically don't want to do that, because they can
contain more changes that were unrelated to that author.

The logic, as I understand it from Michael's explanation of cvs2git's
guts, is to flush out any pending add to branch because of implicit
appearance of a branch tag operations when something other change is
about to occur on the destination branch. So unrelated stuff can get
batched together.

Personally, the idea of trying to use git-filter-branch to make what
cvs2git currently gives you more sensible scares me silly. I think the
approach should be to use it as is, or improve cvs2git.


Another glitch that might be worth fixing before you convert is the way
that cvs2git says This commit was manufactured by cvs2svn to create
branch, when it actually means manufactured to incrementally create
the branch state as it appears in CVS - i.e. many of these commits
actually update an existing branch. Just as soon as I can figure out how
to cleanly fit that into cvs2git's structure, I want it to change the
word create to update in most of those commits.


Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 07/09/10 16:47, Tom Lane wrote:
 Max Bowsher m...@f2s.com writes:
 Personally, the idea of trying to use git-filter-branch to make what
 cvs2git currently gives you more sensible scares me silly.
 
 I'm not excited about it either --- but if Magnus wants to experiment,
 no harm trying.
 
 Another glitch that might be worth fixing before you convert is the way
 that cvs2git says This commit was manufactured by cvs2svn to create
 branch, when it actually means manufactured to incrementally create
 the branch state as it appears in CVS - i.e. many of these commits
 actually update an existing branch. Just as soon as I can figure out how
 to cleanly fit that into cvs2git's structure, I want it to change the
 word create to update in most of those commits.
 
 I thought all of those message texts were taken from the configuration
 file.

Yes, but currently these two cases both reference the same entry in the
configuration file.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 07/09/10 18:16, Tom Lane wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 Tom Lane wrote:
 What I'd like is for those commits to vanish from the git log entirely.
 
 It seems to me that in your case such commits could be grafted over:
 
 *---*---*---*
  \
   A---B---C---D
 
 E.g., if C is one of these special manufactured commits, then you
 could use git grafts to change the parent of D from C to B, then
 bake in the change with git filter-branch.  This would make C
 inaccessible and subject to garbage collection.
 
 Hmm, I see.  This depends on the fact that git commits reference
 filesystem states and not deltas, correct?  So it does actually make
 sense to just delete that commit from the history.  I was concerned
 that it'd invalidate later commits, but I guess it doesn't.

It wouldn't - except for the fact that cvs2git batches such manufactured
commits such that there is no guarantee that a single manufactured
commit pertains only to files in the commit immediately afterwards. For
example, consider the it.po file in the commit referenced in this thread
yesterday:

commit b36518cb880bb236496ec3e505ede4001ce56157
Author: PostgreSQL Daemon webmas...@postgresql.org
Date:   Sun Feb 28 21:32:02 2010 +

This commit was manufactured by cvs2svn to create branch
'REL8_4_STABLE'.

Cherrypick from master 2010-02-28 21:31:57 UTC Tom Lane
t...@sss.pgh.pa.us 'Fix up memory management problems in contrib/xml2.':
contrib/xml2/expected/xml2.out
contrib/xml2/sql/xml2.sql
src/bin/pg_dump/po/it.po


Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 07/09/10 21:25, Magnus Hagander wrote:
 On Tue, Sep 7, 2010 at 22:06, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Sep 7, 2010 at 10:08 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Sep 7, 2010 at 9:56 AM, Magnus Hagander mag...@hagander.net wrote:
 You're saying you don't require a fix on the latest issue here? Or
 should we spend some time trying to figure out if we can fix it with
 git-filter-branch?

 I think that the latest issue here is the issue of how files get
 added to branches, which we discussed before with pretty much the same
 set of conclusions.  I'm not wild about the way that's getting
 converted, but I'm not sure I care enough about it to argue with Tom.
 However, I want to convince myself that the deletes we've done over
 the years have been properly handled.  I need to look at Max's latest
 conversion and I'll look at yours as well.

 Magnus -

 I just looked at your latest conversion (based on what Max did) and it
 looks a lot better.  I think, though, that we should re-remove these
 branches:

  origin/unlabeled-1.44.2
  origin/unlabeled-1.51.2
  origin/unlabeled-1.59.2
  origin/unlabeled-1.87.2
  origin/unlabeled-1.90.2
 
 Oh yeah, I did the push before I ran that step of my script. Oops, sorry.
 

Speaking of which, could you update the public copy of all the
conversion documentation / machinery?

Thanks,
Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 07/09/10 23:15, Robert Haas wrote:
 On Tue, Sep 7, 2010 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, why is this commit shown as being a predecessor of refs/tags/REL8_4_4
 and not refs/tags/REL8_4_3?  That's nothing to do with it.po, perhaps,
 but it sure looks wrong.  (Magnus, did you check against the 8.4.3 tarball?)
 
 I think this is another result of the same basic problem.  Since
 cvs2git thinks it.po was added to REL8_4_STABLE on 2010-02-28 rather
 than 2010-05-13,the REL8_4_STABLE version that existed on to
 2010-03-12, when 8.4.3 was tagged, includes that file.  But cvs2git
 also knows that 8.4.3 does NOT include that file, so it picks the
 commit on the 8.4.3 branch that most closely matches the contents of
 the tag (namely, Marc's tag 8.4.3 commit) and then shoves a
 manufactured commit on top of that to make the contents of the 8.4.3
 tag match what actually got tagged.  But that manufactured commit is
 only there to make the tag contents match; it's not actually part of
 the branch.  If the conversion correctly made it.po get added on
 2010-05-13 rather than 2010-02-28 then Marc's tag 8.4.3 commit would
 match the tag contents exactly and no manufactured commit would be
 created.

Yes, this is the correct analysis.

 The effect of all of this is that if someone checks out a git commit
 between 2010-02-28 and 2010-05-13, it.po will be there, even though
 file didn't exist on that CVS branch at that time.  Max's contention
 seems to be that this is a CVS problem rather than a cvs2git problem.
 Perhaps we can do something like cvs update -r REL8_4_STABLE -d
 SOME_INTERMEDIATE_DATE and see whether that file is there or not.

$ cvs co -r REL8_4_STABLE -D 2010-04-01 pgsql
...
$ ls -la pgsql/src/bin/pg_dump/po/it.po
-rw-r--r-- 1 maxb maxb 67871 2010-02-19 00:40 pgsql/src/bin/pg_dump/po/it.po

It's there.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 07/09/10 23:20, Max Bowsher wrote:
 On 07/09/10 23:15, Robert Haas wrote:
 On Tue, Sep 7, 2010 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, why is this commit shown as being a predecessor of refs/tags/REL8_4_4
 and not refs/tags/REL8_4_3?  That's nothing to do with it.po, perhaps,
 but it sure looks wrong.  (Magnus, did you check against the 8.4.3 tarball?)

 I think this is another result of the same basic problem.  Since
 cvs2git thinks it.po was added to REL8_4_STABLE on 2010-02-28 rather
 than 2010-05-13,the REL8_4_STABLE version that existed on to
 2010-03-12, when 8.4.3 was tagged, includes that file.  But cvs2git
 also knows that 8.4.3 does NOT include that file, so it picks the
 commit on the 8.4.3 branch that most closely matches the contents of
 the tag (namely, Marc's tag 8.4.3 commit) and then shoves a
 manufactured commit on top of that to make the contents of the 8.4.3
 tag match what actually got tagged.  But that manufactured commit is
 only there to make the tag contents match; it's not actually part of
 the branch.  If the conversion correctly made it.po get added on
 2010-05-13 rather than 2010-02-28 then Marc's tag 8.4.3 commit would
 match the tag contents exactly and no manufactured commit would be
 created.
 
 Yes, this is the correct analysis.
 
 The effect of all of this is that if someone checks out a git commit
 between 2010-02-28 and 2010-05-13, it.po will be there, even though
 file didn't exist on that CVS branch at that time.  Max's contention
 seems to be that this is a CVS problem rather than a cvs2git problem.
 Perhaps we can do something like cvs update -r REL8_4_STABLE -d
 SOME_INTERMEDIATE_DATE and see whether that file is there or not.
 
 $ cvs co -r REL8_4_STABLE -D 2010-04-01 pgsql
 ...
 $ ls -la pgsql/src/bin/pg_dump/po/it.po
 -rw-r--r-- 1 maxb maxb 67871 2010-02-19 00:40 pgsql/src/bin/pg_dump/po/it.po
 
 It's there.


And, I've just tracked down that this bug was apparently fixed in CVS
1.11.18, released November 2004.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 07/09/10 23:34, Tom Lane wrote:
 No doubt.  However, the facts on the ground are that it.po is provably
 not there in REL8_4_0, REL8_4_1, REL8_4_2, or REL8_4_3, and is there in
 REL8_4_4, and that no commit on the branch touched it before 2010-05-13
 (just before 8.4.4).  I will be interested to see the argument why
 cvs2git should consider the sanest translation of these facts to involve
 adding it.po to the branch after 8.4.2 and removing it again before
 8.4.3.

Only that cvs2git isn't quite so smart as to take tags present on a
branch as a guideline of when to introduce files that sprung into
existence on a branch at an uncertain point. It merely operates by
breaking cyclic dependencies between the various events it observes in
the CVS repository. In this case, the create branch REL8_4_STABLE
operation gets broken into several pieces to fit around the actual
revisions involved.

Hmm. Now I'm speculating vaguely about how the cycle breaker could be
convinced to break branch update commits into as many pieces as
possible, instead of as few.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 08/09/10 00:47, Tom Lane wrote:
 Max Bowsher m...@f2s.com writes:
 And, I've just tracked down that this bug was apparently fixed in CVS
 1.11.18, released November 2004.
 
 Hrm, what bug exactly?  As far as I've gathered from the discussion,
 this is a fundamental design limitation of CVS, not a fixable bug.

The bug that CVS represented addition to a branch in a way which didn't
record when it occurred.

The way in which it was bludgeoned into the RCS file format was somewhat
hacky, but was a successful fix.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-07 Thread Max Bowsher
On 08/09/10 00:37, Robert Haas wrote:
 On Tue, Sep 7, 2010 at 7:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, as Max says downthread, cvs -r REL8_4_STABLE -d
 INTERMEDIATE_DATE apparently shows the file as being there, which is a
 fairly good argument for his position.

 I haven't tested, but if I understand what Max and Michael are saying
 about CVS, that operation would probably show the file as being there
 on *every* date between REL8_4_STABLE splitting off and the actual
 addition of it.po to the branch.  Because CVS isn't paying attention to
 the evidence of the intermediate tags not being there, either.

 Nonetheless, having the file pop into being and then disappear again
 between two observable points seems way too much like quantum physics
 for my taste.  I think it has to be possible for cvs2git to produce a
 less surprising translation.
 
 Well, if Max is correct that this bug is fixed in CVS 1.11.18 (I don't
 see it in the NEWS file) and that a checkout-by-date shows the file
 present during the time cvs2git claims it is present, then a less
 surprising translation wouldn't be a faithful representation of the
 contents of our CVS repository.

Correct. You'll have to decide whether you wish to represent your
current cvs repository, or attempt to doctor things to fix the insanity
CVS introduced.

 One thing I'm not quite clear on is
 how cvs2git thinks CVS should look given what we actually did vs.
 how it actually does look,

CVS from 1.11.18 kludges things to work right by inserting a file
revision on the branch in the dead (deleted) state with the same date as
the revision it branched from. This marks identifiably that it didn't
exist on the branch to start with, Then, a non-dead revision marks the
true addition of the file to the branch. I'm attaching a sample RCS file.

 but if our CVS repository is busted maybe
 we should be looking to fix that rather than complaining about
 cvs2git.

A possibility. We'd need a tool which would insert an extra node into
the history graph of an RCS file. Unless we can bodge it by using
x.y.z.0 as a revision id, it would also need to renumber all the
revisions on the branch. Still, cvs2git has code to parse the RCS
format, so it's probably achievable without too much work.

Max.
head1.1;
access;
symbols
b1:1.1.0.2;
locks; strict;
comment @# @;


1.1
date2010.09.08.00.33.01;author maxb;state Exp;
branches
1.1.2.1;
next;
commitidlO0BL09PCcYPwINu;

1.1.2.1
date2010.09.08.00.33.01;author maxb;state dead;
branches;
next1.1.2.2;
commitidFuoVc28H18LVwINu;

1.1.2.2
date2010.09.08.00.33.17;author maxb;state Exp;
branches;
next;
commitidFuoVc28H18LVwINu;


desc
@@


1.1
log
@Foo2.
@
text
@@


1.1.2.1
log
@file b was added on branch b1 on 2010-09-08 00:33:17 +
@
text
@@


1.1.2.2
log
@Merge.
@
text
@@




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-05 Thread Max Bowsher
On 05/09/10 03:55, Robert Haas wrote:
 On Sat, Sep 4, 2010 at 9:17 AM, Max Bowsher m...@f2s.com wrote:
 Can you post the repo you ended up with somewhere?

 Well, it's a Bazaar repository at the moment :-)

 But, I'll re-run it targetting git, and push it somewhere. github?
 anywhere better?
 
 No, that's fine.
 
 I think we should start a git repository somewhere containing the
 precise conversion recipe - i.e.:

  * cvs2git options file
  * cvs2git invocation command line
  * all scripts that massage the CVS repository before conversion, or the
 Git repository afterwards
 
 Yeah, that would be great.


For both, see http://github.com/maxb


Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-04 Thread Max Bowsher
On 03/09/10 03:34, Max Bowsher wrote:
 Robert Haas wrote:
 On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 What weirdness, exactly, are you discussing now?  I've lost track of
 which problem(s) are still unresolved.
 Lots of commits that look like this:

 commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
 Author: PostgreSQL Daemon webmas...@postgresql.org
 Date:   Sat Dec 2 08:36:42 2006 +

 This commit was manufactured by cvs2svn to create branch 
 'REL8_2_STABLE'.

 Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
 webmas...@postgresql.org ''
 Delete:
 src/backend/parser/gram.c
 src/interfaces/ecpg/preproc/pgc.c
 src/interfaces/ecpg/preproc/preproc.c

 I have a test conversion running (well, a test conversion to bzr,
 because I like qbzr so much more than gitk) and will report back.

OK, so I ran a conversion first run the following:

for r in 2.89 2.90 2.91; do rcs -x,v -sdead:$r
./cvsroot/pgsql/src/backend/parser/Attic/gram.c ; done
for r in 1.3 1.4 1.5 1.6; do rcs -x,v -sdead:$r
./cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c ; done
for r in 1.7 1.8 1.9 1.10 1.11 1.12; do rcs -x,v -sdead:$r
./cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/preproc.c ; done

(in essence pretend that these revisions deleted the file instead of
changing it)

The conversion looks nicer, but I notice we have a similar issue to
those three with src/interfaces/ecpg/preproc/y.tab.h in release
tags/branches up to and including 7.4.

So, I'm going to try running another attempt additionally doing:

for r in 1.3 1.4 1.5 1.6 1.7 1.8; do rcs -x,v -sdead:$r
./cvsroot/pgsql/src/interfaces/ecpg/preproc/Attic/y.tab.h ; done

... churn churn churn ...

and the result is that things are looking pretty clean :-)

You now need to decide if you can live with throwing away a little bit
of history for those four files to get a cleaner conversion.


Max.




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-04 Thread Max Bowsher
On 04/09/10 12:24, Robert Haas wrote:
 On Sat, Sep 4, 2010 at 3:22 AM, Max Bowsher m...@f2s.com wrote:
 and the result is that things are looking pretty clean :-)
 
 Hey, that's great.  But I wonder why Magnus got a different result.

This is the first time I've posted these incantations for excising the
unwanted history, so he would not have been using them.

 Can you post the repo you ended up with somewhere?

Well, it's a Bazaar repository at the moment :-)

But, I'll re-run it targetting git, and push it somewhere. github?
anywhere better?

I think we should start a git repository somewhere containing the
precise conversion recipe - i.e.:

 * cvs2git options file
 * cvs2git invocation command line
 * all scripts that massage the CVS repository before conversion, or the
Git repository afterwards


Max.




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-02 Thread Max Bowsher
On 02/09/10 14:40, Michael Haggerty wrote:
 Robert Haas wrote:
 On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 What weirdness, exactly, are you discussing now?  I've lost track of
 which problem(s) are still unresolved.

 Lots of commits that look like this:

 commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
 Author: PostgreSQL Daemon webmas...@postgresql.org
 Date:   Sat Dec 2 08:36:42 2006 +

 This commit was manufactured by cvs2svn to create branch 'REL8_2_STABLE'.

 Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
 webmas...@postgresql.org ''
 Delete:
 src/backend/parser/gram.c
 src/interfaces/ecpg/preproc/pgc.c
 src/interfaces/ecpg/preproc/preproc.c
 
 I addressed that problem in this email:
 
 http://archives.postgresql.org/pgsql-hackers/2010-08/msg01819.php
 
 Summary: it is caused by a known weakness in cvs2svn's
 branch-parent-choosing code that would be difficult to solve.
 
 But it just occurred to me--the script contrib/git-move-refs.py is
 supposed to fix problems like this.  Have you run this script against
 your git repository?  (Caveat: I am not very familiar with the script,
 which was contributed by a user.  Please check the results carefully and
 let us know how it works for you.)


Moving refs can't possibly splice out branch creation commits.

Max.




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-09-02 Thread Max Bowsher
On 02/09/10 16:44, Michael Haggerty wrote:
 Max Bowsher wrote:
 On 02/09/10 14:40, Michael Haggerty wrote:
 Robert Haas wrote:
 On Thu, Sep 2, 2010 at 8:13 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 What weirdness, exactly, are you discussing now?  I've lost track of
 which problem(s) are still unresolved.
 Lots of commits that look like this:

 commit c50da22b6050e0bdd5e2ef97541d91aa1d2e63fb
 Author: PostgreSQL Daemon webmas...@postgresql.org
 Date:   Sat Dec 2 08:36:42 2006 +

 This commit was manufactured by cvs2svn to create branch 
 'REL8_2_STABLE'.

 Sprout from master 2006-12-02 08:36:41 UTC PostgreSQL Daemon
 webmas...@postgresql.org ''
 Delete:
 src/backend/parser/gram.c
 src/interfaces/ecpg/preproc/pgc.c
 src/interfaces/ecpg/preproc/preproc.c
 I addressed that problem in this email:

 http://archives.postgresql.org/pgsql-hackers/2010-08/msg01819.php

 Summary: it is caused by a known weakness in cvs2svn's
 branch-parent-choosing code that would be difficult to solve.

 But it just occurred to me--the script contrib/git-move-refs.py is
 supposed to fix problems like this.  Have you run this script against
 your git repository?  (Caveat: I am not very familiar with the script,
 which was contributed by a user.  Please check the results carefully and
 let us know how it works for you.)

 Moving refs can't possibly splice out branch creation commits.
 
 Max,
 
 My understanding was that the problem is not that the branches are
 created, but that they are created from a non-optimal starting point,
 making it necessary for each of them to be doctored using a fixup
 commit.  Since the tree contents following the first branch commit is
 identical to the tree contents on trunk one commit later, moving the
 branch tags will give the same branch contents without the need for
 branch fixup commits, and the old (branch-fixed) commits, no longer
 being referenced, will be garbage collected at the next git gc.  Why
 don't you think this will work?

You can't move a branchpoint after there are commits on the branch. I'm
pretty certain there will be commits on the REL8_2_STABLE branch :-)

Also, IIUC, this isn't the one commit later version of the problem -
it's a case of, for a period of *years*, the RCS files for these three
files claim they exist on trunk but no branches branching off trunk
during this period.

I am exploring the option of setting the unwanted revisions of the files
to the dead state (removing them outright doesn't work, since they have
a branch from one of the revisions in question.)

I have a test conversion running (well, a test conversion to bzr,
because I like qbzr so much more than gitk) and will report back.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-25 Thread Max Bowsher
On 25/08/10 09:18, Magnus Hagander wrote:
 On Wed, Aug 25, 2010 at 07:11, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:

 2. Any non-ASCII characters in, for example, contributor's names show
 up differently in the two repos.  Generally, the original repo is OK
 and the new repo is garbled; although I found one very old example
 that went the other way.

 What it looks like to me is that a Latin1-UTF8 conversion has been
 applied to the log text.  Which might be a good idea if it all *was*
 Latin1, but a fair-sized percentage isn't.  Applying this conversion to
 UTF8 entries results in garbage, of course.  Even if this could be done
 reliably, I think this counts as editorializing on the historical
 record, and should be switched off if possible.
 
 I think the problem is that we have a mix of them :( git requires it to be 
 utf8.
 
 cvs2git is configured to try, in order, latin1, utf8 and ascii, and
 use whichever first returns correct result. In this case it seems it
 does return saying things are right, because the result is valid utf8
 - just not the utf8 we expected.
 
 I can give it a try the other way around - trying utf8 *before*
 latin1, to see if that makes it better - utf8 tends to be more strict.

*Every* byte sequence is valid latin1, therefore if you try latin1,
utf8, ascii in that order, latin1 will always be used.

You most likely want utf8, latin1 (no point also including ascii since
it's a strict subset of latin1).

 There are also a number of commits that differ in order between the
 two repos, and an even larger number where commits are duplicated or
 merged in one repository relative to the other.

 I suspect that this is an artifact of the converter trying to merge
 nearby commits into one commit, which it more or less *has* to do for
 sanity since CVS commits aren't atomic.  I don't have a problem with
 the concept, but I notice cases where the converted commit has a
 timestamp some minutes later than what the cvs2cl output claims.
 I suspect this is what the converter was using as a cutoff time.
 Would it be possible to make sure that the converted commit is always
 timestamped with the latest individual file update timestamp from the
 included CVS commits?
 
 I can't comment o nthis part - Michael or Max?

cvs2git will try to use the timestamps from the commits, but sometimes
the ordering of how revisions and tags relate to each other will
actually disagree with the timestamps. In such a case, cvs2git nudges
commit timestamps forward in time, to force the defined temporal
ordering into consistency with the topological ordering of events.

In other words, no, you can't make cvs2git *always* use the timestamp
from a cvs commit, but it should have a good reason for doing so when it
deviates from that.

Max.




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-25 Thread Max Bowsher
On 25/08/10 01:15, Robert Haas wrote:
 On Fri, Aug 20, 2010 at 1:56 PM, Max Bowsher m...@f2s.com wrote:
 My guess at this point is that there may be a (very old?) version of cvs
 which, when adding a file to a branch, actually misrecorded the file as
 having existed on the branch from the moment it was first added to trunk
 - this would explain this anomaly.
 
 I think this is what is happening, except I'm unable to account for it
 by the age of the CVS version we're runnning.  The machine the CVS
 repo is running on is running 1.11.17-FreeBSD (client/server).  I
 don't know how long it's been that way, but there are examples of this
 in the relatively recent past - like July 2nd of this year.  I am 100%
 positive that what I did was 'cvs add' one new file, 'cvs delete' one
 old file, modify a few other things, and commit the whole deal.  But
 in the git conversion there are two commits, one of which adds a copy
 of the file as it exists in HEAD and the other of which contains the
 balance of the changes.  Every recent manufactured commit is of this
 same form: it immediately precedes the commit of which (in my view) it
 should be considered a part.
 
 Looking back a bit further in history, there is some stranger stuff.
 
 commit ec0274633871c43da670fa90d0ac4fd7090639f2
 Author: PostgreSQL Daemon webmas...@postgresql.org
 Date:   Mon Jun 6 16:30:43 2005 +
 
 This commit was manufactured by cvs2svn to create branch 'REL8_0_STABLE'.
 
 Cherrypick from master 2005-06-06 16:30:42 UTC Bruce Momjian 
 br...@momjian.
 doc/src/FAQ/FAQ_hungarian.html
 
 And then, much later, the following completely empty commit:
 
 commit 446b749c2eaeff3c0611d33bc12b3df28e2cf8fa
 Author: Bruce Momjian br...@momjian.us
 Date:   Tue Oct 4 14:17:44 2005 +
 
 Add FAQ_hungarian.html to 8.0.X branch.
 
 What really happened is:
 
 http://archives.postgresql.org/pgsql-committers/2005-10/msg00044.php
 
 So that's pretty much the same thing, except the time lag between the
 two commits that should be married is much larger.

Yup, exact same problem, the file was added to the branch, and CVS
erroneously recorded that it *had existed on the branch* from the moment
it was created on trunk.

 The odder cases are the ones involving deletion.  There are a couple
 of branches/tags that, or so I'm guessing, are only present for a
 subset of the files in the repository: ecpg_big_bison, creation,
 Release-1-6-0, MANUAL_1_0, REL2_0B, and SUPPORT.  I'm wondering if we
 shouldn't just nuke those, or at least nuke them from the copy of the
 repository upon which we are running the conversion.

Well, I'd caution against being too revisionist with your history, but
if you're convinced you want to drop certain tags/branches, you can
configure cvs2git to ignore them (see the symbol strategy rules part of
the options file).


Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-25 Thread Max Bowsher
On 25/08/10 04:21, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:

 What seemed more likely to be artifacts were
 these:
 
   remotes/origin/unlabeled-1.44.2
   remotes/origin/unlabeled-1.51.2
   remotes/origin/unlabeled-1.59.2
   remotes/origin/unlabeled-1.87.2
   remotes/origin/unlabeled-1.90.2
 
 Any idea where those came from?

These occur when there are numbered revisions in one or more RCS files,
which lack a branch tag to identify their name. The most likely cause is
deleting a branch after having committed to it.

Indeed, all of these five correspond to a commit with the message:

   Make the world at least somewhat safe for zero-column tables, and
   remove the special case in ALTER DROP COLUMN to prohibit dropping a
   table's last column.

I have an idea you can fix this by running the following on your live
CVS repository:

cvs rtag -D 2002-09-23 20:43:41 UTC zero-column-tables pgsql
cvs rtag -F -B -r 1.44.2 zero-column-tables \
  pgsql/src/backend/commands/tablecmds.c
cvs rtag -F -B -r 1.90.2 zero-column-tables \
  pgsql/src/backend/parser/parse_target.c
cvs rtag -F -B -r 1.90.2 zero-column-tables \
  pgsql/src/backend/access/common/tupdesc.c
cvs rtag -F -B -r 1.59.2 zero-column-tables \
  pgsql/src/backend/executor/execTuples.c
cvs rtag -F -B -r 1.87.2 zero-column-tables \
  pgsql/src/backend/executor/nodeAgg.c,v
cvs rtag -F -B -r 1.51.2 zero-column-tables \
  pgsql/src/test/regress/expected/alter_table.out

(Untested as yet, I have a test conversion running.)

 This series of commits also seems pretty messed up:
 http://archives.postgresql.org/pgsql-committers/2007-04/msg00222.php
 http://archives.postgresql.org/pgsql-committers/2007-04/msg00223.php
 
 You can find out about the reasons for that in this *other* discussion
 of conversion to git:
 http://archives.postgresql.org/pgsql-hackers/2007-04/msg00670.php
 particularly here:
 http://archives.postgresql.org/pgsql-hackers/2007-04/msg00685.php
 
 ... pretty crazy.  I think we should try to do something to clean this up,
 perhaps by doctoring the file on the CVS side.
 
 On the whole I feel that you're moving the goalposts.  AFAIR the agreed
 criteria for an acceptable SCM conversion were that it reproduce the
 historical states of our tree at least at all the release tags, and that
 it provide a close approximation of the CVS commit logs.  I think that
 manufactured commits that correspond to CVS's artifacts might be a bit
 ugly, but trying to get rid of them sounds way too much like putting
 lipstick on a pig.  And if it means removing real, if ugly, history,
 I'm not sure I'm in favor of it at all.

I'm mostly with Tom on this one. Basically you are now discovering what
a mess CVS has made. The mess has always existed, but only now do you
have the tools to notice this.

Your options are:

1) Accept that.

2) Retroactively modify history to say that those generated files NEVER
existed in the repository.

3) Retroactively modify history to say that those generated files are
actually included in all those release tags.


Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-25 Thread Max Bowsher
On 25/08/10 12:36, Heikki Linnakangas wrote:
 On 25/08/10 14:03, Max Bowsher wrote:
 On 25/08/10 09:18, Magnus Hagander wrote:
 On Wed, Aug 25, 2010 at 07:11, Tom Lanet...@sss.pgh.pa.us  wrote:
 Robert Haasrobertmh...@gmail.com  writes:
 There are also a number of commits that differ in order between the
 two repos, and an even larger number where commits are duplicated or
 merged in one repository relative to the other.

 I suspect that this is an artifact of the converter trying to merge
 nearby commits into one commit, which it more or less *has* to do for
 sanity since CVS commits aren't atomic.  I don't have a problem with
 the concept, but I notice cases where the converted commit has a
 timestamp some minutes later than what the cvs2cl output claims.
 I suspect this is what the converter was using as a cutoff time.
 Would it be possible to make sure that the converted commit is always
 timestamped with the latest individual file update timestamp from the
 included CVS commits?

 I can't comment o nthis part - Michael or Max?

 cvs2git will try to use the timestamps from the commits, but sometimes
 the ordering of how revisions and tags relate to each other will
 actually disagree with the timestamps. In such a case, cvs2git nudges
 commit timestamps forward in time, to force the defined temporal
 ordering into consistency with the topological ordering of events.
 
 Hmm, why does it force that consistency? AFAIK git is happy with a
 commit with an older timestamp following a commit with a newer timestamp.

Um. Good point. Why do enforce that?

Michael, do you think anything would break if we just removed the
ensure monotonicity code?

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-25 Thread Max Bowsher
On 25/08/10 16:43, Tom Lane wrote:
 Max Bowsher m...@f2s.com writes:
 On 25/08/10 04:21, Tom Lane wrote:
 What seemed more likely to be artifacts were these:

 remotes/origin/unlabeled-1.44.2
 remotes/origin/unlabeled-1.51.2
 remotes/origin/unlabeled-1.59.2
 remotes/origin/unlabeled-1.87.2
 remotes/origin/unlabeled-1.90.2

 Any idea where those came from?
 
 These occur when there are numbered revisions in one or more RCS files,
 which lack a branch tag to identify their name. The most likely cause is
 deleting a branch after having committed to it.
 
 Indeed, all of these five correspond to a commit with the message:
 
Make the world at least somewhat safe for zero-column tables, and
remove the special case in ALTER DROP COLUMN to prohibit dropping a
table's last column.
 
 It seems likely to me that this has something to do with the aborted
 early branch for 7.4 development:
 http://archives.postgresql.org/pgsql-hackers/2002-09/msg01733.php
 
 If you read that thread you'll find an agreement that we'd continue
 development on HEAD and then do a mega back-patch into REL7_3_STABLE,
 but there is no mega back-patch later in the CVS logs.  What actually
 happened is explained here:
 http://archives.postgresql.org/pgsql-hackers/2002-11/msg00113.php
 
 The first actual commit into REL7_3_STABLE that cvs2cl finds is
 a mass delete pursuant to my comment there.  I am not sure exactly
 what Marc did to move the REL7_3_STABLE tag up to today, but I'll
 bet that the funny state of the 2002-09-28 commit has something to
 do with that, as it was the first commit into HEAD after Marc
 originally established the REL7_3_STABLE branch.
 
 Max's proposed fix seems to involve recognizing those extra versions
 as a legitimate branch, which I think we don't really want.  It'd be
 better if we deleted them.

In that case, either employ an ExcludeRegexpStrategyRule('unlabeled-.*')
in the cvs2git options file, or drop those refs after converting to git.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 12:55, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 13:50, Max Bowsher m...@f2s.com wrote:
 I have run cvs2git on the pgsql module of your CVS locally (is that the
 right thing to convert?) if you'd like to compare notes on specific
 parts of the conversion.
 
 Correct, that's the one. Can you put your repo up somewhere so we can
 look at it? Then I don't have to wait for my process to finish :D

Placed at http://red-bean.com/~maxb/pgsql-test.git - about 230MB -
sorry, dumb transport only, but hopefully that's not an issue for this
use case.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 19/08/10 10:35, Magnus Hagander wrote:
 On Thu, Aug 19, 2010 at 07:00, Michael Haggerty mhag...@alum.mit.edu wrote:
 Magnus Hagander wrote:
 Is there some way to make cvs2git work this way, and just not bother
 even trying to create merge commits, or is that fundamentally
 impossible and we need to look at another tool?

 The good news: (I just reminded myself/realized that) Max Bowsher has
 already implemented pretty much exactly what you want in the cvs2svn
 trunk version, including noting in the commit messages any cherry-picks
 that are not reflected in the repo ancestry.
 
 Ah, that's great.

I should mention that the way it notes this is to reference commits by
their timestamp, author and initial line of log message - it does this
because cvs2git doesn't know the commit sha ever - that doesn't appear
until the stream is fed through git fast-import. I did briefly raise the
idea of augmenting the fast-import process to support substituting
fast-import marks to shas in log messages, but didn't get time to take
it beyond an idea.

 The bad news: It is broken [1].  But I don't think it should be too much
 work to fix it.
 
 That's less great of course, but it gives hope!
 
 Thanks for your continued efforts!

I've just made a commit to cvs2svn trunk. I hope this should now be fixed.

Max.




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 12:02, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 09:49, Max Bowsher m...@f2s.com wrote:
 On 19/08/10 10:35, Magnus Hagander wrote:
 On Thu, Aug 19, 2010 at 07:00, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 Magnus Hagander wrote:
 Is there some way to make cvs2git work this way, and just not bother
 even trying to create merge commits, or is that fundamentally
 impossible and we need to look at another tool?

 The good news: (I just reminded myself/realized that) Max Bowsher has
 already implemented pretty much exactly what you want in the cvs2svn
 trunk version, including noting in the commit messages any cherry-picks
 that are not reflected in the repo ancestry.

 Ah, that's great.

 I should mention that the way it notes this is to reference commits by
 their timestamp, author and initial line of log message - it does this
 because cvs2git doesn't know the commit sha ever - that doesn't appear
 until the stream is fed through git fast-import. I did briefly raise the
 idea of augmenting the fast-import process to support substituting
 fast-import marks to shas in log messages, but didn't get time to take
 it beyond an idea.

 The bad news: It is broken [1].  But I don't think it should be too much
 work to fix it.

 That's less great of course, but it gives hope!

 Thanks for your continued efforts!

 I've just made a commit to cvs2svn trunk. I hope this should now be fixed.
 
 
 Great. I will download and test the trunk version soon. I'm currently
 running a test using cvs2svn and then git-svn clone from that - but
 it's insanely slow (been going for 30+ hours now, and probably has
 8-10 hours more to go)...

Uh, you are? Why do it that way?

The thing I fixed pertains to the direct use of cvs2git, and will have
no effect on executions of cvs2svn.

I have run cvs2git on the pgsql module of your CVS locally (is that the
right thing to convert?) if you'd like to compare notes on specific
parts of the conversion.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 21:08, Tom Lane wrote:
 Max Bowsher m...@f2s.com writes:
 On Fri, Aug 20, 2010 at 20:52, Tom Lane t...@sss.pgh.pa.us wrote:
 If I understand Max's statements correctly, there is an observable
 problem in the actual git history, not just the commit log entries:
 it will believe that a file added on a branch had been there since
 the branch forked off, not just as of the time it got added.
 
 Not since the branch forked off, but rather it will believe the file
 added to the branch from the moment it was added to trunk - the issue is
 actually in the cvs repository too - were you to ask CVS for the state
 of the branch at the relevant time, you'd see the extra file there too.
 
 Ah.  So Magnus' tests didn't catch that because he only looked at
 release tag times, and none of these event pairs occurred across a
 release.
 
 In the specific case we've been looking at so far, the file is only
 appearing less than a minute prematurely.
 
 Hmm.  I wonder whether the anomaly is dependent on the order in which
 the cvs add's and cvs commit's are done in the two different branches.
 
 I'm still confused as to why this results in such massive weirdness in
 the generated git history, though.  If it simply caused an extra commit
 that adds the new file slightly earlier than the commit we think of as
 adding the file, I wouldn't be complaining.

Isn't this what's happening?

 It's the fact that there
 are all those unrelated HEAD commits showing up in the log for a branch
 that bugs me.

You mean in the synthetic log message? Well, they're not exactly
unrelated - the overall effect is that the file was added on trunk,
'merged' into the branch, and then modified appropriately for that branch.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 18:28, Tom Lane wrote:
 Max Bowsher m...@f2s.com writes:
 The history that cvs2svn is aiming to represent here is this:
 
 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
 did *not* exist.
 
 2) Later, it was added to trunk.
 
 3) Then, someone retroactively added the branch tag to the file, marking
 it as included in the REL8_4_STABLE branch. [This corresponds to the git
 changeset that Robert is questioning]
 
 Uh, no.  We have never retroactively added anything to any branch.
 I don't know enough about the innards of CVS to know what its internal
 representation of this sort of thing is, but all that actually happened
 here was a cvs add and a cvs commit in REL8_4_STABLE long after the
 branch occurred.  We would like the git history to look like that too.

When I try reproducing these circumstances locally, that is executing a
cvs add and a cvs commit of a file on a branch where that file
already exists on trunk, CVS writes an internal representation different
to what I see in your repository for this file.

I'm at a loss to explain how your repository came to be this way, but I
can tell you that cvs2git is faithfully rendering what your repository
says into git.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 14:36, Robert Haas wrote:
 On Fri, Aug 20, 2010 at 9:28 AM, Magnus Hagander mag...@hagander.net wrote:
 I believe Robert had some comments/questions as well :-)
 
 What Magnus means is that I'm a grumpy old developer who complains
 about everything.
 
 Anyway, what I noticed was that we're getting stuff like this:
 
 http://git.postgresql.org/gitweb?p=git-migration-test.git;a=commit;h=586b324c255a4316d72a5757566ebe6e630df47e
 
 commit 586b324c255a4316d72a5757566ebe6e630df47e
 Author: cvs2git 
 Date:   Thu May 13 16:39:49 2010 +
 
 This commit was manufactured by cvs2svn to create branch 'REL8_4_STABLE'.
 
 Cherrypick from master 2010-05-13 16:39:43 UTC adunstan 'Abandon the use 
 of
 src/pl/plperl/plperl_opmask.pl
 
 We're not getting that on EVERY back-patch, just on some of them.  I
 really just want to turn this code to detect merges and cherry-picks
 OFF altogether, so that we get the original committer and commit
 message instead off the above.  It's much easier to read if you're
 browsing the back-branch history, and it's probably easier to match up
 commits across branches, too.


The history that cvs2svn is aiming to represent here is this:

1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
did *not* exist.

2) Later, it was added to trunk.

3) Then, someone retroactively added the branch tag to the file, marking
it as included in the REL8_4_STABLE branch. [This corresponds to the git
changeset that Robert is questioning]

4) Then, adunstan committed a change to it on the branch.


cvs2svn/git/etc seeks to faithfully represent what the result would have
been of doing a CVS checkout of the REL8_4_STABLE branch, at various
points in time, which is why this changeset is introduced.


I should also say that the autogenerated commit message is rather poor -
it should say 'update' not 'create' in this case. I'm actually looking
at fixing that.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 18:30, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 19:28, Tom Lane t...@sss.pgh.pa.us wrote:
 Max Bowsher m...@f2s.com writes:
 The history that cvs2svn is aiming to represent here is this:

 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
 did *not* exist.

 2) Later, it was added to trunk.

 3) Then, someone retroactively added the branch tag to the file, marking
 it as included in the REL8_4_STABLE branch. [This corresponds to the git
 changeset that Robert is questioning]

 Uh, no.  We have never retroactively added anything to any branch.
 I don't know enough about the innards of CVS to know what its internal
 representation of this sort of thing is, but all that actually happened
 here was a cvs add and a cvs commit in REL8_4_STABLE long after the
 branch occurred.  We would like the git history to look like that too.
 
 Yeah.
 
 In fact, is the only thing that's wrong here the commit message?
 Because it's probably trivial to just patch that away.. Hmm, but i
 guess we'd like to hav ethe actual commit message and not just another
 fixed one..

There is no actual commit message - the entire changeset is
synthesized by cvs2git to represent the addition of a branch tag to the
file - i.e. the logical equivalent of a cvs tag -b, which has no
commit message.

Max.





signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 19:07, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 19:56, Max Bowsher m...@f2s.com wrote:
 On 20/08/10 18:43, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 19:41, Max Bowsher m...@f2s.com wrote:
 On 20/08/10 18:30, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 19:28, Tom Lane t...@sss.pgh.pa.us wrote:
 Max Bowsher m...@f2s.com writes:
 The history that cvs2svn is aiming to represent here is this:

 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
 did *not* exist.

 2) Later, it was added to trunk.

 3) Then, someone retroactively added the branch tag to the file, marking
 it as included in the REL8_4_STABLE branch. [This corresponds to the git
 changeset that Robert is questioning]

 Uh, no.  We have never retroactively added anything to any branch.
 I don't know enough about the innards of CVS to know what its internal
 representation of this sort of thing is, but all that actually happened
 here was a cvs add and a cvs commit in REL8_4_STABLE long after the
 branch occurred.  We would like the git history to look like that too.

 Yeah.

 In fact, is the only thing that's wrong here the commit message?
 Because it's probably trivial to just patch that away.. Hmm, but i
 guess we'd like to hav ethe actual commit message and not just another
 fixed one..

 There is no actual commit message - the entire changeset is
 synthesized by cvs2git to represent the addition of a branch tag to the
 file - i.e. the logical equivalent of a cvs tag -b, which has no
 commit message.

 There is a commit message on the trunk when the file was added there.
 Is there any chance of being able to lift that message off trunk and
 just copy it into the branch, instead of saying this is a cherry-pick
 of this commit over here?

 It wouldn't be accurate to do so, because the synthetic commit is not
 copying the entire change, only registering the addition of (in this
 case) one file which happens to be part of the changeset. Please note
 that there is a changeset on the branch immediately following the
 synthetic one under discussion which contains the 'real' commit message
 used when committing to the branch.
 
 Hmm. Good point.
 
 I wonder if we should try to locate these places and fix them with git
 filter-branch or rebase -i after the fact, with history rewriting.
 
 There seem to be 57 of them.

It sounds cumbersome.

Michael Haggerty might be better placed than me to assess whether
eliding them within cvs2git is practically achievable.

 Searching for those, I also found a bunch with the comment Sprouted
 from branch. What do those mean?

It appears as part of the description of what a synthetic branch
creation commit did, existing only to put into context the operations
that follow - i.e. the creation of the REL7_4_STABLE branch involved
sprouting from trunk, then deleting 4 files which were not included on
the branch.

The revision described in the Sprout ... line isn't particularly
interesting, since it's always the same as the parent of the commit -
it's just listed for symmetry with Cherrypick ... lines which may follow.

The presence/absence of a Sprout ... line indicates whether the
particular commit is the initial creation of a branch, versus the
grafting in of additional files to the branch. (The latter occurs when a
file is tagged as if it was part of the branch from the creation of the
branch, but only initially came into being *after* there were already
commits to the branch.)

 My guess at this point is that there may be a (very old?) version of cvs
 which, when adding a file to a branch, actually misrecorded the file as
 having existed on the branch from the moment it was first added to trunk
 - this would explain this anomaly.
 
 Well, the one Robert pointed to is a very recent commit. Not sure if
 it uses the client version or the server version - the version on
 cvs.postgresql.org is:
 
 [...@cvs ~]$ cvs --version
 
 Concurrent Versions System (CVS) 1.11.17-FreeBSD (client/server)

Unsure, I'm afraid. Though I might try hunting through CVS's CVS.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 18:43, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 19:41, Max Bowsher m...@f2s.com wrote:
 On 20/08/10 18:30, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 19:28, Tom Lane t...@sss.pgh.pa.us wrote:
 Max Bowsher m...@f2s.com writes:
 The history that cvs2svn is aiming to represent here is this:

 1) At the time of creation of the REL8_4_STABLE branch, plperl_opmask.pl
 did *not* exist.

 2) Later, it was added to trunk.

 3) Then, someone retroactively added the branch tag to the file, marking
 it as included in the REL8_4_STABLE branch. [This corresponds to the git
 changeset that Robert is questioning]

 Uh, no.  We have never retroactively added anything to any branch.
 I don't know enough about the innards of CVS to know what its internal
 representation of this sort of thing is, but all that actually happened
 here was a cvs add and a cvs commit in REL8_4_STABLE long after the
 branch occurred.  We would like the git history to look like that too.

 Yeah.

 In fact, is the only thing that's wrong here the commit message?
 Because it's probably trivial to just patch that away.. Hmm, but i
 guess we'd like to hav ethe actual commit message and not just another
 fixed one..

 There is no actual commit message - the entire changeset is
 synthesized by cvs2git to represent the addition of a branch tag to the
 file - i.e. the logical equivalent of a cvs tag -b, which has no
 commit message.
 
 There is a commit message on the trunk when the file was added there.
 Is there any chance of being able to lift that message off trunk and
 just copy it into the branch, instead of saying this is a cherry-pick
 of this commit over here?

It wouldn't be accurate to do so, because the synthetic commit is not
copying the entire change, only registering the addition of (in this
case) one file which happens to be part of the changeset. Please note
that there is a changeset on the branch immediately following the
synthetic one under discussion which contains the 'real' commit message
used when committing to the branch.

My guess at this point is that there may be a (very old?) version of cvs
which, when adding a file to a branch, actually misrecorded the file as
having existed on the branch from the moment it was first added to trunk
- this would explain this anomaly.

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 19:54, Magnus Hagander wrote:
 On Fri, Aug 20, 2010 at 20:52, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 In fact, is the only thing that's wrong here the commit message?
 Because it's probably trivial to just patch that away.. Hmm, but i
 guess we'd like to hav ethe actual commit message and not just another
 fixed one..

 If I understand Max's statements correctly, there is an observable
 problem in the actual git history, not just the commit log entries:
 it will believe that a file added on a branch had been there since
 the branch forked off, not just as of the time it got added.

Not since the branch forked off, but rather it will believe the file
added to the branch from the moment it was added to trunk - the issue is
actually in the cvs repository too - were you to ask CVS for the state
of the branch at the relevant time, you'd see the extra file there too.

In the specific case we've been looking at so far, the file is only
appearing less than a minute prematurely.

 Now, I would think that your tests of file contents as of the various
 release tags should have caught extra files, so maybe I'm
 misunderstanding.
 
 I haven't been able to complete that test on the repo converted by the
 new version yet, because the repo Max prepared for us had the keyword
 problem. The other process is still running.

Would it help at all for you to send me the options file and related
file so I can produce a repository converted as you are expecting?

Max.



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] git: uh-oh

2010-08-20 Thread Max Bowsher
On 20/08/10 19:30, Tom Lane wrote:
 Max Bowsher m...@f2s.com writes:
 My guess at this point is that there may be a (very old?) version of cvs
 which, when adding a file to a branch, actually misrecorded the file as
 having existed on the branch from the moment it was first added to trunk
 - this would explain this anomaly.
 
 I have no idea what version of CVS is running on our master server.
 I have noticed that it sometimes generates its own synthetic commit
 messages for cases related to this, for example these events on HEAD:
 
 2010-05-13 12:40  adunstan
 
   * src/pl/plperl/sql/plperlu_plperl.sql: file plperlu_plperl.sql was
   initially added on branch REL8_4_STABLE.
 
 2010-05-13 12:40  adunstan
 
   * src/pl/plperl/expected/plperlu_plperl.out: file
   plperlu_plperl.out was initially added on branch REL8_4_STABLE.

This is actually what's supposed to occur, and cvs2git will elide these
synthetic entries, which exist to represent the concept of adding a file
to a branch after the initial creation of the branch, within the fairly
arcane constraints of the RCS file format.

 I don't see one of these for plperl_opmask.pl in particular, so there
 may be more than one anomaly involved.

Just the one anomaly - the absence of one of those for plperl_opmask.pl
is the original anomaly.

 However, the bottom line here is that we don't want the history that
 cvs2git is preparing for these events, because it doesn't correspond to
 what we did.  Whether this is the most faithful representation of the
 CVS history is academic; it simply is not reality.  What we would like
 is for the history to look like the file got added to the branch as of
 the first commit that touched it on that branch.  That is reality, as
 it appears from our neck of the woods anyway.


Michael, what's your take on this? I have a feeling that such a thing is
*not* going to be a quick hack in cvs2svn.

Max.




signature.asc
Description: OpenPGP digital signature