Re: [HACKERS] git: uh-oh

Michael Haggerty Wed, 08 Sep 2010 01:16:47 -0700

Tom Lane wrote:
> Well, even if the goal is to faithfully represent the bogus history
> shown by CVS, cvs2git isn't doing a good job of it.

Them's fightin' words :-)

> In the case of
> src/bin/pg_dump/po/it.po, the CVS history claims that the version
> added to REL8_4_STABLE on 2010-05-13 is a child of the mainline
> version 1.7 committed on 2010-02-19.  Therefore, according to CVS
> the file existed on the branch from 2010-02-19, not 2010-02-28
> as claimed by the cvs2git translation.

Incorrect.  The CVS history implies three user-initiated events in this
neighborhood:

2010.02.19: version 1.7 committed to trunk
unknown date: file added to branch REL8_4_STABLE (1.7.6)
2010.05.13: file modified on branch REL8_4_STABLE to create 1.7.6.1

The CVS history gives no reason to assume that the middle event happened
on 2010-02-19, or on 2010-05-13, or on any other particular date.  *If*
you trust the timestamps (which cvs2git treats sceptically because they
are often wrong), then you can say with certainty that the intermediate
event happened sometime between the two numbered commits.

It is cvs2git policy to try to group add-branch-tag-to-file events
together if such grouping is consistent with the nearby commit dates.
The files contrib/xml2/expected/xml2.out and contrib/xml2/sql/xml2.sql
have the following constraints:

contrib/xml2/expected/xml2.out:
2010.02.28: 1.1
unknown date: file added to branch REL8_4_STABLE (1.1.2)
2010.03.01: 1.1.2.1

contrib/xml2/sql/xml2.sql
2010.02.28: 1.1
unknown date: file added to branch REL8_4_STABLE (1.1.2)
2010.03.01: 1.1.2.1

Since there is a date range (2010-02-28 - 2010-03-01) consistent with
all of the constraints, cvs2git picks a date in that range for a commit
that adds all three files to branch REL8_4_STABLE.

> I did some "cvs co" operations
> to check this and cvs does indeed retrieve the file between 02-19 and
> 02-28, but not before 02-19.  So I don't think you can defend the
> cvs2git behavior by claiming that it's an exact translation.

CVS is using the same incomplete data as cvs2svn and, just like cvs2git,
it has to pick a date out of its hat.  It happens to choose a different
date than cvs2git.  *Neither CVS nor cvs2git can be sure when the file
was really added to the branch, and neither is more likely to be correct
than the other.*  (Actually, cvs2git is arguably more likely to be
correct because it uses information from multiple files in its heuristic
whereas CVS considers information for only the single file.)

Robert Haas wrote:
> One thing I'm not quite clear on is
> how cvs2git thinks CVS "should" look given what we actually did vs.
> how it actually does look,

The crux of the problem is that there is a plethora of hypothetical
"true" histories that are consistent with the incomplete data recorded
by CVS.  cvs2svn/cvs2git picks a history that is

1. Correct, which I define to mean that the chosen history is not
contradicted by the CVS data (with deviations allowed only when the CVS
data is internally inconsistent).  Any problems with this criterion are
considered serious bugs.

But (1) still leaves a vast number of possible histories.  So a
secondary goal is to choose a history that is

2. Plausible, meaning that it the history is believable given the way
that people typically develop software in a typical CVS project.  This
is necessarily subjective and depends a lot on project culture and
policies.  (A cvs2git written from scratch for the pgsql project would
undoubtedly be more mindful of your project's policies.)  Improvements
on this criterion are also constrained by performance requirements.

Michael

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

Reply via email to