Tom Lane wrote: > Well, even if the goal is to faithfully represent the bogus history > shown by CVS, cvs2git isn't doing a good job of it.
Them's fightin' words :-) > In the case of > src/bin/pg_dump/po/it.po, the CVS history claims that the version > added to REL8_4_STABLE on 2010-05-13 is a child of the mainline > version 1.7 committed on 2010-02-19. Therefore, according to CVS > the file existed on the branch from 2010-02-19, not 2010-02-28 > as claimed by the cvs2git translation. Incorrect. The CVS history implies three user-initiated events in this neighborhood: 2010.02.19: version 1.7 committed to trunk unknown date: file added to branch REL8_4_STABLE (1.7.6) 2010.05.13: file modified on branch REL8_4_STABLE to create 1.7.6.1 The CVS history gives no reason to assume that the middle event happened on 2010-02-19, or on 2010-05-13, or on any other particular date. *If* you trust the timestamps (which cvs2git treats sceptically because they are often wrong), then you can say with certainty that the intermediate event happened sometime between the two numbered commits. It is cvs2git policy to try to group add-branch-tag-to-file events together if such grouping is consistent with the nearby commit dates. The files contrib/xml2/expected/xml2.out and contrib/xml2/sql/xml2.sql have the following constraints: contrib/xml2/expected/xml2.out: 2010.02.28: 1.1 unknown date: file added to branch REL8_4_STABLE (1.1.2) 2010.03.01: 1.1.2.1 contrib/xml2/sql/xml2.sql 2010.02.28: 1.1 unknown date: file added to branch REL8_4_STABLE (1.1.2) 2010.03.01: 1.1.2.1 Since there is a date range (2010-02-28 - 2010-03-01) consistent with all of the constraints, cvs2git picks a date in that range for a commit that adds all three files to branch REL8_4_STABLE. > I did some "cvs co" operations > to check this and cvs does indeed retrieve the file between 02-19 and > 02-28, but not before 02-19. So I don't think you can defend the > cvs2git behavior by claiming that it's an exact translation. CVS is using the same incomplete data as cvs2svn and, just like cvs2git, it has to pick a date out of its hat. It happens to choose a different date than cvs2git. *Neither CVS nor cvs2git can be sure when the file was really added to the branch, and neither is more likely to be correct than the other.* (Actually, cvs2git is arguably more likely to be correct because it uses information from multiple files in its heuristic whereas CVS considers information for only the single file.) Robert Haas wrote: > One thing I'm not quite clear on is > how cvs2git thinks CVS "should" look given what we actually did vs. > how it actually does look, The crux of the problem is that there is a plethora of hypothetical "true" histories that are consistent with the incomplete data recorded by CVS. cvs2svn/cvs2git picks a history that is 1. Correct, which I define to mean that the chosen history is not contradicted by the CVS data (with deviations allowed only when the CVS data is internally inconsistent). Any problems with this criterion are considered serious bugs. But (1) still leaves a vast number of possible histories. So a secondary goal is to choose a history that is 2. Plausible, meaning that it the history is believable given the way that people typically develop software in a typical CVS project. This is necessarily subjective and depends a lot on project culture and policies. (A cvs2git written from scratch for the pgsql project would undoubtedly be more mindful of your project's policies.) Improvements on this criterion are also constrained by performance requirements. Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers