Hi,

Quoting "Robert Haas" <robertmh...@gmail.com>:
That's not the best news I've had today...

Sorry :-(

To me they sound complex and inconvenient.  I guess I'm kind of
mystified by why we can't make this work reliably.  Other than the
"broken tags" issue we've discussed, it seems like the only real issue
should be how to group changes to different files into a single
commit.  Once you do that, you should be able to construct a
well-defined, total function f : <cvs-file, cvs-revision> -> <git
commit> which is surjective on the space of git commits.  In fact it
might be a good idea to explicitly construct this mapping and drop it
into a database table somewhere so that people can sanity check it as
much as they wish.  Why is this harder than I think it is?

Well, as CVS doesn't guarantee any consistency between files, you end up with silly situations more often than you think. One of the simplest possible example is something like:

  commit 1: fileA @ 1.1, fileB @ 1.2
  commit 2: fileA @ 1.2, fileB @ 1.1

Seen from fileA, it's obvious that commit 1 (@1.1) comes before commit 2 (@1.2), but seen from fileB it's the exact opposite. The most promising approach to solve these problems seems to be based on Graph Theory, where you work with a graph of dependencies from fileA @ 1.1 to fileA @ 1.2.

To resolve the above situation, you'd have "split" a blob of single-file commits into two end-result commits (for monotone / git). In the above example, you'd have two options to resolve the conflict:

  commit 1a: fileA @ 1.1
  commit 2:  fileA @ 1.2, fileB @ 1.1
  commit 1b: fileA @ 1.2

Or:

  commit 2a: fileB @ 1.1
  commit 1: fileA @ 1.1, fileB @ 1.2
  commit 2b: fileB @ 1.2

(Note that often enough, these have actually been separate commits in CVS as well, there's just no way to represent that. And no, timestamps are simply not reliable enough).

Now add tags, branches and cyclic dependencies involving many files and many 100 commits to the example above and you start to get an idea of the complexity of the problem in general.

See my description and diagrams of the steps used for cvs_import in monotone at [1] or follow descriptions of how cvs2svn works internally.

A few numbers about a conversion I'm trying for testing my algorithm and heuristics. It's converting a pretty recent snapshot of the Postgres repository:

 * running at 100% CPU time since: April, 17
 * Total number of files involved: 6'847
 * total number of blobs (before splitting): 28'010
 * blobs split due to cyclic dependencies: 12'801

Admittedly, my algorithm isn't optimized at all. However, I'm focusing on good results rather than speed of conversion.

Also note, that monotone uses SQLite, so it actually stores the results of this conversion in an SQL database, as you proposed. Recently, a git_export command has been added, so that's definitely worth a try for converting CVS to git. However, I fear cvs2git is more mature.

Regards

Markus Wanner

[1]: a description of the various steps in conversion from CVS to monotone:
http://www.monotone.ca/wiki/CvsImport/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to