Re: [HACKERS] PostgreSQL Developer meeting minutes up

Markus Wanner Thu, 28 May 2009 23:42:09 -0700

Hi,

Quoting "Robert Haas" <[email protected]>:

That's not the best news I've had today...


Sorry :-(

To me they sound complex and inconvenient.  I guess I'm kind of
mystified by why we can't make this work reliably.  Other than the
"broken tags" issue we've discussed, it seems like the only real issue
should be how to group changes to different files into a single
commit.  Once you do that, you should be able to construct a
well-defined, total function f : <cvs-file, cvs-revision> -> <git
commit> which is surjective on the space of git commits.  In fact it
might be a good idea to explicitly construct this mapping and drop it
into a database table somewhere so that people can sanity check it as
much as they wish.  Why is this harder than I think it is?

Well, as CVS doesn't guarantee any consistency between files, you endup with silly situations more often than you think. One of thesimplest possible example is something like:


  commit 1: fileA @ 1.1, fileB @ 1.2
  commit 2: fileA @ 1.2, fileB @ 1.1

Seen from fileA, it's obvious that commit 1 (@1.1) comes before commit2 (@1.2), but seen from fileB it's the exact opposite. The mostpromising approach to solve these problems seems to be based on GraphTheory, where you work with a graph of dependencies from fileA @ 1.1to fileA @ 1.2.

To resolve the above situation, you'd have "split" a blob ofsingle-file commits into two end-result commits (for monotone / git).In the above example, you'd have two options to resolve the conflict:


  commit 1a: fileA @ 1.1
  commit 2:  fileA @ 1.2, fileB @ 1.1
  commit 1b: fileA @ 1.2

Or:

  commit 2a: fileB @ 1.1
  commit 1: fileA @ 1.1, fileB @ 1.2
  commit 2b: fileB @ 1.2

(Note that often enough, these have actually been separate commits inCVS as well, there's just no way to represent that. And no, timestampsare simply not reliable enough).

Now add tags, branches and cyclic dependencies involving many filesand many 100 commits to the example above and you start to get an ideaof the complexity of the problem in general.

See my description and diagrams of the steps used for cvs_import inmonotone at [1] or follow descriptions of how cvs2svn works internally.

A few numbers about a conversion I'm trying for testing my algorithmand heuristics. It's converting a pretty recent snapshot of thePostgres repository:


 * running at 100% CPU time since: April, 17
 * Total number of files involved: 6'847
 * total number of blobs (before splitting): 28'010
 * blobs split due to cyclic dependencies: 12'801

Admittedly, my algorithm isn't optimized at all. However, I'm focusingon good results rather than speed of conversion.

Also note, that monotone uses SQLite, so it actually stores theresults of this conversion in an SQL database, as you proposed.Recently, a git_export command has been added, so that's definitelyworth a try for converting CVS to git. However, I fear cvs2git is moremature.


Regards

Markus Wanner

[1]: a description of the various steps in conversion from CVS to monotone:
http://www.monotone.ca/wiki/CvsImport/


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Developer meeting minutes up

Reply via email to