Re: Fwd: git cvsimport implications

2013-05-17 Thread Michael Haggerty
On 05/17/2013 03:34 PM, Andreas Krey wrote:
> On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote:
> ...
>> We both know that the CVS history omits important data, and that the
>> history is mutable, etc.  So there are lots of hypothetical histories
>> that do not contradict CVS.  But some things are recorded unambiguously
>> in the CVS history, like
>>
>> * The contents at any tag or the tip of any branch (i.e., what is in the
>> working tree when you check it out).
> 
> Except that the tags/branches may be made in a way that can't
> be mapped onto any commit/point of history otherwise exported,
> with branches that are done on parts of the trees first, or
> likewise tags.

This is true, but cvs2git nevertheless puts the required content on the
branch so that it checks out correctly.  In other words, a "CVS tag
creation" (which might not have been done a single point in time) is
done by cvs2git roughly like this (assume it is from master):

1. Make a list of all versions of all files that have to be in the tag.

2. When one of those file versions has to be overwritten (e.g., because
a later version of that file needs to be added to master), create a Git
tag-branch containing all of the files that are currently at the correct
version for the tag.  (It has to be a Git branch, not a tag, because we
might have to change it later.)

3. As other files on master go through the revisions needed for the tag,
create new commits on the tag-branch that add those revisions of those
files to the tag-branch.

At the end of the process, the tag-branch has the same contents as the
CVS tag, though it may have had to be created via multiple commits.

Currently, step 3 creates merge commits from master to the tag-branch.
This is sometimes what one would expect, sometimes not--a matter of
taste, really, because the CVS history is in this aspect more flexible
than what is representable in Git's history model.

> ...
>> That being said, I appreciate that cvsimport can do incremental imports.
>>  cvs2git doesn't even attempt it.  I've thought about what it would take
>> to implement correct incremental imports in cvs2svn/cvs2git, and it is
> 
> Do these two produce stable output? That is, return the same commits
> for multiple runs on the same repo?

It usually produces stable output, but not always.  I've had reports of
users using cvs2svn successfully as an "incremental importer" by simply
running the full import each time and relying on Git to match up the
overlapping part of the history simply because the SHA-1s are identical.
 But (1) the later conversions would be just as slow as the first, (2)
some of the heuristic decisions for grouping CVS file changes into Git
changesets can be affected by later commits, and (3) CVS history is
mutable; if the CVS history is changed retroactively in any way then it
won't work at all.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Andreas Krey
On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote:
...
> We both know that the CVS history omits important data, and that the
> history is mutable, etc.  So there are lots of hypothetical histories
> that do not contradict CVS.  But some things are recorded unambiguously
> in the CVS history, like
> 
> * The contents at any tag or the tip of any branch (i.e., what is in the
> working tree when you check it out).

Except that the tags/branches may be made in a way that can't
be mapped onto any commit/point of history otherwise exported,
with branches that are done on parts of the trees first, or
likewise tags.

...
> That being said, I appreciate that cvsimport can do incremental imports.
>  cvs2git doesn't even attempt it.  I've thought about what it would take
> to implement correct incremental imports in cvs2svn/cvs2git, and it is

Do these two produce stable output? That is, return the same commits
for multiple runs on the same repo?

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Martin Langhoff
On Fri, May 17, 2013 at 9:34 AM, Andreas Krey  wrote:
> On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote:
> ...
>> We both know that the CVS history omits important data, and that the
>> history is mutable, etc.  So there are lots of hypothetical histories
>> that do not contradict CVS.  But some things are recorded unambiguously
>> in the CVS history, like
>>
>> * The contents at any tag or the tip of any branch (i.e., what is in the
>> working tree when you check it out).
>
> Except that the tags/branches may be made in a way that can't
> be mapped onto any commit/point of history otherwise exported,
> with branches that are done on parts of the trees first, or
> likewise tags.

Yeah, that's what I remember too.  It is perfectly fine in CVS to add
a tag to a file at a much later date than the rest of the tree. And it
happened too ("oh, I didn't have directory support/some-os checked out
when I tagged the release yesterday! let me check it out and add the
tag, nevermind that the branch has moved forward in the interim...").

I would add the long history of "cvs repository manipulation". Bad,
ugly stuff, but it happened in every major project I've seen. Mozilla,
X.org, etc.

TBH I am very glad that Michael cares deeply about the correctness,
and it leads to a much better tool. No doubt.

When discussing it with end users, I do think we have to be honest and
say that there's a fair chance that the output will not be perfect...
because what is in CVS is rather imperfect when you look at it closely
(which users aren't usually doing).

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Michael Haggerty
On 05/17/2013 01:50 PM, Martin Langhoff wrote:
> On Fri, May 17, 2013 at 5:10 AM, Michael Haggerty  
> wrote:
>> For one-time imports, the fix is to use a tool that is not broken, like
>> cvs2git.
> 
> As one of the earlier maintainers of cvsimport, I do believe that
> cvs2git is less broken, for one-shot imports, than cvsimport. Users
> looking for a one-shot import should not use cvsimport as there are
> better options there. Myself, I have used parsecvs (long ago, so
> perhaps it isn't the best of the crop nowadays).
> 
> TBH, I am puzzled and amused at all the chest-thumping about cvs
> importers. Yeah, yours is a bit better or saner, but we all wade in
> the muddle of essentially broken data. So "is not broken" is rather
> misleading when talking to end users. It carries so many caveats about
> whether it'll work on the users' particular repo that it is not a
> generally truthful statement.

I disagree.  I use the following definition of "correct":

The Git history output by an importer must not contradict the
history that is recorded in CVS.

We both know that the CVS history omits important data, and that the
history is mutable, etc.  So there are lots of hypothetical histories
that do not contradict CVS.  But some things are recorded unambiguously
in the CVS history, like

* The contents at any tag or the tip of any branch (i.e., what is in the
working tree when you check it out).

* The order of modifications to a single file on a single branch and the
file contents after each of those revisions.

* Who committed a particular change, and approximately when (modulo
clock skew).

If a tool doesn't get these things correct (especially the first!) then
it should only be used with great caution.  cvsimport can make mistakes
on the first two.  As far as I know, cvs2svn/cvs2git are correct
according to this definition.


That being said, I appreciate that cvsimport can do incremental imports.
 cvs2git doesn't even attempt it.  I've thought about what it would take
to implement correct incremental imports in cvs2svn/cvs2git, and it is
far beyond the budget of time that I have for the project.  So I
definitely give props to cvsimport for attempting incremental imports
and apparently often doing a good enough job that it is useful to people.

> [...]
> At the time, I looked into trying to use cvs2svn (precursor to
> cvs2git) as the "CVS read" side of cvsimport, but it did not support
> incremental imports at all, and it took forever to run.

cvs2svn still doesn't support incremental imports, and it still takes a
long time to run (though less than before).  cvs2git is considerably
faster, partly because of the speed and convenience of using
git-fast-import.  But conversion time is much less of an issue for
one-time conversions.

> It was a time when git was new and people were dipping their toes in
> the pool, and some developers were pining to use git on projects that
> used CVS (like we use git-svn now). Incremental imports were a must.
> 
> One of the nice features of cvsimport is that it can do incrementals
> on a repo imported with another tool. That earns it a place under the
> sun. If it didn't have that, I'd be voting for removal (after a review
> that the replacement *is* actually better ;-) across a number of test
> repos).

Incremental imports are indeed the saving grace of cvsimport and for
that reason I don't advocate it's removal.  But I think we should be
clearer about warning users against using it for one-time imports,
because it can produce output that is *objectively* incorrect in
important ways.

Regarding tests, the failing tests that I added to the cvsimport test
suite a few years ago were taken directly from the cvs2svn/cvs2git test
suite, where they pass :-)

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Martin Langhoff
On Fri, May 17, 2013 at 5:10 AM, Michael Haggerty  wrote:
> For one-time imports, the fix is to use a tool that is not broken, like
> cvs2git.

As one of the earlier maintainers of cvsimport, I do believe that
cvs2git is less broken, for one-shot imports, than cvsimport. Users
looking for a one-shot import should not use cvsimport as there are
better options there. Myself, I have used parsecvs (long ago, so
perhaps it isn't the best of the crop nowadays).

TBH, I am puzzled and amused at all the chest-thumping about cvs
importers. Yeah, yours is a bit better or saner, but we all wade in
the muddle of essentially broken data. So "is not broken" is rather
misleading when talking to end users. It carries so many caveats about
whether it'll work on the users' particular repo that it is not a
generally truthful statement.

I am very glad to hear it is better than cvsimport, and even more glad
to hear its limitations are better understood and documented. It has
had a testsuite for the longest of times!

And very likely has the best chance of success across the available
importers :-)

Oh, and why is cvsimport so vague? Because it is just driven by cvsps.
It creates a repo based on what cvsps understands from the CVS data.

At the time, I looked into trying to use cvs2svn (precursor to
cvs2git) as the "CVS read" side of cvsimport, but it did not support
incremental imports at all, and it took forever to run.

It was a time when git was new and people were dipping their toes in
the pool, and some developers were pining to use git on projects that
used CVS (like we use git-svn now). Incremental imports were a must.

One of the nice features of cvsimport is that it can do incrementals
on a repo imported with another tool. That earns it a place under the
sun. If it didn't have that, I'd be voting for removal (after a review
that the replacement *is* actually better ;-) across a number of test
repos).

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread John Keeping
On Fri, May 17, 2013 at 11:10:03AM +0200, Michael Haggerty wrote:
> On 05/15/2013 08:03 PM, Eugene Sajine wrote:
> > My primary goal was to understand better what are the real problems
> > that we might have with the way we use git cvsimport, so I was not
> > asking about the guarantee of the cvsimport to import things
> > correctly, but if there is a guarantee the import will result in
> > completely broken history.
> 
> So what are you going to do, use cvsimport whenever you cannot *prove*
> that it is wrong?  You sure have low standards for your software.
> 
> The only *useful* guarantee is that software is *correct* under defined
> circumstances.  I don't think anybody has gone to the trouble to figure
> out when that claim can be made for cvsimport.
> 
> > If the cvsimport is that broken - is there any plan to fix it?
> 
> For one-time imports, the fix is to use a tool that is not broken, like
> cvs2git.
> 
> Alternatively, Eric Raymond claims to have developed a new version of
> cvsps that is not quite as broken as the old version.  Presumably
> cvsimport would be not quite as broken if used with the new cvsps.

cvsimport doesn't work with the cvsps-3 - we decided to stick with the
version we have (using cvsps-2) because that is the only option that
supports incremental import; those using if for that are used to its
deficiencies and there is no plan to improve it.  The manpage notes that
it uses a deprecated version of cvsps and recommends alternatives for
one-shot imports.

There is a version of git-cvsimport script in the cvsps-3 repository
that works with it, but it does not support incremental import in the
same was as git.git's git-cvsimport so it will not replace the version
in git.git.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Michael Haggerty
On 05/15/2013 08:03 PM, Eugene Sajine wrote:
> My primary goal was to understand better what are the real problems
> that we might have with the way we use git cvsimport, so I was not
> asking about the guarantee of the cvsimport to import things
> correctly, but if there is a guarantee the import will result in
> completely broken history.

So what are you going to do, use cvsimport whenever you cannot *prove*
that it is wrong?  You sure have low standards for your software.

The only *useful* guarantee is that software is *correct* under defined
circumstances.  I don't think anybody has gone to the trouble to figure
out when that claim can be made for cvsimport.

> If the cvsimport is that broken - is there any plan to fix it?

For one-time imports, the fix is to use a tool that is not broken, like
cvs2git.

Alternatively, Eric Raymond claims to have developed a new version of
cvsps that is not quite as broken as the old version.  Presumably
cvsimport would be not quite as broken if used with the new cvsps.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-15 Thread Eugene Sajine
Hi

My primary goal was to understand better what are the real problems
that we might have with the way we use git cvsimport, so I was not
asking about the guarantee of the cvsimport to import things
correctly, but if there is a guarantee the import will result in
completely broken history. IF there is a situation when cvsimport can
do things right and when it definitely going to fail?

Anyway, thanks a lot for the info. I do know that cvs2git is an option.

If the cvsimport is that broken - is there any plan to fix it?

Thanks,
Eugene

On Wed, May 15, 2013 at 2:24 AM, Michael Haggerty  wrote:
> On 05/15/2013 12:19 AM, Junio C Hamano wrote:
>> Eugene Sajine  writes:
>>
>>> What if there are a lot of branches in the CVS repo? Is it guaranteed
>>> to be broken after import?
>>
>> Even though CVS repository can record branches in individual ,v
>> files, reconstructing per branch history and where the branch
>> happened in each "changeset" cannot be determined with any
>> certainty.  The best you can get is a heuristic result.
>>
>> I do not think anybody can give such a guarantee.  The best you can
>> do is to convert it and validate if the result matches what you
>> think has happened in the CVS history.
>
> Junio, you are correct that there is no 100% reliable way of inferring
> the changesets that were made in CVS.  CVS doesn't record which file
> revisions were committed at the same time, unambiguous branch points,
> etc.  The best a tool can do is use heuristics.
>
> But it *is* possible for a conversion tool to make some more elementary
> guarantees regarding aspects of the history that are recorded
> unambiguously in CVS, for example:
>
> * That if you check the tip of same branch out of CVS and out of Git,
> you get the same contents.
>
> * That CVS file revisions are committed to Git in the correct order
> relative to each other; e.g., that the changes made in CVS revision
> 1.4.2.2 in a particular file precede those made in revision 1.4.2.3 of
> the same file.
>
> git-cvsimport fails to ensure even this minimal level of correctness.
> Such errors are demonstrated in its own test suite.
>
> cvs2git, on the other hand, gets the basics 100% correct (if you find a
> discrepancy, please file a bug!), in addition to having great heuristics
> for inferring the details of the history.
>
> There is no reason ever to use git-cvsimport for one-time conversions
> from CVS to Git.  The only reason ever to use it is if you absolutely
> require an incremental bridge between CVS and Git, and even then please
> use it with great caution.
>
> Michael
> the cvs2svn/cvs2git maintainer
>
> --
> Michael Haggerty
> mhag...@alum.mit.edu
> http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-14 Thread Michael Haggerty
On 05/15/2013 12:19 AM, Junio C Hamano wrote:
> Eugene Sajine  writes:
> 
>> What if there are a lot of branches in the CVS repo? Is it guaranteed
>> to be broken after import?
> 
> Even though CVS repository can record branches in individual ,v
> files, reconstructing per branch history and where the branch
> happened in each "changeset" cannot be determined with any
> certainty.  The best you can get is a heuristic result.
> 
> I do not think anybody can give such a guarantee.  The best you can
> do is to convert it and validate if the result matches what you
> think has happened in the CVS history.

Junio, you are correct that there is no 100% reliable way of inferring
the changesets that were made in CVS.  CVS doesn't record which file
revisions were committed at the same time, unambiguous branch points,
etc.  The best a tool can do is use heuristics.

But it *is* possible for a conversion tool to make some more elementary
guarantees regarding aspects of the history that are recorded
unambiguously in CVS, for example:

* That if you check the tip of same branch out of CVS and out of Git,
you get the same contents.

* That CVS file revisions are committed to Git in the correct order
relative to each other; e.g., that the changes made in CVS revision
1.4.2.2 in a particular file precede those made in revision 1.4.2.3 of
the same file.

git-cvsimport fails to ensure even this minimal level of correctness.
Such errors are demonstrated in its own test suite.

cvs2git, on the other hand, gets the basics 100% correct (if you find a
discrepancy, please file a bug!), in addition to having great heuristics
for inferring the details of the history.

There is no reason ever to use git-cvsimport for one-time conversions
from CVS to Git.  The only reason ever to use it is if you absolutely
require an incremental bridge between CVS and Git, and even then please
use it with great caution.

Michael
the cvs2svn/cvs2git maintainer

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-14 Thread Junio C Hamano
Eugene Sajine  writes:

> What if there are a lot of branches in the CVS repo? Is it guaranteed
> to be broken after import?

Even though CVS repository can record branches in individual ,v
files, reconstructing per branch history and where the branch
happened in each "changeset" cannot be determined with any
certainty.  The best you can get is a heuristic result.

I do not think anybody can give such a guarantee.  The best you can
do is to convert it and validate if the result matches what you
think has happened in the CVS history.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html