Re: Fwd: git cvsimport implications

2013-05-17 Thread Michael Haggerty
On 05/15/2013 08:03 PM, Eugene Sajine wrote:
 My primary goal was to understand better what are the real problems
 that we might have with the way we use git cvsimport, so I was not
 asking about the guarantee of the cvsimport to import things
 correctly, but if there is a guarantee the import will result in
 completely broken history.

So what are you going to do, use cvsimport whenever you cannot *prove*
that it is wrong?  You sure have low standards for your software.

The only *useful* guarantee is that software is *correct* under defined
circumstances.  I don't think anybody has gone to the trouble to figure
out when that claim can be made for cvsimport.

 If the cvsimport is that broken - is there any plan to fix it?

For one-time imports, the fix is to use a tool that is not broken, like
cvs2git.

Alternatively, Eric Raymond claims to have developed a new version of
cvsps that is not quite as broken as the old version.  Presumably
cvsimport would be not quite as broken if used with the new cvsps.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread John Keeping
On Fri, May 17, 2013 at 11:10:03AM +0200, Michael Haggerty wrote:
 On 05/15/2013 08:03 PM, Eugene Sajine wrote:
  My primary goal was to understand better what are the real problems
  that we might have with the way we use git cvsimport, so I was not
  asking about the guarantee of the cvsimport to import things
  correctly, but if there is a guarantee the import will result in
  completely broken history.
 
 So what are you going to do, use cvsimport whenever you cannot *prove*
 that it is wrong?  You sure have low standards for your software.
 
 The only *useful* guarantee is that software is *correct* under defined
 circumstances.  I don't think anybody has gone to the trouble to figure
 out when that claim can be made for cvsimport.
 
  If the cvsimport is that broken - is there any plan to fix it?
 
 For one-time imports, the fix is to use a tool that is not broken, like
 cvs2git.
 
 Alternatively, Eric Raymond claims to have developed a new version of
 cvsps that is not quite as broken as the old version.  Presumably
 cvsimport would be not quite as broken if used with the new cvsps.

cvsimport doesn't work with the cvsps-3 - we decided to stick with the
version we have (using cvsps-2) because that is the only option that
supports incremental import; those using if for that are used to its
deficiencies and there is no plan to improve it.  The manpage notes that
it uses a deprecated version of cvsps and recommends alternatives for
one-shot imports.

There is a version of git-cvsimport script in the cvsps-3 repository
that works with it, but it does not support incremental import in the
same was as git.git's git-cvsimport so it will not replace the version
in git.git.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Martin Langhoff
On Fri, May 17, 2013 at 5:10 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 For one-time imports, the fix is to use a tool that is not broken, like
 cvs2git.

As one of the earlier maintainers of cvsimport, I do believe that
cvs2git is less broken, for one-shot imports, than cvsimport. Users
looking for a one-shot import should not use cvsimport as there are
better options there. Myself, I have used parsecvs (long ago, so
perhaps it isn't the best of the crop nowadays).

TBH, I am puzzled and amused at all the chest-thumping about cvs
importers. Yeah, yours is a bit better or saner, but we all wade in
the muddle of essentially broken data. So is not broken is rather
misleading when talking to end users. It carries so many caveats about
whether it'll work on the users' particular repo that it is not a
generally truthful statement.

I am very glad to hear it is better than cvsimport, and even more glad
to hear its limitations are better understood and documented. It has
had a testsuite for the longest of times!

And very likely has the best chance of success across the available
importers :-)

Oh, and why is cvsimport so vague? Because it is just driven by cvsps.
It creates a repo based on what cvsps understands from the CVS data.

At the time, I looked into trying to use cvs2svn (precursor to
cvs2git) as the CVS read side of cvsimport, but it did not support
incremental imports at all, and it took forever to run.

It was a time when git was new and people were dipping their toes in
the pool, and some developers were pining to use git on projects that
used CVS (like we use git-svn now). Incremental imports were a must.

One of the nice features of cvsimport is that it can do incrementals
on a repo imported with another tool. That earns it a place under the
sun. If it didn't have that, I'd be voting for removal (after a review
that the replacement *is* actually better ;-) across a number of test
repos).

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Michael Haggerty
On 05/17/2013 01:50 PM, Martin Langhoff wrote:
 On Fri, May 17, 2013 at 5:10 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 For one-time imports, the fix is to use a tool that is not broken, like
 cvs2git.
 
 As one of the earlier maintainers of cvsimport, I do believe that
 cvs2git is less broken, for one-shot imports, than cvsimport. Users
 looking for a one-shot import should not use cvsimport as there are
 better options there. Myself, I have used parsecvs (long ago, so
 perhaps it isn't the best of the crop nowadays).
 
 TBH, I am puzzled and amused at all the chest-thumping about cvs
 importers. Yeah, yours is a bit better or saner, but we all wade in
 the muddle of essentially broken data. So is not broken is rather
 misleading when talking to end users. It carries so many caveats about
 whether it'll work on the users' particular repo that it is not a
 generally truthful statement.

I disagree.  I use the following definition of correct:

The Git history output by an importer must not contradict the
history that is recorded in CVS.

We both know that the CVS history omits important data, and that the
history is mutable, etc.  So there are lots of hypothetical histories
that do not contradict CVS.  But some things are recorded unambiguously
in the CVS history, like

* The contents at any tag or the tip of any branch (i.e., what is in the
working tree when you check it out).

* The order of modifications to a single file on a single branch and the
file contents after each of those revisions.

* Who committed a particular change, and approximately when (modulo
clock skew).

If a tool doesn't get these things correct (especially the first!) then
it should only be used with great caution.  cvsimport can make mistakes
on the first two.  As far as I know, cvs2svn/cvs2git are correct
according to this definition.


That being said, I appreciate that cvsimport can do incremental imports.
 cvs2git doesn't even attempt it.  I've thought about what it would take
to implement correct incremental imports in cvs2svn/cvs2git, and it is
far beyond the budget of time that I have for the project.  So I
definitely give props to cvsimport for attempting incremental imports
and apparently often doing a good enough job that it is useful to people.

 [...]
 At the time, I looked into trying to use cvs2svn (precursor to
 cvs2git) as the CVS read side of cvsimport, but it did not support
 incremental imports at all, and it took forever to run.

cvs2svn still doesn't support incremental imports, and it still takes a
long time to run (though less than before).  cvs2git is considerably
faster, partly because of the speed and convenience of using
git-fast-import.  But conversion time is much less of an issue for
one-time conversions.

 It was a time when git was new and people were dipping their toes in
 the pool, and some developers were pining to use git on projects that
 used CVS (like we use git-svn now). Incremental imports were a must.
 
 One of the nice features of cvsimport is that it can do incrementals
 on a repo imported with another tool. That earns it a place under the
 sun. If it didn't have that, I'd be voting for removal (after a review
 that the replacement *is* actually better ;-) across a number of test
 repos).

Incremental imports are indeed the saving grace of cvsimport and for
that reason I don't advocate it's removal.  But I think we should be
clearer about warning users against using it for one-time imports,
because it can produce output that is *objectively* incorrect in
important ways.

Regarding tests, the failing tests that I added to the cvsimport test
suite a few years ago were taken directly from the cvs2svn/cvs2git test
suite, where they pass :-)

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Martin Langhoff
On Fri, May 17, 2013 at 9:34 AM, Andreas Krey a.k...@gmx.de wrote:
 On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote:
 ...
 We both know that the CVS history omits important data, and that the
 history is mutable, etc.  So there are lots of hypothetical histories
 that do not contradict CVS.  But some things are recorded unambiguously
 in the CVS history, like

 * The contents at any tag or the tip of any branch (i.e., what is in the
 working tree when you check it out).

 Except that the tags/branches may be made in a way that can't
 be mapped onto any commit/point of history otherwise exported,
 with branches that are done on parts of the trees first, or
 likewise tags.

Yeah, that's what I remember too.  It is perfectly fine in CVS to add
a tag to a file at a much later date than the rest of the tree. And it
happened too (oh, I didn't have directory support/some-os checked out
when I tagged the release yesterday! let me check it out and add the
tag, nevermind that the branch has moved forward in the interim...).

I would add the long history of cvs repository manipulation. Bad,
ugly stuff, but it happened in every major project I've seen. Mozilla,
X.org, etc.

TBH I am very glad that Michael cares deeply about the correctness,
and it leads to a much better tool. No doubt.

When discussing it with end users, I do think we have to be honest and
say that there's a fair chance that the output will not be perfect...
because what is in CVS is rather imperfect when you look at it closely
(which users aren't usually doing).

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Andreas Krey
On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote:
...
 We both know that the CVS history omits important data, and that the
 history is mutable, etc.  So there are lots of hypothetical histories
 that do not contradict CVS.  But some things are recorded unambiguously
 in the CVS history, like
 
 * The contents at any tag or the tip of any branch (i.e., what is in the
 working tree when you check it out).

Except that the tags/branches may be made in a way that can't
be mapped onto any commit/point of history otherwise exported,
with branches that are done on parts of the trees first, or
likewise tags.

...
 That being said, I appreciate that cvsimport can do incremental imports.
  cvs2git doesn't even attempt it.  I've thought about what it would take
 to implement correct incremental imports in cvs2svn/cvs2git, and it is

Do these two produce stable output? That is, return the same commits
for multiple runs on the same repo?

Andreas

-- 
Totally trivial. Famous last words.
From: Linus Torvalds torvalds@*.org
Date: Fri, 22 Jan 2010 07:29:21 -0800
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-17 Thread Michael Haggerty
On 05/17/2013 03:34 PM, Andreas Krey wrote:
 On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote:
 ...
 We both know that the CVS history omits important data, and that the
 history is mutable, etc.  So there are lots of hypothetical histories
 that do not contradict CVS.  But some things are recorded unambiguously
 in the CVS history, like

 * The contents at any tag or the tip of any branch (i.e., what is in the
 working tree when you check it out).
 
 Except that the tags/branches may be made in a way that can't
 be mapped onto any commit/point of history otherwise exported,
 with branches that are done on parts of the trees first, or
 likewise tags.

This is true, but cvs2git nevertheless puts the required content on the
branch so that it checks out correctly.  In other words, a CVS tag
creation (which might not have been done a single point in time) is
done by cvs2git roughly like this (assume it is from master):

1. Make a list of all versions of all files that have to be in the tag.

2. When one of those file versions has to be overwritten (e.g., because
a later version of that file needs to be added to master), create a Git
tag-branch containing all of the files that are currently at the correct
version for the tag.  (It has to be a Git branch, not a tag, because we
might have to change it later.)

3. As other files on master go through the revisions needed for the tag,
create new commits on the tag-branch that add those revisions of those
files to the tag-branch.

At the end of the process, the tag-branch has the same contents as the
CVS tag, though it may have had to be created via multiple commits.

Currently, step 3 creates merge commits from master to the tag-branch.
This is sometimes what one would expect, sometimes not--a matter of
taste, really, because the CVS history is in this aspect more flexible
than what is representable in Git's history model.

 ...
 That being said, I appreciate that cvsimport can do incremental imports.
  cvs2git doesn't even attempt it.  I've thought about what it would take
 to implement correct incremental imports in cvs2svn/cvs2git, and it is
 
 Do these two produce stable output? That is, return the same commits
 for multiple runs on the same repo?

It usually produces stable output, but not always.  I've had reports of
users using cvs2svn successfully as an incremental importer by simply
running the full import each time and relying on Git to match up the
overlapping part of the history simply because the SHA-1s are identical.
 But (1) the later conversions would be just as slow as the first, (2)
some of the heuristic decisions for grouping CVS file changes into Git
changesets can be affected by later commits, and (3) CVS history is
mutable; if the CVS history is changed retroactively in any way then it
won't work at all.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Fwd: git cvsimport implications

2013-05-17 Thread Michael Haggerty
On 05/17/2013 06:10 PM, Eugene Sajine wrote:
 MIchael, sorry for dup - didn't press reply all for the first one.
 

 So what are you going to do, use cvsimport whenever you cannot *prove*
 that it is wrong?  You sure have low standards for your software.
 
 1. You are making assumptions and conclusions that have no grounds.
 I asked for help understanding what are the problems of cvsimport.
 Never i said i'm not willing use cvs2git. Never i said I'm happy to have
 problems in my git repos. So, this low standard punch was... not necessary.

I didn't mean to be offensive.  I meant it more in the sense of you
deserve to expect more from your software.

 2. I started to use cvsimport because it was the tool *provided with
 git* about three years ago.
 By that time i didn't find any better and simpler tool to use and
 those implications were uknown for me,
 they were brought up to my attention just recently.
 CVS is not good for branches, so most of our projects didn't have any
 cvs branches.
 So for majority of those it seems that the cvsimport did it's job just fine.
 Now we are going to try to migrate some projects that are using CVS
 branches heavily.
 That concerns me, so i'm looking for better tool.

The Git test suite (tests t/t960?-*.sh) demonstrates some of the known
problems with cvsimport, and those failures are summarized in the
manpage for git-cvsimport(1).  Not all of the problems are related to
branches and tags.  There might be more problems; I simply documented a
few that I found relatively quickly then I stopped looking.

 3. Is there a way to have the whole plumbing with the
 blobfiles and dumpfiles and consequent git fast-import wrapped into
 nice command like:
 
 git cvsimport -C path/to/my/new/shiny/gitrepo
 
 Or are there any particular reasons why end user must deal with blob
 and dump files and do fast-import afterwards?

There are benefits to the split blobfile/dumpfile approach for some
users, so I wouldn't want to get rid of that possibility.  But there's
no reason I wouldn't accept a patch that provides an option to convert
as you describe.  Alternately, it would take only a few lines of script
to automate it yourself.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-15 Thread Michael Haggerty
On 05/15/2013 12:19 AM, Junio C Hamano wrote:
 Eugene Sajine eugu...@gmail.com writes:
 
 What if there are a lot of branches in the CVS repo? Is it guaranteed
 to be broken after import?
 
 Even though CVS repository can record branches in individual ,v
 files, reconstructing per branch history and where the branch
 happened in each changeset cannot be determined with any
 certainty.  The best you can get is a heuristic result.
 
 I do not think anybody can give such a guarantee.  The best you can
 do is to convert it and validate if the result matches what you
 think has happened in the CVS history.

Junio, you are correct that there is no 100% reliable way of inferring
the changesets that were made in CVS.  CVS doesn't record which file
revisions were committed at the same time, unambiguous branch points,
etc.  The best a tool can do is use heuristics.

But it *is* possible for a conversion tool to make some more elementary
guarantees regarding aspects of the history that are recorded
unambiguously in CVS, for example:

* That if you check the tip of same branch out of CVS and out of Git,
you get the same contents.

* That CVS file revisions are committed to Git in the correct order
relative to each other; e.g., that the changes made in CVS revision
1.4.2.2 in a particular file precede those made in revision 1.4.2.3 of
the same file.

git-cvsimport fails to ensure even this minimal level of correctness.
Such errors are demonstrated in its own test suite.

cvs2git, on the other hand, gets the basics 100% correct (if you find a
discrepancy, please file a bug!), in addition to having great heuristics
for inferring the details of the history.

There is no reason ever to use git-cvsimport for one-time conversions
from CVS to Git.  The only reason ever to use it is if you absolutely
require an incremental bridge between CVS and Git, and even then please
use it with great caution.

Michael
the cvs2svn/cvs2git maintainer

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-15 Thread Eugene Sajine
Hi

My primary goal was to understand better what are the real problems
that we might have with the way we use git cvsimport, so I was not
asking about the guarantee of the cvsimport to import things
correctly, but if there is a guarantee the import will result in
completely broken history. IF there is a situation when cvsimport can
do things right and when it definitely going to fail?

Anyway, thanks a lot for the info. I do know that cvs2git is an option.

If the cvsimport is that broken - is there any plan to fix it?

Thanks,
Eugene

On Wed, May 15, 2013 at 2:24 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 05/15/2013 12:19 AM, Junio C Hamano wrote:
 Eugene Sajine eugu...@gmail.com writes:

 What if there are a lot of branches in the CVS repo? Is it guaranteed
 to be broken after import?

 Even though CVS repository can record branches in individual ,v
 files, reconstructing per branch history and where the branch
 happened in each changeset cannot be determined with any
 certainty.  The best you can get is a heuristic result.

 I do not think anybody can give such a guarantee.  The best you can
 do is to convert it and validate if the result matches what you
 think has happened in the CVS history.

 Junio, you are correct that there is no 100% reliable way of inferring
 the changesets that were made in CVS.  CVS doesn't record which file
 revisions were committed at the same time, unambiguous branch points,
 etc.  The best a tool can do is use heuristics.

 But it *is* possible for a conversion tool to make some more elementary
 guarantees regarding aspects of the history that are recorded
 unambiguously in CVS, for example:

 * That if you check the tip of same branch out of CVS and out of Git,
 you get the same contents.

 * That CVS file revisions are committed to Git in the correct order
 relative to each other; e.g., that the changes made in CVS revision
 1.4.2.2 in a particular file precede those made in revision 1.4.2.3 of
 the same file.

 git-cvsimport fails to ensure even this minimal level of correctness.
 Such errors are demonstrated in its own test suite.

 cvs2git, on the other hand, gets the basics 100% correct (if you find a
 discrepancy, please file a bug!), in addition to having great heuristics
 for inferring the details of the history.

 There is no reason ever to use git-cvsimport for one-time conversions
 from CVS to Git.  The only reason ever to use it is if you absolutely
 require an incremental bridge between CVS and Git, and even then please
 use it with great caution.

 Michael
 the cvs2svn/cvs2git maintainer

 --
 Michael Haggerty
 mhag...@alum.mit.edu
 http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: git cvsimport implications

2013-05-14 Thread Eugene Sajine
Hi,

We are using git cvsimport heavily but mostly the projects are not
using branches that much. We are also migrating our repos only once,
so there is  no commits to CVS repo and no incremental imports allowed
after the migration. we have migrated more than a thousand projects
already.

we use the simplest way (from the CVS checkout folder)

$ git cvsimport -C /path/to/new/git/repo

Just recently it was brought to my attention that we can have problems
with that tool. So my question is if anybody could advise which
scenarios are safe to use this tool for, and what is not recommended?

What if there are a lot of branches in the CVS repo? Is it guaranteed
to be broken after import?

Do i understand correctly that it might put some files into a branch,
that were not originally in this branch in CVS? In which cases it
might happen (i'm sorry i didn't quite get the issues in the man
pages for cvsimport)?

Thanks,
Eugene
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: git cvsimport implications

2013-05-14 Thread Junio C Hamano
Eugene Sajine eugu...@gmail.com writes:

 What if there are a lot of branches in the CVS repo? Is it guaranteed
 to be broken after import?

Even though CVS repository can record branches in individual ,v
files, reconstructing per branch history and where the branch
happened in each changeset cannot be determined with any
certainty.  The best you can get is a heuristic result.

I do not think anybody can give such a guarantee.  The best you can
do is to convert it and validate if the result matches what you
think has happened in the CVS history.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html