Re: Fwd: git cvsimport implications
On 05/17/2013 03:34 PM, Andreas Krey wrote: > On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote: > ... >> We both know that the CVS history omits important data, and that the >> history is mutable, etc. So there are lots of hypothetical histories >> that do not contradict CVS. But some things are recorded unambiguously >> in the CVS history, like >> >> * The contents at any tag or the tip of any branch (i.e., what is in the >> working tree when you check it out). > > Except that the tags/branches may be made in a way that can't > be mapped onto any commit/point of history otherwise exported, > with branches that are done on parts of the trees first, or > likewise tags. This is true, but cvs2git nevertheless puts the required content on the branch so that it checks out correctly. In other words, a "CVS tag creation" (which might not have been done a single point in time) is done by cvs2git roughly like this (assume it is from master): 1. Make a list of all versions of all files that have to be in the tag. 2. When one of those file versions has to be overwritten (e.g., because a later version of that file needs to be added to master), create a Git tag-branch containing all of the files that are currently at the correct version for the tag. (It has to be a Git branch, not a tag, because we might have to change it later.) 3. As other files on master go through the revisions needed for the tag, create new commits on the tag-branch that add those revisions of those files to the tag-branch. At the end of the process, the tag-branch has the same contents as the CVS tag, though it may have had to be created via multiple commits. Currently, step 3 creates merge commits from master to the tag-branch. This is sometimes what one would expect, sometimes not--a matter of taste, really, because the CVS history is in this aspect more flexible than what is representable in Git's history model. > ... >> That being said, I appreciate that cvsimport can do incremental imports. >> cvs2git doesn't even attempt it. I've thought about what it would take >> to implement correct incremental imports in cvs2svn/cvs2git, and it is > > Do these two produce stable output? That is, return the same commits > for multiple runs on the same repo? It usually produces stable output, but not always. I've had reports of users using cvs2svn successfully as an "incremental importer" by simply running the full import each time and relying on Git to match up the overlapping part of the history simply because the SHA-1s are identical. But (1) the later conversions would be just as slow as the first, (2) some of the heuristic decisions for grouping CVS file changes into Git changesets can be affected by later commits, and (3) CVS history is mutable; if the CVS history is changed retroactively in any way then it won't work at all. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote: ... > We both know that the CVS history omits important data, and that the > history is mutable, etc. So there are lots of hypothetical histories > that do not contradict CVS. But some things are recorded unambiguously > in the CVS history, like > > * The contents at any tag or the tip of any branch (i.e., what is in the > working tree when you check it out). Except that the tags/branches may be made in a way that can't be mapped onto any commit/point of history otherwise exported, with branches that are done on parts of the trees first, or likewise tags. ... > That being said, I appreciate that cvsimport can do incremental imports. > cvs2git doesn't even attempt it. I've thought about what it would take > to implement correct incremental imports in cvs2svn/cvs2git, and it is Do these two produce stable output? That is, return the same commits for multiple runs on the same repo? Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds Date: Fri, 22 Jan 2010 07:29:21 -0800 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
On Fri, May 17, 2013 at 9:34 AM, Andreas Krey wrote: > On Fri, 17 May 2013 15:14:58 +, Michael Haggerty wrote: > ... >> We both know that the CVS history omits important data, and that the >> history is mutable, etc. So there are lots of hypothetical histories >> that do not contradict CVS. But some things are recorded unambiguously >> in the CVS history, like >> >> * The contents at any tag or the tip of any branch (i.e., what is in the >> working tree when you check it out). > > Except that the tags/branches may be made in a way that can't > be mapped onto any commit/point of history otherwise exported, > with branches that are done on parts of the trees first, or > likewise tags. Yeah, that's what I remember too. It is perfectly fine in CVS to add a tag to a file at a much later date than the rest of the tree. And it happened too ("oh, I didn't have directory support/some-os checked out when I tagged the release yesterday! let me check it out and add the tag, nevermind that the branch has moved forward in the interim..."). I would add the long history of "cvs repository manipulation". Bad, ugly stuff, but it happened in every major project I've seen. Mozilla, X.org, etc. TBH I am very glad that Michael cares deeply about the correctness, and it leads to a much better tool. No doubt. When discussing it with end users, I do think we have to be honest and say that there's a fair chance that the output will not be perfect... because what is in CVS is rather imperfect when you look at it closely (which users aren't usually doing). cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
On 05/17/2013 01:50 PM, Martin Langhoff wrote: > On Fri, May 17, 2013 at 5:10 AM, Michael Haggerty > wrote: >> For one-time imports, the fix is to use a tool that is not broken, like >> cvs2git. > > As one of the earlier maintainers of cvsimport, I do believe that > cvs2git is less broken, for one-shot imports, than cvsimport. Users > looking for a one-shot import should not use cvsimport as there are > better options there. Myself, I have used parsecvs (long ago, so > perhaps it isn't the best of the crop nowadays). > > TBH, I am puzzled and amused at all the chest-thumping about cvs > importers. Yeah, yours is a bit better or saner, but we all wade in > the muddle of essentially broken data. So "is not broken" is rather > misleading when talking to end users. It carries so many caveats about > whether it'll work on the users' particular repo that it is not a > generally truthful statement. I disagree. I use the following definition of "correct": The Git history output by an importer must not contradict the history that is recorded in CVS. We both know that the CVS history omits important data, and that the history is mutable, etc. So there are lots of hypothetical histories that do not contradict CVS. But some things are recorded unambiguously in the CVS history, like * The contents at any tag or the tip of any branch (i.e., what is in the working tree when you check it out). * The order of modifications to a single file on a single branch and the file contents after each of those revisions. * Who committed a particular change, and approximately when (modulo clock skew). If a tool doesn't get these things correct (especially the first!) then it should only be used with great caution. cvsimport can make mistakes on the first two. As far as I know, cvs2svn/cvs2git are correct according to this definition. That being said, I appreciate that cvsimport can do incremental imports. cvs2git doesn't even attempt it. I've thought about what it would take to implement correct incremental imports in cvs2svn/cvs2git, and it is far beyond the budget of time that I have for the project. So I definitely give props to cvsimport for attempting incremental imports and apparently often doing a good enough job that it is useful to people. > [...] > At the time, I looked into trying to use cvs2svn (precursor to > cvs2git) as the "CVS read" side of cvsimport, but it did not support > incremental imports at all, and it took forever to run. cvs2svn still doesn't support incremental imports, and it still takes a long time to run (though less than before). cvs2git is considerably faster, partly because of the speed and convenience of using git-fast-import. But conversion time is much less of an issue for one-time conversions. > It was a time when git was new and people were dipping their toes in > the pool, and some developers were pining to use git on projects that > used CVS (like we use git-svn now). Incremental imports were a must. > > One of the nice features of cvsimport is that it can do incrementals > on a repo imported with another tool. That earns it a place under the > sun. If it didn't have that, I'd be voting for removal (after a review > that the replacement *is* actually better ;-) across a number of test > repos). Incremental imports are indeed the saving grace of cvsimport and for that reason I don't advocate it's removal. But I think we should be clearer about warning users against using it for one-time imports, because it can produce output that is *objectively* incorrect in important ways. Regarding tests, the failing tests that I added to the cvsimport test suite a few years ago were taken directly from the cvs2svn/cvs2git test suite, where they pass :-) Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
On Fri, May 17, 2013 at 5:10 AM, Michael Haggerty wrote: > For one-time imports, the fix is to use a tool that is not broken, like > cvs2git. As one of the earlier maintainers of cvsimport, I do believe that cvs2git is less broken, for one-shot imports, than cvsimport. Users looking for a one-shot import should not use cvsimport as there are better options there. Myself, I have used parsecvs (long ago, so perhaps it isn't the best of the crop nowadays). TBH, I am puzzled and amused at all the chest-thumping about cvs importers. Yeah, yours is a bit better or saner, but we all wade in the muddle of essentially broken data. So "is not broken" is rather misleading when talking to end users. It carries so many caveats about whether it'll work on the users' particular repo that it is not a generally truthful statement. I am very glad to hear it is better than cvsimport, and even more glad to hear its limitations are better understood and documented. It has had a testsuite for the longest of times! And very likely has the best chance of success across the available importers :-) Oh, and why is cvsimport so vague? Because it is just driven by cvsps. It creates a repo based on what cvsps understands from the CVS data. At the time, I looked into trying to use cvs2svn (precursor to cvs2git) as the "CVS read" side of cvsimport, but it did not support incremental imports at all, and it took forever to run. It was a time when git was new and people were dipping their toes in the pool, and some developers were pining to use git on projects that used CVS (like we use git-svn now). Incremental imports were a must. One of the nice features of cvsimport is that it can do incrementals on a repo imported with another tool. That earns it a place under the sun. If it didn't have that, I'd be voting for removal (after a review that the replacement *is* actually better ;-) across a number of test repos). cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
On Fri, May 17, 2013 at 11:10:03AM +0200, Michael Haggerty wrote: > On 05/15/2013 08:03 PM, Eugene Sajine wrote: > > My primary goal was to understand better what are the real problems > > that we might have with the way we use git cvsimport, so I was not > > asking about the guarantee of the cvsimport to import things > > correctly, but if there is a guarantee the import will result in > > completely broken history. > > So what are you going to do, use cvsimport whenever you cannot *prove* > that it is wrong? You sure have low standards for your software. > > The only *useful* guarantee is that software is *correct* under defined > circumstances. I don't think anybody has gone to the trouble to figure > out when that claim can be made for cvsimport. > > > If the cvsimport is that broken - is there any plan to fix it? > > For one-time imports, the fix is to use a tool that is not broken, like > cvs2git. > > Alternatively, Eric Raymond claims to have developed a new version of > cvsps that is not quite as broken as the old version. Presumably > cvsimport would be not quite as broken if used with the new cvsps. cvsimport doesn't work with the cvsps-3 - we decided to stick with the version we have (using cvsps-2) because that is the only option that supports incremental import; those using if for that are used to its deficiencies and there is no plan to improve it. The manpage notes that it uses a deprecated version of cvsps and recommends alternatives for one-shot imports. There is a version of git-cvsimport script in the cvsps-3 repository that works with it, but it does not support incremental import in the same was as git.git's git-cvsimport so it will not replace the version in git.git. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
On 05/15/2013 08:03 PM, Eugene Sajine wrote: > My primary goal was to understand better what are the real problems > that we might have with the way we use git cvsimport, so I was not > asking about the guarantee of the cvsimport to import things > correctly, but if there is a guarantee the import will result in > completely broken history. So what are you going to do, use cvsimport whenever you cannot *prove* that it is wrong? You sure have low standards for your software. The only *useful* guarantee is that software is *correct* under defined circumstances. I don't think anybody has gone to the trouble to figure out when that claim can be made for cvsimport. > If the cvsimport is that broken - is there any plan to fix it? For one-time imports, the fix is to use a tool that is not broken, like cvs2git. Alternatively, Eric Raymond claims to have developed a new version of cvsps that is not quite as broken as the old version. Presumably cvsimport would be not quite as broken if used with the new cvsps. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
Hi My primary goal was to understand better what are the real problems that we might have with the way we use git cvsimport, so I was not asking about the guarantee of the cvsimport to import things correctly, but if there is a guarantee the import will result in completely broken history. IF there is a situation when cvsimport can do things right and when it definitely going to fail? Anyway, thanks a lot for the info. I do know that cvs2git is an option. If the cvsimport is that broken - is there any plan to fix it? Thanks, Eugene On Wed, May 15, 2013 at 2:24 AM, Michael Haggerty wrote: > On 05/15/2013 12:19 AM, Junio C Hamano wrote: >> Eugene Sajine writes: >> >>> What if there are a lot of branches in the CVS repo? Is it guaranteed >>> to be broken after import? >> >> Even though CVS repository can record branches in individual ,v >> files, reconstructing per branch history and where the branch >> happened in each "changeset" cannot be determined with any >> certainty. The best you can get is a heuristic result. >> >> I do not think anybody can give such a guarantee. The best you can >> do is to convert it and validate if the result matches what you >> think has happened in the CVS history. > > Junio, you are correct that there is no 100% reliable way of inferring > the changesets that were made in CVS. CVS doesn't record which file > revisions were committed at the same time, unambiguous branch points, > etc. The best a tool can do is use heuristics. > > But it *is* possible for a conversion tool to make some more elementary > guarantees regarding aspects of the history that are recorded > unambiguously in CVS, for example: > > * That if you check the tip of same branch out of CVS and out of Git, > you get the same contents. > > * That CVS file revisions are committed to Git in the correct order > relative to each other; e.g., that the changes made in CVS revision > 1.4.2.2 in a particular file precede those made in revision 1.4.2.3 of > the same file. > > git-cvsimport fails to ensure even this minimal level of correctness. > Such errors are demonstrated in its own test suite. > > cvs2git, on the other hand, gets the basics 100% correct (if you find a > discrepancy, please file a bug!), in addition to having great heuristics > for inferring the details of the history. > > There is no reason ever to use git-cvsimport for one-time conversions > from CVS to Git. The only reason ever to use it is if you absolutely > require an incremental bridge between CVS and Git, and even then please > use it with great caution. > > Michael > the cvs2svn/cvs2git maintainer > > -- > Michael Haggerty > mhag...@alum.mit.edu > http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
On 05/15/2013 12:19 AM, Junio C Hamano wrote: > Eugene Sajine writes: > >> What if there are a lot of branches in the CVS repo? Is it guaranteed >> to be broken after import? > > Even though CVS repository can record branches in individual ,v > files, reconstructing per branch history and where the branch > happened in each "changeset" cannot be determined with any > certainty. The best you can get is a heuristic result. > > I do not think anybody can give such a guarantee. The best you can > do is to convert it and validate if the result matches what you > think has happened in the CVS history. Junio, you are correct that there is no 100% reliable way of inferring the changesets that were made in CVS. CVS doesn't record which file revisions were committed at the same time, unambiguous branch points, etc. The best a tool can do is use heuristics. But it *is* possible for a conversion tool to make some more elementary guarantees regarding aspects of the history that are recorded unambiguously in CVS, for example: * That if you check the tip of same branch out of CVS and out of Git, you get the same contents. * That CVS file revisions are committed to Git in the correct order relative to each other; e.g., that the changes made in CVS revision 1.4.2.2 in a particular file precede those made in revision 1.4.2.3 of the same file. git-cvsimport fails to ensure even this minimal level of correctness. Such errors are demonstrated in its own test suite. cvs2git, on the other hand, gets the basics 100% correct (if you find a discrepancy, please file a bug!), in addition to having great heuristics for inferring the details of the history. There is no reason ever to use git-cvsimport for one-time conversions from CVS to Git. The only reason ever to use it is if you absolutely require an incremental bridge between CVS and Git, and even then please use it with great caution. Michael the cvs2svn/cvs2git maintainer -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: git cvsimport implications
Eugene Sajine writes: > What if there are a lot of branches in the CVS repo? Is it guaranteed > to be broken after import? Even though CVS repository can record branches in individual ,v files, reconstructing per branch history and where the branch happened in each "changeset" cannot be determined with any certainty. The best you can get is a heuristic result. I do not think anybody can give such a guarantee. The best you can do is to convert it and validate if the result matches what you think has happened in the CVS history. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html