Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-06 Thread Michael Haggerty
On 01/05/2013 04:11 PM, Eric S. Raymond wrote:
 Perhaps I was unclear.  I consider the interface design error to
 be not in the fact that all the blobs are written first or detached,
 but rather that the implementation detail of the two separate journal
 files is ever exposed.
 
 I understand why the storage of intermediate results was done this
 way, in order to decrease the tool's working set during the run, but
 finishing by automatically concatenating the results and streaming
 them to stdout would surely have been the right thing here.

cvs2svn/cvs2git is built to be able to handle very large CVS
repositories, not only those that can fit in RAM.  This goal influences
a lot of its design, including the pass-by-pass structure with
intermediate databases and the resumability of passes.

The blobfile necessarily contains every version of every file, with no
delta-encoding and no compression.  Its size can be a large multiple of
the on-disk size of the original CVS repository.  If the save to
tempfiles then cat tempfiles at end of run behavior were hard-coded
into cvs2git, then there would be no way to avoid requiring enough
temporary space to hold the whole blobfile.

Writing the blobfile into a separate file, on the other hand, means that
for example the blobfile could be written into a named pipe connected to
the standard input of git fast-import [1].  git fast-import could
even be run on a remote server.

I consider these bigger advantages than the ability to pipe the output
of cvs2git directly into another command.

 The downstream cost of letting the journalling implementation be
 exposed, instead, can be seen in this snippet from the new git-cvsimport
 I've been working on:
 
 def command(self):
 Emit the command implied by all previous options.
 return (cvs2git --username=git-cvsimport --quiet --quiet 
 --blobfile={0} --dumpfile={1} {2} {3}  cat {0} {1}  rm {0} 
 {1}).format(tempfile.mkstemp()[1], tempfile.mkstemp()[1], self.opts, 
 self.modulepath)
 
 According to the documentation, every caller of csv2git must go
 through analogous contortions!  This is not the Unix way; if Unix
 design principles had been minimally applied, that second line would
 just read like this:
 
  return cvs2git --username=git-cvsimport --quiet --quiet

Never in my worst nightmares did I imagine that my terrible design taste
would force you to type an extra two lines of code.  Oh the humanity!

By the way, patches are welcome.  And you don't need to trumpet their
imminent arrival [2] or malign the existing code beforehand.  Moreover,
it would be adequate if you just demonstrate working code and *then* ask
for sign-in, rather than the other way around.

 If Unix design principles had been thoroughly applied, the --quiet
 --quiet part would be unnecessary too - well-behaved Unix commands
 *default* to being completely quiet unless either (a) they have an
 exceptional condition to report, or (b) their expected running time is
 so long that tasteful silence would leave users in doubt that they're
 working.

cvs2git is not a command that one uses 100 times a day.  It is a tool
for one-shot conversions of CVS repositories to git.  These conversions
can take hours or even days of processing time (not to mention the time
for configuring the conversion and changing the rest of a project's
infrastructure from CVS to git).  So yes, I think we would like to
appeal to (b) and humbly ask for your permission to give the user some
feedback during the conversion.

 (And yes, I do think violating these principles is a lapse of taste when
 git tools do it, too.)
 
 Michael Haggerty wants me to trust that cvs2git's analysis stage has
 been fixed, but I must say that is a more difficult leap of faith when
 two of the most visible things about it are still (a) a conspicuous
 instance of interface misdesign, and (b) documentation that is careless and
 incomplete.

The cvs2git documentation is lacking; I admit it (as opposed to the
cvs2svn documentation, which I think is quite complete).  And the
program itself also has a lot of rough edges, for example its inability
to convert .cvsignore files into .gitignore files.  Patches are welcome.
 I haven't used cvs2svn for my own purposes in many years and I've
*never* once had a need to use cvs2git; I maintain these programs purely
as a service to the community.  Most of the community seems satisfied
with the programs as they are, and if not they usually submit courteous
and concrete bug reports or submit patches.

I request that you follow their example.  I especially ask that you
restrain from spreading public FUD about imagined problems based on
speculation.  Please do your tests and *then* report any problems that
you find.

Yours,
Michael

[1] In fact, the current implementation of generate_blobs.py sometimes
seeks back to earlier parts of the blob file when it needs the fulltext
of a revision that has already been output, but this would be easy to
change as soon as 

Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-05 Thread Max Horn

On 03.01.2013, at 21:53, Eric S. Raymond wrote:

 Michael Haggerty mhag...@alum.mit.edu:
 There are two good reasons that the output is written to two separate files:
 
 Those are good reasons to write to a pair of tempfiles, and I was able
 to deduce in advance most of what your explanation would be from the
 bare fact that you did it that way.
 
 They are *not* good reasons for having an interface that exposes this
 implementation detail to the caller - that choice I consider a failure
 of interface-design judgment.  But I know how to fix this in a simple and
 backward-compatible way, and will do so when I have time to write you
 a patch.  Next week or the week after, most likely.
 
 Also, the cvs2git manual page is still rather half-baked and careless,
 with several fossil references to cvs2svn that shouldn't be there and
 obviously incomplete feature coverage. Fixing these bugs is also on my
 to-do list for sometime this month.
 
 I'd be willing to put in this work anyway, but it still in the back of
 my mind that if cvs2git wins the test-suite competition I might
 officially end-of-life both cvsps and parsecvs.  One of the features
 of the new git-cvsimport is direct support for using cvs2git as a
 conversion engine.
 
 A potentially bigger problem is that if you want to handle such
 blob/dump output, you have to deal with git-fast-import format's blob
 command as opposed to only handling inline blobs.
 
 Not a problem.  All of the main potential consumers for this output,
 including reposurgeon, handle the blob command just fine.

Hm, you snipped this part of Michael's mail:

 However, if that is a
 problem, it is possible to configure cvs2git to write the blobs inline
 with the rest of the dumpfile (this mode is supported because hg
 fast-import doesn't support detached blobs).

I would call hg fast-import a main potential customer, given that there 
cvs2hg is another part of the cvs2svn suite. So I can't quite see how you can 
come to your conclusion above...



Cheers,
Max--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-05 Thread Eric S. Raymond
Max Horn post...@quendi.de:
 Hm, you snipped this part of Michael's mail:
 
  However, if that is a
  problem, it is possible to configure cvs2git to write the blobs inline
  with the rest of the dumpfile (this mode is supported because hg
  fast-import doesn't support detached blobs).
 
 I would call hg fast-import a main potential customer, given that there 
 cvs2hg is another part of the cvs2svn suite. So I can't quite see how you 
 can come to your conclusion above...

Perhaps I was unclear.  I consider the interface design error to
be not in the fact that all the blobs are written first or detached,
but rather that the implementation detail of the two separate journal
files is ever exposed.

I understand why the storage of intermediate results was done this
way, in order to decrease the tool's working set during the run, but
finishing by automatically concatenating the results and streaming
them to stdout would surely have been the right thing here.
 
The downstream cost of letting the journalling implementation be
exposed, instead, can be seen in this snippet from the new git-cvsimport
I've been working on:

def command(self):
Emit the command implied by all previous options.
return (cvs2git --username=git-cvsimport --quiet --quiet 
--blobfile={0} --dumpfile={1} {2} {3}  cat {0} {1}  rm {0} 
{1}).format(tempfile.mkstemp()[1], tempfile.mkstemp()[1], self.opts, 
self.modulepath)

According to the documentation, every caller of csv2git must go
through analogous contortions!  This is not the Unix way; if Unix
design principles had been minimally applied, that second line would
just read like this:

 return cvs2git --username=git-cvsimport --quiet --quiet

If Unix design principles had been thoroughly applied, the --quiet
--quiet part would be unnecessary too - well-behaved Unix commands
*default* to being completely quiet unless either (a) they have an
exceptional condition to report, or (b) their expected running time is
so long that tasteful silence would leave users in doubt that they're
working.

(And yes, I do think violating these principles is a lapse of taste when
git tools do it, too.)

Michael Haggerty wants me to trust that cvs2git's analysis stage has
been fixed, but I must say that is a more difficult leap of faith when
two of the most visible things about it are still (a) a conspicuous
instance of interface misdesign, and (b) documentation that is careless and
incomplete.
-- 
a href=http://www.catb.org/~esr/;Eric S. Raymond/a
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-05 Thread Eric S. Raymond
Bart Massey b...@cs.pdx.edu:
 I don't know what Eric Raymond officially end-of-life-ing parsecvs means?

You and Keith handed me the maintainer's baton.  If I were to EOL it,
that would be the successor you two designated judging in public that
the code is unsalvageable or has become pointless.  If you wanted to
exclude the possibility that a successor would make that call, you
shouldn't have handed it off in a state so broken that I can't even
test it properly.

But I don't in fact think the parsecvs code is pointless. The fact that it
only needs the ,v files is nifty and means it could be used as an RCS
exporter too.  The parsing and topo-analysis stages look like really
good work, very crisp and elegant (which is no less than I'd expect
from Keith, actually).

Alas, after wrestling with it I'm beginning to wonder whether the
codebase is salvageable by anyone but Keith himself.  The tight coupling
to the git cache mechanism is the biggest problem.  So far, I can't
figure out what tree.c is actually doing in enough detail to fix it or pry
it loose - the code is opaque and internal documentation is lacking.

More generally, interfacing to the unstable API of libgit was clearly
a serious mistake, leading directly to the current brokenness.  The
tool should have emitted an import stream to begin with.  I'm trying
to fix that, but success is looking doubtful.
-- 
a href=http://www.catb.org/~esr/;Eric S. Raymond/a
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-05 Thread Jonathan Nieder
Eric S. Raymond wrote:

 Michael Haggerty wants me to trust that cvs2git's analysis stage has
 been fixed, but I must say that is a more difficult leap of faith when
 two of the most visible things about it are still (a) a conspicuous
 instance of interface misdesign, and (b) documentation that is careless and
 incomplete.

For what it's worth, I use cvs2git quite often.  I've found it to work
well and its code to be clear and its developers responsive.  But I
don't mind if we disagree, and multiple implementations to explore the
design space of importers doesn't seem like a terrible outcome.

Thanks for your work,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-03 Thread Eric S. Raymond
Michael Haggerty mhag...@alum.mit.edu:
 There are two good reasons that the output is written to two separate files:

Those are good reasons to write to a pair of tempfiles, and I was able
to deduce in advance most of what your explanation would be from the
bare fact that you did it that way.

They are *not* good reasons for having an interface that exposes this
implementation detail to the caller - that choice I consider a failure
of interface-design judgment.  But I know how to fix this in a simple and
backward-compatible way, and will do so when I have time to write you
a patch.  Next week or the week after, most likely.

Also, the cvs2git manual page is still rather half-baked and careless,
with several fossil references to cvs2svn that shouldn't be there and
obviously incomplete feature coverage. Fixing these bugs is also on my
to-do list for sometime this month.

I'd be willing to put in this work anyway, but it still in the back of
my mind that if cvs2git wins the test-suite competition I might
officially end-of-life both cvsps and parsecvs.  One of the features
of the new git-cvsimport is direct support for using cvs2git as a
conversion engine.
 
 A potentially bigger problem is that if you want to handle such
 blob/dump output, you have to deal with git-fast-import format's blob
 command as opposed to only handling inline blobs.

Not a problem.  All of the main potential consumers for this output,
including reposurgeon, handle the blob command just fine.

 cvs2git does not currently support incremental conversions; therefore, a
 cvsps-based option (if it would actually work, that is) would have at
 least one advantage over cvs2git.

Yes. The reason I didn't ship the replacement patch Junio was
expecting yesterday is that I don't have test coverage for the
incremental case.  I'm working on that now.

 cvs2svn has an extensive test suite which includes tests derived from
 bug reports that we have received over the years.  I adapted a few of
 its test repositories to create the git test suite additions that I made
 in Feb 2009, but there are many more in our project.

I've merged those into my tree.

 I think it would be great to have a way to test across tools, though
 please realize that the inference of the most plausible true CVS
 history is partly objective but also often a matter of heuristics and
 taste.  Moreover, the choice of how to represent the inferred history in
 git, which has rather a different model than CVS/Subversion, is also
 non-obvious and somewhat controversial.  I expect that there will be a
 number of simple CVS repositories for which we can all agree about the
 correct git output, but not far away will be a vast number for which the
 correct answer is unclear.  Many of the interesting tests would fall
 into the latter category.

I'm aware of the problem.  One of the interesting questions is how much
further into the weird cases everybody can agree on what correct 
translation looks like.  We won't know until we push it.
 
 It's not clear what you want me to sign off on.

If you're not willing to use the new suite, my spending the effort 
required to genericize it gets much less interesting.  I needed 
Junio's agreement because I wanted to move the old git-cvsimport
tests from the git tree to the new test suite; they're not really
tests of the wrapper script at all but of the conversion engines.

   I guess you want to
 replace (or augment?) the cvs2svn test suite with one based on your
 framework? 

Augment, not replace - and just as importantly, commit to writing 
new tests into the new generic framework when they don't involve a 
tool-specific option.  It would be silly and duplicative for us *not*
to be sharing as many tests as we can.

 * We definitely want to continue testing the Subversion output of
 cvs2svn.  A test suite that only tests the git output could at best be
 an addition to the current test suite, not a replacement for it.  (That
 being said, the addition of good tests of the 2git output would be great.)

Agreed.

 * A test suite that tests only the easy cases wouldn't really be
 interesting, because the difficult cases are where the potential
 problems lie.

Yes, I know.  I'm arguing that we should be doing that exploration
jointly rather than separately.

 * It would be unfortunate if the cvs2svn test suite would grow another
 run-time dependency or if we would have to invest a lot of time
 synchronizing with another project, though if the gain were big enough
 we could consider it.

I know how to keep the friction cost low.  You'll see more about this when
I split off the test suite and announce it.

 * The licenses obviously have to be compatible to the extent required by
 the level of coupling.

I don't think this will be a problem.  You own the copyright on your tests and
I own it on mine, so we can relicense under whatever common license we choose.
I'm not fussy about what we use; ASL 2.0 would be fine by 

Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2012-12-23 Thread Heiko Voigt
Hi,

On Sat, Dec 22, 2012 at 12:36:48PM -0500, Eric S. Raymond wrote:
 If we can agree on this, I'll start a public repo, and contribute my
 Python framework - it's more capable than any of the shell harnesses
 out there because it can easily drive interleaved operations on multiple 
 checkout directories.

Please share so we can have a look. BTW, where can I find your cvsps
code?

 Anybody who is still interested in this problem should contribute
 tests.  Heiko Voigt, I'd particularly like you in on this.

If it does not take to much effort I could port my tests to the new
framework. Since I currently are not in active need of cvs conversions
its not of big interest to me anymore. But if it does not take too much
time I am happy to help.

From my past cvs conversion experiences my personal guess is that
cvs2svn will win this competition.

Cheers Heiko
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2012-12-23 Thread Eric S. Raymond
Heiko Voigt hvo...@hvoigt.net:
 Please share so we can have a look. BTW, where can I find your cvsps
 code?

https://gitorious.org/cvsps

Developments of the last 48 hours:

1. Andreas Schwab sent me a patch that uses commitids wherever the history
   has them - this makes all the time-skew problems go away.  I added code
   to warn if commitids aren't present, so users will get a clear indication
   of when time-skew problems might bite them versus when that is happily
   impossible.

2. I've scrapped a lot of obsolete code and options.  The repo head
   version uses what used to be called cvs-direct mode all the time
   now; it works, and the effect on performance is major.  This also
   means that cvsps doesn't need to use any local CVS commands or even
   have CVS installed where it runs.

 From my past cvs conversion experiences my personal guess is that
 cvs2svn will win this competition.

That could be.  But right now cvsps has one significant advantage over
cvs2git (which parsecvs might share) - it's *blazingly* fast.  So fast
that I scrapped all the local-caching logic; there seems no point to it at
today's network speeds, and that's one less layer of complications to
go wrong.

I've removed a couple hundred lines of code and the program works
better and faster than it did before.  That's having a good day!
-- 
a href=http://www.catb.org/~esr/;Eric S. Raymond/a
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html