Re: RFC: Separate commit identification from Merkle hashing

2019-05-23 Thread Eric S. Raymond
Jonathan Nieder :
> Honestly, I do think you have missed some fundamental issues.
> https://public-inbox.org/git/ab3222ab-9121-9534-1472-fac790bf0...@gmail.com/
> discusses this further.

Have re-read.  That was a different pair of proposals.

I have abandoned the idea of forcing timestamp uniqueness entirely - that was
a hack to define a canonical commit order, and my new RFC describes a better
way to get this.

I still think finer-grained timestamps would be a good idea, but that is
much less important than the different set of properties we can guarantee
via the new RFC.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: RFC: Separate commit identification from Merkle hashing

2019-05-23 Thread Eric S. Raymond
Jonathan Nieder :
> In other words, usually the benefit of supporting multiple hash
> functions as a reader is that you want the strength of the strongest
> of those hash functions and you need a migration path to get there.
> If you don't have a way to eventually drop support for the weaker
> hashes, then what benefit do you get from supporting multiple hash
> functions?

Not losing the capability to verify old parts of histories up to the
strength of the old hash algorithm.  Not perfect, but better than nothing.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: RFC: Separate commit identification from Merkle hashing

2019-05-23 Thread Eric S. Raymond
tice several important properties of this design.
> >
> > A. Git becomes absolutely future-proofed against hash-algorithm
> >changes. It can even support the use of multiple hash types over
> >the lifetime of one repo.
> >
> > B. All SHA-1 commit references will resolve forever even after git
> >stops generating them.  All future hash-based commit references will
> >also be good forever.
> 
> We might need to be able to distinguish commit IDs from hash-based
> object identifier of commit on command line, perhaps with something like
> 
>   ^{id}
> 
> This is similar to proposed
> 
>   git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}

Reasonable.

> > C. The id/verification split will be invisible from clients at start,
> >because initially they coincide and will continue to do so unless
> >an explicit decision changes either the verification-hash algorithm
> >or the way commit-IDs are initialized.
> 
> The problem may be with reusing command output for input (to refer to
> objects and commits).

Solvable, I think.

> > D. My wish for forward-portable unique commit IDs is granted.
> >They're not by default eyeball-friendly, but I can live with that.
> >Furthermore, because they're preserved in streams they can be
> >eternally stable even as hash algorithms and preferred ID
> >formats change.
> 
> Good.

Oh, man, you have no idea how good yet.  You won't until you've done a
few repo conversions yourself.

/me needs a cross-eyed emoji here

> > E. There is now a unique total order on the repo, modulo highly
> >unlikely (and in priciple completely avoidable) commit-ID
> >collisions. It's commit date tie-broken by commit-ID sort order.
> >It too survives hash-function changes.
> 
> Nice.

One thing I will commit to do if we get this far is write the fast-export
code that does canonical order.  I need this badly for reposurgeon tests.

> > F. There's no need for timestamp uniqueness any more.
> >
> > G. When a repository is imported from (say) Subversion, the Subversion
> >IDs *don't have to break*!  They can be used to initialize the
> >commit-ID fields. Many users migrating from other VCSes will be
> >deeply, deeply grateful for this feature.
> 
> There would also need to be some support to retrieve commits using their
> "commit ID" stable identifiers.  It may not need to be very fast.

Agreed.

OK, what do we do next?  Who needs to sign off on this?  Should I prepare
an edit for the hash-function-transition.txt describing the splitting off
of commit IDs?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: RFC: Separate commit identification from Merkle hashing

2019-05-20 Thread Eric S. Raymond
Jonathan Nieder :
> > I think it's a weakness, though, that most of it is written as though it
> > assumes only one hash transition will be necessary.  (This is me thinking
> > on long timescales again.)
> 
> Hm, can you point to what part of the doc suggested that?  Best to make
> the text clearer, to avoid confusing the next person.

I will reread it with an editorial eye and try to come up with
concrete suggestions, perhaps a patch. My relative ignorance
should actually be helpful here.

> >The same technique (probably the
> > same code!) could be used to map the otherwise uninterpreted
> > commit-IDs I'm proposing to lookup keys.
> 
> No, since Git relies on commit IDs for integrity checking.  The hash
> function transition described in that document relies on
> round-tripping ability for the duration of the transition.

I do not quite understand this comment yet. But I don't think it
matters that I don't, and I will by the time I write any code.  I
expect the worst case is that the separated IDs require a different
lookup table from the hashes, but will resolve at the same speed.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: RFC: Separate commit identification from Merkle hashing

2019-05-20 Thread Eric S. Raymond
Jonathan Nieder :
> Hi!
> 
> Eric S. Raymond wrote:
> 
> > One reason I am sure of this is the SHA-1 to whatever transition.
> > We can't count on the successor hash to survive attack forever.
> > Accordingly, git's design needs to be stable against the possibility
> > of having to accommodate multiple future hash algorithms in the
> > future.
> 
> Have you read through Documentation/technical/hash-function-transition?  It
> takes the case where the new hash function is found to be weak into account.
> 
> Hope that helps,
> Jonathan

Reading now...

At first sight I think it looks pretty compatible with what I am proposing.
The goals anyway, some of the implementation tactics would change a bit.

I think it's a weakness, though, that most of it is written as though it
assumes only one hash transition will be necessary.  (This is me thinking
on long timescales again.)

Instead of having a gpgsig-sha256 field, I would change the code so all
hash cookies have an delimited optional prefix giving the hash-algorithm
type, with an absent prefix interpreted as SHA-1.

I think the idea of mapping future hashes to SHA-1s, which are then
used as fs lookup keys, is sound.  The same technique (probably the
same code!) could be used to map the otherwise uninterpreted
commit-IDs I'm proposing to lookup keys.

I should have said in my previous mail that I'm prepared to put
my coding fingers into making all this happen. I am pretty sure my
gramty manager will approve.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




RFC: Separate commit identification from Merkle hashing

2019-05-20 Thread Eric S. Raymond
I have been thinking hard about the problems raised during my
request for unique timestamps.  I think I've found a better way
to bust the box I was trying to break out of.  I am therefore
withdrawing that proposal and replacing it with this one.

It's time to separate commit identification from Merkle hashing.

One reason I am sure of this is the SHA-1 to whatever transition.
We can't count on the successor hash to survive attack forever.
Accordingly, git's design needs to be stable against the possibility
of having to accommodate multiple future hash algorithms in the
future.

Here's how to do it:

1. Commit IDs and Merkle-tree hashes become separate commit
   properties in the git filesystem.

2. The data structure representing a Merkle-tree hash becomes
   a pair consisting of a value and a hash-algorithm tag. An
   empty tag is interpreted as SHA-1. I will call this entity the
   "verification hash" and avoid unqualified use of "hash" in the
   rest of this proposal.

3. The initial value of a commit's ID in a live repository is a copy
   of its verification hash, except in one important case.

4. When a repository is exported to a stream, the commit-id is dumped
   with other commit metadata.  Thus, anything that can read a stream
   can resolve commit references in its change comments.

5. When a stream is imported, if a commit has a commit-id field it
   overrides the default assignment of the generated verification hash
   to that field.

6. Commit IDs are free-format and not interpreted by git except
   as lookup keys. When git changes verification-hash functions,
   commit IDs do not change.

Notice several important properties of this design.

A. Git becomes absolutely future-proofed against hash-algorithm
   changes. It can even support the use of multiple hash types over
   the lifetime of one repo.

B. All SHA-1 commit references will resolve forever even after git
   stops generating them.  All future hash-based commit references will
   also be good forever.

C. The id/verification split will be invisible from clients at start,
   because initially they coincide and will continue to do so unless
   an explicit decision changes either the verification-hash algorithm
   or the way commit-IDs are initialized.

D. My wish for forward-portable unique commit IDs is granted.
   They're not by default eyeball-friendly, but I can live with that.
   Furthermore, because they're preserved in streams they can be
   eternally stable even as hash algorithms and preferred ID
   formats change.

E. There is now a unique total order on the repo, modulo highly
   unlikely (and in priciple completely avoidable) commit-ID
   collisions. It's commit date tie-broken by commit-ID sort order.
   It too survives hash-function changes.

F. There's no need for timestamp uniqueness any more.

G. When a repository is imported from (say) Subversion, the Subversion
   IDs *don't have to break*!  They can be used to initialize the
   commit-ID fields. Many users migrating from other VCSes will be
   deeply, deeply grateful for this feature.

I believe this solves every problem I walked in with except timestamp
truncation.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

Probably fewer than 2% of handguns and well under 1% of all guns will
ever be involved in a violent crime. Thus, the problem of criminal gun
violence is concentrated within a very small subset of gun owners,
indicating that gun control aimed at the general population faces a
serious needle-in-the-haystack problem.
-- Gary Kleck, "Point Blank: Handgun Violence In America"


Re: Finer timestamps and serialization in git

2019-05-20 Thread Eric S. Raymond
Jakub Narebski :
> Errr... how did you get that the hash of a commit is not portable???

OK. You're telling me that premise was wrong.  Thank you,
accepted.

I've since had a better idea.  Expect mail soon.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-20 Thread Eric S. Raymond
Elijah Newren :
> Hi,
> 
> On Mon, May 20, 2019 at 11:09 AM Eric S. Raymond  wrote:
> 
> > > For cookie to be unique among all forks / clones of the same repository
> > > you need either centralized naming server, or for the cookie to be based
> > > on contents of the commit (i.e. be a hash function).
> >
> > I don't need uniquess across all forks, only uniqueness *within the repo*.
> 
> You've lost me.  In other places you stated you didn't want to use the
> commit hash, and now you say this.  If you only care about uniqueness
> within the current copy of the repo and don't care about uniqueness
> across forks (i.e. clones or copies that exist now or in the future --
> including copies stored using SHA256), then what's wrong with using
> the commit hash?

Because it's not self-describing, can't be computed solely from visible
commit metadata, and relies on complex external assumptions about how
the hash is computed which break when your VCS changes hash algorithms.

These are dealbreakers because one of my major objectives is forward
portability of these IDs forever. And I mean *forever*.  It should be
possible for someone in the year 40,000, in between assaulting planets
for the God-Emperor, to look at an import stream and deduce how to
resolve the cookies to their commits without seeing git's code or
knowing anything about its hash algorithms.

I think maybe the reason I'm having so much trouble getting this
across is that git insiders are used to thinking of import streams as
transient things.  Because I do a lot of repo migrations, I have a
very different view of them.  I built reposurgeon on the realization
that they're a general transport format for revision histories, and
that has forward value independent of the existence of git.

If a stream contained fully forward-portable action stamps, it would be
forward-portable forever.  Hashes in commit comments are the *only*
blocker to that.  Take this from a person who has spent way too much time
patching Subversion IDs like r1234 during repository conversions.

It would take so little to make this work. Existing stream format is
*almost there*.

> A stable ordering of commits in a fast-export stream might be a cool
> feature.  But I don't know how to define one, other than perhaps sort
> first by commit-depth (maybe optionally adding a few additional
> intermediate sorting criteria), and then finally sort by commit hash
> as a tiebreaker. Without the fallback to commit hash, you fall back
> on normal traversal order which isn't stable (it depends on e.g. order
> of branches listed on the command line to fast-export, or if using
> --all, what new branch you just added that comes alphabetically before
> others).
>
> I suspect that solution might run afoul of your dislike for commit
> hashes, though, so I'm not sure it'd work for you.

It does. See above.

> > So let me back up a step.  I will cheerfully drop advocating bumping
> > timestamps if anyone can tell me how a different way to define a per-commit
> > reference cookie that (a) is unique within its repo, and (b) only requires
> > metadata visible in the fast-export representation of the commit.
> 
> Does passing --show-original-ids option to fast-export and using the
> resulting original-oid field as the cookie count?

I was not aware of this option.  Looking...no wonder, it's not on my
system man page.  Must be recent.

OK. Wow.  That is *useful*, and I am going to upgrade reposurgeon to read
it.  With that I can do automatic commit-reference rewriting.

I don't consider it a complete solution. The problem is that OID is
a consistent property that can be used to resolve cookies, but there's
no guaranteed that it's a *preserved* property that survives multiple
round trips and changes in hash functions.

So the right way to use it is to pick it up, do reference-cookie
resolution, and then mung the reference cookies to a format that is
stable forever.  I don't know what that format should be yet.  I
have a message in composition about this.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-20 Thread Eric S. Raymond
Derrick Stolee :
> What it sounds like you are doing is piping a 'git fast-import' process into
> reposurgeon, and testing that reposurgeon does the same thing every time.
> Of course this won't be consistent if 'git fast-import' isn't consistent.

It's not actually import that fails to have consistent behavior, it's export.

That is, if I fast-import a given stream, I get indistinguishable
in-core commit DAGs every time. (It would be pretty alarming if this
weren't true!)

What I have no guarantee of is the other direction.  In a multibranch repo,
fast-export writes out branches in an order I cannot predict and which
appears from the outside to be randomly variable.

> But what you should do instead is store a fixed file from one run of
> 'git fast-import' and send that file to reposurgeon for the repeated test.
> Don't rely on fast-import being consistent and instead use fixed input for
> your test.
> 
> If reposurgeon is providing the input to _and_ consuming the output from
> 'git fast-import', then yes you will need to have at least one integration
> test that runs the full pipeline. But for regression tests covering 
> complicated
> logic in reposurgeon, you're better off splitting the test (or mocking out
> 'git fast-import' with something that provides consistent output given
> fixed input).

And I'd do that... but the problem is more fundamental than you seem to
understand.  git fast-export can't ship a consistent output order because
it doesn't retain metadata sufficient to totally order child branches.

This is why I wanted unique timestamps.  That would solve the problem,
branch child commits of any node would be ordered by their commit date.

But I had a realization just now.  A much smaller change would do it.
Suppose branch creations had creation stamps with a weak uniqueness property;
for any given parent node, the creation stamps of all branches originating
there are guaranteed to be unique?

If that were true, there would be an implied total ordering of the
repository.  The rules for writing out a totally ordered dump would go
like this:

1. At any given step there is a set of active branches and a cursor
on each such branch.  Each cursor points at a commit and caches the
creation stamp of the current branch.

2. Look at the set of commits under the cursors.  Write the oldest one.
If multiple commits have the same commit date, break ties by their
branch creation stamps.

3. Bump that cursor forward. If you're at a branch creation, it
becomes multiple cursors, one for each child branch.
If you're at a join, some cursors go away.

Here's the clever bit - you make the creation stamp nothing but a
counter that says "This was the Nth branch creation."  And it is
set by these rules:

4. If the branch creation stamp is undefined at branch creation time,
number it in any way you like as long as each stamp is unique. A
defined, documented order would be nice but is not necessary for
streams to round-trip.

5. When writing an export stream, you always utter a reset at the
point of branch creation.

6. When reading an import stream, the ordinal for a new branch is
defined as the number of resets you have seen.

Rules 5 and 6 together guarantee that branch creation ordinals round-trip
through export streams.  Thus, streams round-trip and I can have my
regression tests with no change to git's visible interface at all!

I could write this code.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-20 Thread Eric S. Raymond
Michal Suchánek :
> On Wed, 15 May 2019 21:25:46 -0400
> Derrick Stolee  wrote:
> 
> > On 5/15/2019 8:28 PM, Eric S. Raymond wrote:
> > > Derrick Stolee :  
> > >> What problem are you trying to solve where commit date is important?  
> 
> > > B. Unique canonical form of import-stream representation.
> > > 
> > > Reposurgeon is a very complex piece of software with subtle failure
> > > modes.  I have a strong need to be able to regression-test its
> > > operation.  Right now there are important cases in which I can't do
> > > that because (a) the order in which it writes commits and (b) how it
> > > colors branches, are both phase-of-moon dependent.  That is, the
> > > algorithms may be deterministic but they're not documented and seem to
> > > be dependent on variables that are hidden from me.
> > > 
> > > Before import streams can have a canonical output order without hidden
> > > variables (e.g. depending only on visible metadata) in practice, that
> > > needs to be possible in principle. I've thought about this a lot and
> > > not only are unique commit timestamps the most natural way to make
> > > it possible, they're the only way conistent with the reality that
> > > commit comments may be altered for various good reasons during
> > > repository translation.  
> > 
> > If you are trying to debug or test something, why don't you serialize
> > the input you are using for your test?
> 
> And that's the problem. Serialization of a git repository is not stable
> because there is no total ordering on commits. And for testing you need
> to serialize some 'before' and 'after' state and they can be totally
> different. Not because the repository state is totally different but
> because the serialization of the state is not stable.

Yes, msuchanek is right - that is exactly the problem.  Very well put.

git fast-import streams *are* the serialization; they're what reposurgeon
ingests and emits.  The concrete problem I have is that there is no stable
correspondence between a repository and one canonical fast-import
serialization of it.

That is a bigger pain in the ass than you will be able to imagine unless
and until you try writing surgical tools yourself and discover that you
can't write tests for them.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-20 Thread Eric S. Raymond
 publish this
> mapping somewhere, be it with Internet Archive or Software Heritage.
> Problem solved.

I don't see it.  How does this prevent old clients from barfing on new
repositories?

> P.S. Could you explain to me how one can use action stamp, e.g.
> , to quickly find the
> commit it refers to?  With SHA-1 id you have either filesystem pathname
> or the index file for pack to find it _fast_.

For the purposes that make action stamps important I don't really care
about performance much (though there are fairly obvious ways to
achieve it).  My goal is to ensure that revision histories (e.g. in
their import-stream format) are forward-portable to future VCSes
without requiring any data outside the stream itself.

Please remember that I'm accustomed to maintaining infrastructure on
decadal timescales - I wrote code in the 1980s that is still in wide use
and I expect some of the code I'm writing now to be still in use thirty
years from now.

This gives me a different perspective on the fragility of things like
SHA-1 hashes.  From a decadal-scale POV any particular crypto-hash
format is unstable garbage, and having them in change comments is a
maintainability disaster waiting to happen.

Action stamps are specifically designed so that they're pointers to commits
that don't require anything but the target commit's import/export-stream
metadata to resolve.  Your idea of an archived hash registry makes me
extremely nervous; I think it's too fragile to trust.

So let me back up a step.  I will cheerfully drop advocating bumping
timestamps if anyone can tell me how a different way to define a per-commit
reference cookie that (a) is unique within its repo, and (b) only requires
metadata visible in the fast-export representation of the commit.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-19 Thread Eric S. Raymond
Jakub Narebski :
> As far as I understand it this would slow down receiving new commits
> tremendously.  Currently great care is taken to not have to parse the
> commit object during fetch or push if it is not necessary (thanks to
> things such as reachability bitmaps, see e.g. [1]).
> 
> With this restriction you would need to parse each commit to get at
> commit timestamp and committer, check if the committer+timestamp is
> unique, and bump it if it is not.

So, I'd want to measure that rather than simply assuming it's a blocker.
Clocks are very cheap these days.

> Also, bumping timestamp means that the commit changed, means that its
> contents-based ID changed, means that all commits that follow it needs
> to have its contents changed...  And now you need to rewrite many
> commits.

What "commits that follow it?" By hypothesis, the incoming commit's
timestamp is bumped (if it's bumped) when it's first added to a branch
or branches, before there are following commits in the DAG.

>And you also break the assumptions that the same commits have
> the same contents (including date) and the same ID in different
> repositories (some of which may include additional branches, some of
> which may have been part of network of related repositories, etc.).

Wait...unless I completely misunderstand the hash-chain model, doesn't the
hash of a commit depend on the hashes of its parents?  If that's the case,
commits cannot have portable hashes. If it's not, please correct me.

But if it's not, how does your first objection make sense?

> > You don't need a daemon now to write commits to a repository. You can
> > just add stuff to the object store, and then later flip the SHA-1 on a
> > reference, we lock those indivdiual references, but this sort of thing
> > would require a global write lock. This would introduce huge concurrency
> > caveats that are non-issues now.
> >
> > Dumb clients matter. Now you can e.g. have two libgit2 processes writing
> > to ref A and B respectively in the same repo, and they never have to
> > know about each other or care about IPC.

How do they know they're not writing to the same ref?  What keeps
*that* operation atomic?

> You do realize that dates may not be monotonic (because of imperfections
> in clock synchronization), thus the fact that the date is different from
> parent does not mean that is different from ancestor.

Good point. That means the O(log2 n) version of the check has to be done
all the time.  Unfortunate.

> >> That's the simple case. The complicated case is checking for date
> >> collisions on *other* branches. But there are ways to make that fast,
> >> too. There's a very obvious one involving a presort that is is O(log2
> >> n) in the number of commits.
> 
> I don't think performance hit you would get would be acceptable.

Again, it's bad practice to assume rather than measure. Human intuitions
about this sort of thing are notoriously unreliable.

> >> Excuse me, but your premise is incorrect.  A git DAG isn't just "any" DAG.
> >> The presence of timestamps makes a total ordering possible.
> >>
> >> (I was a theoretical mathematician in a former life. This is all very
> >> familiar ground to me.)
> 
> Maybe in theory, when all clock are synchronized.

My assertion does not depend on synchronized clocks, because it doesn't have to.

If the timestamps in your repo are unique, there *is* a total ordering - 
by timestamp. What you don't get is guaranteed consistency with the
topo ordering - that is you get no guarantee that a child's timestamp
is greater than its parents'. That really would require a common
timebase.

But I don't need that stronger property, because the purpose of
totally ordering the repo is to guararantee the uniqueness of action
stamps.  For that, all I need is to be able to generate a unique cookie
for each commit that can be inserted in its action stamp.  For my use cases
that cookie should *not* be a hash, because hashes always break N years
down.  It should be an eternally stable product of the commit metadata.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-19 Thread Eric S. Raymond
Philip Oakley :
> > But I don't quite understand your claim that there's no format
> > breakage here, unless you're implying to me that timestamps are already
> > stored in the git file system as variable-length strings.  Do they
> > really never get translated into time_t?  Good news if so.
> Maybe just take some of the object ID bits as being the fractional time
> timestamp. They are effectively random, so should do a reasonable job of
> distinguishing commits in a repeatable manner, even with full round tripping
> via older git versions (as long as the sha1 replicates...)

Huh.  That's an interesting idea.  Doesn't absolutely guarantee uniqueness,
but even with birthday effect the probability of collisions could be pulled
arbitrarily low.

> As I understand it the commit timestamp is actually free text within the
> commit object (try `git cat-file -p ), so the issue is
> whether the particular git version is ready to accept the additional 'dot'
> factional time notation (future versions could be extended, but I think old
> ones would reject them if I understand the test up thread - which would
> compromise backward compatibility and round tripping).

Nobody seems to want to grapple with the fact that changing hash formats is
as large or larger a problem in exactly the same way.

I'm not saying that changing the timestamp granularity justifies a format
break.  I'm saying that *since you're going to have one anyway*, the option
to increase timestamp precision at the same time should not be missed.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-15 Thread Eric S. Raymond
Derrick Stolee :
> On 5/15/2019 3:16 PM, Eric S. Raymond wrote:
> > The deeper problem is that I want something from Git that I cannot
> > have with 1-second granularity. That is: a unique timestamp on each
> > commit in a repository.
> 
> This is impossible in a distributed version control system like Git
> (where the commits are immutable). No matter your precision, there is
> a chance that two machiens commit at the exact same moment on two different
> machines and then those commits are merged into the same branch.

It's easy to work around that problem. Each git daemon has to single-thread
its handling of incoming commits at some level, because you need a lock on the
file system to guarantee consistent updates to it.

So if a commit comes in that would be the same as the date of the
previous commit on the current branch, you bump the incoming commit timestamp.
That's the simple case. The complicated case is checking for date
collisions on *other* branches. But there are ways to make that fast,
too. There's a very obvious one involving a presort that is is O(log2
n) in the number of commits.

I wouldn't have brought this up in the first place if I didn't have a
pretty clear idea how to do it in code!

> Even when you specify a committer, there are many environments where a set
> of parallel machines are creating commits with the same identity.

If those commit sets become the same commit in the final graph, this is
not a problem for total ordering.

> > Why do I want this? There are number of reasons, all related to a
> > mathematical concept called "total ordering".  At present, commits in
> > a Git repository only have partial ordering. 
> 
> This is true of any directed acyclic graph. If you want a total ordering
> that is completely unambiguous, then you should think about maintaining
> a linear commit history by requiring rebasing instead of merging.

Excuse me, but your premise is incorrect.  A git DAG isn't just "any" DAG.
The presence of timestamps makes a total ordering possible.

(I was a theoretical mathematician in a former life. This is all very
familiar ground to me.)

> > One consequence is that
> > action stamps - the committer/date pairs I use as VCS-independent commit
> > identifications in reposurgeon - are not unique.  When a patch sequence
> > is applied, it can easily happen fast enough to give several successive
> > commits the same committer-ID and timestamp.
> 
> Sorting by committer/date pairs sounds like an unhelpful idea, as that
> does not take any graph topology into account. It happens that commits
> can actually have an _earlier_ commit date than its parent.

Yes, I'm aware of that.  The uniqueness properties that make a total
ordering desirable are not actually dependent on timestamp order
coinciding with topo order.

> Changing the granularity of timestamps requires changing the commit format,
> which is probably a non-starter.

That's why I started by noting that you're going to have to break the
format anyway to move to an ECDSA hash (or whatever you end up using).

I'm saying that *since you'll need to do that anyway*, it's a good time
to think about making timestamps finer-grained and unique.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-15 Thread Eric S. Raymond
Jason Pyeron :
> If we take the below example:
> 
> committer Name  1557948240 -0400
> 
> and we follow the rule that:
> 
> 1. any trailing zero after the decimal point MUST be omitted
> 2. if there are no digits after the decimal point, it MUST be omitted
> 
> This would allow:
> 
> committer Name  1557948240 -0400
> committer Name  1557948240.12 -0400
> 
> but the following are never allowed:
> 
> committer Name  1557948240. -0400
> committer Name  1557948240.00 -0400
> 
> By following these rules, all previous commits' hash are unchanged. Future 
> commits made on the top of the second will look like old commit formats. 
> Commits coming from "older" tools will produce valid and mergeable objects. 
> The loss precision has frustrated us several times as well.

Yes, that's almost exactly what I came up with.  I was concerned with upward
compatibility in fast-export streams, which reposurgeon ingests and emits.

But I don't quite understand your claim that there's no format
breakage here, unless you're implying to me that timestamps are already
stored in the git file system as variable-length strings.  Do they
really never get translated into time_t?  Good news if so.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-15 Thread Eric S. Raymond
Derrick Stolee :
> What problem are you trying to solve where commit date is important?

I don't know what Jason's are.  I know what mine are.

A. Portable commit identifiers

1. When I in-migrate a repository from (say) Subversion with
reposurgeon, I want to be able to patch change comments so that (say)
r2367 becomes a unique reference to its corresponding commit. I do
not want the kludge of appending a relic SVN-ID header to be *required*,
though some customers may choose that. Requirung that is an orthogonality
violation.

2. Because I think in decadal timescales about infrastructure, I want
my commit references to be in a format that won't break when the history
is forward-migrated to the *next* VCS. That pretty much eliminates any
from of opaque hash. (Git itself will have a weaker version of this problem
when you change hash formats.)

3. Accordingly, I invented action stamps. This is an action stamp:
. One reason I want timestamp
uniqueness is for action-stamp uniqueness.

B. Unique canonical form of import-stream representation.

Reposurgeon is a very complex piece of software with subtle failure
modes.  I have a strong need to be able to regression-test its
operation.  Right now there are important cases in which I can't do
that because (a) the order in which it writes commits and (b) how it
colors branches, are both phase-of-moon dependent.  That is, the
algorithms may be deterministic but they're not documented and seem to
be dependent on variables that are hidden from me.

Before import streams can have a canonical output order without hidden
variables (e.g. depending only on visible metadata) in practice, that
needs to be possible in principle. I've thought about this a lot and
not only are unique commit timestamps the most natural way to make
it possible, they're the only way conistent with the reality that
commit comments may be altered for various good reasons during
repository translation.

> P.S. All of my (overly strong) opinions on using commit date are made
> more valid when you realize anyone can set GIT_COMMITTER_DATE to get
> an arbitrary commit date.

In the way I would write things, you can *request* that date, but in
case of a collision you might actually get one a few microseconds off
that preserves its order relationship with your other commits.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond




Re: Finer timestamps and serialization in git

2019-05-15 Thread Eric S. Raymond
Ævar Arnfjörð Bjarmason :
> You put it key-values in the commit message and read it back out via
> git-interpret-trailers.

Speaking as a person who has done a lot of repository migrations, this
makes me shudder.  It's fragile, kludgy, and does not maintain proper
separation of concerns.

The feature I *didn't* ask for at the next format break is a user-modifiable
key-value store per commit that is *not* in the commit comment.  Bzr
has this.  It's useful.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Finer timestamps and serialization in git

2019-05-15 Thread Eric S. Raymond
The recent increase in vulnerability in SHA-1 means, I hope, that you
are planning for the day when git needs to change to something like
an elliptic-curve hash.  This means you're going to have a major
format break. Such is life.

Since this is going to have to happen anyway, let me request two
functional changes in git. Neither will be at all difficult, but the
first one is also a thing that cannot be done without a format break,
which is why I have not suggested them before.  They come from lots of
(often painful) experience with repository conversions via
reposurgeon.

1. Finer granularity on commit timestamps.

2. Timestamps unique per repository

The coarse resolution of git timestamps, and the lack of uniqueness,
are at the bottom of several problems that are persistently irritating
when I do repository conversions and surgery.

The most obvious issue, though a relatively superficial one, is that I have
to thow away information whenever I convert a repository from a system with
finer-grained time.  Notably this is the case with Subversion, which keeps
time to milliseconds. This is probably the only respect in which its data
model remains superior to git's. :-)

The deeper problem is that I want something from Git that I cannot
have with 1-second granularity. That is: a unique timestamp on each
commit in a repository. The only way to be certain of this is for git
to delay accepting integration of a patch until it can issue a unique
time mark for it - obviously impractical if the quantum is one second,
but not if it's a millisecond or microsecond.

Why do I want this? There are number of reasons, all related to a
mathematical concept called "total ordering".  At present, commits in
a Git repository only have partial ordering. One consequence is that
action stamps - the committer/date pairs I use as VCS-independent commit
identifications in reposurgeon - are not unique.  When a patch sequence
is applied, it can easily happen fast enough to give several successive
commits the same committer-ID and timestamp.

Of course the commit hash remains a unique commit ID.  But it can't
easily be parsed and followed by a human, which is a UX problem when
it's used as a commit stamp in change comments.

More deeply, the lack of total ordering means that repository graphs
don't have a single canonical serialized form.  This sounds abstract
but it means there are surgical operations I can't regression-test
properly.  My colleague Edward Cree has found cases where git fast-export
can issue a stream dump for which git fast-import won't necessarily
re-color certain interior nodes the same way when it's read back in
and I'm pretty sure the absence of total ordering on the branch tips
is at the bottom of that.

I'm willing to write patches if this direction is accepted.  I've figured
out how to make fast-import streams upward-compatible with finer-grained
timestamps.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond



Re: [PATCH 3/3] docs/cvs-migration: mention cvsimport caveats

2016-09-27 Thread Eric S. Raymond
Jeff King :
>   I am not qualified to write on the current state of
> the art in CVS importing.

I *am* qualified; cvs-fast-export has had a lot of work put into it by
myself and others over the last five years.  Nobody else is really
working this problem anymore, not much else than cvs2git is even left
standing at this point. Most other attempts on the problem have
stalled or flamed out, and were never very robust in dealing with
repository malformations to begin with.

cvs2git can probably still almost match cvs-fast-export in ability to handle
pathological cases, but is painfully slow by comparison.  (Part of that is
implementation in Python vs. C.)

cvs-fast-export has been successfully performance-tuned for very large
repositories, such as the entirety of NetBSD, and is orders of
magnitude faster than it used to be. (I parallelized the parsing
of RCS masters with a re-entrant Bison instance running per thread;
this makes a huge difference on large repositories, for which that
stage dominates running time.) Its ability to recover sense from
repository malformations was already pretty good five years ago
and is probably unmatched now.  It does .cvsignore conversion.

cvs-fast-export also now has a really good test suite collecting all
kinds of weird CVS deformations from the field, and a wrapper that can
both do a conversion and check for correctness at every tag as well as
the tip revision.

By contrast, the wrapper/cvsps combination git ships continues to be
disgracefully bad and should be scrapped - remember that I maintained
cvsps for a while and tried to EOL it because its branch-resolution
algorithms are unsound.  I have a replacement wrapper ready any time
the git maintainer decides to stop shipping broken, dangerous code.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


Re: [PATCH 3/3] docs/cvs-migration: mention cvsimport caveats

2016-09-22 Thread Eric S. Raymond
Jeff King :
> Back when this guide was written, cvsimport was the only
> game in town. These days it is probably not the best option.

It is absolutely not.  As I have tried to point out here before, it
is *severely* broken in its processing of branchy CVS repositories.

Nobody wanted to hear that, but it's still true. Recommending it
is irresponsible.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


Re: I have end-of-lifed cvsps

2013-12-18 Thread Eric S. Raymond
Michael Haggerty :
> If you haven't tried cvs2git yet, please start it up somewhere in the
> background.  It might take a while but it should have no trouble with
> your repos, and then you can compare the tools based on experience
> rather than speculation.

That would be a good thing.

Michael, in case you're wondering why I've continued to work on
cvs-fast-export when cvs2git exists, there are exactly two reasons:
(a) it's a whole lot faster on repos that aren't large enough to
demand multipass, and (b) the single-whole-dumpfile output makes it a
better reposurgeon front end.

> But the traffic on the cvs2svn/cvs2git mailing list has trailed off
> essentially to zero, so either the software is perfect already (haha) or
> most everybody has already converted.  Therefore I don't invest any
> significant time in that project these days.

Reasonable.  I'm doing this as a temporary break from working on GPSD.
I don't expect to be investing a lot of time in it after I get it
to a 1.0 state.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-18 Thread Eric S. Raymond
John Keeping :
> Which I think sums up the position nicely; if you're doing a one-shot
> import then the standalone tools are going to be a better choice, but if
> you're trying to use Git for your work on top of CVS the only choice is
> cvsps with git-cvsimport.

Which will trash your history - the bugs in that are worse than the bugs
in 3.0, which are bad enough that I *terminated* it.

Lovely
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-18 Thread Eric S. Raymond
Jeff King :
> In git, it may happen quite a bit during "git am" or "git rebase", in
> which a large number of commits are replayed in a tight loop.

That's a good point - a repeatable real-world case in which we can
expect that behavior.

This case could be solved, though, with a slight tweak to the commit generator
in git (given subsecond timestamps).  It could keep the time of last commit
and stall by an arbitrary small amount, enough to show up as a timestamp
difference. 

Action stamps work pretty well inside reposurgeon because they're
mainly used to identify commits from older VCSes that can't run that
fast. Collisions are theoretically possible but I'm never seen one in
the wild.

>   You can
> use the author timestamp instead, but it also collides (try "%at %ae" in
> the above command instead).

Yes, obviously for the same reason. 
 
> > And now you know why I wish git had subsecond timestamp resolution!  If it
> > did, uniqueness of these in a git stream could be guaranteed.
> 
> It's still not guaranteed. Even with sufficient resolution that no two
> operations could possibly complete in the same time unit, clocks do not
> always march forward. They get reset, they may skew from machine to
> machine, the same operation may happen on different machines, etc.

Right...but the *same person* submitting operations from *different
machines* within the time window required to be caught by these effects
is at worst fantastically unlikely.  That case is exactly why action 
stamps have an email part.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-18 Thread Eric S. Raymond
Jakub Narębski :
> It is a bit strange that markfile has explicitly SHA-1 (":markid "),
> instead of generic reference to commit, in the case of CVS it would be
> commitid (what to do for older repositories, though?), in case of Bazaar
> its revision id (GUID), etc.  Can we assume that SCM v1 fast-export and
> SCM v2 fast-import markfile uses compatibile commit names in markfile?

For use in reposurgeon I have defined a generic cross-VCS reference to
commit I call an "action stamp"; it consists of an RFC3339 date followed by 
a committer email address. Here's an example:

 2013-02-06T09:35:10Z!e...@thyrsus.com

In any VCS with changesets (git, Subversion, bzr, Mercurial) this
almost always suffices to uniquely identify a commit. The "almost" is
because in these systems it is possible for a user to do multiple commits
in the same second.

And now you know why I wish git had subsecond timestamp resolution!  If it
did, uniqueness of these in a git stream could be guaranteed.

The implied model completely breaks for CVS, of course.  There you have to 
use commitids and plain give up when those don't exist.
 
> I think it would be possible for remote-helper for cvs-fast-export to find
> this cutoff date automatically (perhaps with some safety margin), for
> fetching (incremental import).

Yes.
 
> > As I tried to explain previously in my response to John Herland, it's
> > incremental output only.  There is *no* CVS exporter known to me, or
> > him, that supports incremental work.  That would be at best be impractically
> > difficult; given CVS's limitations it may be actually impossible. I wouldn't
> > bet against impossible.
> 
> Even with saving (or re-calculating from git import) guesses about CVS
> history made so far?

Even with that.  cvsps-2.x tried to do something like this.  It was a lose.
 
> Anyway I hope that incremental CVS import would be needed less
> and less as CVS is replaced by any more modern version control system.

I agree.  I have never understood why people on this list are attached to it.

> I was thinking about creating remote-helper for cvs-fast-export, so that
> git can use local CVS repository as "remote", using e.g. "cvsroot::"
> as repo URL, and using this mechanism for incremental import (aka fetch).
> (Or even "cvssync::" for automatic cvssync + cvs-fast-export).
> 
> But from what I understand this is not as easy as it seems, even with
> remote-helper API having support for fast-import stream.

It's a swamp I wouldn't want to walk into.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-17 Thread Eric S. Raymond
Andreas Schwab :
> "Eric S. Raymond"  writes:
> 
> > All versions of CVS have generated commitids since 2004.
> 
> Though older versions are still in use, eg. sourceware.org still does
> not generate commitids.

That is awful.  Alas, there is not much anyone can do about stupidity
that determined.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-17 Thread Eric S. Raymond
Jakub Narębski :
> > No, cvs-fast-export does not have --export-marks. It doesn't generate the
> > SHA1s that would require. Even if it did, it's not clear how that would 
> > help.
> 
> I was thinking about how the following part of git-fast-export
> `--import-marks=`
> 
>   Any commits that have already been marked will not be exported again.
>   If the backend uses a similar --import-marks file, this allows for 
> incremental
>   bidirectional exporting of the repository by keeping the marks the same
>   across runs.

I understand that. But it's not relevant - cvs-fast-import doesn't know about
git SHA1s, and cannot.
 
> How cvs-fast-export know where to start exporting from in incremental mode?

You give it a cutoff date. This is the same way cvsps-2.x and 3.x worked,
and it's what the cvsimport wrapper expects to pass down.

> BTW. does cvs-fast-export support incremental *output*, or does it
> perform also incremental *work*?

As I tried to explain previously in my response to John Herland, it's
incremental output only.  There is *no* CVS exporter known to me, or
him, that supports incremental work.  That would be at best be impractically
difficult; given CVS's limitations it may be actually impossible. I wouldn't
bet against impossible.

> Anyway, that might mean that generic fast-import stream based incremental
> (i.e. supporting proper thin fetch) remote helper is out of question, perhaps
> writing one for cvs / cvs-fe would bring incremental import from CVS to
> git?

Sorry, I don't understand that.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-17 Thread Eric S. Raymond
Johan Herland :
> > Alan and I are going to take a good hard whack at modifying cvs-fast-export
> > to make this work. Because there really aren't any feasible alternatives.
> > The analysis code in cvsps was never good enough. cvs2git, being written
> > in Python, would hit the core limit faster than anything written in C.
> 
> Depends on how it organizes its data structures. Have you actually
> tried running cvs2git on it? I'm not saying you are wrong, but I had
> similar problems with my custom converter (also written in Python),
> and solved them by adding multiple passes/phases instead of trying to
> do too much work in fewer passes. In the end I ended up storing the
> largest inter-phase data structures outside of Python (sqlite in my
> case) to save memory. Obviously it cost a lot in runtime, but it meant
> that I could actually chew through our largest CVS modules without
> running out of memory.

You make a good point.  cvs2git is descended from cvs2svn, which has
such a multipass organization - it will only have to avoid memory
limits per pass.  Alan and I will try that as a fallback if
cvs-fast-import continues to choke.
 
> > It is certainly the case that a sufficiently large CVS repo will break
> > anything, like a star with a mass over the Chandrasekhar limit becoming a
> > black hole :-)
> 
> :) True, although it's not the sheer size of the files themselves that
> is the actual problem. Most of those bytes are (deltified) file data,
> which you can pretty much stream through and convert to a
> corresponding fast-export stream of blob objects. The code for that
> should be fairly straightforward (and should also be eminently
> parallelizable, given enough cores and available I/O), resulting in a
> table mapping CVS file:revision pairs to corresponding Git blob SHA1s,
> and an accompanying (set of) packfile(s) holding said blobs.

Allowing for the fact that cvs-fast-export isn't git and doesn't use
SHA1s or packfiles, this is in fact how a large portion of
cvs-fast-export works.  The blob files get created during the walk
through the master file list, before actual topo analysis is done.

> The hard part comes when trying to correlate the metadata for all the
> per-file revisions, and distill that into a consistent sequence/DAG of
> changesets/commits across the entire CVS repo. And then, of course,
> trying to fit all the branches and tags into that DAG of commits is
> what really drives you mad... ;-)

Well I know this...:-)

> > The question is how common such supermassive cases are. My own guess is that
> > the *BSD repos and a handful of the oldest GNU projects are pretty much the
> > whole set; everybody else converted to Subversion within the last decade.
> 
> You may be right. At least for the open-source cases. I suspect
> there's still a considerable number of huge CVS repos within
> companies' walls...

If people with money want to hire me to slay those beasts, I'm available.
I'm not proud, I'll use cvs2git if I have to.
 
> > I find the very idea of writing anything that encourages
> > non-history-correct conversions disturbing and want no part of it.
> >
> > Which matters, because right now the set of people working on CVS lifters
> > begins with me and ends with Michael Rafferty (cvs2git),
> 
> s/Rafferty/Haggerty/?

Yup, I thinkoed.
 
> > who seems even
> > less interested in incremental conversion than I am.  Unless somebody
> > comes out of nowhere and wants to own that problem, it's not going
> > to get solved.
> 
> Agreed. It would be nice to have something to point to for people that
> want something similar to git-svn for CVS, but without a motivated
> owner, it won't happen.

I think the fact that it hasn't happened already is a good clue that
it's not going to. Given the decline curve of CVS usage, writing 
git-cvs might have looked like a decent investment of time once,
but that era probably ended five to eight years ago.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-17 Thread Eric S. Raymond
Jakub Narębski :
> Errr... doesn't cvs-fast-export support --export-marks= to save
> progress and --import-marks= to continue incremental import?

No, cvs-fast-export does not have --export-marks. It doesn't generate the
SHA1s that would require. Even if it did, it's not clear how that would help.

> I would check it in cvs-fast-export manpage, but the page seems to
> be down:
> 
>   http://isup.me/www.catb.org
> 
> It's not just you! http://www.catb.org looks down from here.

Confirmed.  Looks like ibiblio is having a bad day.  I'll file a bug report. 

> > Fortunately, incremental dump is trivial to implement in the output
> > stage of an exporter if you have access to the exporter source code.
> > I've done it in two different exporters.  cvs-fast-export now has a
> > regression test for this case
> 
> This is I guess assuming that information from later commits doesn't
> change guesses about shape of history from earlier commits...

That's the "stability" property that Martin Langhoff and I were discussing
earlier.

cvs-fast-export conversions are stable under incremental
lifting providing a commitid-generating version of CVS is in use
during each increment.  Portions of the history *before the first
lift* may lack commitids and will nevertheless remain stable through
the whole process.

All versions of CVS have generated commitids since 2004.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-17 Thread Eric S. Raymond
Johan Herland :
> However, I fear that you underestimate the number of users that want
> to use Git against CVS repos that are orders of magnitude larger (in
> both dimensions: #commits and #files) than your example repo.

You may be right. See below...

I'm working with Alan Barret now on trying to convert the NetBSD
repositories. They break cvs-fast-export through sheer bulk of
metadata, by running the machine out of core.  This is exactly
the kind of huge case that you're talking about.

Alan and I are going to take a good hard whack at modifying cvs-fast-export 
to make this work. Because there really aren't any feasible alternatives.
The analysis code in cvsps was never good enough. cvs2git, being written
in Python, would hit the core limit faster than anything written in C.

> Although a full-history converter with fairly stable output can be
> made to support this second problem for repos up to a certain size,
> there will probably still be users that want to work incrementally
> against much bigger repos, and I don't think _any_
> full-history-gone-incremental importer will be able to support the
> biggest repos.
> 
> Consequently I believe that for these big repos it is _impossible_ to
> get both fast incremental workflows and a high degree of (historical)
> correctness.
> 
> cvsps tried to be all of the above, and failed badly at the
> correctness criteria. Therefore I support your decision to "shoot it
> through the head". I certainly also support any work towards making a
> full-history converter work in an incremental manner, as it will be
> immensely useful for smaller CVS repos. But at the same time we should
> realize that it won't be a solution for incrementally working against
> _large_ CVS repos.

It is certainly the case that a sufficiently large CVS repo will break
anything, like a star with a mass over the Chandrasekhar limit becoming a 
black hole :-)

The question is how common such supermassive cases are. My own guess is that
the *BSD repos and a handful of the oldest GNU projects are pretty much the
whole set; everybody else converted to Subversion within the last decade. 
 
> Although it should have been made obvious a long time ago, the removal
> of cvsps has now made it abundantly clear that Git currently provides
> no way to support the incremental workflow against large CVS repos.
> Maybe that is ok, and we can ignore that, waiting for the few
> remaining large CVS repos to die? Or maybe we need a new effort to
> fill this niche? Something that is NOT based on a full-history
> converter, and does NOT try to guarantee a history-correct conversion,
> but that DOES try to guarantee fast and relatively worry-free two-way
> synchronization against a CVS server. Unfortunately (or fortunately,
> depending on POV) I have not had to touch CVS in a long while, and I
> don't see that changing soon, so it is not my itch to scratch.

Nor mine.  I find the very idea of writing anything that encourages
non-history-correct conversions disturbing and want no part of it.

Which matters, because right now the set of people working on CVS lifters
begins with me and ends with Michael Rafferty (cvs2git), who seems even
less interested in incremental conversion than I am.  Unless somebody
comes out of nowhere and wants to own that problem, it's not going
to get solved.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-17 Thread Eric S. Raymond
Johan Herland :
> HOWEVER, this only solves the "cheap" half of the problem. The reason
> people want incremental CVS import, is to avoid having to repeatedly
> convert the ENTIRE CVS history. This means that the CVS exporter must
> learn to start from a given point in the CVS history (identified by
> the above mapping) and then quickly and efficiently convert only the
> "new stuff" without having to consult/convert the rest of the CVS
> history. THIS is the hard part of incremental import. And it is much
> harder for systems like CVS - where the starting point has a broken
> concept of history...

I know of *no* importer that solves what you call the "deep" part of
the problem.  cvsps didn't, cvs-fast-import doesn't, cvs2git doesn't.
All take the easy way out; parse the entire history, and limit what
is emitted in the output stage.

Actually, given what I know about delta-file parsing I'd say a "true"
incremental CVS exporter would be so hard that it's really not worth the
bother.  The problem is the delta-based history representation.
Trying to interpret that without building a complete set of history
states in the process (which is most of the work a whole-history
exporter does) would be brutally difficult - barely possible in
principle maybe, but I wouldn't care to try it.

It's much more practical to tune up a whole-history exporter so it's
acceptably fast, then do incremental dumping by suppressing part of
the conversion in the output stage. 

cvs-fast-export's benchmark repo is the history of GNU troff.  That's
3057 commits in 1549 master files; when I reran it just now the
whole-history conversion took 49 seconds.  That's 3.7K commits a
minute, which is plenty fast enough for anything smaller than (say)
one of the *BSD repositories.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-17 Thread Eric S. Raymond
Jakub Narębski :
> I wonder if we can add support for incremental import once, for all
> VCS supporting fast-export, in one place, namely at the remote-helper.

Something in the pipeline - either the helper or the exporter - needs to
have an equivalent of vc-fast-export's and cvsps's -i option, which
omits all commits before a specified time and generates cookies like
"from refs/heads/master^0" before each branch root in the incremental
dump.

This could be done in the wrapper, but only if the wrapper itself
includes an import-stream parser, interprets the output from the
exporter program, and re-emits it.  Having done similar things
myself in reposurgeon, I advise against this strategy; it would
introduce a level of complexity to the wrapper that doesn't belong
there, and make the exporter+wrapper comnination harder to verify.

Fortunately, incremental dump is trivial to implement in the output
stage of an exporter if you have access to the exporter source code.
I've done it in two different exporters.  cvs-fast-export now has a
regression test for this case

> I don't know details, so I don't know if it is possible; certainly
> unstable fast-export output would be a problem, unless some tricks
> are used (like remembering mappings between versions).

About such tricks I can only say "That way lies madness".  The present
Perl wrapper is buggy because it's over-complex.  The replacement wrapper
should do *less*, not more.

Stable output and incremental dump are reasonable things to demand of
your supported exporters.  cvs-fast-export has incremental dump
unconditionally, and stability relative to every CVS implementation
since 2004.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] t9605: test for cvsps commit ordering bug

2013-12-14 Thread Eric S. Raymond
Replying to very old but newly relevant mail:

Chris Rorvick :
> Import of a trivial CVS repository fails due to a cvsps bug.

The t9605 test you sent me is now part of cvs-fast-export's 
regression-test suite, along with suitably adapted versions of
t960[1-4] from the git tree.  Here is a summary of the results:

t9601:
|
|| cvsps| cvs-fast-export
|import a module with a vendor branch| Succeeds | Succeeds 
|check master out of git repository  | Succeeds | Succeeds 
|check a file imported once  | Fails| Succeeds 
|check a file imported twice | Succeeds | Succeeds 
|check a file imported then modified on HEAD | Succeeds | Succeeds 
|...imported, modified, then imported again  | Succeeds | Succeeds 
|check a file added to HEAD then imported| Succeeds | Fails
| a vendor branch whose tag has been removed | Succeeds | Succeeds
|

t9602:
|
|| cvsps| cvs-fast-export
|import module   | Succeeds | Succeeds 
|test branch master  | Succeeds | Succeeds 
|test branch vendorbranch| Succeeds | Fails
|test_branch B_FROM_INITIALS | Fails| Succeeds
|test_branch B_FROM_INITIALS_BUT_ONE | Fails| Fails
|test_branch B_MIXED | Fails| Succeeds
|test_branch B_SPLI  | Succeeds | Succeeds
|test branch vendortag   | Fails| Succeeds
|test tag T_ALL_INITIAL_FILES| Succeeds | Succeeds
|test tag T_ALL_INITIAL_FILES_BUT_ONE| Fails| Fails
|test_tag T_MIXED| Fails| Succeeds
|

t9603:
cvsps fails this test; cvs-fast-export succeeds.

t9604:
cvsps and cvs-fast-export both succeed at this test.

t9605:
cvsps fails this test; cvs-fast-export succeeds.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Eric S. Raymond
Martin Langhoff :
> On Thu, Dec 12, 2013 at 6:04 PM, Eric S. Raymond  wrote:
> > I'm not sure what counts as a nonsensical branching point. I do know that
> > Keith left this rather cryptic note in a REAME:
> 
> Keith names exactly what we are talking about.

Oh, yeah, I figured that much out.  What I wasn't clear on was (a) whether
that's a complete description of "nonsensical branching point" or whether there
are other pathologies fundamentally *different* from that one.

I'm also not sure I have the end state of what cvs-fast-export does in that
case visualized correctly. When he says: "an entirely disjoint history will
be created containing the branch revisions and all parents back to the
root", I'm visualizing something like this:

  abcdefgh
  \
   +123---4

Suppose the root is a our pathological branch point is at d, then it
sounds like he's saying cvs-fast-export will produce a changeset DAG
that looks like this:

  ab'---c'---d'---efgh
   \
+b''---c''---d''1234

What I'm not clear on here is how b is related to b' and b'', c to c' and c'',
and d to d' and d''.  Which file changes go to which commit?  I shall have to
craft some broken RCS files to find out.

Have I explained that I'm building a test suite?  I intend to know exactly
what the tool does in these cases and document it.

> Between my earlier explanation and Keith's notes it should be clear to
> you. It is absolutely trivial in CVS to have an "inconsistent"
> checkout (for example, if you switch branch with the -l parameter
> disabling recursion, or if you accidentally switch branch in a
> subdirectory).

That last one sounds easy to fall into and nasty. 

> On that inconsistent checkout, nothing prevents you from tagging it,
> nor from creating a new branch.
> 
> An importer with a 'consistent tree mentality' will look at the
> files/revs involved in that tag (or branching point) and find no tree
> to match.
> 
> CVS repos with that crap exist. x11/xorg did (Jim Gettys challenged me
> to try importing it at an LCA, after the Bazaar NG folks passed on
> it). Mozilla did as well.
> 
> 
> IMHO it is a valid path to skip importing the tag/branch. As long as
> main dev work was in HEAD, things end up ok (which goes back to my
> flying fish notes).

The other way to handle it would be to translate the history as though every
branch of a file subset had been an attempt to branch eveything.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Eric S. Raymond
Martin Langhoff :
> On Thu, Dec 12, 2013 at 3:58 PM, Eric S. Raymond  wrote:
> >>  - regardless of commit ids, do you synthesize an artificial commit?
> >> How do you define parenthood for that artificial commit?
> >
> > Because tagging is never used to deduce changesets, the case does not arise.
> 
> So if a branch has a nonsensical branching point, or a tag is
> nonsensical, is it ignored and not imported?

I don't know what happens when identically-named tags point at changes that
resolve into two different commits.  I will figure that out and document it.

There's evidence, in the form of some code that is #ifdefed out, that 
Keith considered trying to make synthetic commits from tag cliques. But
abandoned the idea because he couldn't figure out how to assign such
cliques to a branch.

I'm not sure what counts as a nonsensical branching point. I do know that
Keith left this rather cryptic note in a REAME:

Disjoint branch resolution. Branches occurring in a subset of the
files are not correctly resolved; instead, an entirely disjoint
history will be created containing the branch revisions and all
parents back to the root. I'm not sure how to fix this; it seems
to implicitly assume there will be only a single place to attach as
branch parent, which may not be the case. In any case, the right
revision will have a superset of the revisions present in the
original branch parent; perhaps that will suffice.

-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Eric S. Raymond
Martin Langhoff :
> If someone creates a nonsensical tag or branch point, tagging files
> from different commits, how do you handle it?
> 
>  - without commit ids, does it affect your guesses?

No.  Tagging is never used to deduce changesets. Look:

/*
 * The heart of the merge operation; detect when two
 * commits are "the same"
 */
static bool
rev_commit_match (rev_commit *a, rev_commit *b)
{
/*
 * Versions of GNU CVS after 1.12 (2004) place a commitid in
 * each commit to track patch sets. Use it if present
 */
if (a->commitid && b->commitid)
return a->commitid == b->commitid;
if (a->commitid || b->commitid)
return false;
if (!commit_time_close (a->date, b->date))
return false;
if (a->log != b->log)
return false;
if (a->author != b->author)
return false;
return true;
}

>  - regardless of commit ids, do you synthesize an artificial commit?
> How do you define parenthood for that artificial commit?

Because tagging is never used to deduce changesets, the case does not arise.

I have added an item to my to-do: document what the tool does with
inconsistent tags.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Eric S. Raymond
Martin Langhoff :
> IIRC, making the output stable is nontrivial, specially on branches.
> Two cases are still in my mind, from when I was wrestling with cvsps.
> 
> 1 - For a history with CVS HEAD and a long-running "stable release"
> branch ("STABLE"), which branched at P1...
> 
>a - adding a file only at the tip of STABLE "retroactively changes
> history"  for P1 and perhaps CVS HEAD
> 
>b - forgetting to properly tag a subset of files with the branch
> tag, and doing it later retroactively changes history
> 
> 2 - you can create a new branch or tag with files that do not belong
> together in any "commit". Doing so changes history retroactively
> 
> ... when I say "changes history", I mean that the importers I know
> revise their guesses of what files were seen together in a 'commit'.
> This is specially true for history recorded with early cvs versions
> that did not record a 'commit id'.

Yikes!  That is a much stricter stability criterion than I thought you
were specifying.   No, cvs-fast-export probably doesn't satify all of these.
I think it would handle 1a in a stable way, but 1b and 2 would throw it.

I'm sure it can't be fooled in the presence of commitids, though,
because when it has those it doesn't try to do any similarity
matching.  And (this is the important point) it won't match any change
with a commit-id to any change without one.

What I think this means is that cvs-fast-export is stable if you are
using a server/client combination that generates commitids (that is,
GNU CVS of any version newer than 1.12 of 2004, or CVS-NT). It is
*not* necessary for stability that the entire history have them.

Here's how the logic works out:

1. Commits grouped by commitid are stable - nothing in CVS ever rewrites
those or assigns a duplicate.

2. No file change made with a commitid can destabilize a commit guess
made without them, because the similarity checker never tries to put both 
kinds in a single changeset.

Can you detect any flaw in this?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Eric S. Raymond
Martin Langhoff :
> In my prior work, the "better" CVS importers would not have stable
> output, so were not appropriate for incremental imports.

That is disturbing.  I would consider lack of stability a severe and
unacceptable failure mode in such a tool, if only because of the
difficulties it creates for proper regression testing.

If cvs-fast-export does not already have this property I will fix it 
so it does.  And document that fact.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Eric S. Raymond
Andreas Krey :
> But anyway, the replacement question is a) how fast the cvs-fast-export is
> and b) whether its output is stable, that is, if the cvs repo C yields
> a git repo G, will then C with a few extra commits yield G' where every
> commit in G (as identified by its SHA1) is also in G', and G' additionally
> contains the new commits that were made to the CVS repo.
> 
> If that is the case you effectively have an incremental mode, except that
> it's not quite as fast.

I am almost certain the output of cvs-fast-export is stable.  I
believe the output of cvsps-3.x was, too.  Not sure about 2.x.

I wrote the output stages for both cvsps-3.x and cvs-fast-export, and
went to some effort to verify that they write streams in the same
"most natural" way - marks sequential from :1, blobs always witten as
late as possible, fileops in the same sort order the git tools emit,
etc.

I have added writing a regression test test to verify the stability
property to the TODO list. I will have this nailed down before the
next point release, in a few days.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-12 Thread Eric S. Raymond
Martin Langhoff :
> On Wed, Dec 11, 2013 at 11:26 PM, Eric S. Raymond  wrote:
> > You'll have to remind me what you mean by "incremental" here. Possibly
> > it's something cvs-fast-export could support.
> 
> User can
> 
>  - run a cvs to git import at time T, resulting in repo G
>  - make commits to cvs repo
>  - run cvs to git import at time T1, pointed to G, and the import tool
> will only add the new commits found in cvs between T and T1.

No, cvs-fast-export doesn't do that. However, it is fast enough that
you can probably just rebuild the whole repo each time you want to
move content. 

When I did the conversion of groff recently I was getting rates of
about 150 commits a second - and it will be faster now, because I
found an expensive operation in the output stage I could optimize
out.

Now that you have reminded me of this, I remember implementing a -i
option for cvsps-3.0 that could be combined with a time restriction 
to output incremental dumps. It's likely I could do the same
thing for cvs-fast-import.

> The above examples assume that the CVS repos have used "flying fish"
> approach in the "interesting" (i.e.: recent) parts of their history.
> 
> [ Simplifying a bit for non-CVS-geeks -- flying fish is using CVS HEAD
> for your development, plus 'feature branches' that get landed, plus
> long-lived 'stable release' branches. Most CVS projects in modern
> times use flying fish, which is a lot like what the git project uses
> in its own repo, but tuned to CVS's strengths (interesting commits
> linearized in CVS HEAD).
> 
> Other approaches ('dovetail') tend to end up with unworkable messes
> given CVS's weaknesses. ]

That terminology -- "flying fish" and "dovetail" -- is interesting, and
I have not heard it before.  It might be woth putting in the Jargon File.
Can you point me at examples of live usage?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I have end-of-lifed cvsps

2013-12-11 Thread Eric S. Raymond
Martin Langhoff :
> On Wed, Dec 11, 2013 at 7:17 PM, Eric S. Raymond  wrote:
> > I tried very hard to salvage this program - the ability to
> > remote-fetch CVS repos without rsync access was appealing
> 
> Is that the only thing we lose, if we abandon cusps? More to the
> point, is there today an incremental import option, outside of
> git-cvsimport+cvsps?

You'll have to remind me what you mean by "incremental" here. Possibly
it's something cvs-fast-export could support.

But what I'm trying to tell you is that, even after I've done a dozen
releases and fixed the worst problems I could find, cvsps is far too
likely to mangle anything that passes through it.  The idea that you
are preserving *anything* valuable by sticking with it is a mirage.

"That bear trap!  It's mangling your leg!"  "But it's so *shiny*..."

> [ I am a bit out of touch with the current codebase but I coded and
> maintained a good part of it back in the day. However naive/limited
> the cvsps parser was, it did help a lot of projects make the leap to
> git... ]

I fear those "lots of projects" have subtly damaged repository
histories, then.  I warned about this problem a year ago; today I
found out it is much worse than I knew then, in fact so bad that I
cannot responsibly do anything but try to get cvsps turfed out of use
*as soon as possible*.

And no, that should *not* wait on cvs-fast-export getting better 
support for "incremental" or any other legacy feature.  Every week
that cvsps remains the git project's choice is another week in which
somebody's project history is likely to get trashed.

This feels very strange and unpleasant.  I've never had to shoot one
of my own projects through the head before.

I blogged about it: http://esr.ibiblio.org/?p=5167

Ignore the malware warning. It's triggered by something else on ibiblio.org;
they're fixing it.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I have end-of-lifed cvsps

2013-12-11 Thread Eric S. Raymond
On the git tools wiki, the first paragraph of the entry for cvsps now
reads:

  Warning: this code has been end-of-lifed by its maintainer in favor of
  cvs-fast-export. Several attempts over the space of a year to repair
  its deficient branch analysis and tag assignment have failed.  Do not
  use it unless you are converting a strictly linear repository and
  cannot get rsync/ssh read access to the repo masters. If you must use
  it, be prepared to inspect and manually correct the history using
  reposurgeon.

I tried very hard to salvage this program - the ability to
remote-fetch CVS repos without rsync access was appealing - but I
reached my limit earlier today when I actually found time to assemble
a test set of CVS repos and run head-to-head tests comparing cvsps
output to cvs-fast-export output.

I've long believed that that cvs-fast-export has a better analyzer
than cvsps just from having read the code for both of them, and having
had to fix some serious bugs in cvsps that have no analogs in
cvs-fast-export.  Direct comparison of the stream outputs revealed
that the difference in quality was larger than I had prevously grasped.

Alas, I'm afraid the cvsps repo analysis code turns out to be crap all
the way down on anything but the simplest linear and near-linear
cases, and it doesn't do so hot on even those (all this *after* I
fixed the most obvious bugs in the 2.x version). In retrospect, trying
to repair it was misdirected effort.

I recommend that git sever its dependency on this tool as soon as
possible. I have shipped a 3.13 release with deprecation warnings fot
archival purposes, after which I will cease maintainance and redirect
anyone inquiring about cvsps to cvs-fast-export.

(I also maintain cvs-fast-export, but credit for the excellent analysis code 
goes to Keith Packard.  All I did was write the output stage, document
it, and fix a few minor bugs.)
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

You [should] not examine legislation in the light of the benefits it will
convey if properly administered, but in the light of the wrongs it
would do and the harm it would cause if improperly administered
-- Lyndon Johnson, former President of the U.S.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove ciabot from contrib

2013-09-26 Thread Eric S. Raymond
Stefan Beller :
> According to
> http://thread.gmane.org/gmane.comp.version-control.git/212649
> Eric, the original author of ciabot, doesn't want the ciabot
> no longer be included in git.git, hence the removal of the
> whole directory.

Note: I was *not* the original author of the ciabot scripts.  I was
their maintainer (baton passed to me by the original authors) when
the CIA service irrecoverably crashed, and did suggest they be
removed.  (It is however true that I had rewritten the scripts
pretty heavily, enough so to perhaps be considered a coauthor.)

Junio demurred based on some representations that a development team
not including the CIA author had plans to revive the CIA service.  I said
"Wait and see, then" - having the ciabot stuff carried in git was
doing me no harm, I was just doing what I thought was my duty by
suggesting the cleanup.

That was almost exactly a year ago now.  The CIA revival effort has since 
sunk without trace.  In part, this is because I fielded a much simpler 
and properly decentralized replacement called "irker" which is now 
widely enough deployed to have suppressed the demand for CIA.  Repository
hook scripts for irker ship with the irker distribution.

I think enough time has passed that removal would be appropriate.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: State of CVS-to-git conversion tools (Was: Re: cvsps: bad usage: invalid argument --norc)

2013-04-23 Thread Eric S. Raymond
Ilya Basin :
> For new branches the 'from' command can refer the common ancestor in
> an existing branch. For example:
> 
>  /--E thebranch
> /
> A---B---C---D master
> 
> Commit E is newer than D; we already imported D; thebranch is new.
> Instead of:
> from refs/heads/thebranch^0
> refer the parent as:
> from refs/heads/master^2

Understood.  Do you actually need this much generality in practice, 
or is it a theoretical case?

> OK, something's wrong with the man page: starting with '-A' the
> description is unstructured:

Interesting.  The aciidoc parser got a little confused, but inserting
some blank lines fixed it. 
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: State of CVS-to-git conversion tools (Was: Re: cvsps: bad usage: invalid argument --norc)

2013-04-23 Thread Eric S. Raymond
Apologies for the somewhat belated reply.  I've been even busier than
usual lately and am about to be traveling for a week.

Ilya Basin :
> Hi Eric.
> 
> ESR> cvs-fast-export does not have incremental-import support.
> ESR> Whether git-cvs-import has it depend on which version you have
> ESR> and what backend it it is using. I don't maintain that wrapper.
> Did you mean "git-fast-import"? Or do you know any wrapper that
> already uses cvsps3 --fast-export?

No, I meant git-cvs-import.  I wrote a version of it that supports
cvsps3, but Junio chose to keep the old wrapper.  Apparently he would
rather inflict cvsps2's rather serious known bugs on users than break
backward compatibility even a little.  

> First of all, I think cvsps3 has almost everithing required for
> incremental import: one could just take the date of the last commit
> and invoke cvs ps with the '-d' flag. However, to import new commits
> into existing branches the stream should contain the "from" command in
> oldest commits in each branch (now missing).
> If the branch already exists in the target git repo, it's easy to
> refer it in the stream:
> from refs/heads/branchname^0

Look at the -i option.  That may do what you need.
 
> But if the branch is new, but it's parent commit is already imported,
> I guess, the only way to refer it is by its SHA-1
> Eric, what parent information can cvsps provide for the first commit
> in a branch, when invoked with the '-d' flag?

At the moment it doesn't provide any at all.  That case wasn't on my
radar when I was fixing the code.  If you can specify a behavior you
think would be useful, I'm listening.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: State of CVS-to-git conversion tools (Was: Re: cvsps: bad usage: invalid argument --norc)

2013-04-18 Thread Eric S. Raymond
Ilya Basin :
> Hi Eric.
> 
> I tried --fast-export. It's 2 times faster.
> The first thing that differs: in cvsps2 commits with adjacent
> timestamps were joined into one (see the attached files). Do you know
> the reason?

The cvsps guy included code to do that. Keith Packard didn't.  
Sorry I can't be more helpful, but that's about all I know.

I didn't write either analysis stage; I understand cvsps's, somewhat,
because I had to fix several nasty bugs in it.  I *don't* understand
cvs-fast-export's analysis stage very well yet, because it has no
obvious bugs that have required me to dive in.  (Keith's notes
document one major bug, which may be inherent to the mismatch between
file- and changest-orientation and not fixable in the general case,
though I will try.)

> Does this --fast-export thing support what John mentioned, the
> "incremental import support"? Does 'git fast-import' has it?

cvs-fast-export does not have incremental-import support.  Whether
git-cvs-import has it depend on which version you have and what
backend it it is using. I don't maintain that wrapper.

> I need it, because full import takes too long.
> The central repo of my employer is CVS, other people commit to it and
> I use git internally to be able to tidy my commit history before
> exporting to CVS.

You are out of luck. That feature was dependent on a very fragile
coupling between the old output format and a bunch of unmaintainably 
horrible Perl in the git-cvs-import wrapper script.  It didn't
work very well; frankly, I'm amazed it worked at all.

The things I had to do to fix the serious bugs in cvsps2 and make it
output a fast-import stream had the side effect of breaking that
coupling. cvsps3 won't give you that feature. Dropping back to cvsps2
to keep that feature will expose you to the cvsps2 bugs.

I'm sorry these tools are such a mess.  I'm trying to fix that, but
it's hard, slow work.  The problems are deeply ugly and the edge cases
have poisoned spikes.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


State of CVS-to-git conversion tools (Was: Re: cvsps: bad usage: invalid argument --norc)

2013-04-14 Thread Eric S. Raymond
Ilya Basin :
> IB> Hi esr.
> IB> In cvsps 3.10 the flag --norc was removed. It broke 'git cvsimport'.
> IB> Please give the option back and write something in the man page like:
> IB> This option has no effect; it is present for compatibility
> 
> Looks like the tool is completely different. I think I'll have to
> downgrade.

Or you could just use 3.x directly rather than through the git-cvsimport
wrapper.  It works better that way - it actually ships a fast-import 
stream that git fast-import can consume directly.

Old cvsps (2.x) was very, very broken; there was a bug in it that
pretty much guaranteed incorrect conversion of branchy repos.  I've
fixed that particular bug, and several other serious ones, along with
adding the import-stream output stage, but I don't really trust the
cvsps code; I think its algorithmic core is weak and klugey and only
works semi-by-accident.

I do *not* recommend downgrading, it will pitch you from a bad
situation into a worse one.  Yes, Junio is still shipping a wrapper
for 2.x, but that was very much against my advice.

I'm also now the maintainer of cvs-fast-export, which used to be Keith 
Packard's utility for lifting the X.org CVS repo to git.  That was before
I resurrected it and added a fast-import stream output stage. 

I *think* cvs-fast-export's algorithms are more correct than cvsps's,
but I have not yet been able to pry loose the time to write the really
rigorous test suite to verify this.  That goal has been second on my
prority list for a couple of months now; I keep wanting to get to it
but having to fight fires on other projects instead.

I wish I could point you at a convertor I really trust.  I can't.
There is a third tool, cvs2git (based on the analyzer from cvs2svn)
that I don't maintain, which has problems of its own. And those three
are about it.

Yes, it's a swamp. The relatively poor capability of the tools isn't
anybody's fault; the problem domain is *nasty*.  I've been working a
closely related but easier one (Subversion stream dump analysis) for a
couple years now and understanding it doesn't make it less ugly.

I think with 4-6 weeks of concentrated attention I could clean up the
mess and deliver a really high-quality converter, and I'm motivated to
do this because I want it as a CVS front end for reposurgeon.  But it
hasn't happened yet and the incompleteness of my test suite is a
blocker in the way.

The topo analysis code in all these tools is really fragile and tends
to break on old edge cases when you try to teach it to handle new
ones, so a good set of regression tests is especially important.  And
doesn't yet exist, though I have built a decent start for cvsps based
on the tests in the git tree.

Do you have enough interest and spare cycles to help finish the test
suite?  Another pair of hands on it might speed things up a lot.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-cvsimport-3 and incremental imports

2013-01-21 Thread Eric S. Raymond
John Keeping :
> > Ah.  OK, that is yet another bug inherited from 2.x - the code doesn't
> > match the documented (and correct) behavior.  Please send me a patch
> > against the cvsps repo, I'll merge it.
> 
> Should now be in your inbox.

Received, merged, tested, and cvsps-3.10 has shipped.
 
> I think the only way to do it without needing to save local state in the
> Git repository would be to teach cvsps to read a table of refs and times
> from its stdin so that we could do something like:
> 
> git for-each-ref --format='%(refname)%09%(*authordate:raw)' refs/heads/ |
> cvsps -i --branch-times-from-stdin |
> git fast-import
> 
> Then cvsps could create a hash table from this and use that to decide
> whether a patch set is interesting or not.

Agreed.  I considered implementing something quite this before thinking of
the ^0 hack.  But an out-of-band timestamp file is much simpler.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-cvsimport-3 and incremental imports

2013-01-21 Thread Eric S. Raymond
John Keeping :
> I also disagree that cvsps outputs commits *newer* than T since it will
> also output commits *at* T, which is what I changed with the patch in my
> previous message.

Ah.  OK, that is yet another bug inherited from 2.x - the code doesn't
match the documented (and correct) behavior.  Please send me a patch
against the cvsps repo, I'll merge it.

> Perhaps it is simplest to just save a CVS_LAST_IMPORT_TIME file in
> $GIT_DIR and not worry about it any more.

Yes, I think you're right. Trying to carry that information in-band would
probably doom us to all sorts of bug-prone complications.

Thanks for the good analysis.  I wish everybody I had to chase bugs with
could explain them with such clarity and concision.

Sigh. Now I have to figure out if cvsps's behavior can be rescued in Chris
Rorvick's recently-discovered failure case. I'm not optimistic.
-- 
            http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-cvsimport-3 and incremental imports

2013-01-21 Thread Eric S. Raymond
John Keeping :
> But this is nothing more than a sticking plaster that happens to do
> enough in this particular case

I'm beginning to think that's the best outcome we ever get in this
problem domain...

>- if the Git repository happened to be on
> a different branch, the start date would be wrong and too many or too
> few commits could be output.  Git doesn't detect that they commits are
> identical to some that we already have because we're explicitly telling
> it to make a new commit with the specified parent.

Then I don't understand the actual failure case.  Either that or you
don't understand the effect of -i. Have you actually experimented with
it?  The reason I suspect you don't understand the feature is that it
shouldn't make any difference to the way -i works which repository branch is
active at the time of the second import.

Here is how I model what is going on:

1. We make commits to multiple branches of a CVS repo up to some given time T.

2. We import it, ending up with a collection of git branches all of which 
   have tip commits dated T or earlier. And *every* commit dated T or earlier
   gets copied over.

3. We make more commits to the same set of branches in CVS.

4. We now run cvsps -d T on the repo. This generates an incremental
   fast-import stream describing all CVS commits *newer* than T (see
   the cvsps manual page).

5. That stream should consist of a set of disconnected branches, each
   (because of -i) beginning with a root commit containing "from
   refs/heads/foo^0" which says to parent the commit on the tip of
   branch foo, whatever that happens to be.  (I don't have to guess
   about this, I tested the feature before shipping.)

6. Now, when git fast-import interprets that stream in the context of
   the repository produced in step 2, for each branch in the
   incremental dump the branch root commit is parented on the tip
   commit of the same branch in the repo.
 
At step 6, it shouldn't matter at all which branch is active, because
where an incremental branch root gets attached has nothing to do with
which branch is active. 

It is sufficient to avoid duplicate commits that cvsps -d 0 -d T and
cvsps -d T run on the same CVS repo operate on *disjoint sets* of CVS
file commits.  I can see this technique possibly getting confused if T
falls in the middle of a changeset where the CVS timestamps for the
file commits are out of order.  But that's the same case that will
fail if we're importing at file-commit granularity, so there's no new
bug here.

Can you explain at what step my logic is incorrect?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] fixup remaining cvsimport tests

2013-01-20 Thread Eric S. Raymond
> > I probably won't be sending any more patches on this.  My hope was to
> > get cvsimport-3 (w/ cvsps as the engine) in a state such that one
> > could transition from the previous version seamlessly.  But the break
> > in t9605 has convinced me this is not worth the effort--even in this
> > trivial case cvsps is broken.  The fuzzing logic aggregates commits
> > into patch sets that have timestamps within a specified window and
> > otherwise matching attributes.  This aggregation causes file-level
> > commit timestamps to be lost and we are left with a single timestamp
> > for the patch set: the minimum for all contained CVS commits.  When
> > all commits have been processed, the patch sets are ordered
> > chronologically and printed.
> >
> > The problem is that is that a CVS commit is rolled into a patch set
> > regardless of whether the patch set's timestamp falls within the
> > adjacent CVS file-level commits.  Even worse, since the patch set
> > timestamp changes as subsequent commits are added (i.e., it's always
> > picking the earliest) it is potentially indeterminate at the time a
> > commit is added.  The result is that file revisions can be reordered
> > in resulting Git import (see t9605.)  I spent some time last week
> > trying to solve this but I coudln't think of anything that wasn't a
> > substantial re-work of the code.

I've lost who was who in the comment thread, but I think it is rather likely
that the above diagnosis is correct in every respect.

I won't know for certain until I finish the test suite and apply it to
all three tools (cvsps, cvs2git, cvs-fast-export) but what I've seen
of their code indicates that cvsps has the weakest changeset analysis of
the three, even after my fixes.

> > I have never used cvs2git, but I suspect Eric's efforts in making it a
> > potential backend for cvsimport are a better use of time.

Agreed.  I didn't add multiengine support to csvsimport at random or
just because Heiko Vogt was bugging me about parsecvs.  I was
half-expecting cvsps to manifest a showstopper like this - hoping it
wouldn't, but hedging against the possibility by making alternate
engines easy to plug into git-cvsimport seemed like a *really good
idea* from the beginning of my work on it.  Sometimes being that kind
of right really sucks.

While I am going to have a try at modifying cvsps to make Chris's
t9605 case work, I'm going to strictly limit the amount of time I
spend on that effort since (as you imply) it is fairly likely this
would be throwing good money after bad.

> Fixing this seemed like it would require splitting the processing out
> into a couple phases and would be a fair amount of work, but maybe I'm
> just not looking at the problem right.

Actually I think you've called it *exactly* right.  The job has to be 
done in multiple clique-spitting phases - that's why cvs2git has 7 passes
(though a few of those, perhaps as many as 3, are artifactual).

This is why the next step in my current work plan for CVS-related stuff will
be unbundling my test suite from the cvsps tree and running it to see if
cvs-fast-export dominates cvsps.  

I'm expecting that it will, in which case my plan will be to salvage
the CVS client code out of cvsps (*that* part is quite good - fast,
clean, effective) gluing it to the better analysis stage in
cvs-fast-export, and then shooting cvsps through the head and burying
it behind the barn.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-cvsimport-3 and incremental imports

2013-01-20 Thread Eric S. Raymond
Jonathan Nieder :
> Junio proposed a transition strategy, but I don't think it's fair to
> say he has chosen it without discussion or is imposing it on you.

I have said everything I could about the bad effects of encouraging
people to continue to use cvsps-2.x, it didn't budge Junio an
inch, and I'm tired of fighting about it.  Quibbling about the 
semantics of 'impose' will neither change these facts nor make
me any less frustrated with the outcome.

I will continue to do what I can to make cvsps-3.x and cvs-fast-export as
bug-free as possible, given the innate perverseness of CVS.  They
won't be perfect; they will be *better*.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-cvsimport-3 and incremental imports

2013-01-20 Thread Eric S. Raymond
John Keeping :
> I don't think there is any way to solve this without giving cvsps more
> information, probably the last commit time for all git branches, but
> perhaps I'm missing a fast-import feature that can help solve this
> problem.

Yes, you are.  The magic incantation is

from refs/heads/^0

I've just pushed a cvsps-3.9 with an -i option that generates these at
each branch root.  Combine it with -d and you get incremental
fast-export.

You get to integrate this.  I think the transition strategy Junio
has chosen is seriously mistaken, leading to unnecessary grief for users
who will be fooled into thinking it's OK to still use cvsps-2.x. Because
I do not wish to encourage or endorse this mistake and am tired of arguing
against stubborn determination to do the wrong thing, I am not going to 
sink more effort into the git project's end of the CVS-lifting problem.
There are too many better uses for my time.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cvs-fast-export release announcement

2013-01-13 Thread Eric S. Raymond
Version 0.2 of the code formerly known as parsecvs has just shipped as
cvs-fast-export.  Project page, with links to documentation and the
public repository, is at <http://www.catb.org/esr/cvs-fast-export/>.

I have some cvsps and reposurgeon patches to merge before I can get
back to the script wrapper.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

The most foolish mistake we could possibly make would be to permit 
the conquered Eastern peoples to have arms.  History teaches that all 
conquerors who have allowed their subject races to carry arms have 
prepared their own downfall by doing so.
-- Adolph Hitler, April 11 1942.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cvsimport: rewrite to use cvsps 3.x to fix major bugs

2013-01-12 Thread Eric S. Raymond
Michael Haggerty :
> Otherwise, how do we know that cvsps currently works with git-cvsimport?
> (OK, you claim that it does, but in the next breath you admit that
> there is a new failure in "one pathological tagging case".)  How can we
> understand its strengths/weaknesses?  How can we gain confidence that it
> works on different platforms?  How will we find out if a future versions
> of cvsps stops working (e.g., because of a breakage or a
> non-backwards-compatible change)?

You can't.  But in practice the git crew was going to lose that
capability anyway simply because the new wrapper will support three
engines rather than just one.  It's not practical for the git tests to
handle that many variant external dependencies.

However, there is a solution.

The solution is for git to test that the wrapper is *generating the
expected commands*.  So what the git tree ends up with is conditional
assurance; the wrapper will do the right thing if the engine it calls
is working correctly.  I think that's really all the git-tree tests
can hope for.

Michael, the engines are my problem and yours - it's *our*
responsibility to develop a (hopefully shared) test suite to verify
that they convert repos correctly.  I'm working my end as fast as I can;
I hope to have the test suite factored out of cvsps and ready to check 
multiple engines by around Wednesday.  I still need to convert t9604,
too.

I have parsecvs working since yesterday, so we really are up to three
engines.

I have two minor features I need to merge into parsecvs before 
I can start on splitting out the test suite.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cvsimport: rewrite to use cvsps 3.x to fix major bugs

2013-01-12 Thread Eric S. Raymond
Junio C Hamano :
> And here is what I got:

Hm. In my version of these tests, I only have one regression from the
old combo (in the pathological tags test, t9602).  You're seeing more
breakage than that, obviously.

> A funny thing was that without cvsps-3.7 on $PATH (which means I am
> getting distro packaged cvsps 2.1), I got identical errors.

That suggests that something in your test setup has gone bad and is
introducing spurious errors. Which would be consistent with the above.

> Looking
> at the log message, it seems that you meant to remove t960[123], so
> perhaps the patch simply forgot to remove 9601 and 9602?

Yes.
 
> As neither test runs "git cvsimport" with -o/-m/-M options, ideally
> we should be able to pass them with and without having cvsps-3.x.
> Not passing them without cvsps-3.x would mean that the fallback mode
> of rewritten cvsimport is not working as expected. Not passing them
> with cvsps-3.x may mean the tests were expecting a wrong conversion
> result, or they uncover bugs in the replacement cvsimport.

That's possible, but seems unlikely.  Because the new cvsimport is
such a thin wrapper around the conversion engine, bugs in it should
lead to obvious crashes or failure to run the engine rather than the 
sort of conversion error the t960* tests are designed to check.  Really
all it does is assemble options to pass to the conversion engines.

My test strategy is aimed at the engine, not the wrapper. I took the
repos from t960*  and wrote a small Python framework to check the same 
assertions as the git-tree tests do, but using the engine.  For example,
here's how my t9602 looks:

import os, cvspstest

cc = cvspstest.ConvertComparison("t9602", "module")
cc.cmp_branch_tree("test of branch", "master", True)
cc.cmp_branch_tree("test of branch", "vendorbranch", True)
cc.cmp_branch_tree("test of branch", "B_FROM_INITIALS", False)
cc.cmp_branch_tree("test of branch", "B_FROM_INITIALS_BUT_ONE", False)
cc.cmp_branch_tree("test of branch", "B_MIXED", False)
cc.cmp_branch_tree("test of branch", "B_SPLIT", True)
cc.cmp_branch_tree("test of tag", "vendortag", False)
# This is the only test new cvsps fails that old git-cvsimport passed.
cc.cmp_branch_tree("test of tag", "T_ALL_INITIAL_FILES", True)
cc.cmp_branch_tree("test of tag", "T_ALL_INITIAL_FILES_BUT_ONE", False)
cc.cmp_branch_tree("test of tag", "T_MIXED", False)
cc.cleanup()
 
> t9600 fails with "-a is no longer supported", even without having
> cvsps-3.x on the $PATH (i.e. attempting to use the fallback).  I
> wonder if this is an option the updated cvsimport would want to
> simply ignore?

Probably.  But I don't think you should keep these tests in the git tree.
That wasn't a great idea even when you were supporting just one engine;
with two (and soon three) it's going to be just silly.  Let sanity-checking
the engines be *my* problem, since I have to do it anyway.

(I'm working towards the generalized test suite as fast as I can.  First
results probably in four days or so.)

> It is a way to tell the old cvsps/cvsimport to disable its
> heuristics to ignore commits made within the last 10 minutes (this
> is done in the hope of waiting for the per-file nature of CVS
> commits to stabilize, IIUC); the user tells the command that he
> knows that the CVS repository is now quiescent and it is safe to
> import the whole thing.

Yes, that's just what -a is supposed to do.  But is should be
irrelevant for testing - in the test framework CVS is running locally, 
so there's no network lag.

> So... does this mean that we now set the minimum required version of
> Python to 2.7?  I dunno.

That would be bad, IMO.  I'll put backporting to 2.6 high on my to-do list.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cvsimport: rewrite to use cvsps 3.x to fix major bugs

2013-01-11 Thread Eric S. Raymond
Junio C Hamano :
> Yeah, it is OK to _discourage_ its use, but to me it looks like that
> the above is a fairly subjective policy decision, not something I
> should let you impose on the users of the old cvsimport, which you
> do not seem to even treat as your users.

Er.  You still don't seem to grasp the fundamentals of this
situation. I'm not imposing any damn thing on the users.  What's
imposing is the fact that cvsps-2.x and the Perl cvsimport are both
individually and collectively *broken right now*, and within a few
months the Perl git-cvsimport is going to cease even pretending to
work.  I'm trying to *fix that problem* as best I can, fixing it
required two radical rewrites, and criticizing me for not emulating
every last detail and misfeature immediately is every bit as pointless
and annoying as arguing about the fabric on the deck chairs while the
ship is sinking.

To put it bluntly, you should be grateful to be getting back any
functionality at all - because the alternative is that the Perl
git-cvsimport will hang out in your tree as a dead piece of cruft.
Your choice is between making it easy for me replace it with minimum
disruption now and hoping for someone else to replace it months from
now after you've had a bunch of unhappy users bitching at you.

So let me be more direct.  I think the -M and -m options are
sufficiently bad ideas that I am *not willing* to put in the quite
large amount of effort that would be required to implement them in cvsps
or parsecvs.  That would be a bad use of my time.

This is not the case with -o; that might be a good idea if I
understood it. This is also not like the 2.x fallback; I thought that
was a bad idea (because it would be better for users that the
combination break in an obvious way than continue breaking in a silent
one), but it was a small enough effort that I was willing to do it
anyway to keep the git maintainer happy. The effort to fix the quoting
bugs is even easier for me to justify; they are actual bugs.

Those are my engineering judgments; go ahead and call them
"subjective" if you like, but neither the facts nor my judgment will
change on that account.

> The "major" in my sentence was from your description (the bugs you
> fixed), and not about the new ones you still have in this draft.  I
> did not mean to say that you are trading fixes to "major" bugs with
> different "major" bugs.

OK, thank you.  In the future I will try to bear in mind that English
is not your primary language when I evaluate statements that seems a bit
offensive.

So what's your next bid? Note that you can't increase my friction and
hassle costs much more before I give up and let you deal with the
consequences without me. I want to do the right thing, but I have
more other projects clamoring for my attention than you could easily
guess.  I need to get git-cvsimport *finished* and off my hands -
I may already have given it more time than I really should have.

So give me your minimum list of deliverables before you'll merge,
please, and then stick to it.  I assume fixes for the quoting bugs
will be on that list.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parsecvs has been salvaged

2013-01-11 Thread Eric S. Raymond
Bart Massey :
> Very cool! I'm glad you got it doing what you wanted; I'll be
> interested to see how parsecvs compares in quality and performance to
> cvs2git and cvsps. --Bart

And now it has that -R option and correctly interprets the timezone field.
(I've been busy this morning.)  I'm working on the no-commitids warning now.

Oh, and it now has...actual documentation, too. :-)
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cvsimport: rewrite to use cvsps 3.x to fix major bugs

2013-01-11 Thread Eric S. Raymond
 was extra work that I took on only because I wanted to
be friendly to the git project, *but*... 

...there is a limit to the amount of what I consider pointless
hoop-jumping that friendliness will buy you, and the 2.x fallback eas
already pushing that limit.  Tread a little more gently, Junio; I've
put in a lot of hard, boring work on git-cvsimport over the last two
weeks when I would rather have been doing other things, and my
patience for being nit-picked without appreciation or reward has a
correspondingly low limit.  We'll both be happier if you don't reach
it.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


parsecvs has been salvaged

2013-01-11 Thread Eric S. Raymond
Since Heiko Voigt and others were concerned about this, I report that 
I have successfully salvaged the parsecvs code. I now have it emitting
a correct-looking fast-import stream for my main test repository.

I'm not ready to ship it yet because there are several features I
think it ought to have before I do.  An -R option like cvsps's;
correct interpretation of a third timezone field as in cvsps; and,
most significantly, I want to make sure it emits warnings for important
error and problem conditions like unresolvable tags and absence of
commitids.

But these are all relatively minor issues. It is likely I will be able
to ship early next week, at which point I will add support for
parsecvs as a third engine in new cvsimport.  

This next step in the larger program will be factoring out the cvsps
test suite and applying it to all three of cvsps, cvs2git, and
parsecvs so I can compare results.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

Americans have the right and advantage of being armed - unlike the citizens
of other countries whose governments are afraid to trust the people with arms.
-- James Madison, The Federalist Papers
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove the suggestion to use parsecvs, which is currently broken.

2013-01-07 Thread Eric S. Raymond
Junio C Hamano :
> In the longer term, if parsecvs is revived either by Eric or
> somebody else, we will add the mention back to the documentation,
> probably with an updated URL.

I'm working on the revival right now. Repository generation is still
broken, and likely to remain so until I can make the export-stream stage
work, but just a few minutes ago I coaxed it into generating what looks 
like graphviz markup describing a commit graph on standard output.

Even though dot(1) barfs on the markup, this is encouraging. It almost
certainly means that the analysis and parsing stages aren't broken, and
by stubbing out enough functions I can figure out what is being passed
to the broken repository-maker well enough for my purposes.

Actually, I've already figured out how to generate blob and commit-header
markup.  The hard part is generating fileops; I don't quite understand
the generated data structures well enough to do that yet.  But I'm
making progress, and feeling more optimistic than I was yesterday.

In related news, I've sent Michael Haggerty patches that fix the visible
problems with cvs2git that I enumerated in previous mail.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: [PATCH] Remove the suggestion to use parsecvs, which is currently broken.

2013-01-06 Thread Eric S. Raymond
Heiko Voigt :
> > I'm parsecvs's maintainer now.  It's not in good shape; there is at
> > least one other known showstopper besides the build issue.  I would
> > strongly prefer to direct peoples' attention away from it until I
> > have time to fix it and cut a release.  This is not a distant 
> > prospect - two or three weeks out, maybe.
> 
> So for this short amount of time you want to change gits documentation?

Yes.  We should not direct people to a tool that plain doesn't work.  

I'll fix parsecvs as soon as I can.  Once I do, I will add support to the
new git-cvsimport to use parsecvs as a conversion engine, alongside
cvsps and cvs2git.

You may not have seen the first version of that patch, so I'll 
explain. The new git-cvsimport can use multiple conversion engines;
each one is expressed as a Python class that knows how to convert
git-cvsimport options to engine options, and how to generate a
command that ships an import stream to standard output.  There's
an -e option that selects an engine.

Currently there are two such classes, one for cvsps and one for cvs2git.
cvsps is the default.  When parsecvs is working, it will be the work of
a few minutes to add a parsecvs class.

The architectural goal here is to make it easy for users of
git-cvsimport to be able to experiment with different engines to
get the best possible conversion, without having to fuss with 
details of the engine invocation.

> Is this hint causing you trouble? Are there many people asking for
> support because of that?

No.  But as a matter of principle I am against having documentation
tell pretty lies, even temporarily. It's bad craftsmanship and bad
faith to do that.
 
> There is no README so I am not sure how the tests are supposed to be
> build in general. Due to the lack of documentation its probably easier
> for you Eric to port my tests.

At the present state of things, I agree.  I have been so busy fighting other
aspects of this problem that I have not yet had time to separate the
test suite from the cvsps code and document it properly.

> The structure of my tests is quite simple:
> 
>   t/  - All the tests
>   t/cvsroot - A cvs module per test
>   t/t[0-9]{4}*/expect - The expected cvsps output
> 
> You can copy the cvs repository modules and convert the expected cvsps
> output to whatever output you want to test against. It the found
> changeset ordering that is interesting.

Noted.  I have a copy and will port them.

> The fix was never clean and AFAIR the reason behind that was that the
> breakage in commit ordering is not easy to fix in cvsps.

Understood. But it's better than no fix at all.

>   That and
> because there are other working tools out there was the reason why I
> stopped working on fixing cvsps.

Once I have all three tools working and can run them against a common
test suite, several interesting possibilities will open up.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Alphabetize the fast-import options, following a suggestion on the list.

2013-01-05 Thread Eric S. Raymond
Jonathan Nieder :
> But in fact the current options list doesn't seem to be well organized at all.

I agree.

> What do you think would be a logical way to group these?
> 
>  Features of input syntax
> 
>   --date-format
>   --done
> 
>  Verbosity
> 
>   --quiet
>   --stats
> 
>  Marks handling (checkpoint/restore)
> 
>   --import-marks
>   --import-marks-if-exists
>   --export-marks
>   --relative-marks
> 
>  Semantics of execution
> 
>   --dry-run
>   --force
>   --cat-blob-fd
>   --export-pack-edges
> 
>  Tuning
> 
>   --active-branches
>   --max-pack-size
>   --big-file-threshold
>   --depth

That would work as well or better than any other organization I can
think of.  Um, which is significant because my work on surgery tools
and exporters means I've had to consult this page a *lot*.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Alphabetize the fast-import options, following a suggestion on the list.

2013-01-05 Thread Eric S. Raymond
ing projects
-   whose total object set exceeds the 4 GiB packfile limit,
-   as these commits can be used as edge points during calls
-   to 'git pack-objects'.
-
 --quiet::
Disable all non-fatal output, making fast-import silent when it
is successful.  This option disables the output shown by
\--stats.
 
+--relative-marks::
+   After specifying --relative-marks the paths specified
+   with --import-marks= and --export-marks= are relative
+   to an internal directory in the current repository.
+   In git-fast-import this means that the paths are relative
+   to the .git/info/fast-import directory. However, other
+   importers may use a different location.
+
 --stats::
Display some basic statistics about the objects fast-import has
created, the packfiles they were stored into, and the
-- 
1.8.1



-- 
http://www.catb.org/~esr/";>Eric S. Raymond

"Gun control" is a job-safety program for criminals.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-fast-import(1): remove duplicate "--done" option

2013-01-05 Thread Eric S. Raymond
John Keeping :
> I'm guessing that the reason the option was documented again (in commit
> 3266de10) is because the options are not in an obvious order.  There
> does seem to be some grouping of the options by type, but without
> subheadings I wonder if it would make more sense to just put them all in
> alphabetical order?

+1

This duplication originated with me. I'll apologize with a 
reordering patch.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-05 Thread Eric S. Raymond
Bart Massey :
> I don't know what Eric Raymond "officially end-of-life"-ing parsecvs means?

You and Keith handed me the maintainer's baton.  If I were to EOL it,
that would be the successor you two designated judging in public that
the code is unsalvageable or has become pointless.  If you wanted to
exclude the possibility that a successor would make that call, you
shouldn't have handed it off in a state so broken that I can't even
test it properly.

But I don't in fact think the parsecvs code is pointless. The fact that it
only needs the ,v files is nifty and means it could be used as an RCS
exporter too.  The parsing and topo-analysis stages look like really
good work, very crisp and elegant (which is no less than I'd expect
from Keith, actually).

Alas, after wrestling with it I'm beginning to wonder whether the
codebase is salvageable by anyone but Keith himself.  The tight coupling
to the git cache mechanism is the biggest problem.  So far, I can't
figure out what tree.c is actually doing in enough detail to fix it or pry
it loose - the code is opaque and internal documentation is lacking.

More generally, interfacing to the unstable API of libgit was clearly
a serious mistake, leading directly to the current brokenness.  The
tool should have emitted an import stream to begin with.  I'm trying
to fix that, but success is looking doubtful.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-05 Thread Eric S. Raymond
Max Horn :
> Hm, you snipped this part of Michael's mail:
> 
> >> However, if that is a
> >> problem, it is possible to configure cvs2git to write the blobs inline
> >> with the rest of the dumpfile (this mode is supported because "hg
> >> fast-import" doesn't support detached blobs).
> 
> I would call "hg fast-import" a main potential customer, given that there 
> "cvs2hg" is another part of the cvs2svn suite. So I can't quite see how you 
> can come to your conclusion above...

Perhaps I was unclear.  I consider the interface design error to
be not in the fact that all the blobs are written first or detached,
but rather that the implementation detail of the two separate journal
files is ever exposed.

I understand why the storage of intermediate results was done this
way, in order to decrease the tool's working set during the run, but
finishing by automatically concatenating the results and streaming
them to stdout would surely have been the right thing here.
 
The downstream cost of letting the journalling implementation be
exposed, instead, can be seen in this snippet from the new git-cvsimport
I've been working on:

def command(self):
"Emit the command implied by all previous options."
return "(cvs2git --username=git-cvsimport --quiet --quiet 
--blobfile={0} --dumpfile={1} {2} {3} && cat {0} {1} && rm {0} 
{1})".format(tempfile.mkstemp()[1], tempfile.mkstemp()[1], self.opts, 
self.modulepath)

According to the documentation, every caller of csv2git must go
through analogous contortions!  This is not the Unix way; if Unix
design principles had been minimally applied, that second line would
just read like this:

 return "cvs2git --username=git-cvsimport --quiet --quiet"

If Unix design principles had been thoroughly applied, the "--quiet
--quiet" part would be unnecessary too - well-behaved Unix commands
*default* to being completely quiet unless either (a) they have an
exceptional condition to report, or (b) their expected running time is
so long that tasteful silence would leave users in doubt that they're
working.

(And yes, I do think violating these principles is a lapse of taste when
git tools do it, too.)

Michael Haggerty wants me to trust that cvs2git's analysis stage has
been fixed, but I must say that is a more difficult leap of faith when
two of the most visible things about it are still (a) a conspicuous
instance of interface misdesign, and (b) documentation that is careless and
incomplete.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


All is proceeding as I have foreseen

2013-01-04 Thread Eric S. Raymond
>From the #irker channel on freenode:

[14:52] TkTech  esr: Oh, and I was talking to scanlime earlier since the 
"Ilkotech" kids messed up again. She'll be putting up a landing page on her 
server listing alternatives, I gave her irker, kgb, and mine.
[14:52] esr "The Ilkotech kids messed up gain"? What'd they do? And what'd 
they do the last time?
[14:53] TkTech  They haven't paid their hosting bills so their host is down, 
not that they've done anything.
[14:53] TkTech  I talked to them and they had about 10 minutes of interest in 
keeping cia.vc alive, then moved on.

CIA is dead beyond recall. I wrote the code in contrib/ciabot/ and I
think it should be removed.  Sometime soon I'll ship another patch
deleting it.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond

The most foolish mistake we could possibly make would be to permit 
the conquered Eastern peoples to have arms.  History teaches that all 
conquerors who have allowed their subject races to carry arms have 
prepared their own downfall by doing so.
-- Adolph Hitler, April 11 1942.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2013-01-03 Thread Eric S. Raymond
equired by
> the level of coupling.

I don't think this will be a problem.  You own the copyright on your tests and
I own it on mine, so we can relicense under whatever common license we choose.
I'm not fussy about what we use; ASL 2.0 would be fine by me.

> * I don't have a lot of time to work on the integration.  cvs2svn has
> long been at a level of maturity where it doesn't need much care and
> feeding, and I would like to keep it that way :-)  Nowadays I am far
> more interested in working on the git project with my little available
> open-sourcin' time.

I don't want to spend the rest of my life on the CVS-lifting problem either.
My present plans envision intense work on it for another three weeks or
so, after which I expect we'll be at a relatively stable and low-maintainance
state. 

FYI, here are my agenda items in roughly the order I expect to finish them:

1. Write test coverage for incremental imports.
2. Ship version 2 of the git-cvsimport replacement patch (with the fallback 
   option Junio requested) to the git list.
3. Get parsecvs to a non-broken state and ship a release
4. Ship a patch for git-cvsimport that adds the option to use parsecvs 
   as a conversion engine.
5. Break the test suite out of cvsps, give it its own public repo, document
   it, and hand you the keys.
6. Fix the interface-design bug(s) in cvs2git, and its documentation.
7. Torture-test all three tools (cvsps, parsecvs, cvs2git) against the
   new suite.
8. Make a judgement about whether I should EOL cvsps or parsecvs or both.

I have other commitments, so this will take a bit longer than it might
have.  I expect to be at step 8 in roughly a month (early February).
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Replace git-cvsimport with a rewrite that fixes major bugs.

2013-01-02 Thread Eric S. Raymond
Martin Langhoff :
> I dealt with enough CVS repos to see that the branch point could be
> ambiguous, and that some cases were incurably ugly and ambiguous.

You are quite right, but you have misintepreted the subject of my
confidence.  I am under no illusion that the new cvsimport/cvsps 
pair is a perfect solution to the CVS-lifting problem, nor even that
such a solution is possible.

> My best guess is that you haven't dealt with enough ugly CVS repos. I
> used to have the old original X.org repos, but no more. Surely
> Mozilla's fugly old CVS repos are up somewhere, and may be
> therapeutic.

Thanks, but since I wrote reposurgeon in 2010 I've done more conversions
of messy CVS and Subversion repositories than I can easily remember (the
Subversion ones being relevant because they often have truly nasty CVS
artifacts in their early history).  Just off the top of my head there's
been gpsd, the Network Utility Tools, Roundup, SSTK2000, the Hercules 
project, and robotfindskitten.  And a raft of smaller projects - I sought
them out as torture tests for reposurgeon.

I am therefore intimately, painfully familiar with how bad CVS repos
can get.  I take it as given that there are still boojums that will
perplex my tools lurking out there in the unexplored jungle.

In fact, this very kind of prior experience had been a major
motivation for reposurgeon.  I became convinced several years back
that the batchy design philosophy of conventional repo-conversion
tools was flawed, not flexible enough to deal with the real-world
messes out there.  So I wrote reposurgeon to amplify human judgment
rather than try to replace it.
 
An example of the batchiness mistake close to home is the -m and -M
options in the old version of cvsimport.  It takes human judgment
looking at the whole commit DAG in gitspace to decide what merge
points would best express the (as you say, sometimes ambiguous) CVS
history - what's needed is a scalpel and sutures in a surgeon's hand,
not a regexp hammer.

For extended discussion, see my blog post "Repositories In
Translation" at http://esr.ibiblio.org/?p=3859 in which I argue that
the process has much more in common with the ambiguity of literary
translation than is normally understood.

No, what I am very confident about is the performance and stability of
the new cvsps/cvsimport code on *the cases the old code handled* - and
a fairly well-defined larger group of many more cases.

My confidence is derived from having built a test suite that
incorporates and improves on the git-tree tests. I don't have to merely
guess or hope that the new code works better, I can exhibit tests
that demonstrate it.

Among my near-term to-do items are applying those tests to cvs2git and
parsecvs.  But I first need to get parsecvs working again; presently, as I've
inherited it, it does not correctly create a HEAD reference in the
translated git repo.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Replace git-cvsimport with a rewrite that fixes major bugs.

2013-01-02 Thread Eric S. Raymond
Junio C Hamano :
> As your version already knows how to detect the case where cvsps is
> too old to operate with it, I imagine it to be straight-forward to
> ship the old cvsimport under obscure name, "git cvsimport--old" or
> something, and spawn it from your version when necessary, perhaps
> after issuing a warning "cvsps 3.0 not found; switching to an old
> and unmaintained version of cvsimport..."

This can be done.  As this may not be the last case in which it comes up,
perhaps we should have an 'obsolete' directory distinct from 'contrib'.

I'll ship another patch.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Replace git-cvsimport with a rewrite that fixes major bugs.

2013-01-02 Thread Eric S. Raymond
Martin Langhoff :
> Replacement with something more solid is welcome, but until you are
> extremely confident of its handling of legacy setups... I would still
> provide the old cvsimport, perhaps in contrib.

I am extremely confident.  I built a test suite so I could be.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Replace git-cvsimport with a rewrite that fixes major bugs.

2013-01-02 Thread Eric S. Raymond
Jonathan Nieder :
> The former is already loudly advertised in the package description and
> manpage, at least lets you get work done, and works fine for simple
> repositories with linear history.

Two of the three claims in this paragraph are false.  The manual page
does not tell you what is true, which is that old cvsps will fuck up
every branch by putting the root point at the wrong place.  And if you
call silently and randomly damaging imports getting work done, your
definitions of "work" and "done" are broken.

> Taking away a command that people have been using in everyday work is
> pretty much a textbook example of a regression, no?

That would be, but we are talking about replacing total breakage with
a git-cvsimport that actually works and that you invoke in pretty much the
same way as the old one.  Nothing is or will be taken away.

In any case, once the distros package cvsps 3.x, old cvsimport will terminate
with an error return, because cvsps-3.x sees an obsolete option that 
git-cvsimport tries to use as a command to treminate after displaying
a prompt to upgrade.

The most we can accomplish by being "conservative" is to lengthen the
window during which people will falsely believe that their conversion
process is working.
-- 
            http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Replace git-cvsimport with a rewrite that fixes major bugs.

2013-01-02 Thread Eric S. Raymond
Jonathan Nieder :
> Speaking with my Debian packager hat on: the updated cvsps is not
> available in Debian.  "git cvsimport" is, and it has users that report
> bugs from time to time.  With this change, I would either have to take
> on responsibility for maintenance of the cvsps package (not going to
> happen) or drop "git cvsimport".  That's a serious regression.

How does going from "it silently damages imports" to "it fails with
an error message" constitute a regression?
 
> The moment someone takes care of packaging the updated cvsps, I'll
> stop minding, though. ;-)

I'll ping the Debian QA group.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failures with python versions when building git 1.8.1

2013-01-01 Thread Eric S. Raymond
Junio C Hamano :
> Dan McGee  writes:
> 
> > A test case snuck in this release that assumes /usr/bin/python is
> > python2 and causes test failures. Unlike all other tests and code
> > depending on python, this one does not respect PYTHON_PATH, which we
> > explicitly set when building git on Arch Linux due to python2 vs
> > python3 differences.
> 
> I had an impression that you are not supposed to run our scripts
> with python3 yet (no python scripts have been checked for python3
> compatibility), even though some people have expressed interests in
> doing so.
> 
> Eric?

Yeah, git's stuff is nowhere even *near* python3 ready.

I have it on my to-do list to run 2to3 on the in-tree scripts as a
diagnostic, but I haven't had time to do that yet...mainly because
cvsps/cvsimport blew up in my face when I poked at them.

Now I need to go beat parsecvs into shape and run both it and cvs2git
against the CVS torture tests I'm developing, so the 2to3 check won't
happen for a week or two at least. Sorry.

As in a general thing, I wouldn't advise worrying too much about python3
compatibility.  That version is not gaining adoption very fast, mainly
due to the rat's nest around plain strings vs. UTF-8 which can make
code conversion a serious pain in the ass.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Replace git-cvsimport with a rewrite that fixes major bugs.

2013-01-01 Thread Eric S. Raymond
Junio C Hamano :
> So..., is this a flag-day patch?
> 
> After this is merged, users who have been interoperating with CVS
> repositories with the older cvsps have to install the updated cvsps
> before using a new version of Git that ships with it?

Yes, they must install an updated cvsps. But this is hardly a loss, as
the old version was perilously broken.

There was an error or typo in the branch-analysis code, dating from
2006 and possibly earlier, that meant that branch root points would
almost always be attributed to parent patchsets one patchset earlier
than they should have been.  Shocked me when I found it - how was this
missed for six years?

Because of the way the analysis is done, this fundamental bug would
also cause secondary damage like file changes near the root point
getting attributed to the wrong branch.  In fact, this is how I
first spotted the problem; my test suite exhibited this symptom.

And mind you this is on top of ancestry-branch tracking not working -
two separate bugs that could interact in ways I'd really rather not
think about.  The bottom line is that every import of a branchy CVS
repo with a pre-3.x version of cvsps is probably wrong.

The old git-cvsimport code was doing its part to screw things up, too.
At least three of the bugs on its manual page are problems I couldn't
reproduce using a bare cvsps instance, even the old broken version.

>As long as
> they update both cvsps and cvsimport, they can continue using the
> existing repository to get updates from the same upstream CVS
> repository without losing hisory continuity?

Yes, but in that case I would strongly advise re-importing the entire
CVS history, as the portion analyzed with 2.2b1 and earlier versions
of cvsps will almost certainly have been somewhat garbled if it
contains any branches.

> I would have preferred an addition of "git cvsimport-new" (or rename
> of the existing one to "git cvsimport-old"), with additional tests
> that compare the results of these two implemenations on simple CVS
> history that cvsimport-old did *not* screw up, to ensure that (1)
> people with existing set-up can choose to keep using the old one,
> perhaps by tweaking their process to use cvsimport-old, and (2) the
> updated one will give these people the identical conversion results,
> as long as the history they have been interacting with do not have
> the corner cases that trigger bugs in older cvsps.
> 
> Or am I being too conservative?

I think you are being too conservative.  This patch is *not* a mere
feature upgrade. The branch-analysis bug I found three days ago is not
a minor problem, it is a big ugly showstopper for any case beside the
simplest linear histories.  Only linear histories will not break.

'People with existing set-ups' should absolutely *not* 'keep using the
old one'; we should yank that choice away from them and get the old
cvsimport/cvsps pair out of use *as fast as possible*, because it
silently mangles branchy imports.

Accordingly, giving people the idea that it's OK to use old and new
versions in parallel would be an extremely bad idea.  I would go so
far as to call it irresponsible.

Here is what I have done to ease the transition:

If you try to use old git-cvsimport with new cvsps, new cvsps will detect
this and ship a message to stderr telling you to upgrade

If you try to use new git-cvsimport with old cvsps, old cvsps will complain
of an invalid argument and git-cvsimport will quit.

As for testing...cvsps now has several dozen self-tests on five
different CVS repositories, including improved versions of the
t960[123] tests.  I will keep developing these as I work on bringing
parsecvs up to snuff.

I don't think there is a lot of point in git-cvsimport having its own
tests any more.  If you read it I think you'll see why; it's a much
thinner wrapper around the conversion engine(s) than it used to be. In
particular, it no longer does its own protocol transactions to the
CVS server.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cvsps import failure

2012-12-30 Thread Eric S. Raymond
Chris Rorvick :
> I tried the new version and found I'm unable to import via pserver:

And now I know why.  One of the cvsps fix patches I merged from Yann
Dirson's collection changed the --root option parsing in an
incompatible way.  As soon as I figure out what it's doing I'll
either revert it or document the new behavior.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

The price of liberty is, always has been, and always will be blood.  The person
who is not willing to die for his liberty has already lost it to the first
scoundrel who is willing to risk dying to violate that person's liberty.  Are
you free?   -- Andrew Ford
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up, an emergency fix for git-cvsimport is coming shortly

2012-12-30 Thread Eric S. Raymond
Chris Rorvick :
> I tried the new version and found I'm unable to import via pserver:
> 
>   $ ./cvsps --root :pserver:me@localhost:/cvsroot module
>   cvsps: connect error: Connection refused
>   cvsps: can't get CVS log data: Connection refused
> 
> Running 2.2b1 (the version packaged w/ Fedora 17) with the same
> arguments with the addition of --cvs-direct connects OK.  I haven't
> taken much time to look into this, so I might be doing something dumb.
>  Thought I'd find out if this is a known issue before delving into it.

Your problem does reproduce here. This paragraph from the output of 
'aptitude show cvs' may be relevant:

 This package contains a CVS binary which can act as both client and server,
 although there is no CVS dæmon; to access remote repositories, please use
 :extssh: not :pserver: any more.

It's therefore possible there's something slightly busted about the pserver 
method at the CVS end, and the 3.[23] code trips over it even though the old
code did not.  Note that new cvsps uses cvs-direct mode all the time; the old
support for fetching logs through local CVS commands is gone.

I use 

  cvsps --root :local:$PWD/repo module

for my testing, and that works. I'm up to my ears in finishing up the
test suite and tracking bugs in the repo-analysis code; if you want to
speed the process up, try running a :pserver: fetch with -v on under
both old and new code to see how the protocol transactions differ.

> Also, I'm curious what impact removing the caching from cvsps will
> have on incremental imports.  Is there any?

Not that I know of.  The caching was a performance hack for human viewing
of changesets.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Heads up, an emergency fix for git-cvsimport is coming shortly

2012-12-30 Thread Eric S. Raymond
Bad news: the combination of cvsps and the existing git-cvsimport
script is seriously broken in both places.  This morning I fixed a
nasty bug in cvsps's branch detection and shipped 3.3. This is a
different bug from the broken (and now removed) ancestry-branch
tracking.

Good news: I have fixed all the urgent bugs (and now you know how I
spent my holidays).  Somewhat to my surprise, half the problems listed
on the git-cvsimport manual page turned out to be problems in
git-cvsimport itself, not more cvsps lossage. Those bugs are dead.

cvsps is now much better about warning when it cannot translate a tag
or sees a dubious branch structure.  I've also enhanced git-cvsimport
to have an engine switch so it can optionally use cvs2git as its 
conversion engine. If and when I can get parsecvs back into working
shape, I will add it to the set of supported engines.

I have a test suite that proves fixes for all the urgent problems, but
that needs a bit more work before I'm willing to call it done.

In a few days I will ship a patch that replaces git-cvsimport with a
working version and removes the t960[123] tests from the git tree.
Those are not actually tests of git-cvsimport itself but of the
underlying conversion engine, and now form about half of cvsps's own
regression-test suite.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

It is proper to take alarm at the first experiment on our
liberties. We hold this prudent jealousy to be the first duty of
citizens and one of the noblest characteristics of the late
Revolution. The freemen of America did not wait till usurped power had
strengthened itself by exercise and entangled the question in
precedents. They saw all the consequences in the principle, and they
avoided the consequences by denying the principle. We revere this
lesson too much ... to forget it-- James Madison.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove the suggestion to use parsecvs, which is currently broken.

2012-12-28 Thread Eric S. Raymond
Heiko Voigt :
> Maybe you could add that information to the parsecvs compile
> instructions? I think just because it takes some effort to compile does
> not justify to remove this useful pointer here. When I was converting a
> legacy cvs repository this pointer would have helped me a lot.

I'm parsecvs's maintainer now.  It's not in good shape; there is at
least one other known showstopper besides the build issue.  I would
strongly prefer to direct peoples' attention away from it until I
have time to fix it and cut a release.  This is not a distant 
prospect - two or three weeks out, maybe.

The priority that is between me and fixing parsecvs is getting (a)
cvsps and git-cvsimport to a non-broken state, and (b) having a sound
test suite in place so I *know* it's in a non-broken state. As previously
discussed, I will then apply that test suite to parsecvs.

Heiko, you can speed up the process by (a) adapting your tests for
the new cvsps test code, and (b) merging the fix you wrote so cvsps
would pass the t9603 test.  

The sooner I can get that out of the way, the sooner I will be avble
to pay serious attention to parsecvs.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2012-12-28 Thread Eric S. Raymond
From: "Eric S. Raymond" 
Date: Fri, 28 Dec 2012 11:40:59 -0500
Subject: [PATCH] Add checks to Python scripts for version dependencies.

---
 contrib/ciabot/ciabot.py   | 8 +++-
 contrib/fast-import/import-zips.py | 7 ++-
 contrib/hg-to-git/hg-to-git.py | 5 +
 contrib/p4import/git-p4import.py   | 5 +
 contrib/svn-fe/svnrdump_sim.py | 4 
 git-p4.py  | 8 +++-
 git-remote-testgit.py  | 5 +
 git_remote_helpers/git/__init__.py | 5 +
 8 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/contrib/ciabot/ciabot.py b/contrib/ciabot/ciabot.py
index bd24395..81c3ebd 100755
--- a/contrib/ciabot/ciabot.py
+++ b/contrib/ciabot/ciabot.py
@@ -47,7 +47,13 @@
 # we default to that.
 #
 
-import os, sys, commands, socket, urllib
+import sys
+if sys.hexversion < 0x0200:
+   # The limiter is the xml.sax module
+sys.stderr.write("ciabot.py: requires Python 2.0.0 or later.\n")
+sys.exit(1)
+
+import os, commands, socket, urllib
 from xml.sax.saxutils import escape
 
 # Changeset URL prefix for your repo: when the commit ID is appended
diff --git a/contrib/fast-import/import-zips.py 
b/contrib/fast-import/import-zips.py
index 82f5ed3..b989941 100755
--- a/contrib/fast-import/import-zips.py
+++ b/contrib/fast-import/import-zips.py
@@ -9,10 +9,15 @@
 ##  git log --stat import-zips
 
 from os import popen, path
-from sys import argv, exit
+from sys import argv, exit, hexversion
 from time import mktime
 from zipfile import ZipFile
 
+if hexversion < 0x0106:
+   # The limiter is the zipfile module
+sys.stderr.write("import-zips.py: requires Python 1.6.0 or later.\n")
+sys.exit(1)
+
 if len(argv) < 2:
print 'Usage:', argv[0], '...'
exit(1)
diff --git a/contrib/hg-to-git/hg-to-git.py b/contrib/hg-to-git/hg-to-git.py
index 046cb2b..232625a 100755
--- a/contrib/hg-to-git/hg-to-git.py
+++ b/contrib/hg-to-git/hg-to-git.py
@@ -23,6 +23,11 @@ import os, os.path, sys
 import tempfile, pickle, getopt
 import re
 
+if sys.hexversion < 0x0203:
+   # The behavior of the pickle module changed significantly in 2.3
+   sys.stderr.write("hg-to-git.py: requires Python 2.3 or later.\n")
+   sys.exit(1)
+
 # Maps hg version -> git version
 hgvers = {}
 # List of children for each hg revision
diff --git a/contrib/p4import/git-p4import.py b/contrib/p4import/git-p4import.py
index b6e534b..593d6a0 100644
--- a/contrib/p4import/git-p4import.py
+++ b/contrib/p4import/git-p4import.py
@@ -14,6 +14,11 @@ import sys
 import time
 import getopt
 
+if sys.hexversion < 0x0202:
+   # The behavior of the marshal module changed significantly in 2.2
+   sys.stderr.write("git-p4import.py: requires Python 2.2 or later.\n")
+   sys.exit(1)
+
 from signal import signal, \
SIGPIPE, SIGINT, SIG_DFL, \
default_int_handler
diff --git a/contrib/svn-fe/svnrdump_sim.py b/contrib/svn-fe/svnrdump_sim.py
index 1cfac4a..95a80ae 100755
--- a/contrib/svn-fe/svnrdump_sim.py
+++ b/contrib/svn-fe/svnrdump_sim.py
@@ -7,6 +7,10 @@ to the highest revision that should be available.
 """
 import sys, os
 
+if sys.hexversion < 0x0204:
+   # The limiter is the ValueError() calls. This may be too conservative
+sys.stderr.write("svnrdump-sim.py: requires Python 2.4 or later.\n")
+sys.exit(1)
 
 def getrevlimit():
 var = 'SVNRMAX'
diff --git a/git-p4.py b/git-p4.py
index 551aec9..69f1452 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -8,7 +8,13 @@
 # License: MIT <http://www.opensource.org/licenses/mit-license.php>
 #
 
-import optparse, sys, os, marshal, subprocess, shelve
+import sys
+if sys.hexversion < 0x0204:
+# The limiter is the subprocess module
+sys.stderr.write("git-p4: requires Python 2.4 or later.\n")
+sys.exit(1)
+
+import optparse, os, marshal, subprocess, shelve
 import tempfile, getopt, os.path, time, platform
 import re, shutil
 
diff --git a/git-remote-testgit.py b/git-remote-testgit.py
index 5f3ebd2..91faabd 100644
--- a/git-remote-testgit.py
+++ b/git-remote-testgit.py
@@ -31,6 +31,11 @@ from git_remote_helpers.git.exporter import GitExporter
 from git_remote_helpers.git.importer import GitImporter
 from git_remote_helpers.git.non_local import NonLocalGit
 
+if sys.hexversion < 0x01050200:
+# os.makedirs() is the limiter
+sys.stderr.write("git-remote-testgit: requires Python 1.5.2 or later.\n")
+sys.exit(1)
+
 def get_repo(alias, url):
 """Returns a git repository object initialized for usage.
 """
diff --git a/git_remote_helpers/git/__init__.py 
b/git_remote_helpers/git/__init__.py
index e69de29..1dbb1b0 100644
--- a/git_remote_helpers/git/__init__.py
+++ b/git_remote_helpers/git/__init__.py
@@ -0,0 +1,5 @@
+import sys
+if sys.hexversion < 0x0204:
+   

[PATCH] Remove the suggestion to use parsecvs, which is currently broken.

2012-12-28 Thread Eric S. Raymond
The parsecvs code has been neglected for a long time, and the only
public version does not even build correctly.  I have been handed
control of the project and intend to fix this, but until I do it
cannot be recommended.

Also, the project URL given for Subversion needed to be updated
to follow their site move.
---
 Documentation/git-cvsimport.txt | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-cvsimport.txt b/Documentation/git-cvsimport.txt
index 98d9881..9d5353e 100644
--- a/Documentation/git-cvsimport.txt
+++ b/Documentation/git-cvsimport.txt
@@ -213,11 +213,9 @@ Problems related to tags:
 * Multiple tags on the same revision are not imported.
 
 If you suspect that any of these issues may apply to the repository you
-want to import consider using these alternative tools which proved to be
-more stable in practice:
+want to imort, consider using cvs2git:
 
-* cvs2git (part of cvs2svn), `http://cvs2svn.tigris.org`
-* parsecvs, `http://cgit.freedesktop.org/~keithp/parsecvs`
+* cvs2git (part of cvs2svn), `http://subversion.apache.org/`
 
 GIT
 ---
-- 
1.8.1.rc2



-- 
http://www.catb.org/~esr/";>Eric S. Raymond

A ``decay in the social contract'' is detectable; there is a growing
feeling, particularly among middle-income taxpayers, that they are not
getting back, from society and government, their money's worth for
taxes paid. The tendency is for taxpayers to try to take more control
of their finances...-- IRS Strategic Plan, (May 1984)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Python scripts audited for minimum compatible version and checks added.

2012-12-24 Thread Eric S. Raymond
Junio C Hamano :
> > Should I resubmit, or do you intend to fix these while merging?
> 
> I'd appreciate a re-roll, perhaps in a few days after the dust
> settles.

You'll get it.

It will take a little longer than it otherwise might have because I'm
in the middle of straightening out the mess around cvsps and git-cvsimport,
which is deeper and nastier than I realized.

It turns out that one of the options git-cvsimport depends on, -A, has
been broken (leading to incorrect conversions of branchy repos) since
2006 if not earlier; I'm removing it outright.

Thus, the version of git-cvsimport in the git-tree will die with an
error calling cvsps 3.x - but since what it was doing before was actually
mangling users' repositories this is no great loss.

I'm going to have to shoot the existing implementation of
git-cvsimport through the head and rewrite it. This won't be
difficult; I already have a proof-of-concept in 126 lines of Python,
which is a big improvement over the 1179 lines of Perl in the existing
version.  Most of the vanished bulk is CVS client code for fetching
logs and files, which is now done better and faster inside cvsps.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Python scripts audited for minimum compatible version and checks added.

2012-12-24 Thread Eric S. Raymond
Pete Wyckoff :
> e...@thyrsus.com wrote on Thu, 20 Dec 2012 09:13 -0500:
> > diff --git a/git-p4.py b/git-p4.py
> > index 551aec9..ec060b4 100755
> > --- a/git-p4.py
> > +++ b/git-p4.py
> > @@ -12,6 +12,11 @@ import optparse, sys, os, marshal, subprocess, shelve
> >  import tempfile, getopt, os.path, time, platform
> >  import re, shutil
> >  
> > +if sys.hexversion < 0x0204:
> > +# The limiter is the subprocess module
> > +sys.stderr.write("git-p4.py: requires Python 2.4 or later.")
> > +sys.exit(1)
> > +
> >  verbose = False
> 
> If 2.3 does not have the subprocess module, this script will fail
> at the import, and not run your version test.

Yes, the import of subprocess should move to after the check.

> All the uses of sys.stderr.write() should probably include a
> newline.  Presumably you used write instead of print to avoid
> 2to3 differences.

That is correct.
 
> The name of this particular script, as users would type it, is
> "git p4"; no dash and no ".py".
> 
> Many of your changes have these three problems; I just picked on
> my favorite one.

Should I resubmit, or do you intend to fix these while merging?
 
> > diff --git a/git-remote-testgit.py b/git-remote-testgit.py
> > index 5f3ebd2..22d2eb6 100644
> > --- a/git-remote-testgit.py
> > +++ b/git-remote-testgit.py
> > @@ -31,6 +31,11 @@ from git_remote_helpers.git.exporter import GitExporter
> >  from git_remote_helpers.git.importer import GitImporter
> >  from git_remote_helpers.git.non_local import NonLocalGit
> >  
> > +if sys.hexversion < 0x01050200:
> > +# os.makedirs() is the limiter
> > +sys.stderr.write("git-remote-testgit.py: requires Python 1.5.2 or 
> > later.")
> > +sys.exit(1)
> > +
> 
> This one, though, is a bit of a lie because git_remote_helpers
> needs 2.4, and you add that version enforcement in the library.

Agreed. The goal here was simply to have the depedencies of the individual
scripts be clearly documented, and establish a practice for future
submitters to emulate.

> I assume what you're trying to do here is to make the
> version-related failures more explicit, rather than have users
> parse an ImportError traceback, e.g.

See above.  At least half the point is making our dependencies
explicit rather than implicit, so we can make better policy
decisions.

> But what about the high-end of the version range?  I'm pretty
> sure most of these scripts will throw syntax errors on >= 3.0,
> how should we catch that before users see it?

That's a problem for another day, when 3.x is more widely deployed.
I'd be willing to run 2to3 on these scripts and check forward 
compatibility.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Python scripts audited for minimum compatible version and checks added.

2012-12-23 Thread Eric S. Raymond
Junio C Hamano :
> Junio C Hamano  writes:
> 
> > I needed something like this on top of it to get it pass t5800.
> >
> > diff --git a/git_remote_helpers/git/__init__.py 
> > b/git_remote_helpers/git/__init__.py
> > index 776e891..5047fd4 100644
> > --- a/git_remote_helpers/git/__init__.py
> > +++ b/git_remote_helpers/git/__init__.py
> > @@ -1,3 +1,5 @@
> > +import sys
> > +
> >  if sys.hexversion < 0x0204:
> >  # The limiter is the subprocess module
> >  sys.stderr.write("git_remote_helpers: requires Python 2.4 or later.")
> 
> Ping?  Is the above the best fix for the breakage?

Sorry, I missed this the first time around.  Yes, I think it is.
 
> If it weren't __init__, I'd silently squash it in, but the filename
> feels a bit more magic than the ordinary *.py files, so I was worried
> there may be some other rules involved what can and cannot go in to
> such a file, hence I've been waiting for an ack or alternatives.

Nope, no special rules.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cvsps, parsecvs, svn2git and the CVS exporter mess

2012-12-23 Thread Eric S. Raymond
Heiko Voigt :
> Please share so we can have a look. BTW, where can I find your cvsps
> code?

https://gitorious.org/cvsps

Developments of the last 48 hours:

1. Andreas Schwab sent me a patch that uses commitids wherever the history
   has them - this makes all the time-skew problems go away.  I added code
   to warn if commitids aren't present, so users will get a clear indication
   of when time-skew problems might bite them versus when that is happily
   impossible.

2. I've scrapped a lot of obsolete code and options.  The repo head
   version uses what used to be called cvs-direct mode all the time
   now; it works, and the effect on performance is major.  This also
   means that cvsps doesn't need to use any local CVS commands or even
   have CVS installed where it runs.

> >From my past cvs conversion experiences my personal guess is that
> cvs2svn will win this competition.

That could be.  But right now cvsps has one significant advantage over
cvs2git (which parsecvs might share) - it's *blazingly* fast.  So fast
that I scrapped all the local-caching logic; there seems no point to it at
today's network speeds, and that's one less layer of complications to
go wrong.

I've removed a couple hundred lines of code and the program works
better and faster than it did before.  That's having a good day!
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Change in cvsps maintainership, abd a --fast-export option

2012-12-22 Thread Eric S. Raymond
Antoine Pelisse :
> > esr@snark:~/WWW/cvsps/fixrepos$ git clone http://repo.or.cz/w/cvsps-hv.git
> > Cloning into 'cvsps-hv'...
> > fatal: http://repo.or.cz/w/cvsps-hv.git/info/refs not valid: is this a git 
> > repository?
> 
> I guess 'w' means write, and you don't have write access. You should use
> 
> http://repo.or.cz/r/cvsps-hv.git
> 
> instead. It works for me.

OK, that got it.  Looking at the tests now.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: Change in cvsps maintainership, abd a --fast-export option

2012-12-22 Thread Eric S. Raymond
Heiko Voigt :
> Hi,
> 
> On Sat, Dec 22, 2012 at 01:21:18AM -0500, Eric S. Raymond wrote:
> > Heiko Voigt :
> > > Back then when I was converting some repositories to git and I also
> > > wrote a quick testsuite for cvsps in an attempt to fix the bugs but gave
> > > up. That was the point when I wrote about cvsimports limitations in the
> > > documentation.
> > > 
> > > My commits can be found here:
> > > 
> > >   http://repo.or.cz/w/cvsps-hv.git
> > > 
> > > I just quickly checked and it seems that it does not run cleanly on a
> > > modern Linux anymore. If it is of interest to you I can try to get it
> > > running again.
> > 
> > That would be helpful.  Please give it some effort.
> 
> Here you go. I have pushed my changes on the master branch there.
> 
> You should now be able to run my tests with
> 
>   make test
> 
> from the root directory of the repository. The expected and actual
> output can be found in the t[0-9]{4}... directories underneath t/.
> 
> Cheers Heiko

sr@snark:~/WWW/cvsps/fixrepos$ git clone http://repo.or.cz/w/cvsps-hv.git
Cloning into 'cvsps-hv'...
fatal: http://repo.or.cz/w/cvsps-hv.git/info/refs not valid: is this a git 
repository?

-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Change in cvsps maintainership, abd a --fast-export option

2012-12-22 Thread Eric S. Raymond
Heiko Voigt :
> My commits can be found here:
> 
>   http://repo.or.cz/w/cvsps-hv.git
> 
> I just quickly checked and it seems that it does not run cleanly on a
> modern Linux anymore. If it is of interest to you I can try to get it
> running again.

esr@snark:~/WWW/cvsps/fixrepos$ git clone http://repo.or.cz/w/cvsps-hv.git
Cloning into 'cvsps-hv'...
fatal: http://repo.or.cz/w/cvsps-hv.git/info/refs not valid: is this a git 
repository?

Doesn't seem to be a good day for cloning - I can't get Yann's repo either,
something about HEAD pointing to an invalid reference.
-- 
            http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Change in cvsps maintainership, abd a --fast-export option

2012-12-21 Thread Eric S. Raymond
Heiko Voigt :
> Back then when I was converting some repositories to git and I also
> wrote a quick testsuite for cvsps in an attempt to fix the bugs but gave
> up. That was the point when I wrote about cvsimports limitations in the
> documentation.
> 
> My commits can be found here:
> 
>   http://repo.or.cz/w/cvsps-hv.git
> 
> I just quickly checked and it seems that it does not run cleanly on a
> modern Linux anymore. If it is of interest to you I can try to get it
> running again.

That would be helpful.  Please give it some effort.

I'm going to work on merging the cvsps patches Yann Dirson has been 
accumulating.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change in cvsps maintainership, abd a --fast-export option

2012-12-21 Thread Eric S. Raymond
Michael Haggerty :
> Perhaps your experience is with an older version of cvs2svn? 

Well, it has been at least four years since I ran it on anything.
Maybe that counts as old. 

I'm willing to believe it's working better now, but I've had to deal
with geological strata of nastiness older versions produced in various
Subversion repositories (cleaning up that crud was a major motivation
for reposurgeon) and I'm fairly sure I haven't seen the last such
fossils.  Every sufficiently old Subversion repository seems to have
few.

> If not,
> please be specific rather than just making complaints that are too vague
> to be rebutted or fixed (whichever is appropriate).  I put a *lot* of
> effort into getting cvs2svn to run correctly, and I take bug reports
> very seriously.

I can't be specific now, but that may change shortly.  I'm putting
together a test suite for cvsps with the specific intention of
capturing as many odd corner cases as I can.  (I just finished writing
a small Python framework for expressing interleaved CVS command
sequences on multiple checkouts in a way that can be easily run.)

It wouldn't be difficult for me to test whether these break cvs2svn. 
You've established that someone over there is paying attention, so
I'll do that, I guess.

I'm willing to share my test suite as well.  Do you have your own zoo
of odd cases I could test on?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change in cvsps maintainership, abd a --fast-export option

2012-12-21 Thread Eric S. Raymond
Michael Haggerty :
> In 2009 I added tests demonstrating some of the erroneous behavior of
> git-cvsimport.  The failing tests in t9601-t9603 are concrete examples
> of the problems mentioned in the manpage.

Thanks, that will be extremely useful. One of the things I'm putting effort
into is building a good test suite for the tool; I may well be able to adapt
your tests directly.
>
> If you haven't yet seen it, there is a writeup of the algorithm used by
> cvs2git to infer the history of a CVS repository [1].  If your goal is
> to make cvsps more robust, you might want to consider the ideas
> described there.

I shall do so.  Their design ideas may well be interesting, even though I
don't trust their code.  I've seem cvs2svn drop far too many weird artifacts 
and just plain broken commits in the back history of Subversion repositories.
I don't know if this is due to design problems, implementation bugs, or both.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Change in cvsps maintainership, abd a --fast-export option

2012-12-20 Thread Eric S. Raymond
Earlier today David Mansfield handed off to me the cvsps project. This
is the code used as an engine for reading CVS repositories by
git-cvsimport.

His reason (aside from general overwork and no longer having a strong
interest on the code) is that I have added a --fast-export option to
cvsps-3.0 that emits a git fast-import stream representing the CVS
history.
 
I did this so that reposurgeon could use cvsps as a front end.  But I
expect it will be of some interest to whoever is maintaining
git-cvsimport. That code can now become much, *much* smaller and
simpler.

The new --fast-export mode solves at least one bug mentioned on the
git-cvsimport man page; multiple tags pointing at the same CVS changeset
will be passed through correctly.

Possibly it fixes some other problems described there as well.  
I don't understand all the bug warnings on that page and would like to
discuss them with the author, whoever that is.  Possibly cvsps can be
further enhanced to address these problems; I'm willing to work on that.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

To stay young requires the unceasing cultivation of the ability to
unlearn old falsehoods.
-- Lazarus Long 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Python scripts audited for minimum compatible version and checks added.

2012-12-20 Thread Eric S. Raymond
Jeff King :
> On Thu, Dec 20, 2012 at 09:13:37AM -0500, Eric S. Raymond wrote:
> 
> > diff --git a/contrib/ciabot/ciabot.py b/contrib/ciabot/ciabot.py
> > index bd24395..b55648f 100755
> > --- a/contrib/ciabot/ciabot.py
> > +++ b/contrib/ciabot/ciabot.py
> > @@ -50,6 +50,11 @@
> >  import os, sys, commands, socket, urllib
> >  from xml.sax.saxutils import escape
> >  
> > +if sys.hexversion < 0x0200:
> > +   # The limiter is the xml.sax module
> > +sys.stderr.write("import-zips.py: requires Python 2.0.0 or later.")
> > +sys.exit(1)
> 
> Should the error message say ciabot.py?
> 
> -Peff

Gack.  Yes.  Thaty's what I get for cut-and-pasting too quickly.
The information about xnml.sex is correct, though.

Want me to resubmit, or will you just patch it?

Note by the way that I still think the entire ciabot subtree (which is 
my code) should just be nuked.  CIA is not coming back, wishful thinking 
on Ilkotech's web page notwithstanding.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Python version auditing followup

2012-12-20 Thread Eric S. Raymond
Most of the Python scripts in the distribution are small and simple to
audit, so I am pretty sure of the results.  The only place where I
have a concern is the git_helpers library; that is somewhat more
complex and I might have missed a dependency somewhere.  Whoever
owns that should check my finding that it should run under 2.4

That was the first of three patches I have promised.  In order to do
the next one, which will be a development guidelines recommend
compatibility back to some specific version X, I need a policy
decision.  How do we set X?

I don't think X can be < 2.4, nor does it need to be - 2.4 came out
in 2004 and eight years is plenty of deployment time.

The later we set it, the more convenient for developers.  But of
course by setting it late we trade away some portability to 
older systems.

In previous discussion of this issue I recommended X = 2.6.
That is still my recommendation. Thoughts, comments, objections?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

In recent years it has been suggested that the Second Amendment
protects the "collective" right of states to maintain militias, while
it does not protect the right of "the people" to keep and bear arms.
If anyone entertained this notion in the period during which the
Constitution and the Bill of Rights were debated and ratified, it
remains one of the most closely guarded secrets of the eighteenth
century, for no known writing surviving from the period between 1787
and 1791 states such a thesis.
-- Stephen P. Halbrook, "That Every Man Be Armed", 1984
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Python scripts audited for minimum compatible version and checks added.

2012-12-20 Thread Eric S. Raymond
Signed-off-by: Eric S. Raymond 
---
 contrib/ciabot/ciabot.py   | 5 +
 contrib/fast-import/import-zips.py | 5 +
 contrib/hg-to-git/hg-to-git.py | 5 +
 contrib/p4import/git-p4import.py   | 5 +
 contrib/svn-fe/svnrdump_sim.py | 4 
 git-p4.py  | 5 +
 git-remote-testgit.py  | 5 +
 git_remote_helpers/git/__init__.py | 4 
 8 files changed, 38 insertions(+)

diff --git a/contrib/ciabot/ciabot.py b/contrib/ciabot/ciabot.py
index bd24395..b55648f 100755
--- a/contrib/ciabot/ciabot.py
+++ b/contrib/ciabot/ciabot.py
@@ -50,6 +50,11 @@
 import os, sys, commands, socket, urllib
 from xml.sax.saxutils import escape
 
+if sys.hexversion < 0x0200:
+   # The limiter is the xml.sax module
+sys.stderr.write("import-zips.py: requires Python 2.0.0 or later.")
+sys.exit(1)
+
 # Changeset URL prefix for your repo: when the commit ID is appended
 # to this, it should point at a CGI that will display the commit
 # through gitweb or something similar. The defaults will probably
diff --git a/contrib/fast-import/import-zips.py 
b/contrib/fast-import/import-zips.py
index 82f5ed3..d9ad71d 100755
--- a/contrib/fast-import/import-zips.py
+++ b/contrib/fast-import/import-zips.py
@@ -13,6 +13,11 @@ from sys import argv, exit
 from time import mktime
 from zipfile import ZipFile
 
+if sys.hexversion < 0x0106:
+   # The limiter is the zipfile module
+sys.stderr.write("import-zips.py: requires Python 1.6.0 or later.")
+sys.exit(1)
+
 if len(argv) < 2:
print 'Usage:', argv[0], '...'
exit(1)
diff --git a/contrib/hg-to-git/hg-to-git.py b/contrib/hg-to-git/hg-to-git.py
index 046cb2b..9f39ce5 100755
--- a/contrib/hg-to-git/hg-to-git.py
+++ b/contrib/hg-to-git/hg-to-git.py
@@ -23,6 +23,11 @@ import os, os.path, sys
 import tempfile, pickle, getopt
 import re
 
+if sys.hexversion < 0x0203:
+   # The behavior of the pickle module changed significantly in 2.3
+   sys.stderr.write("hg-to-git.py: requires Python 2.3 or later.")
+   sys.exit(1)
+
 # Maps hg version -> git version
 hgvers = {}
 # List of children for each hg revision
diff --git a/contrib/p4import/git-p4import.py b/contrib/p4import/git-p4import.py
index b6e534b..fb48e2a 100644
--- a/contrib/p4import/git-p4import.py
+++ b/contrib/p4import/git-p4import.py
@@ -14,6 +14,11 @@ import sys
 import time
 import getopt
 
+if sys.hexversion < 0x0202:
+   # The behavior of the marshal module changed significantly in 2.2
+   sys.stderr.write("git-p4import.py: requires Python 2.2 or later.")
+   sys.exit(1)
+
 from signal import signal, \
SIGPIPE, SIGINT, SIG_DFL, \
default_int_handler
diff --git a/contrib/svn-fe/svnrdump_sim.py b/contrib/svn-fe/svnrdump_sim.py
index 1cfac4a..ed43dbb 100755
--- a/contrib/svn-fe/svnrdump_sim.py
+++ b/contrib/svn-fe/svnrdump_sim.py
@@ -7,6 +7,10 @@ to the highest revision that should be available.
 """
 import sys, os
 
+if sys.hexversion < 0x0204:
+   # The limiter is the ValueError() calls. This may be too conservative
+sys.stderr.write("svnrdump-sim.py: requires Python 2.4 or later.")
+sys.exit(1)
 
 def getrevlimit():
 var = 'SVNRMAX'
diff --git a/git-p4.py b/git-p4.py
index 551aec9..ec060b4 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -12,6 +12,11 @@ import optparse, sys, os, marshal, subprocess, shelve
 import tempfile, getopt, os.path, time, platform
 import re, shutil
 
+if sys.hexversion < 0x0204:
+# The limiter is the subprocess module
+sys.stderr.write("git-p4.py: requires Python 2.4 or later.")
+sys.exit(1)
+
 verbose = False
 
 # Only labels/tags matching this will be imported/exported
diff --git a/git-remote-testgit.py b/git-remote-testgit.py
index 5f3ebd2..22d2eb6 100644
--- a/git-remote-testgit.py
+++ b/git-remote-testgit.py
@@ -31,6 +31,11 @@ from git_remote_helpers.git.exporter import GitExporter
 from git_remote_helpers.git.importer import GitImporter
 from git_remote_helpers.git.non_local import NonLocalGit
 
+if sys.hexversion < 0x01050200:
+# os.makedirs() is the limiter
+sys.stderr.write("git-remote-testgit.py: requires Python 1.5.2 or later.")
+sys.exit(1)
+
 def get_repo(alias, url):
 """Returns a git repository object initialized for usage.
 """
diff --git a/git_remote_helpers/git/__init__.py 
b/git_remote_helpers/git/__init__.py
index e69de29..776e891 100644
--- a/git_remote_helpers/git/__init__.py
+++ b/git_remote_helpers/git/__init__.py
@@ -0,0 +1,4 @@
+if sys.hexversion < 0x0204:
+# The limiter is the subprocess module
+sys.stderr.write("git_remote_helpers: requires Python 2.4 or later.")
+sys.exit(1)
-- 
1.8.1.rc2



-- 
http://www.catb.org/~esr/";>Eric S. Raymond

"The state calls its 

Re: Python extension commands in git - request for policy change

2012-12-12 Thread Eric S. Raymond
Patrick Donnelly :
> On Tue, Dec 11, 2012 at 10:30 PM, Eric S. Raymond  wrote:
> > It might be a good fit for extending git; I wouldn't be very surprised if
> > that worked. However, I do have concerns about the "Oh, we'll just
> > lash together a binding to C" attitude common among lua programmers; I
> > foresee maintainability problems and the possibility of slow death by
> > low-level details as that strategy tries to scale up.
> 
> I think this is quite a prediction? Could you give an example
> scenario?

Everything old is new again.  I'm going by experience with Tcl back in the day.

>How would another language (e.g. Python) mitigate this?

The way you mitigate this sort of problem is to have a good set of
high-level bindings for standard services (like socket I/O) built in
your extension language and using its abstractions, so you don't get a
proliferation of low-level semi-custom APIs for doing the same stuff.

I have elsewhere referred to this as "the harsh lesson of Perl", which
I do not love but which was the first scripting language to get this
right.  There is a reason Tcl and a couple of earlier designs like csh
that we would now call "scripting languages" were displaced by Python
and Perl; this is it.

My favorite present-day example of getting this right is the Python bindings
for GTK.  They're lovely.  A work of art.

> I don't see how these languages are more appropriate based on your concerns.

Your previous exchange with Jeff King indicates that you don't
understand glue scripting very well.  Your puzzlement here just
confirms that.  Trust both of us on this, it's important.  And
reread my previous three paragraphs.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Python extension commands in git - request for policy change

2012-12-12 Thread Eric S. Raymond
Jeff King :
> I think there are really two separate use cases to consider:
> 
>   1. Providing snippets of script to Git to get Turing-complete behavior
>  for existing Git features. For example, selecting commits during a
>  traversal (e.g., a better "log --grep"), formatting output (e.g., a
>  better "log --format" or "for-each-ref --format").
> 
>   2. Writing whole new git commands in a language that is quicker or
>  easier to develop in than C.

That's good analysis.  I agree with your use-case split, I guess I'm just not
very aware of the places in git where (1) is important.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Python extension commands in git - request for policy change

2012-12-12 Thread Eric S. Raymond
Joshua Jensen :
> Anyway, my preference is to allow scripts to run in-process within
> Git, because it is far, far faster on Windows.  I imagine it is
> faster than forking processes on non-Windows machines, too, but I
> have no statistics to back that up.
> 
> Python, Perl, or Ruby can be embedded, too, but Lua probably embeds
> the easiest and smallest out of those other 3 languages.
> 
> And shell scripts tend to be the slowest on Windows due to the
> excessive numbers of process invocations needed to get anything
> reasonable done.

I don't think there's *any* dimension along which lua is not clearly
better than shell for this sort of thing, so no argument there.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Python extension commands in git - request for policy change

2012-12-11 Thread Eric S. Raymond
Sitaram Chamarty :
> [snipping the rest; all valid points no doubt]

I meant to respond to Patrick's post earlier.

I haven't actually written any code in lua yet, but I've read the book;
I think I get it.  I've seen the effects of lua integration on another
large project, Battle for Wesnoth.

I'm not, despite conclusions some people here might have jumped to,
religiously attached to Python.  So I can say this: I think lua as a
language is an *excellent* design.  It is clever, economical,
minimalist, and (other than the one ugly detail of 1-origin indexing)
shows consistent good taste.

It might be a good fit for extending git; I wouldn't be very surprised if
that worked. However, I do have concerns about the "Oh, we'll just
lash together a binding to C" attitude common among lua programmers; I
foresee maintainability problems and the possibility of slow death by
low-level details as that strategy tries to scale up.

And, of course, one problem with calling back into C a lot is that
you walk back into C's resource-management issues.

My sense is that git's use cases are better served by a glue language
in the Python/Perl/Ruby class rather than an extension langage. But
my mind is open on this issue.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Millisecond precision in timestamps?

2012-11-29 Thread Eric S. Raymond
Junio C Hamano :
> That is exactly why I said it is all relative.  If it helps your
> application, you can weigh the pros-and-cons yourself and choose to
> throw "junk" extended header fields in the commit objects you
> create, using hash-object (or commit-tree).  You can read it out
> using cat-file and do whatever you want to do with it, and modern
> Git (v1.5.0 was from early 2007) and tools that are designed to work
> with Git know to ignore such "junk" field.

A good start.  But remember that reposurgeon's entire interface to the
git object level is through fast-export/fast-import.  I need import-
stream syntax for these.

bzr's syntax would do:

-----------
mark :1
committer Eric S. Raymond  1289147634 -0500
data 14
First commit.

property branch-nick 12 bzr-testrepo
M 644 inline README
data 41
This is a test file in a dummy bzr repo.
---

If we actually care about keys being full utf-8 with embedded whitespace
it should look more like this:

-----------
mark :1
committer Eric S. Raymond  1289147634 -0500
data 14
First commit.

property 11
branch-nick
propval 12 
bzr-testrepo
M 644 inline README
data 41
This is a test file in a dummy bzr repo.
---
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >