That light at the end of the tunnel?

Eric S. Raymond Fri, 20 Jul 2018 19:05:34 -0700

That light at the end of the tunnel turned out to be an oncoming train.

Until recently I thought the conversion was near finished. I'd had
verified clean conversions across trunk and all branches, except for
one screwed-up branch that the management agreed we could discard.


I had some minor issues left with execute-permission propagation and how
to interpret mid-branch deletes  I solved the former and was working
on the latter.  I expected to converge on a final result well before
the end of the year, probably in August or September.

Then, as I reported here, my most recent test conversion produced
incorrect content on trunk.  That's very bad, because the sheer size
of the GCC repository makes bug forensics extremely slow. Just loading
the SVN dump file for examination in reposurgeon takes 4.5 hours; full
conversions are back up to 9 hours now.  The repository is growing
about as fast as my ability to find speed optimizations.

Then it got worse. I backed up to a commit that I remembered as
producing a clean conversion, and it didn't. This can only mean that
the reposurgeon changes I've been making to handle weird branch-copy
cases have been fighting each other.

For those of you late to the party, interpreting the operation
sequences in Subversion dump files is simple and produces results that
are easy to verify - except near branch copy operations. The way those
interact with each other and other operations is extremely murky.

There is *a* correct semantics defined by what the Subversion code
does.  But if any of the Subversion devs ever fully understood it,
they no longer do. The dump format was never documented by them. It is
only partly documented now because I reverse-engineered it.  But the
document I wrote has questions in it that the Subversion devs can't
answer.

It's not unusual for me to trip over a novel branch-copy-related
weirdness while converting a repo.  Normally the way I handle this is
by performing a bisection procedure to pin down the bad commit.  Then I:

(1) Truncate the dump to the shortest leading segment that
reproduces the problem.

(2) Perform a strip operation that replaces all content blobs with
unique small cookies that identify their source commit. Verify that it still
reproduces...

(3) Perform a topological reduce that drops out all uninteresting
commits, that is pure content changes not adjacent to any branch
copies or property changes. Verify that it still reproduces...

(4) Manually remove irrelevant branches with reposurgeon.
Verify that it still reproduces...

At this point I normally have a fairly small test repository (never,
previously, more than 200 or so commits) that reproduces
the issue. I watch conversions at increasing debug levels until I
figure out what is going on. Then I fix it and the reduced dump
becomes a new regression test.

In this way I make monotonic progress towards a dumpfile analyzer
that ever more closely emulates what the Subversion code is doing.
It's not anything like easy, and gets less so as the edge cases I'm
probing get more recondite.  But until now it worked.

The size of the GCC repository defeats this strategy. By back of the
envelope calculation, a single full bisection would take a minimum of
18 days.  Realistically it would probably be closer to a month.

That means that, under present assumptions, it's game over
and we've lost.  The GCC repo is just too large and weird.

My tools need to get a lot faster, like more than an order of
magnitude faster, before digging out of the bad situation the
conversion is now in will be practical.

Hardware improvements won't do that.  Nobody knows how to build a
machine that can crank a single process enough faster than 1.3GHz.
And the problem doesn't parallelize.

There is a software change that might do it.  I have been thinking
about translating reposurgeon from Python to Go. Preliminary
experiments with a Go version of repocutter show that it has a
40x speed advantage over the Python version.  I don't think I'll
get quite that much speedup on reposurgeon, but I'm pretty
optimistic agout getting enough speedup to make debugging the GCC
conversion tractable.  Even at half that, 9 hour test runs would
collapse to 13 minutes.

The problem with this plan is that a full move to Go will be very
difficult.  *Very* difficult.  As in, work time in an unknown and
possibly large number of months.

GCC management will have to make a decision about how patient
it is willing to be.  I am at this point not sure it wouldn't be
better to convert your existing tree state and go from there, jeeping
the Subversion history around for archival purposes
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

..every Man has a Property in his own Person. This no Body has any
Right to but himself.  The Labour of his Body, and the Work of his
Hands, we may say, are properly his. .... The great and chief end
therefore, of Mens uniting into Commonwealths, and putting themselves
under Government, is the Preservation of their Property.
        -- John Locke, "A Treatise Concerning Civil Government"

That light at the end of the tunnel?

Reply via email to