Re: removing toxic emailers
Ian Lance Taylor via Gcc : > This conversation has moved well off-topic for the GCC mailing lists. > > Some of the posts here do not follow the GNU Kind Communication > Guidelines (https://www.gnu.org/philosophy/kind-communication.en.html). > > I suggest that people who want to continue this thread take it off the > GCC mailing list. > > Thanks. > > Ian Welcome to the consequences of abandoning "You shall judge by the code alone." This is what it will be like, *forever*, until you reassert that norm. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
Ian Lance Taylor : > Patronizing or infantilizing anybody doesn't come into this at all. I am not even *remotely* persuaded of this. This whole attitude that if a woman is ever exposed to a man with less than perfect American upper-middle-class manners it's a calamity requiring intervention and mass shunning, that *reeks* of infantilizing women. > We want free software to succeed. Free software is more likely to > succeed if more people work on it. If you are a volunteer, as many > are, you can choose to spend your time on the project where you have > to short-stop unwelcome advances, where you are required to deal with > "men with poor social skills." Or you can choose to spend your time > on the project where people treat you with respect. Which one do you > choose? The one where your expected satisfaction is higher, with boorishness from autistic males factored in as one of the overheads. Don't try to tell me that's a deal-killer, I've known too many women who would laugh at you for that assumption. > Or perhaps you have a job that requires you to work on free software. > Now, if you work on a project where the people act like RMS, you are > being forced by your employer to work in a space where you face > unwelcome advances and men who have "trouble recognizing boundaries." > That's textbook hostile environment, and a set up for you to sue your > employer. So your employer will never ask anyone to work on a project > where people act like that--at least, they won't do it more than once. Here's what happens in the real world (and I'm not speculating, I was a BoD member of a tech startup at one time, stuff like this came up). You say "X is being a jerk - can I work on something else?" Your employer, rightly terrified of the next step, is not going to "force" you to do a damn thing. He's going to bend over backwards to accommodate you. > (Entirely separately, I don't get the slant of your whole e-mail. You > can put up with RMS despite the boorish behavior you describe. Great. > You're a saint. Why do you expect everyone else to be a saint? I'm no saint, I'm merely an adult who takes responsibility for my own choices when dealing with people who have minimal-brain-damage syndromes. OK, I have probably acquired a bit more tolerance for their quirks than average from long experience, but I don't believe I'm an extreme outlier that way. What I am pushing for is for everyone to recognize that *women are adults* - they have their own agency and are in general perfectly capable of treating an RMS-class jerk as at worst a minor annoyance. Behaving as though he's some sort of icky monster who should be shunned by all right-thinking people and taints everything he touches is ... just unbelievably disconnected from reality. Bizarre neo-Puritan virtue signaling of no help to anyone. If I needed more evidence that many Americans lead pampered, cossetted, hyper-insulated lives that require them to make up their own drama, this whole flap would be it. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
Christopher Dimech via Gcc : > The commercial use of free software is our hope, not our fear. When people > at IBM began to come to free software, wanting to recommend it and use it, > and maybe distribute it themselves or encourage other people to distribute > it for them, we did not criticise them for not being non-profit virtuous > enough, or said "we are suspicious of you", let alone threatening them. Actually, some of us did *exactly* those things late in the last century. One of the challenges I faced in my early famous years was persuading the hacker culture as a whole to treat the profit-centered parts of the economy as allies rather than enemies. I won't say that a *majority* of us were resistent to this, but I did have to work hard on the problem for a while, between 1997 and about 2003. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
David Malcolm : > > I will, however, point out that it is a very *different* point from > > "RMS has iupset some people and should therefore be canceled". > > Eric: I don't know if you're just being glib, or you're deliberately > trying to caricature those of us who are upset by RMS's behavior. My intent was not caricature. I was being dismissive and snarky because I genuinely consider the personality complaints against RMS to be pretty trivial. Not the managerial ones Joseph Myers listed; those are serious. But they're not the cause of the current ruckus. To make the "triviality" point in the most forceful possible way, I will take the bull by the horns and directly address RMS's behavior towards women. And I will reveal a few things that I haven't talked about in public for 40 years. I've known RMS since 1979; I'm fully aware of how obnoxious he can be towards both men and women. There have been occasions on which I have thought the state of the universe would have been improved if he'd gotten a swift slap in the face. In fact, the first or second time I met him face to face it was because he was rather determinedly pursuing my then-girlfriend. A hostile witness might have said he was creeping on her, though that slang for it wouldn't be invented until much later. I think an explanation of how how I reasoned about that situation has some value in light of the current attempt to ostracize RMS. I paid very careful attention to whether my girlfriend appeared to need any help dealing with him. I regarded her as an adult fully capable of making her own decisions. One of those decisions could have been to slap his face. If a more severe sanction had been required, and she had yelled for help, I would cheerfully have punched his lights out. No fisticuffs were required. She gently discouraged him, and we both established friendly relations with him. In later years RMS and I remained fairly close long after I broke up with that girlfriend. He made passes at at least two of my later girlfriends that I know of, including the woman I am still married to. In all cases, I trusted these ladies to handle the situation like adults, and they did. It really would not have occurred to me to do otherwise. I hear a lot of talk about RMS's behavior towards women being some sort of vast horrible transgression that will drive all women everywhere to flee from ever being contributors to FSF projects. To me this seems just silly, and very infantilizing of women in general. My girlfriends were emtirely able to (1) short-stop his advances when they became unwelcome (2) understand that some men have poor social skills and trouble recognizing boundaries, (3) and *stay on friendly terms with him anyway*. I mean I saw this not just more than once, but every single time it came up. I don't assume that any adult female is incapable of these things; I respect women as fully capable of asserting and defending their interests, I *expect* women to do that, and I thus consider a lot of the white-knighting on their behalf to be at best empty virtue signaling and at worst a cover for much more discreditable motives. Of course, he offends men too. When I deal with RMS, I know that I'm going to have to cope with a certain amount of unpleasantness because he has autism-like deficits amplified by some unfortunate personal history. Yes. So what? He's one of my oldest friends anyway. He has many admirable qualities; I respect and value him even when I have to argue with him. And I can work with him when I need to. Why in the *hell* should I assume anyone with female genitalia is incapable of doing the same? More to the point, why is anybody else making such a silly, reductive assumption and then turning it into a galloping moral panic that somehow justifies stoning RMS and driving him out of the village? *grumble* Get *over* yourselves. You want to be "welcoming" to women? Don't patronize or infantilize them - respect their ability to tell off RMS for themselves *and then keep working with him*! -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
Adrian via Gcc : > Eric S. Raymond : > > there is actually a value conflict between being "welcoming" in that > sense and the actual purpose of this list, which is to ship code. > > Speaking as a "high functioning autist", I'm aware of the difficulties that > some of us have with social interactions - and also that many of us > construct a persona or multiple personae to interact with others, a > phenomenon known as "masking". > > I understand why "Asshole" can function as a viable mask for many people, > because there are cultures where it's tolerated, particularly in > remote-working groups like mailing lists, where physical altercations are > unlikely and no-one has to confront the results of their interactions with > others if they don't want to. > > It doesn't necessarily follow that "smart" == "asshole" though. I did not intend that claim. I intended the weaker observation that driving away a large number of smart autistic assholes (and non-assholes with poor social skills) is not necessarily a good trade for the people the project might recruit by being "more welcoming". Possibly that *would* be a good trade. I have decades of experience that makes me doubt this. I think the claim needs to be examined skeptically, not just uncritically accepted because we value being "nice". In general, I think efforts to guilt-bomb hackers into being "more inclusive" should be resisted without a clear grasp on what we might be throwing away by accepting them. Just because you live inside a culture doesn't mean you can predict what mutating its assumptions will do to it, and we have work to do that should not be casually disrupted. Note: I am not an autist myself, so I'm not guarding my own flanks here. I'm sort of autist-sympathetic, in that I think it is a good thing autists can join the hacker culture and have a place where their quirks are useful and tolerated. I would be a little sad if that were lost. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
Paul Koning via Gcc : > > On Apr 14, 2021, at 4:39 PM, Ian Lance Taylor via Gcc > > wrote: > > So we don't get the choice between "everyone is welcome" and "some > > people are kicked off the list." We get the choice between "some > > people decline to participate because it is unpleasant" and "some > > people are kicked off the list." > > > > Given the choice of which group of people are going to participate and > > which group are not, which group do we want? > > My answer is "it depends". More precisely, in the past I would have > favored those who decline because the environment is unpleasant -- > with the implied assumption being that their objections are > reasonable. Given the emergency of cancel culture, that assumption > is no longer automatically valid. I concur on both counts. You (the GCC project) are no longer in a situation where any random person saying "your environment is hostile" is a reliable signal of a real problem. Safetyism is being gamed by outsiders for purposes that are not yours and have nothing to do with shipping good code. Complaints need to be discounted accordingly, to a degree that would not have been required before the development of a self-reinforcing culture of complaint and rage-mobbing around 2014. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
Joseph Myers : > On Wed, 14 Apr 2021, Eric S. Raymond wrote: > > > I'm not judging RMS's behavior (or anyone else's) one way or > > another. I am simply pointing out that there is a Schelling point in > > possible community norms that is well expressed as "you shall judge by > > the code alone". This list is not full of contention from affirming > > that norm, but from some peoples' attempt to repudiate it. > > Since RMS, FSF and GNU are not contributing code to the toolchain and > haven't been for a very long time, the most similar basis to judge them > would seem to be based on their interactions with toolchain development. > I think those interactions generally show that FSF and GNU have been bad > umbrella organizations for the toolchain since at least when the GCC 4.4 > release was delayed waiting for a slow process of developing the GCC > Runtime Library Exception. I do not have standing to argue this point. I will, however, point out that it is a very *different* point from "RMS has iupset some people and should therefore be canceled". -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
Nathan Sidwell : > The choice to /not/ have a policy for ejecting jerks has serious costs. One > of those costs is the kind of rancorous dispute that has been > burning like a brushfire on this list the last few weeks. The situation isn't that symmetrical. The brushfire didn't happen when it was a norm here that off-list behavior was not the list's business. It only came about when some people decided that norm should no longer apply. I'm not judging RMS's behavior (or anyone else's) one way or another. I am simply pointing out that there is a Schelling point in possible community norms that is well expressed as "you shall judge by the code alone". This list is not full of contention from affirming that norm, but from some peoples' attempt to repudiate it. (For those of you unfamilar with the concept, a Schelling point is one of natural equilibrium in a two- or molti-player game, such that when you move away from it all parties' decision costs go way up.) -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
Nathan Sidwell : > I'd just like to eject the jerks, because they make the place unwelcoming. I understand the impulse. The problem is that there is actually a value conflict between being "welcoming" in that sense and the actual purpose of this list, which is to ship code. It's a much more direct conflict in the hacker culture than elsewhere because so many potential contributors are high-functioning autists. That makes the downstream consequences of politeness enforcement a lot more damaging to the project's ability to ship code than they would otherwise be. There is a hypothetical world, of course, in which jerks and assholes are such a huge problem that they interfere measurably with shipping code. But contemplete the amount of angry verbiage on this list recently from people who could have been using their fingers typing code, and I think it's clear that the amount of social friction oroduced by attempts to eject the jerks will be far higher than if you simply continued to tolerate them. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: removing toxic emailers
Nathan Sidwell : > Do we have a policy about removing list subscribers that send abusive or > other toxic emails? do we have a code of conduct? Searching the wiki or > website finds nothing. The mission statement mentions nothing. I'm not a GCC insider, but I know a few things about the social dynamics of voluntarist subcultures. You might recall I wrote a book about that once. The choice to have a policy for ejecting jerks has serious costs. One of those costs is the kind of rancorous dispute that has been burning like a brushfire on this list the last few weeks. Another, particularly serious for hackers - is that such a policy is hostile to autists and others who have poor interaction skills but can ship good code. This is a significant percentage of your current and future potential contributors, enough that excluding them is a real problem. Most seriously: the rules, whatever they are, will be gamed by people whose objectives are not "ship useful software". You will be fortunate if the gamers' objectives are as relatively innocuous as "gain points in monkey status competition by beating up funny-colored monkeys"; there are much worse cases that have been known to crash even projects with nearly as much history and social inertia as this one. Compared to these costs, the overhead of tolerating a few jerks and assholes is pretty much trivial. That's hard to see right now because the jerks are visible and the costs of formal policing are hypothetical, but I strongly advise you against going down the Code of Conduct route regardless of how fashionable that looks right now. I have forty years of observer-participant anthropology in intentional online communities, beginning with the disintegration of the USENET cabal back in the 1980s, telling me that will not end well. You're better off with an informal system of moderator fiat and *without* rules that beg to become a subject of dispute and manipulation. A strong norm about off-list behavior and politics being out of bounds here is also helpful. You face a choice between being a community that is about shipping code and one that is embroiled in perpetual controversy over who gets to play here and on what terms. Choose wisely. -- http://www.catb.org/~esr/";>Eric S. Raymond
The dust seems to have settled from the repository conversion
The dust seems to have settled from the GCC repository conversion. I haven't seen any complaints about the conversion since it was finalized in January, so I'm gathering there have not been any significant problems with it. Unfortunately, it left *me* with a problem. If you're on this list, more than likely you have a full-time job that pays you for working on open-source code. Twenty years ago I sold the business world on the value of open-source shared infrastructure, so you can partly thank me for the fact that you have that option. Ironically, I myself have benefitted very little from that successful persuasion, because the work I do is not closely enough tied to anything a corporation knows it can monetize. Who has a business case for developing something like reposurgeon? I spent most of a year - thousands of hours - focusing on the technical issues associated with the GCC conversion. Because I'm not on salary anywhere, paying bills and not having steady income during that time blew a pretty large hole in my savings account. Now my house needs a new roof, and I have medical bills, and things are looking rather grim. This wasn't the first public infrastructure project I've worked on, and it certainly won't be the last. Reposurgeon, GPSD, NTPsec, giflib - if you have found my work valuable and it gives you confidence that I will continue to do useful things, please subscribe at one of these places: https://www.subscribestar.com/esr https://www.patreon.com/esr Finally, be aware that I am not the only person inn this sort of situation. If you feel motivated to tackle the more general problem of load-bearing Internet people without salaries, please look at http://loadsharers.org take the pledge, and find two load-bearers to support who aren't me. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Help with new GCC git workflow...
Richard Biener : > > I like to write really fine-grained commits when I'm developing, then > > squash before pushing so the public repo commits always go from "tests > > pass" to "test pass". That way you can do clean bisections on the > > public history. > > The question is wheter one could achieve this with branches? That is, > have master contain a merge commit from a branch that contains the > fine-grained commits? Because for forensics those can be sometimes > useful. Of course you can do this. Git gives you a number of different possbilities here. You get to chose based onm how you like your histiry to look. Discussion of my choice is here: https://blog.ntpsec.org/2017/04/09/single-head-provable-steps.html -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Help with new GCC git workflow...
Peter Bergner : > At this point, I get a little confused. :-) I know to submit my patch > for review, I'll want to squash my commits down into one patch, but how > does one do that? Should I do that now or only when I'm ready to > push this change to the upstream repo or ??? Do I need to even do that? If you want to squash a commit series, the magic is git rebase -i. You give that a number of commits to look back at at and you'll get a buffer instructing you how to squash and shuffle that series. You'll also be able to edit the commit message. I like to write really fine-grained commits when I'm developing, then squash before pushing so the public repo commits always go from "tests pass" to "test pass". That way you can do clean bisections on the public history. > Also, when I'm ready to push this "change" upstream to trunk, I'll need > to move this over to my master and then push. What are the recommended > commands for doing that? There are a couple of ways. I usually squash as described above then use "git cherry-pick". But that's because I have philosophical reasons to avoid long-lives branches. > I assume I need to rebase my branch to > current upstream master, since that probably has moved forward since > I checked my code out. Yes, in general you'll want to do that. > Also, at what point do I write my final commit message, which is different > than the (possibly simple) commit messages above? Is that done after I've > pulled my local branch into my master? ...or before? ...or during the > merge over? I do it at rebase -i time along with the squash of the series. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: git conversion in progress
Thomas Koenig : > Hm... I just hope this is a one-time effect, and isn't an indication > that git uses much more resources, server-side, so the current > infrastructure is not up to the task. Is git that much more > resource hungry than svn? Or is this unrelated? Almost certanly unrelated. In normal use git is *spectacularly* faster than SVN on equivalent operations. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Bernd Schmidt : > I was on the fence for a long time, since I felt that the rewritten > reposurgeon was still somewhat unproven. And that was a fair criticism for a short while, until the first compare-all verification on the GCC history came back clean. The most difficult point in the whole process for me was in late November. That was when I faced up to the fact that, while I had a Subversion dump reader that was 95% good, (1) that 5% could disqualify it for this complex a history, and (2) I wasn't going to be able to solve that last 5% without tearing down most of the reader and rebuilding it. The problem was that I'd been patching the dump reader to fix edge cases for too long, and the code had rigidified. Too many auxiliary data structures with partially overlapping semabtics - I couldn't change anything without breaking everything. Which is the universe's way of telling you it's time for a rewrite. Of course the risk was that I wouldn't get that rewrite done in time for deadline. But I had two assets that mitigated the risk. One was a couple of very sharp collaborators, Julien Rivaud and Daniel Brooks (and later another, Edward Cree). The other was having a really good test suite, and a well-established procedure for integrating new tests that jsm and rearnshaw were able to use. It was (as the Duke of Wellington famously said) a damned near-run thing. With all those advantages, if I had waited even a week longer to make the crucial scrap-and-rebuild decision, the new reader might have landed too late. There's a lesson in here somewhere. When I figure out what it is, I'll put it in my next book. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Rescue of prehistoric GCC versions
Joseph Myers : > I want to consider the conversion machinery essentially frozen at this > point and not to add any new features not present in the conversion now Very well, I won't push the inegration change for those commits. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Richard Earnshaw (lists) : > I want to also take this opportunity to thank Maxim for the work he has > done. Having that fallback option has meant that we could press harder for > a timely solution and has also driven several significant improvements to > the overall result. I do not think we would have achieved as good a result > overall if he hadn't developed his scripts. Yes. Reposurgeon's ChangeLog processing, in particular, was significantly improved using lessons learned from maxim's scripts. -- http://www.catb.org/~esr/";>Eric S. Raymond
Rescue of prehistoric GCC versions
I have been able to rescue or reconstruct from patches the following prehisoric GCC releases gcc-0.9 gcc-1.21 gcc-1.22 gcc-1.25 gcc-1.26 gcc-1.27 gcc-1.28 gcc-1.35 gcc-1.36 gcc-1.37.1 gcc-1.38 gcc-1.39 gcc-1.40 gcc-1.41 gcc-1.42 gcc-2.1 gcc-2.2.2 gcc-2.3.3 gcc-2.4.5 gcc-2.5.8 gcc-2.6.3 gcc-2.7.2 gcc-2.8.0 The gap in the sequence represents the beginning of the repository history; r3 = gcc-1.36. The 0.9 to 0.35 tarballs can be glued to the front of the history, one commit each, with a firewall commit containing a deleteall to keep the content from leaking forward. This is an issue because the early parts of the repo don't have complete trees. I'm now testing a conversion on the Great Beast that puts these in place. If all goes well I will push this capability to the public conversion repository later today. You can audit the reconstruction process by reading the script I wrote to automate it: https://gitlab.com/esr/gcc-conversion/blob/master/ancients Unfortunately, I was only able to find valid patch chains to three releases that don't have complete tarballs. If anyone else can scrounge up materials that could help complete the fossil sequence, now would be a really good time for that. We have only three days at most left to integrate them. -- http://www.catb.org/~esr/";>Eric S. Raymond The object of life is not to be on the side of the majority, but to escape finding oneself in the ranks of the insane. -- Marcus Aurelius
Re: GIT conversion: question about tags & release branches
Martin Liška : > > Anyway, please check Joseph's next candidate to see if this shows what you > > expect -- I think it should be out later today. > > I'll check it once it's published. Everybody: time is growing short before the final conversion, so if you see anything that looks wrong or anomalous please send up a rocket *immediately*. The faster you let us know, the more likely it is we'll be able to nip in with a fix while that is still possible. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Maxim Kuvyrkov : > Once gcc-reparent conversion is regenerated, I'll do another round of > comparisons between it and whatever the latest reposurgeon version is. Thanks, Maxim. Those comparisons have been very helpful to Joseph and Richard and to the reposurgeon devteam as well. They use your feedback to find places where their comment-processing scripts could be improved; we've used it learn what additional oddities in ChangeLogs we need to be able to handle automatically. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > To me, that indicates that using a conversion tool that is conservative in > its heuristics, and then selectively applying improvements to the extent > they can be done safely with manual review in a reasonable time, is better > than applying a conversion tool with more aggressive heuristics. There's a more general point here, which I'm developing in my book-in-progress. Clean data-conversion problems can be done algorithmically without a human in the loop. Messy data-conversion problems need judgment amplifiers. Maxim's scripts try to treat a messy conversion problem as though it were a clean one. Maxim is pretty sharp, so this almost works. Almost. But the failure mode is predictable - overinterpreting badly-formed input leads to plausible garbage on output. When this happens, it's the Goddess Eris's way of telling you that there needs to be human judgment in the loop. Instead of trying to automate it out, you should be building tools that partion the process into things a computer does well, driven by choices a human makes well. This is a point that needs making because programmers thrown at messy conversion problems tend to be more fixated on achieving full automation than they perhaps ought to be. Elswhere I have written of Zeno tarpits: http://esr.ibiblio.org/?p=6772 Subversion dump streams are not quite a Zeno tarpit - they actually obey something that has the effect of a formal specification - but ChangeLog parsing is. > The issues with the reposurgeon conversion listed in Maxim's last comments > were of the form "reposurgeon is being conservative in how it generates > metadata from SVN information". I think that's a very good basis for > adding on a limited set of safe improvements to authors and commit > messages that can be done reasonably soon and then doing the final > conversion with reposurgeon. The flip side of this is that Joseph has been making intelligent and realistic suggestions for how to improve reposurgeon. That is *invaluable* - it captures knowledge that will make future comparisons easier and better. Software engineers (outside of a few AI specialists) don't ordinarily think of themselves as being in the knowledge-capture business. But it's a useful perspective to cultivate. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Git conversion: fixing email addresses from ChangeLog files
Richard Earnshaw (lists) : > Weak in the sense that it isn't proof given that the user name is > partially redacted. There's nothing in the gcc archives that gives a > full name either, unfortunately. > > Yes, it's the most likely match, but there's still an element of doubt. > > R. https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60 If you open his message to Michel Peppler, you'll see a sig block that says: bjo...@planetarion.com Bjørn Wennberg, Fifth Season AS It's him, yep. Be sure to get the ø right what you fill in the name. :-) -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > The case you mention is one where there was a merge to a branch not from > its immediate parent but from an indirect parent. I don't think it would > be hard to support detecting such merges in reposurgeon. We're working on it. > This is an example where the originally added ChangeLog entry was > malformed (had the date in the form "2004-0630"), so a conservatively safe > approach was taken of using the committer rather than trying to guess what > a malformed ChangeLog entry means and risk extracting nonsense. > > I expect other cases are being similarly careful in cases where there was > a malformed ChangeLog entry or a commit edited ChangeLog entries by other > authors so leaving its single-author nature ambiguous. Parsing > ChangeLogs, especially where malformed entries are involved, is inherently > a heuristic matter. As Joseph says, one of reposurgeon's design principles is "First, do no harm." And yes, changelogs are full of malformations and junk like this. I saw and dealt with a lifetime's worth while converting the Emacs history from bzr to git. If you try to interpret any random garbage in, you will assuredly get garbage out when you least expect it. Often the cost of this sort of mistake is not fully realized until it is far too late for correction. This is *why* reposurgeon is conservative. The correct thing for reposurgeon to do is flag unparseable entry headers for human intervention, and as of today it does that. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Git conversion: fixing email addresses from ChangeLog files
Richard Earnshaw (lists) : > Also, for this one: > > # "47044": "", > > There's some (relatively weak) evidence that this is Bjørn Wennberg (eg > https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60J), > but in the absence of stronger evidence, I'm going to just put bjornw as > the name. What's weak about that? The full email address matches. Un;rdd you think there are two hackers nameed Bjorn, with a last initial of W, running around using the same email address, I think we have a winner. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: The far past of GCC
Mark Wielaard : > Apparently less complete, but there is also > https://ftp.gnu.org/old-gnu/gcc/ > Which does have some old diff files to reconstruct some missing versions. There are quite a few ancient preserved release tarballs out there Here is the list of reconstructable pre-r3 releases as as I now know it: 0.9 ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-0.9.tar.bz2 1.21ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.21.tar.bz2 1.22ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.22.tar.bz2 1.23ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.23-1.24.bz2 1.24ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.24-1.25.bz2 1.25ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.25-1.26.bz2 1.26ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.26-1.27.bz2 1.27ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.27.tar.bz2 1.28ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.28-1.29.bz2 1.29ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.29-1.30.bz2 1.30ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.30-1.31.bz2 1.31ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.31.tar.bz2 1.32ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.31-1.32.bz2 1.33ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.32-1.33.bz2 1.34ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.32-1.34.bz2 1.35ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.35.tar.bz2 It looks like the relevant bits of ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-[12] and ftp://sourceware.org/pub/gcc/old-releases/gcc-[12] Incorporating these will be easy. What I would do is write script that does this: (a) checks to see if each tarball is mirrored locally (b) if not, fetches it, applying forward or back diffs from the nearest whole version as required. (c) generates a sequence of reposurgeon incorporate commands to be included un the main lift script sbb says r3 is 1.36. I doubt r1 and r2 are anything other than Subversion directory creations, but people with easier access than me should check. After this life gets a little trickier. We have the following tarballs that might be of interest: 1.36r3 ftp://gcc.org/pub/gcc/old-releases/gcc-1/gcc-1.36.tar.bz2 1.37? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.37.tar.bz2 1.38? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.38.tar.bz2 1.39? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.39.tar.bz2 1.40? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.40.tar.bz2 1.41? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.41.tar.bz2 1.42? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.42.tar.bz2 2.0 r358ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.8.tar.bz2 2.1 r586ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.1.tar.bz2 3.2.2 ? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.2.2.tar.bz2 2.3.3 ? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.3.3.tar.bz2 2.4.5 ? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.4.5.tar.bz2 2.5.8 ? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.5.8.tar.bz2 2.6.3 ? ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.6.3.tar.bz2 2.7.2 r10608 ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.7.2.tar.bz2 Before we can do anything with these, we need to identify which Subversion revsion each one with a ? belongs to. I've added three of ssb's identifications. For completeness I note thse for which we have no tarballs: r1184 = 2.2, r2674 = 2.3.1, r4493 = 2.4.0 "minus two swapped commits", r5867 = 2.5.0, r7771 = 2.6.0, r9996 = 2.7.0. This recomstruction is being tracked here: https://gitlab.com/esr/gcc-conversion/issues/4 -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: The far past of GCC
Jeff Law : > I believe RCS was initially used circa 1992 on the FSF machine which > held the canonical GCC sources. That year sounds right - it's when I wrote the original vcs.el for Emacs and a lot of Emacs users who hadn't been usiing version control started to. Doesn't give us a Subversion revision, though. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Git conversion: fixing email addresses from ChangeLog files
Richard Earnshaw (lists) : > I've just commented that one out for now; if anybody knows the correct > addresses, please let me know. Also, there's one joint list that I've > not attempted to fix at this time. > # "28488": "Jim Kingdon <http://developer.redhat.com>", That's Jim Kingdon the former CVS dev - I think he was involved in Subversion early too. He's king...@cyclic.com or king...@panix.com, according to my back mail. but since I think I remember that he did work at RedHat in the late '90s king...@redhat.com would be a good bet too. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Git conversion: fixing email addresses from ChangeLog files
Joseph Myers : > Concretely, what I'd suggest is: convert ISO-8859-1 entries in the > checked-in list to UTF-8, removing anything that thereby becomes a > duplicate or unnecessary; handle anything whose encoding isn't simply > ISO-8859-1 or UTF-8 via a hardcoded entry in bugdb.py using hex escapes > like the existing such entries there. Once the checked-in list is pure > UTF-8 it's easier for people to review and edit. Where the issue is only > presence of ISO-8859 NBSP, or "" or () around the names, remove that in > the checked-in list and again remove duplicates. That way the list can be > limited to non-encoding variations. Be aware that repusurgeon has a "transcode" command for moving a specified set of object to UTF-8 from a specified encoding. -- http://www.catb.org/~esr/";>Eric S. Raymond
The far past of GCC
In moving the history of a project old enough to have used more than one version-control system, I think it's good practice to mark the strata. I'm even interested in pinning down the RCS-to-CVS cutover, if there's enough evidence to establish that. I've added an issue to the tracker about this: https://gitlab.com/esr/reposurgeon/issues/224 If you have knowledge of the relevant dates or SVN revisions, please leave a comment on the issue. I'm making this a public request becauause there was talk of gluing very old, pre-CVS tarballs to the history. Reposurgeon has primitives to do this gracefully because one of my projects, INTERCAL, was old enough to have pre-CVS tarballs and I felt there was value in preserving that ancient history. I think there is rather more value in preserving GCC's ancient history! If nothing else, there are very few data sets on codebase growth with as long a timespan. Therefore, if you know where I can retrieve pre-CVS tarballs of GCC, please leave the URLs in a comment on that issue thread. I know about the official GCC download page; the oldest tarball on it is evidently from 1997, and I assume that is well after the project was CVSed. I'm looking for older sources. -- http://www.catb.org/~esr/";>Eric S. Raymond The spirit of resistance to government is so valuable on certain occasions, that I wish it always to be kept alive. It will often be exercised when wrong, but better so than not to be exercised at all. I like a little rebellion now and then. -- Thomas Jefferson, letter to Abigail Adams, 1787
Re: Test GCC conversion with reposurgeon available
Andreas Schwab : > On Dez 25 2019, Eric S. Raymond wrote: > > > That's easily fixed by adding a timezone entry to your author-map > > entry - CET, is it? > > The time zone is not constant. Congratulations, you have broken one of reposurgeon's assumptions. It is possible to use reposurgeon;d DSL tset committer TZ on a selected set of commits; if you want to work uo a patch for the lift script we'll take it. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > reposurgeon results are fully reproducible (by design, the same inputs to > the same version of reposurgeon should produce the same output as a > git-fast-import stream, Designer confirms, and adds that we gave a *very* stringent test suite to verify this. Much of it consists of bizarre malformations collected during past conversions. GCC has added its share. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Richard Earnshaw (lists) : > Well, personally, I'd rather we didn't throw away data we have in our > current SVN repo unless it's unpresentable in the final conversion. I agree with this philosophy. You will have noticed by now, I hope, that reposurgeon peserves as much as it can, leaving deletions to be a matter of user policy. In the normal case, reposurgeon could save its users a significant amount of work by being more aggressive about automatically deleting remnant bits that are merely *very unlikely* to be useful. I deliberately refused to go thar route. > Merge info is not one of those cases. Sometimes. Some Subversion mergeinfo operations map to Git's branch-centric merging. Many do not, corresponding to cherry-picks that cannot be expressed in a Git history. Reposurgeon does a correct but not complete job of translating mergeinfos that compose into branch merges. It handles the simple, cmmon cases and punts the tricky ones. More coverage would theoretically be possible, but I don't have the faintest clue what a general resolution rule would look like. Except I'm pretty sure the problem is bitchy-hard and the solution really easy to get subtly wrong. Frankly, I don't want to touch this mess with insulated tongs. Somebody would have to offer me serious money to compensate for the expected level of pain. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Maxim Kuvyrkov : > Removing auto-generated .gitignore files from reposurgeon conversion > would allow comparison of git trees vs gcc-pretty and gcc-reparent > beyond r195087. So, while we are evaluating the conversion > candidates, it is best to disable conversion features that cause > hard-to-workaround differences. I was going to write that feature yesterday, then Julien nipped in and did it while my back was turned. It's a read option, --no-automatic-ignores. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversion with reposurgeon available
Vincent Lefevre : > What matters is that the date is correct. I don't think the timezone > matters (that's why SVN doesn't store timezone information, I assume), > possibly except for the committer himself (?). For instance, Subversion doesn't store timezone because all commits are consifered to have occurred at UTC time on a central repository. I think time as well as date matters because soimetimes it could be information of significance what order commits were in even if they were on the same day. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversion with reposurgeon available
Toon Moene : > On 12/26/19 10:30 PM, Eric S. Raymond wrote: > > > Me, I don't undertstand why version-control systems designed for distributed > > use don't ignore timezones entirely and display all times in UTC - relative > > time is surely more imoortant than the commit time's relationship to solar > > noon wherever the keyboard happened to be. But I don't make these decisions. > > So we are going to base this world wide free software endeavor on a source > code system that doesn't keep time by UTC ? They all *do* keep time by UTC. What confuses me is why they every try to *display* anything other than UTC. It seems pointless to me to ever display local time in clients, but they do it anyway. Wiothout that complication, there would be no need to track user timezones. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversion with reposurgeon available
Vincent Lefevre : > > Here's why you want to get timezones right: there are going to be times > > when the order of commits is significant information for a developer's > > understanding of what happened. But without a timezone you only know > > the actual time of a commit to 24-hour resoltion. > > I don't understand what you mean. What matters for the order of > commits is the global time, and this is what SVN stores. SVN does not > store timezone information, i.e. it has no idea of what local time of > the user had, but I don't think this is important information. UTC time plus a timezone offset set is what git stores. That's not the locus of the problem. In Subversion-land there's newver any doubt about the sequence of commits; the revision numbers tell you that. In Git-land you have to go by timestamps, and if a timezone entry is wrong it can skew the displayed time. Me, I don't undertstand why version-control systems designed for distributed use don't ignore timezones entirely and display all times in UTC - relative time is surely more imoortant than the commit time's relationship to solar noon wherever the keyboard happened to be. But I don't make these decisions. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Alexandre Oliva : > I don't see that it does (help). Incremental conversion of a missed > branch should include the very same parent links that the conversion of > the entire repo would, just linking to the proper commits in the adopted > conversion. git-svn can do that incrementally, after the fact; I'm not > sure whether either conversion tool we're contemplating does, but being > able to undertake such recovery seems like a desirable feature to me. It's all in what you have in the lift script. Reposurgeon can do any kind of branch surgery you want, and that can be added to the conversion pipeline and replicated every time. > >From what I read, he's doing verifications against SVN. What I'm > suggesting, at this final stage, is for us to do verify one git > converted repo against the other. There are no tools for that, and probably won't be unless somebody revives repodiffer. There isn't a lot of time left in the schedule for that, and I have my hands full fixing other glitches. (Minor issues about parsing ChangeLogs and generated .gitignores; the serious problems are well behind us at this point.) > Maxim appears to be doing so and finding (easy-to-fix) problems in the > reposurgeon conversion; it would be nice for reposurgeon folks to > reciprocate and maybe even point out problems in the gcc-pretty > conversion, if they can find any, otherwise the allegations of > unsuitability of the tools would have to be taken on blind faith. Joseph has already made the call to go with a reposurgeon-based conversion for reasons he explained in detail on this list. Given that, it really doesn't make any sense for me to do any of what you're proposing with time I could use working on Joseph's RFEs instead. If you're concerned about the quality of reposurgeon's conversion, you'd be a good person to work on a comparison tool. Should I email you a copy of the repodiffer code as it last existed in my repository? -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Alexandre Oliva : > On Dec 25, 2019, "Eric S. Raymond" wrote: > > > Reposurgeon has a reparent command. If you have determined that a > > branch is detached or has an incorrect attachment point, patching the > > metadata of the root node to fix that is very easy. > > Thanks, I see how that can enable a missed branch to be converted and > added incrementally to a converted repo even after it went live, at > least as long as there aren't subsequent merges from a converted branch > to the missed one. I don't quite see how this helps if there are, > though. There's also a command for cutting parent links, ifvthat helps. > Could make it a requirement that at least the commits associated with > head branches and published tags compare equal in both conversions, or > that differences are known, understood and accepted, before we switch > over to either one? Going over all corresponding commits might be too > much, but at least a representative random sample would be desirable to > check IMHO. repotool compare does that, and there's a production in the conversion makefile that applies it. As Joseph says in anotyer reply, he's already doing a lot of the verifications you are suggesting. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversion with reposurgeon available
Joseph Myers : > On Wed, 25 Dec 2019, Andreas Schwab wrote: > > > On Dez 25 2019, Joseph Myers wrote: > > > > > Timezones for any email address can be specified in gcc.map for any > > > authors wishing to have an appropriate timezone used for their commits. > > > > But that should not be used for unrelated authors. > > It's not. > > On investigation, I think you are referring to the conversion of r269472. > That was committed for you by Jim Wilson and thus has you as author and > Jim Wilson as committer and Jim Wilson's timezone entry has been applied. > So the argument here is that the author's timezone information should be > applied to the author date, and the committer's timezone information > should be applied to the committer date. I expect that should be > straightforward (although when coming from SVN, there's also an argument > that we only have committer dates so the committer timezone is the > relevant one to apply). Theee's also an FSF policy about Changelogs that's relevant, I think. Git sometimes fills in the author field from the committer, and Changelog parsing is done only after translation. That's probably the source of this bug. If anybody cares enough to file a bug with a test load attached, I can probably fix this. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversion with reposurgeon available
Segher Boessenkool : > The goal is not to pretend we never used SVN. One of *my* goals is that the illusion of git back to the beginning of time should be as consistent as possible. > The goal is to have a Git repo that is as useful as possible for us. Exactly. I've already written about minimizing cognitive friction. Here's why you want to get timezones right: there are going to be times when the order of commits is significant information for a developer's understanding of what happened. But without a timezone you only know the actual time of a commit to 24-hour resoltion. There is no way we'll get this perfect. But there is more wrong and less wrong, and reposurgeon tries hard to be less wrong. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversion with reposurgeon available
Andreas Schwab : > Definitely not. I have never authored or committed any revision in the > -0800 time zone. That's easily fixed by adding a timezone entry to your author-map entry - CET, is it? That will prevent reposurgeon from making any attempt to deduce your timezone. It would be interesting to know how reposurgeon got misled. Most likely it was by a Changelog entry. Reposurgeon watches as these are being processed to see if it can pin an email address to a single timezone by looking up its TLD in the IANA database. I don't know how that could land you in California, though. Maybe I ought to be logging timezone deductions so we can trace them back. Has anyone else seen wrong timezone attributions? -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Segher Boessenkool : > Or doing what everyone else does: put an empty .gitignore file in > otherwise empty directories. That is an ugly kludge that I will have no part of whatsoever. Conversion artifacts like this are sources of cognitive friction and confusion that take developers' attention away from the substantive part of their work. Each individual one may be minor, but the cumulative effect can be a chronic distraction that us not less because developers are unware or ibly half-aware of it. Thus, the goal of a repository converter should be to bridge smoothly between the native idioms of the source and target systems, *minimizing* conversion artifacts. The ideal should be to produce a converted history that looks as much as possible like it has always been under the target system. Developers should have no need to know or care that the history used to be managed differently unless they need to do sonething that *unavoidably* crosses that boundary, like looking uo a legacy ID grom an old bug report. Reposurgeon was designed for this goal from the beginning. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversion with reposurgeon available
Joseph Myers : > These are all cases covered by the request-for-enhancement issue for > adding Co-Authored-by: when the ChangeLog header names multiple authors, > as the corresponding de facto git idiom for that case. I apologize, but I am growing doubtful I can deliver that. Even if I can, it may take longer than your conversion schedule allows given that we've only got five days on the clock. Here are the problems: 1. I don't have a reduced test case to validate parsing against. 2. The ChangeLog-parsing code is fragile and difficult to modify. This is inherent - the syntactic cues it's working with are weak and false matches are all too easy. I've got to have 1 before I can even try to deal with 2. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Alexandre Oliva : > I know very little about reposurgeon, but I'm concerned that, should we > make the conversion with it, and later identify e.g. missed branches, we > might be unable to make such an incremental recovery. Can anyone > alleviate my concerns and let me know we could indeed make such an > incremental recovery of a branch missed in the initial conversion, in > such a way that its commit history would be shared with that of the > already-converted branch it branched from? Reposurgeon has a reparent command. If you have determined that a branch is detached or has an incorrect attachment point, patching the metadata of the root node to fix that is very easy. > Now, would it be too much of a burden to insist that the commit graphs > out of both conversions be isomorphic, and maybe mappings between the > commit ids (if they can't be made identical to begin with, that is) be > generated and shared, so that the results of both conversions can be > efficiently and mechanically compared (disregarding expected > differences) not only in terms of branch and tag names and commit > graphs, but also tree contents, commit messages and any other metadata? > Has anything like this been done yet? On the GCC repository, no. There are very serious practical problems with full verification of git against SVN stemming mainly from the fact that Subversion checkout on a respository of this size is extremely slow. IIRC Joseph at one point estimated a check time on the order of months due to that overhead alone. If you're talking about a commit-by-commit comparison between two conversions that assumes one or te other is correct, that is theoretically possible and - because git retrieval is much faster - could theoretically be done in a reasonable amount of time. But there is a lot of devil in the practical details. The reposurgeon suite once included a tool for such comparisons. Last year this happened: commit b8a609925ba70a6b68f9eda1d748eb667ad2fa59 Author: Eric S. Raymond Date: Fri Aug 24 12:40:46 2018 -0400 Retire repodiffer. Its only use case was checks against git-svn... ...which we now know to make such bad conversions that on larger than trivial repos the differ would be prohibitively noisy. Maxim's scripts probably make a better conversion than bare git-svn, because he uses git-svn only for linear basic blocks and thereby avoids its worst failure modes. In theory I could dust off repodiffer and apply it. That's in theory. In practice, on a repository this size I am not greatly optimistic about getting a result that could be interpreted by a Mark I brain. The reasons go beyond git-svn's brain damage to the same ontological-mismatch problems that make SVN-to-git conversion a headache in general. You might think at least there'd be a 1:1 correspondence between commits in the two conversions, but that's not going to be true for a couple of different reasons. 1. Split commits. Reposurgeon decomposes these into pieces one per git branch. I don't know what Maxim's scripts do. I think Joseph turned up that there are over a thousand of these in the GCC history. 2. There are three classes of commits in Subversion that don't really fit the git data model, (1) directory creation/deletion commits, (2) directory copy commits, (3) property changes with no associated blob. For each of these exceptional commits a converter to Git has a choice of dropping the commit, turning it into some sort of annotated tag, or leaving it in place as a zero-op commit (anomalous but not forbidden in the git model). It is pretty much guaranteed that different converters will make different choices about these, which will make for huge amounts of noise in your attempt at a diff. Checking for DAG isomorphism: again, theoretically possible, practically pretty daunting. It could be worse - general graph isomorphism is not even known to be polynomial-time - but in this case we can label corresponding commits with matching legacy IDs, which should make possible an isomorphism check in linear time with a trivial algorithm. Well, except for split commits. That one would be solvable, albeit painful. The real problem here would be mergeinfo links. It's not even obvious what "correct" mapping of mergeinfo links is, in general, due to the mismatch between Subversion's cherry-pick-based merge model and git's branch merging. Again, different converters will make different choices. Reconciling them would be not fun. There is another world of hurt lurking in "(disregarding expected differences)". How do you know what differences to expect? How are you going to specify them? What will interpret that spec? There is more months of work here - nasty, wearing toil, with no guarantee of a result with a decent signal-to-noise ratio. Even though I'm quite literally the best-qualified person on earth to do it, I flinch at the thought. -- http://www.catb.org/~esr/";>Eric S. Raymond
The new Subversion reader in reposurgeon is complete for GCC purposes
This morning Julian Rivaud and I fully qualified the new Subversion dump stream reader against reposurgeon's test suite. This is the same code Joseph Myers has been using recent versions of to make test conversions of the GCC history that appear correct. We believe reposurgeon is now feature-complete for a full and correct GCC conversion. Caveat: The repository is too large for verification on every single revision to be practical. We have five remaining minor issues, mostly related to user-generaed .gitignore files (as opposed to files generarted from svn:ignore properties) that should not affect the GCC conversion. We expect to fix these over the next few days, anyway. We have one remaining RFE from Richard Earnshaw that would be nice to have, but is not essential. I'll be working on that. -- http://www.catb.org/~esr/";>Eric S. Raymond If I were to select a jack-booted group of fascists who are perhaps as large a danger to American society as I could pick today, I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms]. -- U.S. Representative John Dingell, 1980
Re: Test GCC conversions (publicly) available
Richard Earnshaw (lists) : > > No, I was thinking more of rearnsha bailing out to handle a family emergency > > and muttering something about not being back for a couple of weeks. If > > that's > > been resolved I haven't heard about it. > > I don't think that should affect things, as I think Joseph has a good handle > on what needs to be done and I think I've handed over everything that's > needed w.r.t. the commit summary reprocessing script. OK, that's good to know. I wish you good fortune with the emergency. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversions (publicly) available
Joseph Myers : > On Thu, 19 Dec 2019, Eric S. Raymond wrote: > > > There are other problems that might cause a delay beyond the > > 31st, however. Best if I let Joseph nd Richard explain those. > > I presume that's referring to the checkme: bug annotations where the PR > numbers in commit messages seem suspicious. I don't think that's > something to delay the conversion unless we're clearly close to having a > complete review of all those cases done; at the point of the final > conversion we should simply change the script not to modify commit > messages in any remaining unresolved suspicious cases. No, I was thinking more of rearnsha bailing out to handle a family emergency and muttering something about not being back for a couple of weeks. If that's been resolved I haven't heard about it. The only conversion blocker that I know is still live is the wrong attributions in some ChangeLog cases. I'm sure we'll get that fixed soon; at this point I'm more worried about getting the test suite to run clean again. The scenario I want to avoid is the where you get a conversion that looks production-ready before I get my tests cleaned up, you deploy it - and then I find something during the remainder of my cleanup that implies a problem with your conversion. A complicating factor is that I'm getting stale. I've been going hammer and tongs at this for nearly three months now, and that's not counting all the previous time on the Go translation. My defect rate is going up. I need a vacation or to work on something else for a while and I can't have that yet. Never nind. We'll get this done. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversions (publicly) available
Mark Wielaard : > Do we already have a new date for when we are making that decision? I believe Joseph was planning on Dec 31st. My team's part will be ready - the enabling reposurgeon changes should done in a week or so, with most of that being RFEs that could be dropped if there were real time pressure. There are other problems that might cause a delay beyond the 31st, however. Best if I let Joseph nd Richard explain those. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Unix philosopy vs. poor semantic locality
Joseph Myers : > On Wed, 18 Dec 2019, Eric S. Raymond wrote: > > > And that, ladies and gentlemen, is why reposurgeon has to be as > > large and complex as it is. > > And, in the end, it *is* complex software on which you build simple > scripts. gcc.lift is a simple script, written in the domain-specific > reposurgeon language. The Patterns crowd speaks of "alternating hard and soft layers". The design of reposurgeon was driven by two insights: 1. Previous VCS-conversion tools sucked in part because they tried to be too automatic, eliminating human judgment. Repposurgeon is designed and intended to be a *judgment amplifier*, doing mechanics and freeing the human operator to think about conversion policy. Hence the DSL. 2. git fast-import streams are a pretty capable format for interchanging version-control histories. Not perfect, but good enough that you can gain a lot by co-opting existing importers and exporters. Mate the idea of a judgment-amplifying DSL to a structure editor for git fast-import streams and reposurgeon is what you get. -- http://www.catb.org/~esr/";>Eric S. Raymond
Unix philosopy vs. poor semantic locality
[New thread] Segher Boessenkool : > > And the "simple scripts" argument dismisses the fact that those scripts > > are built on top of complex software. It just doesn't hold water IMHO. > > This is the Unix philosophy though! I'm now finishing a book in which I have a lot to say about this, inspired in part by experience with reposurgeon. One of the major concepts I introduce in the book is "semantic locality". This is a property of data representations and structures. A representation has good semantic locality when the context you need to interpret any individual part of it is reliably nearby. A classic example of a representation wth good semantic locality is a Unix password file. All the information associated with a username is on one line. It is accordingly easy to parse and extract individual records. Databases have very poor semantic locality. So do version-control systems. You need a lot of context to understand any individual data element, and that context can be arbitrarily far away in terms of retrieval complexity and time. The Unix philosophy of small loosely-coupled tools has few more fervent advocates than me. But I have come to understand that it almost necessarily fails in the presence of data representations with poor semantic locality. This contraint can be inverted and used as a guide to good design: to enable loose coupling, design your representations to have good semantic locality. If the Unix password file were an SQL database, could you grep it? No. You'd have to use an SQL-specific query method rather than a generic utility like grep that is uncoupled from the specifics of the database's schema. The ideal data representation for enabling the Unix ecology of tools is textual, self-describing, and has good semantic locality. Historically, Unix programmers have understood the importance of textuality and self-description. But we've lacked the concept of and a term for semantic locality. Having that allows one to talk about some things that were hard to even notice before. Here's one: the effort required to parallelize an operation on a data structure is inversely proportional to its semantic locality. If it has good semantic locality, you can carve it into pieces that are easy units of work for parallelization. If it doesn't...you can't. Best case is you'll need locking for shared parts. Worst case is that the referential structure of the representation is so tangled that you can't parallelize at all. Version-control systems rely on data structures with very poor semantic locality. It is therefore predictable that attacking them with small unspecialized tools and scripting is...difficult. It can be done, sometimes, with sufficient cleverness, but the results are too often like making a pig fly by strapping JATO units to it. That is to say: a brief and glorious ascent followed by entirely predictable catastrophe. Having trouble believing me? OK, here's a challenge: rewrite GCC's code-generation stage in awk/sed/m4. The attempt, if you actually made it, would teach you that poor semantic locality forces complexity on the tools that have to deal with it. And that, ladies and gentlemen, is why reposurgeon has to be as large and complex as it is. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > Nor do I think reposurgeon (or at least the SVN reader, which is the main > part engaged here) is significantly more complicated than implied by the > task it's performing of translating between the different conceptual > models of SVN and git. I've found it straightforward to produce reduced > testcases for issues found, and fixed several of them myself despite not > actually knowing Go. The issues remaining are generally conceptually > straightforward to understand the issue and how to fix it. Let me note for the record that I found Joseph's ability to find and fix bugs in the reader quite impressive. Maybe not as impressive as it would have been before the recent rewrite. That code used to be a pretty nasty hairball. It's a lot cleaner and easier to understand now. But impedence-matching the two data models is tricky, subtler than it looks, and has rebarbative edge cases. Even given the ckeanest possible implementatiion, troubleshooting it is no mean feat. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Jeff Law : > But it's not that freshly constructed, at least not in my mind. All > the experience ESR has from the python implementation carries to the Go > implementation. Not only do you have reposurgeon, you have me. I wish this mattered less than it does. I have *far* more experience doing big, messy repository moves than anybody else. I try to exteriorize that knowledge into the reposurgeon code and documents as much as I can, but as with other kinds of expertise a lot of it is implicit knowledge that is only elicited by practice and answering questions. On small conversions of clean repositories such implicit expertise doesn't matter too much. You may be able to pull off a once-and-done with the tools, especially if they're my tools and you've read all my stuff on good practice. As an example, the CVS-to-git conversion of groff didn't really need me. Lifts from CVS are normally horrible, but the groff devs were the best I've ever seen at not leaving debris from operator errors in the history. Any of them could have read my docs and done a clean coversion in two hours. Only...there was no way to way to know that in advance. The odds were heavily against it. Emacs was, and GCC is, the messy opposite case. You guys needed a seasoned "I know these things so you don't have to" expert more than you will probably ever really understand. And, sadly, there aren't any others but me yet. Nobody else has been interested enough in the problem to invest the time. > Where I think we could have done better would have been to get more > concrete detail from ESR about the problems with git-svn. That was > never forthcoming and it's a disappointment. Maybe some of the recent > discussions are in fact related to these issues and I simply missed > that point. I posted this link before: http://esr.ibiblio.org/?p=6778 I can't actually tell you much more than that. Actually, if I understood git-svn's failure modes in enough detail to tell you more I might be less frightened of it. Mostly what I know is that during several other conversions I have stumbled across trails of metadata damage for which use of git-svn seems to have been to blame. Though, admittedly, I'm not certain of that in any individual case; the ways git-svn screws up are not necessarily disinguishable from the aftereffects of cvs2svn conversion damage, or from normal kinds of operator error. Overall, though, defect rates seemed noticeably higher when git-svn had been used as a front end. I learned to flinch when people wanting me to do a full conversion of an SVN repo admitted git-svn had been deployed, even though I was hard-put to explain why I was flinching. > I do think we've gotten some details about the "scar tissue" from the > cvs->svn transition as well as some of our branch problems. It's my > understanding reposurgeon cleans this up significantly whereas Maxim's > scripts don't touch this stuff IIUC. That's correct. And again, no blame to Maxim for this; he took a conventional approach that does as little analysis as it can get away with, which can be a good tradeoff on smaller, cleaner repositories without a CVS back-history. > There's still > work going on, but I'd consider the outstanding issues nits and well > within the scope of what can reasonably still be changing. Issue list here: https://gitlab.com/esr/reposurgeon/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=GCC Presently 6 items including 2 bugs. One of those bugs may already be fixed, we're waiting on Joseph's current conversion to see. Counting time do all the RFEs requested, polishing, and final review I think we're looking at another week, maybe a bit less if things go well. You guys could get a final conversion under your Yule tree. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Test GCC conversion with reposurgeon available
Bernd Schmidt : > I vote for including .cvsignore files. Their absence makes diff comparisons > of "git ls-tree" on specific revisions needlessly noisy. A few minutes ago I implmemted and pushed a --cvsignores read option for Subversion dumps. That should do what you eant. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > I expect the next conversion run, started after that > one finishes, to include both parts of Richard's commit message > improvements, as well as an improvement to commit attribution extraction > from ChangeLog files (to include attributions from ChangeLog. > files, not just plain ChangeLog). There is also a known but minor bug in ChangeLog mining at branch roots. I'm working on that and expect to have a fix shortly. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Jeff Law : > So unless there's something Maxim's scripts are getting right that > aren't by reposurgeon, then reposurgeon is the right choice. It is still possible that the scripts could get things right that reposurgeon doesn't. But the reverse question is also valid. Can Maxim's scripts get everything right that reposurgeon does? If anyone wants to audit for that, my test suite is open source. May the best program win! -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Segher Boessenkool : > There is absolutely no reason to trust a system that supposedly was > already very mature, but that required lots of complex modifications, > and even a complete rewrite in a different language, that even has its > own bug tracker, to work without problems (although we all have *seen* > some of its many problems over the last years), and at the same time > bad-mouthing simple scripts that simply work, and have simple problems. Some factual corrections: I didn't port to Go to fix bugs, I ported for better performance. Python is a wonderful language for prototyping a tool like this, but it's too slow and memory-hungry for use at the GCC conversion's scale. Also doesn't parallelize worth a damn. I very carefully *didn't* bad-mouth Maxim's scripts - in facrt I have said on-list that I think his approach is on the whole pretty intelligent. To anyone who didn't have some of the experiences I have had, even using git-svn to analyze basic blocks would appear reasonable and I don't actually fault Maxim for it. I *did* bad-mouth git-svn - and I will continue to do so until it no longer troubles the world with botched conversions. Relying on it is, in my subject-matter-expert opinion, unacceptably risky. While I don't blame Maxim for not being aware of this, it remains a serious vulnerability in his pipeline. I don't know how it is on your planet, but here on Earth having a bug tracker - and keeping it reasonably clean - is generally considered a sign of responsible maintainership. In conclusion, I'm happy that you're so concerned about bugs in reposurgeon. I am too. You're welcome to file issues and help us improve our already-extensive test suite by shipping us dumps that produce errors. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > * As we're part of the free software community as a whole rather than > something in isolation, choosing to make a general-purpose tool work for > our conversion is somewhat preferable to choosing an ad hoc approach > because it contributes something of value for other repository conversions > by other projects in future. That's not just theory or sentiment. Reposurgeon is the best any-VCS-to-any-VCS converter there is because every time I do a conversion, I learn things, and that knowledge gets incorporated in the code and the documentation around it. Yes, in theory someone else could build a tool as good that incorporates as much domain knowledge. So far, nobody has tried. It's unlikely anyone will, at this point, when they can join my dev team and get the results they want with much less effort by improving reposurgeon or one of its auxiliary tools. Every time that happens, everybody - into the indefinite future - wins. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Jeff Law : > > It may not be my place to say, but...I think the stakes are pretty > > high here. If I were a GCC developer, I think I'd want the best > > possible conversion even if that takes a little longer. > Well, I'm not sure that's entirely true. OK, that's a policy choice the GCC project is going to have to make. I'm just the mechanic here. Joseph Myers has made his choice. He has said repeatedly that he wants to follow through with the reposurgeon conversion, and he's putting his effort behind that by writing tests and even contributing code to reposurgeon. We'll get this done faster if nobody is joggling his elbow. Or mine. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > When we're talking about something to be used > for the next 20 years we should make sure to get it right. Segher and others should note that I'm not in the habit of sinking most of a year of my time into problems that I don't think are extremely important. This conversion *is* that important. > conversions with an ad hoc script need much more thorough, trickier > validation because you don't benefit from knowing the tool has worked for > other conversions). Nor, as far as I am aware, do the scripts have anything resembling reposurgeon's test suite. Segher Boessenkool: > > If the reposurgeon conversion is not ready now, then it is too late > > to be selected. Maxim's conversion pipeline isn't ready either -- there are known bugs with its result. Does that mean it's too late to select Maxim's conversion? If so, what do you propose be done? Please stop bellyaching and pitch in. Whether it's by fixing up Maxim's conversion, helping improve the reposurgeon one, or writing a conversion method of your own - I don't much care and it's not my job to tell you what to do, anyway. Any of those choices might be helpful; sniping from the sidelines is not. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Segher Boessenkool : > > Do people really want to keep tweaking the conversions and postpone the > > git switchover? > > No. It may not be my place to say, but...I think the stakes are pretty high here. If I were a GCC developer, I think I'd want the best possible conversion even if that takes a little longer. jsm28, rearnsha, and my reposurgeon crew are pretty close to a final deliverable now. We know what the remaining issues are, they're not major, and we have a strategy for fixing them. Have a little patience, please. Better yet, come over to #reposurgeon on freenode and help out. Anyone who can run tests on a machine with >128GB RAM would be especially welcome. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Jonathan Wakely : > That's good news and I'm relieved to hear it. Thanks. Defect resolution has sped up noticeably since jsm28 and rearnsha showed up on #reposurgeon and started working directly with my crew. Relax. As Joseph reported, we've got this well in hand now. We might even have a final conversion on the original 16 Dec deadline, though I'm personally guessing it will take a bit longer than that. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Jonathan Wakely : > My concern is that there is no conversion done using reposurgeon that > *can* be used to do correctness checks. We can in fact verify revisions of a GCC conversion in place using repotool compare. Joseph Myers has been using this with reposurgeon's readlimit to run tests. Unfortunately, on a repository this large, it's not practical to run a verification on every single revision. The blocker is the slowness of svn checkout. In practice, you have to sample key revisions, with particular attention to those at and just after known metadata defects. The conversion crew - which now includes Joseph Myers and Richard Earnshaw, in addition to my co-developers Daniel Brooks and Julien Rivaud - is diligently testing as it refines the last bits of the conversion. I believe everybody on the crew is now satisfied that we're converging on a good result. It helps that we now have a detailed characterization of the pathological trunk deletion at r184996; most of the conversion problems radiated from that. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Bernd Schmidt : > On 12/9/19 7:19 PM, Joseph Myers wrote: > > > > For any conversion we're clearly going to need to run various validation > > (comparing properties of the converted repository, such as contents at > > branch tips, with expected values of those properties based on the SVN > > repository) and fix issues shown up by that validation. reposurgeon has > > its own tools for such validation; I also intend to write some validation > > scripts myself. > > Would it be feasible to require that both conversions produce the same > output repository to some degree? Can we just look at release tags and > require that they have the same hash in both conversions, or are there good > reasons why the two would produce different outputs? There are a couple of areas that could produce divergences. One is the part of the history before SVN was adopted. There's a lot of weird junk back there, artifacts from the cvs2svn conversion, that can produce issues like fundamntal uncertainty about where a child branch should actually be rooted on its parent. Reposurgeon makes choices that are a-priori reasonable in cases of doubt, but there are edge cases where a different conversion pipeline could make different ones. Another is how to translate tags. I don't know what Maxim's scripts do, but under reposurgeon a copy commit can have one of two dispositions: (1) Become a lightweight tag (git reference) if the tag comment looks like it was autogenerated and carries no real information. (2) Become a git annotated tag if we want to preserve the tag metadata (comment, date stamp) There's room for a certain amount of artistic license here. Most conversions have few enough disputable cases that the differences between renderings can be reviewed by eyeball. I'm not going to bet that will be true of this one. At the scale of this conversion, any form of comparative auditing is pretty hopeless. You get your assurance, if you get it, from believing the correctness of the conversion tool. Which is a major reason that reposurgeon has a *large* test suite. 98 general operations tests, 55 Subversion test dumps including a rogue's gallery of metadata perversions gathered from pervious conversions, and a cloud of surrounding auxiliary checks. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Joseph Myers : > I think we should fix whatever the remaining relevant bugs are in > reposurgeon and do the conversion with reposurgeon being used to read and > convert the SVN history and do any desired surgical operations on it. On behalf of the reposurgeon crew - Julien Rivaud, Daniel Brooks, and myself - we thank you for that expression of confidence. We'll do our damnedest to deliver rapidly. We welcome oversight and discussion at #reposurgeon on freenode, because we're just the mechanics. You guys have to make the policy decisions. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Richard Biener : > To me, looking from the outside, the talks about reposurgeon doing damage and > a rewrite (in the last minute) would fix it doesn't make a trustworthy > appearance either ;) *shrug* Hard problems are hard. Every time I do a conversion that is at a record size I have to rebuild parts of the analyzer, because the problem domain is seriously gnarly. I'm having to rebuild more than usual this time because the GCC repo is a monster that stresses the analyzer in particularly unusual ways. Reposurgeon has been used for several major conversions, including groff and Emacs. I don't mean to be nasty to Maxim, but I have not yet seen *anybody* who thought they could get the job done with ad-hoc scripts turn out to be correct. Unfortunately, the costs of failure are often well-hidden problems in the converted history that people trip over months and years later. Experience matters at this. So does staying away from tools like git-svn that are known to be bad. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Proposal for the transition timetable for the move to GIT
Maxim Kuvyrkov : > The general conversion workflow is (this really is a poor-man's translator of > one DAG into another): > > 1. Parse SVN history of entire SVN root (svn log -qv file:///svnrepo/) and > build a list of branch points. > 2. From the branch points build a DAG of "basic blocks" of revision history. > Each basic block is a consecutive set of commits where only the last commit > can be a branchpoint. > 3. Walk the DAG and ... > 4. ... use git-svn to individually convert these basic blocks. > 4a. Optionally, post-process git result of basic block conversion using "git > filter-branch" and similar tools. > > Git-svn is used in a limited role, and it does its job very well in this role. Your approach sounds pretty reasonable except for that part. I don't trust git-svn at *all* - I've collided with it too often during past conversions. It has a nasty habit of leaving damage in places that are difficult to audit. I agree that you've made a best possible effort to avod being bitten by using it only for basic blocks. That was clever and the right thing to do, and I *still* don't trust it. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Joseph Myers : > On Thu, 5 Dec 2019, Joseph Myers wrote: > > > On Thu, 5 Dec 2019, Eric S. Raymond wrote: > > > > > Joseph Myers : > > > > I just tried a leading-segment load up to r14877, but it didn't > > > > reproduce > > > > the problems I see with r14877 in a full repository conversion - it > > > > seems > > > > the combination with something later in the history may be necessary to > > > > reproduce the issue. > > > > > > Great :-( > > > > > > Well, there's a bisection-like strategy for finding the minimum > > > leading segment that produces misbehavior. My conversion crew will > > > apply it as hard as we need to to get the job done. > > > > I've now provided a reduced synthetic test (7 commits) for the issue > > observed at r14877, in issue 172. It wouldn't surprise me if a fix for > > this synthetic test fixes both issues 171 and 172 (and it wouldn't > > surprise me if it's fixed in the new SVN dump reader). > > And given the synthetic test I've added to issue 178, I suspect the same > problem is behind at least some of the missing file/directory deletions as > well. Likely, yes. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Joseph Myers : > On Thu, 5 Dec 2019, Eric S. Raymond wrote: > > > Joseph Myers : > > > I just tried a leading-segment load up to r14877, but it didn't reproduce > > > the problems I see with r14877 in a full repository conversion - it seems > > > the combination with something later in the history may be necessary to > > > reproduce the issue. > > > > Great :-( > > > > Well, there's a bisection-like strategy for finding the minimum > > leading segment that produces misbehavior. My conversion crew will > > apply it as hard as we need to to get the job done. > > I've now provided a reduced synthetic test (7 commits) for the issue > observed at r14877, in issue 172. It wouldn't surprise me if a fix for > this synthetic test fixes both issues 171 and 172 (and it wouldn't > surprise me if it's fixed in the new SVN dump reader). If not, I think it soon will be. I expect that little synthetic test to help a lot. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Joseph Myers : > I just tried a leading-segment load up to r14877, but it didn't reproduce > the problems I see with r14877 in a full repository conversion - it seems > the combination with something later in the history may be necessary to > reproduce the issue. Great :-( Well, there's a bisection-like strategy for finding the minimum leading segment that produces misbehavior. My conversion crew will apply it as hard as we need to to get the job done. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Joseph Myers : > I think we currently have the following reposurgeon issues open for cases > where the present code results in incorrect tree contents and we're hoping > the new code will fix that (or make it much easier to find and fix the > bugs). These are the issues that are most critical for being able to use > reposurgeon for the conversion. > > https://gitlab.com/esr/reposurgeon/issues/167 > https://gitlab.com/esr/reposurgeon/issues/171 > https://gitlab.com/esr/reposurgeon/issues/172 > https://gitlab.com/esr/reposurgeon/issues/178 I'm aware these are the real blockers. I was much more worried about the conversion before we figured out that most of the remaining content mismatches seem to radiate out from something weird that happened at r14877. That's early enough that a leading-segment load including it doesn't take forever. Which means it's practical to do detailed forensics on the defect even if you don't have handy an EC12 instance with ridiculo-humongous amonts of RAM. Now I'm pretty certain we can finish this. A matter of when, not if. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Richard Earnshaw (lists) : > Ok, this is one to keep an eye on. There are a number of anomalous commmits > at present, which Eric is working on with a new approach to replaying the > SVN data into reposurgeon. Once that is done we're hoping that this sort of > problem will go away. Best case is it just goes away. Worst case is we'll need to figure out what surgical commands need to be patched into the recipe to deal with the remaining anomalies. I suspect the latter, in particular that we're going to end up needing to do something manually around r14877. Iy might be a trivial tweak to the splice command I commented out. -- http://www.catb.org/~esr/";>Eric S. Raymond
GCC conversion work in progress
Those of you with a direct interest in the conversion might want to watch #reposurgeon on freenode. This is where Daniel Brooks, Julien Rivaud and I are working on it. Here's where the code lives: reposurgeon: https://gitlab.com/esr/reposurgeon The conversion recipe: https://gitlab.com/esr/gcc-conversion In the next few days I expect the remaining problems to move from nechanism to policy choices. At that point, broader review of the recipe and the conversion progress starts to become desirable. -- http://www.catb.org/~esr/";>Eric S. Raymond "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin, Historical Review of Pennsylvania, 1759.
Re: Branch and tag deletions
Joseph Myers : > The avoidance of '.' in branch and tag names is, I'm pretty sure, a legacy > of CVS restrictions on valid names for branches and tags. Those > restrictions are not relevant to git or SVN; if picking any new convention > it seems appropriate for the tag for GCC 10.1 to say "10.1" somewhere in > its name rather than "10_1". That is correct. I recommend mapping tags from using "_" to using ".", they're just plain more readable that way. I have done this in previous conversions. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > Eric, can Richard and I get direct write access to the gcc-conversion > repository? Waiting for merge requests to be merged is getting in the way > of fast iteration on incremental improvements to the conversion machinery, > it should be possible to get multiple incremental improvements a day into > the repository. Sure. I only found one "Richard Earnshaw" and one "Joseph Myers" on Gitlab, so I have given both Developer access. I changed thw branch protection rules so Developers can push. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Segher Boessenkool : > Do we postpone the transition another few months because we have to check > all commits for mistakes the conversion tool made because it tried to be > "smart"? > > Or will we rush in these changes, unnecessary errors and all, because > people have invested time in doing this? > > It is not a decision that can be made late. It is a *design decision*. Besr in mind that the tool is continuing to improve. There are now three people working on it effectively full-time in response to this conversion. We will fix the attribution bug. Compared to dealing with dumpfile malformations that sort of thing is a pretty easy problem once we have a way to reproduce it. At this point my only serious worry is what kinds of contortions we'll need to go through to get around the effects of the GCC/EGCS merge. I'll be concentrating on that once I finish debugging the analyzer rewrite. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > I did a comparison of git and SVN checkouts to look at missing file > problems. I've now filed reposurgeon issues 171 and 172 for the problems > I noted. Issue 171 relates to handling of trunk deletion / recreation. > Issue 172 relates to the first point where missing file problems appear > (unless some appeared and then disappeared in the history before then). > As it's at a very early point in the GCC history (r14877), hopefully it > shouldn't be too hard to track down if your rewrite doesn't fix it, since > it shouldn't require loading much of the history to reproduce. (Roughly, > it's at the start of EGCS, i.e. around the point where we spliced together > the gcc2 and EGCS CVS histories when converting from CVS to SVN. So some > bits of the history around then may well look weird, but I don't see > anything particularly odd about that particular SVN commit.) Thank you, that is very valuable information to have. There is probably some odd artifact at the merge point that confuses my old code. If we are fortunate, the new code won't be confused. The old code was brittle and had failures in weird places because I started on branch analysis and handling of mixed-branch commits too early. The new code essentially replays the dump operations into commits without trying to do branch analysis or mixed-branch resolution, then does those latter things in separate passes. We'll know in a day or two, I think. The rewrite is done; I'm troubleshooting some problems that I *think* are minor but which are blocking merging to HEAD. Once I get the new analyzer passing regressions I'll do a read-limited conversion up to r14900 and see what's up. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > One more observation on that: in my last test conversion, deleting the > emptycommit-* tags took over 7 hours (i.e. the bulk of the time for the > conversion was spent just deleting those tags). Deleting tags matching > /-root$/ took about half an hour. So I think there is a performance issue > somewhere with (some cases of) tag deletion by regexp, at least when the > regexp matches a large number of tags (but some other bulk deletions seem > to run much quicker per tag). Taking a few seconds per tag is fine for an > individual deletion, but a problem when you want to delete 4070 tags at > once. File that as an issue, please. Go has very good profiling tools, finding the hotspot(s) in situations like this is easy and thus we should be able to fix this quickly when it reaches the top of the priority list, -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > > I'm more worried about missing files. I saw a bunch of those on my > > last test. This could be spurious - the elaborate set of branch > > mappings you specified confuses my validation test, because there is > > no longer a 1-1 corresponsence between Subversion and git branches. > > I'm hoping any such missing file problems come from bugs in the old SVN > dump reader with complicated commits mixing copies / deletions / > replacements with copies from other locations and that your rewrite will > fix the semantics in such cases. Also possible. The old code was a hairball. The new code is a bunch of relatively simple sequential passes - 10 so far, final version likely to have 12 or 13 - with well-defined preconditions and exit contracts. If nothing else this is going to make troubleshooting any remaining defects much easier. > All the current gcc-conversion merge requests, both mine and Richard's, > should now be set to allow rebasing. They were, and are all merged now, except for one that Richard just landed. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > My current test conversion run is testing two changes: deleting > emptycommit tags, and using --user-ignores to prefer the .gitignore file > in SVN over one auto-generated from svn:ignore properties. For the next > one after that I'll try eliminating all branch/tag removals that shouldn't > be doing anything, based on the current sets of branches and tags in SVN, > and report bugs if I see anything appearing in the converted repository > that shouldn't be. I'm more worried about missing files. I saw a bunch of those on my last test. This could be spurious - the elaborate set of branch mappings you specified confuses my validation test, because there is no longer a 1-1 corresponsence between Subversion and git branches. The next test I run I'm going to comment out your branch mappings. If I get a validated conversion out of that I think it's all over but the cleanup and policy tinkering. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > On Wed, 27 Nov 2019, Maxim Kuvyrkov wrote: > > > IMO, we should aim to convert complete SVN history frozen at a specific > > point. So that if we don't want to convert some of the branches or tags > > to git, then we should delete them from SVN repository before > > conversion. > > Sure, we could do that. Eric, can you confirm that, with current > reposurgeon, if a branch or tag was deleted in SVN and does not appear in > the final revision of /branches or /tags, it should not appear in the > resulting converted repository, so that any cases where reposurgeon fails > to reflect such a deletion-in-SVN should be reported as a reposurgeon bug? Confirmed. The ontological mismatch between the Subversion and Git data models actually *forces* us to pick a preferred view and discard tags and branches that are not visible from that view. For obvious reasons reposurgeon chooses the view backwards from the end of the history, so it will be the most recent incarnation of each tag and branch that you see. There is an alternative, the --nobranch conversion. This would preserve the entire historical structure, including deleted tags and branches, but the cost is that the conversion doesn't have git tags and branches itself - it's just one big directory history on /refs/heads/master. While this is useful for forensics, it is not a conversion you'd want to use for production. > And that the same applies where a branch or tag was renamed - that only > the new name, not the old one, should appear in the converted repository? Confirmed, see above. > There are quite a few deletions in gcc.lift for tags that do not actually > appear in /tags in the current SVN repository, but I'm not sure how many > are actually relevant with current reposurgeon. Many will not be. The recipe file predates the point at which I came to fully undersrand the ramifications of tag delete/recreate sequences. I haven't cleaned it up yet because chasing down the last few bugs in the analyzer is more important. I'll leave it to you guys to discuss the policy issues. In general I think you can safely throw out branchphoint tagas and emptycommits; reposurgeon only preserves those on the theoretical chance that there might be something interesting in the change comments. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > Thanks. We've accumulated a lot of merge requests on the gcc-conversion > repository, once those are merged I'll test a further change to remove > those tags. I just checked; a rebase button appeared on your MRs and I merged all three, but no rebase option occurs on Richard Earnshaw's reqyests. The GitLab interface seems fickle and arbitrary at times. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > A further note: in a previous run of the conversion I didn't see any > emptycommit-* tags. In my most recent conversion run, I see 4070 such > tags. How do I tell reposurgeon never to create such tags? Or should I > add a tag deletion command for them in gcc.lift, once tag deletion is > working reliably? That's what tag deletion by regexp is for. One of reposurgeon's design rules is "never add a special-purpose switch or flag when an application of the selection-set minilanfuage will do" -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Branch and tag deletions
Joseph Myers : > I'm looking at the sets of branches and tags resulting from a GCC > repository conversion with reposurgeon. > > 1. I see 227 branches (and one tag) with names like > cxx0x-concepts-branch-deleted-r131428-1 (this is out of 780 branches in > total in a conversion of GCC history as of a few days ago). Can we tell > reposurgeon not to create such branches (and tags)? I can't simply do > "branch /-deleted-r/ delete" because that command doesn't take a regular > expression. Those dead branches were supposed to never be visible in the final conversion. They arise when a tag is created, then deleted, then recreated under the same name. The dumpfile operations for the old tag can't simply ignored, as part of its content could get copied forward from before the delete to a branch that remains live. So I recolor them, then have logic to skip generating commits and tags from them. You;re seeing dome leak through those guards, which is a bug. I'm using a different and much simpler strategy in the analyzer rewrite; this bug should be squashed when it lands. > 2. gcc.lift has a series of "tag delete" commands, generally > deleting tags that aren't official GCC releases or prereleases (many of > which were artifacts of how creating such tags was necessary to track > merges in the CVS and older SVN era). But some such commands are > mysteriously failing to work. For example I see > > tag /ZLIB_/ delete > reposurgeon: no tags matching /ZLIB_/ > > but there are tags ZLIB_1_1_3, ZLIB_1_1_4, ZLIB_1_2_1, ZLIB_1_2_3 left > after the conversion. This isn't just an issue with regular expressions; > I also see e.g. > > tag apple/ppc-import-20040330 delete > reposurgeon: no tags matching apple/ppc-import-20040330 > > and again that tag exists after the conversion. I knew there was a problem with those, but I have not diagnosed it yet. I know generally where it has to be and think it will be relatively easy to clean up once I've dealt with the more pressing issues. Please file issues about these bugs so I can track then. On the first one, it would be helpful if you could list some tags that these match expressions fail to pick up from as early as possible. Shortening the leading segment I need to load speeds up my test cycle significantly, -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Split commit naming
Joseph Myers : > My question is: is it a stable interface to reposurgeon that the portions > of such a split commit will always be numbered in lexicographical order by > branch name (or some other such well-defined stable ordering), so I can > write <80870.2> in gcc.lift and know that some reposurgeon change won't > accidentally make that refer to the portion of the commit on > gcc-3_3-branch instead? Your timing is fortuitous, as I just finished rewriting the code for mixed-commit handling and it is fresh in my mind. The old behavior was indeed that cliques were lexicographically ordered by branch. This was not documented. The master branch still uses the old code. Current behavior on my development branch is that fileops are not sorted before splitting; you get whatever order they had in the dump. I will change this so they are sorted by pathname and document that. And...it's done. You won't see the new code for a few days, until I finish the analyzer rewrite. The old code had become overgrown and brittle; I spent a week trying to find a strategy to get around a particular pathological-tag defect only to discover that I could no longer modify the analyzer without cascade bugs. I'll describe the problem, since I think the GCC repository has some of these and they may explain some of your earlier bug reports. Suppose you create a tag, then later on modify the tag copy by copying to one of its subdirectories. When translating to git you want to attach the tag reference to the revision the *second* copy came from. Simple in concept but the obvious implementation of root-finding prefers the earliest copy. When it proved impossible to change this wthout producing a cascade of breakage, I faced up to the necessity of a scrap-and-rebuild. It's not done yet, but it's pretty well advanced. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Richard Earnshaw (lists) : > > But then I get errors: > > > > *** Unknown syntax: relax > > > > Change that to > > set relax Oops. He's right. It used to be a command, but that changed recently as art of a redesign of log levels and options. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Joseph Myers : > I see the changelogs issue is fixed (I can run a conversion past that > point on a system with 128GB memory, with mergeinfo processing being very > slow as described by Richard). But then I get errors: > > *** Unknown syntax: relax Missing "relax" command probably means your reposurgeon is very old. What does "version" say? -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Richard Earnshaw (lists) : > Nope, that was from running the go version from yesterday. This one, to > be precise: 1ab3c514c6cd5e1a5d6b68a8224df299751ca637 > > This pass used to be very fast a couple of weeks back, but something > went in recently that's caused a major slowdown. > > Oh, and I've been having problems with the ChangeLogs command as well. > It used to run fine on my machine (128G), but now it's started blowing > memory and taking my X server down. That sucks. Those were stretches of code the two guys working with me have been trying to speed up. Looks like that backfired. Please file isses at https://gitlab.com/esr/reposurgeon/issues and include timing reports if you can. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
3,278016-278017,278019-278028,278032-278035,278038,278041,278044-278049,278051,278053-278058,278062,278064-278066,278068-278070,278074-278091,278093-278096,278098-278107,278111-278129,278131-278142,278144-278153,278156-278157,278159,278179,278184-278185,278189-278196,278199-278200 > > and in the conversion we get about 35 links back to different revisions > in trunk. > > I don't know if the SHA codes are stable, but in my conversion, done > last night, it comes out at 44b84e63a8b00b9881fbb93d3af1536c2338aa72 > > There's another example at r20 on the same branch, which has even > more links. > > R. File an issue here, please. https://gitlab.com/esr/reposurgeon/issues -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Jason Merrill : > Well, I was thinking of also giving some clue of what the commit was > about. One possibly cut-off line accomplishes that, a simple revision > number not so much. It's conventional under Git to have comments lead with a summary sentence. I think you're going to find that the value of Subversion revision references fades pretty fast after the conversion. That has been my experience with other conversions. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Richard Earnshaw (lists) : > I was looking at the reposurgeon code last night, and I think I can see what > the problem *might* be, but I haven't had time to produce a testcase. > > Some of our commits have mergeinfo that looks a bit like this: > > 202022-202023,202026,202028-202029,202036,202039-202041,202043-202044,202048-202049,202051-202056,202058-202061,202064-202065,202068-202071,202077,202079-202082,202084,202086-202088,202092-202104,202106-202113,202115-202119,202121,202124-202134,202139,202142-202146,202148-202150,202153-202154,202158-202159,202163-202165,202168,202172,202174,202179-202180,202184-202192,202195,202197,202202-202208,202225-202230,202232-202233,202237-202239,202242,202244-202245,202247,202250-202251,202258-202264,202266,202269,202271-202275,202279,202281-202282,202284,202286,202289-202292,202296-202299,202301-202302,202305,202309,202311-202323,202327-202335,202337,202339,202343-202346,202350,202352,202356-202357,202359-202360,202363-202371,202373-202374,202377,202379-202382,202384,202389,202391-202395,202398-202407,202409,202411,202416-202418,202421 > > which is a massive long list with a number of holes in it. > > But I suspect the holes are really commits to other branches and that in the > above describes a linear chain along one branch. If so, rather than > producing links to each subgroup (and perhaps dropping single non-list > elements, the description can be mapped back to a contiguous sequence of > commits down a branch and thus should really resolve to a single child being > used for the merge source. At present, I think for the above we're seeing a > child reference created for each subrange in that list. I have no doubt you are correct. Detecting such interrupted ranges ia foing to be... interesting. > Incidentally, the mergeinfo pass on the gcc repo is currently taking about 8 > hours on my machine, that's 80-90% of the entire conversion time. But it > might be related to the above. You must be running the old Python code, there was on O(n**2) in that phase that has since been fixed. Try the Go code from https://gitlab.com/esr/reposurgeon; it is *much* faster. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Joseph Myers : > I think the main thing to make sure of in the conversion regarding that > issue is that cherry-picks do *not* turn into merge commits I confirm that this is how it now works. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Richard Earnshaw (lists) : > Well a lot of that is a property of the conversion tool. git svn does a > relatively poor job of anything other than straight history (I believe it > just ignores the non-linear information. Yes, svn-git does a *terrible* job on anything other than linear history. That is a major reason I'm busting my hump to get the conversion done. It would be very sad if you guys fell into using that. It does a tolerable job of live gatewaying on simple histories, but read this: http://esr.ibiblio.org/?p=6778 > I don't believe any tool can > recreate information for cherry-picking unless it's recorded in the SVN > meta-data. Eric would be better placed to comment here. You are correct, there is nothing practical that can be done in the absence of svn:mergeinfo and svnmerge-integrated properties. > My own observation is that when the SVN commits have merge meta-data, > reposurgeon will pick this up and create links across to the relevant > branches. It does, however seem to create far more links than a traditional > git merge would do, especially when a sequence of commits are referenced. I > don't know if that's essentially unfixable, or if it's something Eric > intends to work on; but I've seen some cases where there are dozens of links > back to a simple sequence of svn commits and where, I suspect, a single link > back to the most recent of that sequence would be all that's really wanted. First I have heard of this. The intent of the present mergeinfo handing is that it looks for mergeinfo declarations that are topologically equivalent to branch merges (that is, they merge all revisions on a source branch rather than cherry-picking isolated revisions) and rendering those as gitspace merge links. There is no attempt to create links corresponding to Subversion cherry picks, as this does not fit the Git DAG model. I have cases that demonstrate this feature working in my test suite, but they are relatively small and artificial. I would not describe my mergeinfo handling as well-tested compared to the rest of the analyzer, and I can thus easily believe your bug report. What I need to troubleshoot this is a test case that is not trivial but of a manageable size - over a couple hundred commits the volume of diagnostics just overwhelms a Mark One Eyeball. Many of my test cases were trimmed to that size by doing stripping and topological reduction on real repositories; I have a tool for this. Do you have a real repository in mind I can start with? The whole gcc history is too huge, but if you were able to tell me that the bug is exhibited within a few thousand commits of origin and point at where, that I could work with. An issue filed on the reposurgeon tracker would be appreciated. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Richard Earnshaw (lists) : > Which makes me wonder if, given a commit log of the form: > > > 2019-10-30 Richard Biener > > PR tree-optimization/92275 > * tree-vect-loop-manip.c (slpeel_update_phi_nodes_for_loops): > Copy all loop-closed PHIs. > > * gcc.dg/torture/pr92275.c: New testcase. > > Where the first line is a ChangeLog style date and author, we could spot the > PR line below that and hoist it up as a more useful summary (perhaps by > copying it rather than moving it). > > It wouldn't fix all commits, but even just doing this for those that have > PRs would be a help. Speaking from lots of experience with converting old repositories that exhibited similar comment conventions, I would be nervous about trying to do this entirely mechanically. I think the risk of mangling text that is not fornatted as you expect - and not noticing that until the friction cost of fixing it has escalated - is rather high. On the other hand, reposurgeon allows a semi-neechanized attack on the problem that I think would work well, because I've done similar things in ither coversions. There's a pair of commands that allow you to (a) extract comments from a range of commits into a message list that looks like an RFC822 mailbox file, (b) modify those comments, and (c) weave the the message list reliably back into the repository. If it were me doing this job, I'd write a reposurgeon command that extracts all the comments containing PR strings into a message box Then I'd write an Emacs macro that moves to the next nessage and hoists its PR line. Then I'd walk through the comments applying the macro and keeping an eye on them for cases where what the macro doesn't do quite the right thing and using undo and hand-editing to recover. Human eyes are very good at spotting anomalies in an expected flow of textm and once you've gotten into the rhythm of a task like this is is easily possible to filter approximately a message per second. In round numbers, providing the anomaly rate isn't high, that's upwards of 3000 messages per hour. The point is that for this kind of task a hnman being who undertands what he's reading is likely to have a lower rate of mangling errors than a program that doesn't. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Fixing cvs2svn branchpoints
Joseph Myers : > Which mid-branch deletes? For the ones by accident (e.g. the deletions of > trunk), where the branch was recreated by copying from the pre-deletion > version of the same branch, nuking the deletes is clearly right. For the > ones where a branch was deleted then recreated as a copy not from the > deleted version - essentially, rebasing done in SVN - maybe we need > community discussion of the right approach. (There are two plausible > approaches there - either just discard all the deleted versions that > aren't part of the SVN history of the most recent creation of the branch, > which makes the list of commits in the branch's history in git look > similar to what it looks like in SVN, or treat deletion + recreation in > that case as some kind of merge.) To get content right, reposurgeon has to run through all nodes looking for branches with more than one creation. For each such clique, it has to change all instances but the last so that the branch has a unique nonce name, then run forward and patch all copy references to the each branch to use the nonce name. Only the last branch in each clique will be visible (and not renamed) in the git conversion. But the earlier branches can't simply be nuked, as they might be (and typically are) referenced by branch copies done before the final branch in the clique was created. This might sound like it will get the special case of a trunk delete/recreate wrong. But when git imports a stream it does its own branch recoloring based on tip resets and parent-child-relationships; we can expect trunk to be (effectively) re-colored back to the root commit. (This whole mess around branch re-creation is something other conversion tools don't even try to get right.) The other case - where you delete a target branch and copy a different source branch over it - is simpler. Because branch names in the git conversion are controlled by the SVN repository pathname (root becomes master, branches/foo becomes branch foo, etc), this looks exactly like an ordinary modification of the target branch. Presently, the fact of the copy is not recorded in the DAG. I could express it as a git merge link; that wouldn't be difficult. > > Also please open reposurgeon issues about the svnmerge properties > > As I understand it, support for that has now been implemented. It has, yes. > > and the missing documentation. > > https://gitlab.com/esr/reposurgeon/issues/151 filed - it's a lot more than > just reparent for which documentation appears to have disappeared. A large chunk of the section on surgical comands vanished, probably due to a finger error wgile I was editing the translation. I have restored it. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Commit messages and the move to git
Jeff Law : > On 11/4/19 3:29 AM, Richard Earnshaw (lists) wrote: > > With the move to git fairly imminent now it would be nice if we could > > agree on a more git-friendly style of commit messages; and, ideally, > > start using them now so that the converted repository can benefit from > > this. > > > > Some tools, particularly gitk or git log --oneline, can use one-line > > summaries from a commit's log message when listing commits. It would be > > nice if we could start adopting a style that is compatible with this, so > > that in future commits are summarized in a useful way. Unfortunately, > > some of our existing commits show no useful information with tools like > > this. > I'd suggest we sync policy with glibc. They're further along on the > ChangeLog issues. Whatever they do in this space we should follow -- > aren't we going to be using some of their hooks/scripts? Note that my reposurgeon conversion recipe runs gitify on the repository. >From the documentation: Attempt to massage comments into a git-friendly form with a blank separator line after a summary line. This code assumes it can insert a blank line if the first line of the comment ends with '.', ',', ':', ';', '?', or '!'. If the separator line is already present, the comment won't be touched. Takes a selection set, defaulting to all commits and tags. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Fixing cvs2svn branchpoints
Joseph Myers : > And here are corresponding lists of tags where the commit cvs2svn > generated for the tag should be reparented. Make that issue 2, please. Also, open an issue 3 about how you want those mid-branch deletes handled. I agree that the right thing is just to nuke them, but I have a lot of plates in the air right now... Also please open reposurgeon issues about the svnmerge properties and the missing documentation. I might get to the svnmerge thing today, it should be a trivial tweak. The repository comparison is still grinding. It has turned up some content mismatches, fewer than last time, most in trunk/libgo. The reason for the "fewer" is that the Go version has learned how to correctly handle a corner case the Python did not - tag/branch delete followed by a recreation at a different root point. That's why this is commented out in the lift script: # Squash accidental trunk deletion and recreation. # Should no longer be needed due to branch recoloring. #<130803.1>,<138077>,<184996.1> squash I used to have to find defects like that by hand and patch them. Now there's a recoloring phase where branches and tags with multiple creations are handled by renaming all but the last such branch in each clique to a unique nonce name. This makes all the results from branch copies come out right, and none of the nonce names are ever visible in the final conversion. I'll go dive into the defect analysis now. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Fixing cvs2svn branchpoints
Joseph Myers : > Here are complete lists of reparentings I think should be done on the > commits that start branches, along with my notes on branches with messy > initial commits but where I don't think any reparenting should be done. > The REPARENT: lines have the meaning I described in > <https://gcc.gnu.org/ml/gcc/2019-10/msg00127.html>. Please leave this as an issue on the gcc-conversion bugtracker. Your timing is interesting. Happens I got my first full conversion with the Go port of reposurgeon earlier today. I'm trying to verify the conversion against the Subversion repository, but a full checkout filled a filesystem on the EC2 instance I'm using. Recovery is underway. I'll do real benchmarks when I'm not staring at a deadline, but the Go port is at least 20x faster than the Python was. That makes the conversion practical, though it turns out the 128GB on my desktop machine isn't enough to support it - hence the EC2 instance. The first full conversion took eight hours. Turns out the single most computationally expensive part of the surgery is data-mining ChangeLog files for commit attributions. Today I threw massive parallelism at the problem, that being something far easier to do in Go than in Python - I think that might cut as much as two hours from the next run. By going to the cloud I've gotten a larger working-set capacity at the cost of some memory-access speed. Didn't want to do that, but your repo is just too damn big for it to be otherwise, unless somebody wants to drop cash on me to double the RAM in the Great Beast. Your pile of requests is tricky but should be doable. You had previously written: >There are also cases where cvs2svn found a good branchpoint, but >represented the branch-creation commit in a superfluously complicated >way, replacing lots of files and subdirectories by copies of different >revisions. Yes, reposurgeon has logic to detect and deal with this automatically. The assumption it makes is that the branch should root to the most recent revision that CVS did a copy from. This is simple and seems to give satisfactory results. Which reminds me. I found a bunch of "svnmerge-integrated" properites in the history. Should I treat those as though they were mergeinfo properies and make branch merges from them? -- http://www.catb.org/~esr/";>Eric S. Raymond
Go reposrgeon is production ready
Today I retired the original Python version of the reposurgeon code. I plan to spend the next couple of days fixing minor bugs that I was deferring until the Go port was finished. Then I'll dive back into the gcc conversion. Barring an emergency on the NTPsec project, I should be able to concentrate on the conversion until it's done. -- http://www.catb.org/~esr/";>Eric S. Raymond Our society won't be truly free until "None of the Above" is always an option.
Re: Reposurgeon status
Joseph Myers : > On Thu, 26 Sep 2019, Eric S. Raymond wrote: > > > > You might want to update the state of reposurgeon on that page. > > > > I will do so. > > Note that once you've created an account, someone will need to add it to > the EditorGroup page before you can edit. I'm having trouble with basic account creation, actually. It's to all appearances not accepting the password I set up. I have sent a reset request. -- http://www.catb.org/~esr/";>Eric S. Raymond
Re: Reposurgeon status
Jeff Law : > Probably the most important thing to know is the project will make a > decision on Dec 16 to choose the conversion tool. The evaluation is > based on the state of tool's conversion on that date. More details: > > > https://gcc.gnu.org/wiki/GitConversion > > You should consider the dates in there as firm. I think it is extremely likely that I will have a final conversion ready by then. The only known problem that is in any way serious is the x-bit propagation bug, and I may already have fixed that. I'd think I'd have to get blindsided by something much larger to miss that deadline. > You might want to update the state of reposurgeon on that page. I will do so. -- http://www.catb.org/~esr/";>Eric S. Raymond