Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien "FrnchFrgg" Rivaud

Le 30/12/2019 à 01:18, Joseph Myers a écrit :


Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example.  The
history ends up containing two different versions of SVN r5 and of many
other commits.


When trying to migrate Blender from svn to git, we actually tried 
git-svn first, and it produced that kind of strangeness. Sometimes, 
something it didn't like in a commit made it duplicate or even multiply 
more the whole history predating that commit, with slight differences 
(that explain the differing sha1 and thus the multiple versions).


That's actually the reason I got involved with reposurgeon in the first 
place, trying to make the then Python version able to cope with the 
Blender repository with less than 64GB of ram.


I thought that working around git-svn to only feed it linear commits 
would sidestep that bug, but it looks like it still can be triggered.


(At the time the bug was so common that we ended with maybe 20 or 30 
times the first 1500 commits in the repository, and of course with the 
speed of git-svn, doing 30 times the same work is horrendous)


Julien



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien "FrnchFrgg" Rivaud

Le 29/12/2019 à 18:30, Ian Lance Taylor via gcc a écrit :

On Sun, Dec 29, 2019 at 5:49 AM Julien '_FrnchFrgg_' RIVAUD
 wrote:


Which brings me to something I find strange in your policy: to me,
merges from trunk to branches should be rare if not nonexistent. And you
are deciding to banish merges the other way around.


Out of curiosity, why do you say that merges from trunk to branches
should be rare?  It seems to me that any long-lived development branch
will require merges from trunk to the branch.  Are you saying that
those kinds of branches are rare?


Because in most cases, the development branch should be periodically 
rebased on top of master, not use a merge from master to the branch.


Maybe that's easier to do while developping, but in the end a real 
rebase should be made (dropping the merge commits), because what you 
will send to the ML for review should be a logical stream of changes and 
"update" merge commits are not that.


Thankfully, if you have git rerere enabled, most conflict resolutions 
you did while merging will be reused when rebasing so it should not be 
too painful.




In GCC we have historically had a pattern in which people use
long-lived parallel branches that maintain specific patches on top of
GCC trunk.  These branches provide a simple way to get a variant of
GCC with specific patches of interest to some people.  These branches
too require regular merges from trunk.


In that case, sure. But I expect these branches to never be merged in 
trunk. So the real rule would be « branches that merge from trunk should 
not be merged into trunk » (rather than « forbid merges into trunk » or 
even « pretend nobody ever merged anything into trunk, these aren't the 
droids you are looking for »)




Ian






Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Richard Earnshaw (lists) wrote:

> gcc-reparent is better, but many (most?) of the release tags are shown
> as merge commits with a fake parent back to the gcc-3 branch point,
> which is certainly not what happened when the tagging was done at that
> time.

And looking at the history of gcc-reparent as part of preparing to compare 
authors to identify commits needing manual attention to author 
identification, I see other oddities.

Do "git log egcs_1_1_2_prerelease_2" in gcc-reparent, for example.  The 
history ends up containing two different versions of SVN r5 and of many 
other commits.  One of them looks normal:

commit c01d37f1690de9ea83b341780fad458f506b80c7
Author: Charles Hannum 
Date:   Mon Nov 27 21:22:14 1989 +

entered into RCS


git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 
138bc75d-0d04-0410-961f-82ee72b054a4

The other looks strange:

commit 09c5a0fa5ed76e58cc67f3d72bf397277fdd
Author: Charles Hannum 
Date:   Mon Nov 27 21:22:14 1989 +

entered into RCS


git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@5 
138bc75d-0d04-0410-961f-82ee72b054a4
Updated tag 'egcs_1_1_2_prerelease_2@279090' (was bc80be265a0)
Updated tag 'egcs_1_1_2_prerelease_2@279154' (was f7cee65b219)
Updated tag 'egcs_1_1_2_prerelease_2@279213' (was 74dcba9b414)
Updated tag 'egcs_1_1_2_prerelease_2@279270' (was 7e63c9b344d)
Updated tag 'egcs_1_1_2_prerelease_2@279336' (was 47894371e3c)
Updated tag 'egcs_1_1_2_prerelease_2@279392' (was 3c3f6932316)
Updated tag 'egcs_1_1_2_prerelease_2@279402' (was 29d9998f523b)

(and in fact it seems there are *four* commits corresponding to SVN r5 and 
reachable from refs in the gcc-reparent repository).  So we don't just 
have stray merge commits, they actually end up leading back to strange 
alternative versions of history (which I think is clearly worse than 
conservatively not having a merge commit in some case where a commit might 
or might not be unambiguously a merge - if a merge was missed on an active 
branch, the branch maintainer can easily correct that afterwards with "git 
merge -s ours" to avoid problems with future merges).

My expectation is that there are only multiple git commits corresponding 
to an SVN commit when the SVN commit touched more than one SVN branch or 
tag and so has to be split to represent it in git (there are about 1500 
such SVN commits, most of them automatic datestamp updates in the CVS era 
that cvs2svn turned into mixed-branch commits).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Jeff Law
On Sun, 2019-12-29 at 22:30 +0100, Thomas Koenig wrote:
> Am 29.12.19 um 14:26 schrieb Segher Boessenkool:
> > We cannot waste a year on a social experiment.  We can slowly and carefully
> > adopt new procedures, certainly.  But anything drastic isn't advisable imo.
> > 
> > Also, many GCC developers aren't familiar with Git at all.  It takes time
> > to learn it, and to learn new ways of working.  Small steps are needed.
> 
> Amen to that.
> 
> My uses of git have can be counted in a single digit (in decimal).  I am
> just hoping you guys know what you are doing, and I am a bit
> apprehensive about the change and my continued ability to contribute.
> 
> Talk of a radical new development model does not raise my confidence.
I was fairly anti GIT for a while, but there's simplistic workflows you
can use which will be close enough to SVN that you're really just
changing the commands you're using, not your entire workflow.

You can add in "git specific" workflows later as you've become familiar
with the basics.  That's what I did, and boy once you wrap your head
around git rebase for dealing with work in progress it's a game
changer.

jeff



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 11:00:08PM +, Joseph Myers wrote:
> fixups in bugdb.py - and that way benefit both from reposurgeon making 
> choices that are as conservatively safe as possible, which seems a 
> desirable property for problem cases that haven't been manually reviewed, 

Problem cases that haven't been manually reviewed should *be* manually
reviewed, or the heuristics improved so there are fewer problem cases.

As I've said many many times now, we only have *one* repository to
convert here.  Taking shortcuts is *good*, making problems for ourselves
by pretending we do things more generically is *bad*.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Eric S. Raymond wrote:

> Joseph Myers :
> > The case you mention is one where there was a merge to a branch not from 
> > its immediate parent but from an indirect parent.  I don't think it would 
> > be hard to support detecting such merges in reposurgeon.
> 
> We're working on it.

And the other example branch mentioned (redhat/gcc-9-branch) is a 
different case: if the merge from gcc-9-branch to redhat/gcc-9-branch had 
been done in the idiomatic way with modern SVN (i.e. naming the branch to 
merge from and letting SVN deal with identifying the revisions involved), 
I think reposurgeon would have handled it just fine.  But the commit 
messages indicate the merge was done in an old-fashioned way (naming 
individual ranges of revisions to merge manually), which resulted in merge 
properties very slightly different from what SVN creates automatically.  
Now I understand what the difference is I expect we'll be able to fix that 
case as well.

> As Joseph says, one of reposurgeon's design principles is "First, do no harm."
> 
> And yes, changelogs are full of malformations and junk like this. I
> saw and dealt with a lifetime's worth while converting the Emacs
> history from bzr to git.
> 
> If you try to interpret any random garbage in, you will assuredly
> get garbage out when you least expect it. Often the cost of this 
> sort of mistake is not fully realized until it is far too late
> for correction.  This is *why* reposurgeon is conservative.
> 
> The correct thing for reposurgeon to do is flag unparseable entry
> headers for human intervention, and as of today it does that.

Furthermore, we can compare authors in the different conversions to 
identify cases where, based on a manual review, Maxim's heuristics produce 
better results for a particular commit, and add those to the list of 
fixups in bugdb.py - and that way benefit both from reposurgeon making 
choices that are as conservatively safe as possible, which seems a 
desirable property for problem cases that haven't been manually reviewed, 
and from different heuristics helping suggest improvements in particular 
cases.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Eric S. Raymond
Richard Earnshaw (lists) :
> Weak in the sense that it isn't proof given that the user name is
> partially redacted.  There's nothing in the gcc archives that gives a
> full name either, unfortunately.
> 
> Yes, it's the most likely match, but there's still an element of doubt.
> 
> R.

https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60

If you open his message to Michel Peppler, you'll see a sig block that
says:

 bjo...@planetarion.com  Bjørn Wennberg, Fifth Season AS

It's him, yep.  Be sure to get the ø right what you fill in the name. :-)
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Joseph Myers wrote:

> I've now made those changes to the checked-in list so it's pure UTF-8, and 
> thus easier to review and edit.  We still need to implement code in 
> bugdb.py to use that list to pick the preferred form from each list of 
> variants (and people may wish to change the preferred forms in some 
> cases).

I've now implemented that code in bugdb.py.

Given those fixes, I'm planning to compare author names from the 
reposurgeon conversion and Maxim's conversion, as I think cases where they 
find different authors (not just different email addresses) are good cases 
for manual review (we already have various such manual author fixups for 
individual commits in bugdb.py).  In fact that manual review may show up 
*other* commits that should be reattributed.  One example Maxim gave of a 
missing author was Aymeric Vincent.  That was a commit on 
premerge-fsf-branch where the reposurgeon heuristic "don't use 
attributions from ChangeLog for a ChangeLog-only commit" applied.  But 
whether or not the commit just adding the ChangeLog entry should be 
reattributed to the person named in that ChangeLog entry, the real changes 
that ChangeLog entry relates to are two previous commits (each file 
committed separately), so it shows up that those two previous commits 
ought to be reattributed.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Eric S. Raymond
Joseph Myers :
> The case you mention is one where there was a merge to a branch not from 
> its immediate parent but from an indirect parent.  I don't think it would 
> be hard to support detecting such merges in reposurgeon.

We're working on it.

> This is an example where the originally added ChangeLog entry was 
> malformed (had the date in the form "2004-0630"), so a conservatively safe 
> approach was taken of using the committer rather than trying to guess what 
> a malformed ChangeLog entry means and risk extracting nonsense.
> 
> I expect other cases are being similarly careful in cases where there was 
> a malformed ChangeLog entry or a commit edited ChangeLog entries by other 
> authors so leaving its single-author nature ambiguous.  Parsing 
> ChangeLogs, especially where malformed entries are involved, is inherently 
> a heuristic matter.

As Joseph says, one of reposurgeon's design principles is "First, do no harm."

And yes, changelogs are full of malformations and junk like this. I
saw and dealt with a lifetime's worth while converting the Emacs
history from bzr to git.

If you try to interpret any random garbage in, you will assuredly
get garbage out when you least expect it. Often the cost of this 
sort of mistake is not fully realized until it is far too late
for correction.  This is *why* reposurgeon is conservative.

The correct thing for reposurgeon to do is flag unparseable entry
headers for human intervention, and as of today it does that.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Mark Wielaard wrote:

> Maybe we should have a separate historical git repo which contains
> everything that we were able to salvage and that people could git
> remote add if they are really, really interested.

I'm not convinced that's very different from having one repo with 
everything but some pieces in refs that aren't fetched by default.  Maybe 
separate repos make fetching a bit more efficient if it allows packs to be 
reused on the server, but they also mean extra administrative overhead 
ensuring the correct configuration for each repo (for public access, not 
allowing pushes to the historical repo, etc.).

-- 
Joseph S. Myers
jos...@codesourcery.com


gcc-10-20191229 is now available

2019-12-29 Thread gccadmin
Snapshot gcc-10-20191229 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20191229/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 279757

You'll find:

 gcc-10-20191229.tar.xz   Complete GCC

  SHA256=fd484a5502441537ab4298a4d7c24e884fd46c70bfbc1c2cf769d6d48386cea2
  SHA1=0e3c685ae609bec4c722a9ad531b7105ce7ba618

Diffs from 10-20191222 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: The far past of GCC

2019-12-29 Thread Joseph Myers
On Sat, 28 Dec 2019, Jeff Law wrote:

> I believe RCS was initially used circa 1992 on the FSF machine which
> held the canonical GCC sources.  But I'm not aware of anyone still
> having a copy of the old RCS ,v files.

See ftp://gcc.gnu.org/pub/gcc/old-releases/old-cvs/ for the old repository 
(that started out as a set of RCS ,v files).  (Or rsync the GCC CVS 
repository from sourceware, and the old-gcc subdirectory is a copy of that 
repository.)

The key issue with integrating tarballs into a git repository is that many 
files (in particular documentation and ChangeLogs) were, for a long time, 
not version-controlled in gcc2.  So you have the version of the history in 
SVN (detailed at the revision level but not covering all files) and the 
version from tarballs / diffs (having all files but not detailed at the 
revision level).

Trying to integrate tarballs into the middle of the sequence not covering 
all files leads either the documentation files appearing or disappearing, 
or to intermediate revisions having non-matching versions of those files.  
To avoid that, I think a natural representation in git might be: we have 
master, with the history as it is in SVN, and a separate branch whose tree 
contents and first-parent ancestry come from the tarballs (from 0.9 
through to 2.7.2.3), leading back to an orphan commit for gcc-0.9.  Then, 
releases in that sequence that we can identify a corresponding master 
commit for can have the commit adding tarball contents set up as a merge 
commit, with the second parent being the corresponding master commit.  
(Actually there might be more than one such branch, reflecting the time 
GCC 1 releases were maintained while GCC 2 development was underway.)

A key feature of doing things like that is that it does *not* need to be 
done at the same time as the main git conversion, because the tarballs 
don't become part of the git ancestry of any commit now in SVN; their 
contents (and corresponding release tags) can be added to git later once 
ready.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Richard Earnshaw (lists)
On 29/12/2019 22:24, Eric S. Raymond wrote:
> Richard Earnshaw (lists) :
>> Also, for this one:
>>
>> #  "47044": "",
>>
>> There's some (relatively weak) evidence that this is Bjørn Wennberg (eg
>> https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60J),
>> but in the absence of stronger evidence, I'm going to just put bjornw as
>> the name.
> 
> What's weak about that?  The full email address matches.  Un;rdd you
> think there are two hackers nameed Bjorn, with a last initial of W,
> running around using the same email address, I think we have a winner.
> 

Weak in the sense that it isn't proof given that the user name is
partially redacted.  There's nothing in the gcc archives that gives a
full name either, unfortunately.

Yes, it's the most likely match, but there's still an element of doubt.

R.


Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Eric S. Raymond
Richard Earnshaw (lists) :
> Also, for this one:
> 
> #  "47044": "",
> 
> There's some (relatively weak) evidence that this is Bjørn Wennberg (eg
> https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60J),
> but in the absence of stronger evidence, I'm going to just put bjornw as
> the name.

What's weak about that?  The full email address matches.  Un;rdd you
think there are two hackers nameed Bjorn, with a last initial of W,
running around using the same email address, I think we have a winner.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Richard Earnshaw (lists)
On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
> Below are several more issues I found in reposurgeon-6a conversion comparing 
> it against gcc-reparent conversion.
> 
> I am sure, these and whatever other problems I may find in the reposurgeon 
> conversion can be fixed in time.  However, I don't see why should bother.  My 
> conversion has been available since summer 2019, I made it ready in time for 
> GCC Cauldron 2019, and it didn't change in any significant way since then.
> 
> With the "Missed merges" problem (see below) I don't see how reposurgeon 
> conversion can be considered "ready".  Also, I expected a diligent developer 
> to compare new conversion (aka reposurgeon's) against existing conversion 
> (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" 
> or even "ready".  The data I'm seeing in differences between my and 
> reposurgeon conversions shows that gcc-reparent conversion is /better/.
> 
> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
> conversion.  I welcome Richard E. to modify his summary scripts to work with 
> svn-git scripts, which should be straightforward, and I'm ready to help.
> 

I don't think either of these conversions are any more ready to use than
the reposurgeon one, possibly less so.  In fact, there are still some
major issues to resolve first before they can be considered.

gcc-pretty has completely wrong parent information for the gcc-3 era
release tags, showing the tags as being made directly from trunk with
massive deltas representing the roll-up of all the commits that were
made on the gcc-3 release branch.

gcc-reparent is better, but many (most?) of the release tags are shown
as merge commits with a fake parent back to the gcc-3 branch point,
which is certainly not what happened when the tagging was done at that
time.

Both of these factually misrepresent the history at the time of the
release tag being made.

As for converting my script to work with your tools, I'm afraid I don't
have time to work on that right now.  I'm still bogged down validating
the incorrect bug ids that the script has identified for some commits.
I'm making good progress (we're down to 160 unreviewed commits now), but
it is still going to take what time I have over the next week to
complete that task.

Furthermore, there is no documentation on how your conversion scripts
work, so it is not possible for me to test any work I might do in order
to validate such changes.  Not being able to run the script locally to
test change would be a non-starter.

You are welcome, of course, to clone the script I have and attempt to
modify it yourself, it's reasonably well documented.  The sources can be
found in esr's gcc-conversion repository here:
https://gitlab.com/esr/gcc-conversion.git


> Meanwhile, I'm going to add additional root commits to my gcc-reparent 
> conversion to bring in "missing" branches (the ones, which don't share 
> history with trunk@1) and restart daily updates of gcc-reparent conversion.
> 
> Finally, with the comparison data I have, I consider statements about 
> git-svn's poor quality to be very misleading.  Git-svn may have had serious 
> bugs years ago when Eric R. evaluated it and started his work on reposurgeon. 
>  But a lot of development has happened and many problems have been fixed 
> since them.  At the moment it is reposurgeon that is producing conversions 
> with obscure mistakes in repository metadata.
> 
> 
> === Missed merges ===
> 
> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked 
> ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane 
> merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
> 
> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
> 
> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
> Author: Richard Earnshaw 
> Date:   Mon Jul 20 08:15:51 2009 +
> 
> Merge trunk through to r149768
> 
> Legacy-ID: 149804
> 
>  COPYING.RUNTIME |73 +
>  ChangeLog   |   270 +-
>  MAINTAINERS |19 +-
> 
> 
> 
> at the same time for svn-git scripts we have:
> 
> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
> 
> commit ce7d5c8df673a7a561c29f095869f20567a7c598
> Merge: 4970119c20da 3a69b1e566a7
> Author: Richard Earnshaw 
> Date:   Mon Jul 20 08:15:51 2009 +
> 
> Merge trunk through to r149768
> 
> git-svn-id: 
> https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 
> 138bc75d-0d04-0410-961f-82ee72b054a4
> 
> 
> ... which agrees with
> $ svn propget svn:mergeinfo 
> file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
> /trunk:142588-149768
> 
> === Bad author entries ===
> 
> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
> "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is 
>

Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Thomas Koenig

Am 29.12.19 um 14:26 schrieb Segher Boessenkool:

We cannot waste a year on a social experiment.  We can slowly and carefully
adopt new procedures, certainly.  But anything drastic isn't advisable imo.

Also, many GCC developers aren't familiar with Git at all.  It takes time
to learn it, and to learn new ways of working.  Small steps are needed.


Amen to that.

My uses of git have can be counted in a single digit (in decimal).  I am
just hoping you guys know what you are doing, and I am a bit
apprehensive about the change and my continued ability to contribute.

Talk of a radical new development model does not raise my confidence.


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Maxim Kuvyrkov wrote:

> With the "Missed merges" problem (see below) I don't see how reposurgeon 
> conversion can be considered "ready".

It aims to be conservatively safe regarding merges, erring on the side of 
not adding incorrect merges if in doubt.  Because of the difficulty in 
matching SVN and git merge semantics, it's inherently hard to define 
unambiguously exactly which merges are correct and which are cherry-picks 
or erroneous.  I think extra merges are something nice-to-have rather than 
critical.

The case you mention is one where there was a merge to a branch not from 
its immediate parent but from an indirect parent.  I don't think it would 
be hard to support detecting such merges in reposurgeon.

> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
> "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is 
> unlikely to start with a digit.

These are already fixed in bugdb.py since that conversion, as part of the 
general review of authors to fix typos and make them more uniform.

> Reposurgeon-6a conversion misses many authors, below is a list of people 
> with names starting with "A".
> 
> Akos Kiss

This is an example where the originally added ChangeLog entry was 
malformed (had the date in the form "2004-0630"), so a conservatively safe 
approach was taken of using the committer rather than trying to guess what 
a malformed ChangeLog entry means and risk extracting nonsense.

I expect other cases are being similarly careful in cases where there was 
a malformed ChangeLog entry or a commit edited ChangeLog entries by other 
authors so leaving its single-author nature ambiguous.  Parsing 
ChangeLogs, especially where malformed entries are involved, is inherently 
a heuristic matter.

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Maxim Kuvyrkov
Below are several more issues I found in reposurgeon-6a conversion comparing it 
against gcc-reparent conversion.

I am sure, these and whatever other problems I may find in the reposurgeon 
conversion can be fixed in time.  However, I don't see why should bother.  My 
conversion has been available since summer 2019, I made it ready in time for 
GCC Cauldron 2019, and it didn't change in any significant way since then.

With the "Missed merges" problem (see below) I don't see how reposurgeon 
conversion can be considered "ready".  Also, I expected a diligent developer to 
compare new conversion (aka reposurgeon's) against existing conversion (aka 
gcc-pretty / gcc-reparent) before declaring the new conversion "better" or even 
"ready".  The data I'm seeing in differences between my and reposurgeon 
conversions shows that gcc-reparent conversion is /better/.

I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
conversion.  I welcome Richard E. to modify his summary scripts to work with 
svn-git scripts, which should be straightforward, and I'm ready to help.

Meanwhile, I'm going to add additional root commits to my gcc-reparent 
conversion to bring in "missing" branches (the ones, which don't share history 
with trunk@1) and restart daily updates of gcc-reparent conversion.

Finally, with the comparison data I have, I consider statements about git-svn's 
poor quality to be very misleading.  Git-svn may have had serious bugs years 
ago when Eric R. evaluated it and started his work on reposurgeon.  But a lot 
of development has happened and many problems have been fixed since them.  At 
the moment it is reposurgeon that is producing conversions with obscure 
mistakes in repository metadata.


=== Missed merges ===

Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked 
ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane merges 
were omitted.  Below is analysis for ARM/hard_vfp_branch.

$ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4

commit ef92c24b042965dfef982349cd5994a2e0ff5fde
Author: Richard Earnshaw 
Date:   Mon Jul 20 08:15:51 2009 +

Merge trunk through to r149768

Legacy-ID: 149804

 COPYING.RUNTIME |73 +
 ChangeLog   |   270 +-
 MAINTAINERS |19 +-



at the same time for svn-git scripts we have:

$ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4

commit ce7d5c8df673a7a561c29f095869f20567a7c598
Merge: 4970119c20da 3a69b1e566a7
Author: Richard Earnshaw 
Date:   Mon Jul 20 08:15:51 2009 +

Merge trunk through to r149768

git-svn-id: https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 
138bc75d-0d04-0410-961f-82ee72b054a4


... which agrees with
$ svn propget svn:mergeinfo 
file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
/trunk:142588-149768

=== Bad author entries ===

Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
"2005-03-18 Kazu Hirata".  It is rather obvious that person's name is unlikely 
to start with a digit.

=== Missed authors ===

Reposurgeon-6a conversion misses many authors, below is a list of people with 
names starting with "A".

Akos Kiss
Anders Bertelrud
Andrew Pochinsky
Anton Hartl
Arthur Norman
Aymeric Vincent

=== Conservative author entries ===

Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many commits 
where svn-git conversion manages to extract valid email from commit data.  This 
happens for hundreds of author entries.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org


> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov  wrote:
> 
> 
>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek  wrote:
>> 
>> On Thu, Dec 26, 2019 at 11:04:29AM +, Joseph Myers wrote:
>> Is there some easy way (e.g. file in the conversion scripts) to correct
>> spelling and other mistakes in the commit authors?
>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>> Jakub Jakub Jelinek (1):
>> Jakub Jeilnek (1):
>> Jelinek (1):
>> entries next to the expected one with most of the commits.
>> For the misspellings, wonder if e.g. we couldn't compute edit distances from
>> other names and if we have one with many commits and then one with very few
>> with small edit distance from those, flag it for human review.
> 
> This is close to what svn-git-author.sh script is doing in gcc-pretty and 
> gcc-reparent conversions.  It ignores 1-3 character differences in 
> author/committer names and email addresses.  I've audited results for all 
> branches and didn't spot any mistakes.
> 
> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and 
> gcc-reposurgeon-5a repos among themselves.  Below are current notes for 
> comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
> 
> == Merges on trunk ==
> 
> Reposurgeon creates

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Joseph Myers
On Sat, 28 Dec 2019, Joseph Myers wrote:

> Concretely, what I'd suggest is: convert ISO-8859-1 entries in the 
> checked-in list to UTF-8, removing anything that thereby becomes a 
> duplicate or unnecessary; handle anything whose encoding isn't simply 
> ISO-8859-1 or UTF-8 via a hardcoded entry in bugdb.py using hex escapes 
> like the existing such entries there.  Once the checked-in list is pure 
> UTF-8 it's easier for people to review and edit.  Where the issue is only 
> presence of ISO-8859 NBSP, or "" or () around the names, remove that in 
> the checked-in list and again remove duplicates.  That way the list can be 
> limited to non-encoding variations.

I've now made those changes to the checked-in list so it's pure UTF-8, and 
thus easier to review and edit.  We still need to implement code in 
bugdb.py to use that list to pick the preferred form from each list of 
variants (and people may wish to change the preferred forms in some 
cases).

-- 
Joseph S. Myers
j...@polyomino.org.uk


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Ian Lance Taylor via gcc
On Sun, Dec 29, 2019 at 5:49 AM Julien '_FrnchFrgg_' RIVAUD
 wrote:
>
> Which brings me to something I find strange in your policy: to me,
> merges from trunk to branches should be rare if not nonexistent. And you
> are deciding to banish merges the other way around.

Out of curiosity, why do you say that merges from trunk to branches
should be rare?  It seems to me that any long-lived development branch
will require merges from trunk to the branch.  Are you saying that
those kinds of branches are rare?

In GCC we have historically had a pattern in which people use
long-lived parallel branches that maintain specific patches on top of
GCC trunk.  These branches provide a simple way to get a variant of
GCC with specific patches of interest to some people.  These branches
too require regular merges from trunk.

Ian


Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Jeff Law
On Sun, 2019-12-29 at 07:32 -0500, Eric S. Raymond wrote:
> Richard Earnshaw (lists) :
> > I've just commented that one out for now; if anybody knows the correct
> > addresses, please let me know.  Also, there's one joint list that I've
> > not attempted to fix at this time.
> > #  "28488": "Jim Kingdon ;",
> 
> That's Jim Kingdon the former CVS dev - I think he was involved in Subversion 
> early too.
And gdb eons ago.

> 
> He's king...@cyclic.com or king...@panix.com, according to my back
> mail. but since I think I remember that he did work at RedHat in the
> late '90s king...@redhat.com would be a good bet too.
Yea and @cygnus.com before that.  I haven't seen much, if anything,
from him in 15+ years.  He's not with Red Hat anymore.

jeff



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Mark Wielaard
Hi,

On Wed, 2019-12-25 at 06:10 -0600, Segher Boessenkool wrote:
> git-svn did not miss any branches.  Finding branches is not done by
> git-svn at all, for this.  These branches were skipped because they
> have nothing to do with GCC, have no history in common (they are not
> descendants of revision 1).  They can easily be added -- Maxim might
> already have done that, not sure, imo it's better to just drop the
> garbage, it's in svn if anyone cares.

I just looked at one of these "missed" branches CLASSPATH.
That was created when both GNU Classpath and gcc/libgcj were both in
cvs. The idea was that it was a kind of cvs vendor branch of the
upstream GNU Classpath releases (and some random checkouts) which would
make merging imports of new code into the main trunk easier. libgcj was
merged and then based on GNU Classpath in the past/when it was
officially imported into gcc. The CLASSPATH branch only contains files
under libjava/classpath.

Some of the commits look a little odd, probably because it was
converted from cvs2svn and then again to git. GNU Classpath moved to
git a long time ago and never was in subversion. And of course these
days gcj and libgcj aren't part of the main gcc trunk anymore.

There is also a classpath-generics branch, which has a couple of
snapshots of the GNU Classpath generics branch (some pre-releases of
classpath before 0.95 which had generics separately).

There are also some other branches containing classpath:
gcj/classpath-095-import-branch
gcj/classpath-095-merge-branch
gcj/classpath-0961-import-branch
gcj/classpath-098-merge-branch
gcj/classpath-20070727-import-branch

These branches contain all of gcc, not just the files under
libjava/classpath
I am not sure why these were separate from the CLASSPATH vendor branch.

Even though I have an (historical) interest in the gcj frontend and GNU
Classpath class library I am not sure these branches would really help
me. Also I think the branch aren't very interesting without the actual
GNU Classpath (git) tree history from which they were cherry-picked.
The classpath git tree does contain tags for each import already, so
you can get the real history there.

Seeing how big the git tree/conversion already is I would suggest
leaving these out of the main git repo if at all possible.

Maybe we should have a separate historical git repo which contains
everything that we were able to salvage and that people could git
remote add if they are really, really interested.

Cheers,

Mark


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 17:32, Richard Earnshaw a écrit :

We agreed that for changes in our current workflow practices we'd defer
that until *after* we'd switched to git; so this is getting off topic.

On the other hand, we do need to sort out what we do with existing merge
history, as that forms part of the conversion.  Can we stick to what's
relevant, please, at least in this thread?


I never wanted to make the GCC project choose new rules now. What I 
advise (and you are more than able to choose to follow or not) is only 
to avoid taking decisions right now, as part of the migration, that 
would impair establishing better rules later, especially if those 
decisions come from (bad?) habits that were taken during the SVN era, 
due to the idiosyncrasies of SVN itself.


Julien




Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Richard Earnshaw
On 29/12/2019 12:15, Segher Boessenkool wrote:
> On Sun, Dec 29, 2019 at 12:02:51PM +0100, Richard Biener wrote:
>> For bisecting trunk a merge would be a single commit, right? So I could see 
>> value in preserving a patch series where individual steps might introduce 
>> temporary issues as a branch merge (after rebasing) so the series is visible 
>> but not when bisecting (by default). It would also make the series 
>> relatedness obvious and avoids splitting it with a commit race (if that is 
>> possible with git). 
> 
> "git bisect" actually goes all the way down the rabbit hole, it tries to
> find the first bad commit in the range you marked as starting as "good",
> ending as "bad".
> 
> It is pretty confusing to do if there are many merges, especially if many
> commits end up not building at all.  But you can always "git bisect skip"
> stuff (it just eats time, and it hampers automated bisecting).
> 
> The really nasty cases are when the code does build, but fails for
> unrelated reasons.
> 
> We require every commit to be individually tested, and if we *do* allow
> merges, that should still be done imo.  Which again makes merging less
> useful: if you are going to rebase your branch anyway (to fix simple
> stuff), why not rebase it onto trunk!
> 
>> IMHO exact workflow for merging a patch series as opposed to a single patch 
>> should be documented. 
> 
> Yes.  It isn't actually documented in so many words for what we do now,
> either, but it would be good to have.
> 
> 
> Segher
> 

We agreed that for changes in our current workflow practices we'd defer
that until *after* we'd switched to git; so this is getting off topic.

On the other hand, we do need to sort out what we do with existing merge
history, as that forms part of the conversion.  Can we stick to what's
relevant, please, at least in this thread?

R.


Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Richard Earnshaw (lists)
On 29/12/2019 12:32, Eric S. Raymond wrote:
> Richard Earnshaw (lists) :
>> I've just commented that one out for now; if anybody knows the correct
>> addresses, please let me know.  Also, there's one joint list that I've
>> not attempted to fix at this time.
> 
>> #  "28488": "Jim Kingdon ",
> 
> That's Jim Kingdon the former CVS dev - I think he was involved in
> Subversion early too.
> 
> He's king...@cyclic.com or king...@panix.com, according to my back
> mail. but since I think I remember that he did work at RedHat in the
> late '90s king...@redhat.com would be a good bet too.
> -- 
>     http://www.catb.org/~esr/";>Eric S. Raymond
> 
> 

Based on https://gcc.gnu.org/ml/gcc-patches/2000-02/msg00179.html and
some other patches from his redhat address, I'm going to go with that.

Also, for this one:

#  "47044": "",

There's some (relatively weak) evidence that this is Bjørn Wennberg (eg
https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60J),
but in the absence of stronger evidence, I'm going to just put bjornw as
the name.

For the final one:

#  "49710": "Naveen Sharma,Nitin Gupta
",

The list emails show Naveen as the driver of the contribution during the
submission phase, so I'll use that name for the primary author, again
barring any stronger evidence that this is incorrect.

R.


Re: The far past of GCC

2019-12-29 Thread Eric S. Raymond
Mark Wielaard :
> Apparently less complete, but there is also
> https://ftp.gnu.org/old-gnu/gcc/
> Which does have some old diff files to reconstruct some missing versions.

There are quite a few ancient preserved release tarballs out there
Here is the list of reconstructable pre-r3 releases as as I now know it:

0.9 ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-0.9.tar.bz2
1.21ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.21.tar.bz2
1.22ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.22.tar.bz2
1.23ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.23-1.24.bz2
1.24ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.24-1.25.bz2
1.25ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.25-1.26.bz2
1.26ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.26-1.27.bz2
1.27ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.27.tar.bz2
1.28ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.28-1.29.bz2
1.29ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.29-1.30.bz2
1.30ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.30-1.31.bz2
1.31ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.31.tar.bz2
1.32ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.31-1.32.bz2
1.33ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.32-1.33.bz2
1.34ftp://gcc.gnu.org/pub/gcc/old-releases/patches/gcc.diff-1.32-1.34.bz2
1.35ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.35.tar.bz2

It looks like the relevant bits of 
ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-[12]
and ftp://sourceware.org/pub/gcc/old-releases/gcc-[12]

Incorporating these will be easy. What I would do is write script that does 
this:

(a) checks to see if each tarball is mirrored locally

(b) if not, fetches it, applying forward or back diffs from the nearest whole
version as required.

(c) generates a sequence of reposurgeon incorporate commands to be included
un the main lift script

sbb says r3 is 1.36.  I doubt r1 and r2 are anything other than
Subversion directory creations, but people with easier access than me
should check.

After this life gets a little trickier. We have the following tarballs
that might be of interest:

1.36r3  ftp://gcc.org/pub/gcc/old-releases/gcc-1/gcc-1.36.tar.bz2
1.37?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.37.tar.bz2
1.38?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.38.tar.bz2
1.39?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.39.tar.bz2
1.40?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.40.tar.bz2
1.41?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.41.tar.bz2
1.42?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-1.42.tar.bz2
2.0 r358ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.8.tar.bz2
2.1 r586ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.1.tar.bz2
3.2.2   ?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.2.2.tar.bz2
2.3.3   ?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.3.3.tar.bz2
2.4.5   ?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.4.5.tar.bz2
2.5.8   ?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.5.8.tar.bz2
2.6.3   ?   ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.6.3.tar.bz2
2.7.2   r10608  ftp://gcc.gnu.org/pub/gcc/old-releases/gcc-1/gcc-2.7.2.tar.bz2

Before we can do anything with these, we need to identify which Subversion 
revsion 
each one with a ? belongs to.  I've added three of ssb's identifications.  For
completeness I note thse for which we have no tarballs:

r1184 = 2.2, r2674 = 2.3.1, r4493 = 2.4.0 "minus two swapped commits",
r5867 = 2.5.0, r7771 = 2.6.0, r9996 = 2.7.0.

This recomstruction is being tracked here: 
https://gitlab.com/esr/gcc-conversion/issues/4
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 02:48:31PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> >Merges aren't scary.  Merges are inconvenient.
> 
> No they are not. You are unaccustomed to them, which is different. 

Lol.  Okay, end of discussion.  You are assuming all the wrong things.


Segher


Re: The far past of GCC

2019-12-29 Thread Mark Wielaard
On Sat, Dec 28, 2019 at 09:15:53PM -0700, Jeff Law wrote:
> I don't have a gitlab account, so I'm commenting here.
> 
> I believe RCS was initially used circa 1992 on the FSF machine which
> held the canonical GCC sources.  But I'm not aware of anyone still
> having a copy of the old RCS ,v files.
> 
> THere's a slight chance we've got the old gcc2 snapshots in the Cygnus
> CVS tree (assuming I could still find it) -- we may have imported the
> snapshots onto CVS branches -- I can't really remember anymore.
> 
> FOr old releases, the best resource I know of is:
> 
> ftp://gcc.gnu.org/pub/gcc/old-releases
> 
> That has stuff all the way back to gcc-0.9, circa 1987.  It's nowhere
> near complete.  You'll also find that in that era things were split up.
> ie, the C++ compiler & runtime were separate distributions from the C
> compiler & code generator, similarly for the old g77 compiler, gnat,
> etc.
> 
> You may find other nuggets in there.  

Apparently less complete, but there is also
https://ftp.gnu.org/old-gnu/gcc/
Which does have some old diff files to reconstruct some missing versions.

Cheers,

Mark


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 14:31, Segher Boessenkool a écrit :

On Sun, Dec 29, 2019 at 12:46:50PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:

At worst, no commit is testable in the
branch except the last, and git will say that the bug was introduced in
the branch, which is not worse that what you'd get without a merge commit.

We normally require every commit to be tested, so it is a lot worse, yes.


That's very good, and should not change. I test every commit of every 
merge request I submit, even on projects that use real merges. It is 
easy to create CI/CD configurations and/or hooks that enforce that when 
trying to push a patch set, with or without a merge commit.


Merge commits have the great effect of separating the history into 
related chunks. Without them, you don't really know if a single bugfix 
is logically part of a set (because it fixes something important to pave 
the way) or not, and you have to think harder to detect the end of a set 
and the start of another (with maybe single commits inbetween).





Segher





Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 14:26, Segher Boessenkool a écrit :

Hi!

On Sun, Dec 29, 2019 at 12:42:07PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:

I'm not arguing that you should go that route, it seems a bit extreme to
me. But outright refusing merges on the basis they are painful is (if
you can accept the strong word) ludicrous.

They are painful for everyone working with the history later.


I don't think merges make looking at history more or less painful, 
unless you consider projects like git where there are a inordinate 
amount of merges. And even then, I think they have solutions.



  Something that we do in GCC more often than in most other projects.
I would have expected a lot if not all projects to look often in 
history, at least for projects with significant complexity.


Which is almost *never* the case for GCC, in my opinion.  Almost all
commits are smallish improvements / bugfixes.

Which are indepenent, clearly.

Every patch should normally be posted to the mailing lists for review.
Such patches should be against trunk.  And *that* patch will be approved,
so *that* is the one you will commit and push upstream eventually.


Indeed, the rebased series would be what is reviewed and pushed 
upstream. Which can be done with a merge commit anyway. I think you 
really should look at the workflow of the git project (and they have 
their share of interdependent strange things that happen too; of course 
less than GCC due to the complexity of the project, but the techniques 
to ensure you don't get bitten by that are the same).


They use merges extensively, and have a very very good track record of 
non-broken master (or at least had last time I looked).




We cannot waste a year on a social experiment.  We can slowly and carefully
adopt new procedures, certainly.  But anything drastic isn't advisable imo.

Also, many GCC developers aren't familiar with Git at all.  It takes time
to learn it, and to learn new ways of working.  Small steps are needed.


Of course ! I am not suggesting you change everything. But setting in 
stone hard rules that force the SVN mindset is harmful too.




Merges aren't scary.  Merges are inconvenient.


No they are not. You are unaccustomed to them, which is different. 
People that only ever used DVCS feel merges are much more natural and 
even productivity increasing. Some even do "bad merges", like "sync from 
trunk" every other commit, which I very much frown against.


Which brings me to something I find strange in your policy: to me, 
merges from trunk to branches should be rare if not nonexistent. And you 
are deciding to banish merges the other way around.


Julien


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 12:46:50PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> At worst, no commit is testable in the 
> branch except the last, and git will say that the bug was introduced in 
> the branch, which is not worse that what you'd get without a merge commit.

We normally require every commit to be tested, so it is a lot worse, yes.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
Hi!

On Sun, Dec 29, 2019 at 12:42:07PM +0100, Julien '_FrnchFrgg_' RIVAUD wrote:
> I'm not arguing that you should go that route, it seems a bit extreme to 
> me. But outright refusing merges on the basis they are painful is (if 
> you can accept the strong word) ludicrous.

They are painful for everyone working with the history later.  Something
that we do in GCC more often than in most other projects.

> >Merging is appropriate if there is parallel development of (mostly) 
> >independent things.
> 
> Which is almost always the case.

Which is almost *never* the case for GCC, in my opinion.  Almost all
commits are smallish improvements / bugfixes.  And most bigger things are
not independent enough -- we require the resulting thing to be (regression)
tested before pushing it upstream, and that is because often that *does*
find problems!

> >Features aren't that, usually: they can be rebased easily, and they should 
> >be posted
> >for review anyway.
> How often successive features checked into GCC are dependent on each 
> other ?

Almost always, one way or the other.  It's not just the GCC code itself
you have to consider here, there things are easily independent enough,
but looking at the code generated by GCC often shows unexpected
interactions.

> The fact that they can be rebased either way and easily is 
> almost a testimony of that. And the fact that they need review has 
> nothing to do with anything.

Every patch should normally be posted to the mailing lists for review.
Such patches should be against trunk.  And *that* patch will be approved,
so *that* is the one you will commit and push upstream eventually.

Those are the procedures we currently have, and it is necessary to keep
the tree even somewhat working most of the time.  Too often the tree is
broken for days on end :-(

> >It is very easy to use merges more often than is useful, and it hurts.
> 
> And it is very easy to use SVN-like workflows, and it hurts far more. 
> SVN, due to its centrality and inherent impossibility to encode logical 
> relationships between changes (as opposed to time-based evolution), 
> slowly impaired most developers mind openness about what can be done in 
> a worthwhile VCS. Moving to git is an opportunity to at last free 
> yourselves, not continue that narrow treading on SVN paths.

We cannot waste a year on a social experiment.  We can slowly and carefully
adopt new procedures, certainly.  But anything drastic isn't advisable imo.

Also, many GCC developers aren't familiar with Git at all.  It takes time
to learn it, and to learn new ways of working.  Small steps are needed.

> SVN was like an almanac listing successive events without any analysis. 
> That's not History (as in the field of study). Git at least can let you 
> express and use to your common benefit logical links between 
> modifications. Don't miss that train.

I think you seriously overestimate how much information content is in a
merge (esp. as applied to the GCC context).  Let's start with using good
commit messages (or actual commit messages *at all*), that has a much
better pain/gain ratio.

> Merges are not scary when the tools are good. Even the logs are totally 
> usable with a lot of merges, with suitable tools. The tool has to adapt, 
> not you.

Merges aren't scary.  Merges are inconvenient.  And yes, there is no way
that all of us will change on a non-geological time scale.


Segher


Re: The far past of GCC

2019-12-29 Thread Richard Kenner
> I believe RCS was initially used circa 1992 on the FSF machine which
> held the canonical GCC sources. 

Your memory agrees with mine.


Re: The far past of GCC

2019-12-29 Thread Eric S. Raymond
Jeff Law :
> I believe RCS was initially used circa 1992 on the FSF machine which
> held the canonical GCC sources.

That year sounds right - it's when I wrote the original vcs.el for Emacs
and a lot of Emacs users who hadn't been usiing version control started to.

Doesn't give us a Subversion revision, though.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Eric S. Raymond
Richard Earnshaw (lists) :
> I've just commented that one out for now; if anybody knows the correct
> addresses, please let me know.  Also, there's one joint list that I've
> not attempted to fix at this time.

> #  "28488": "Jim Kingdon ",

That's Jim Kingdon the former CVS dev - I think he was involved in Subversion 
early too.

He's king...@cyclic.com or king...@panix.com, according to my back
mail. but since I think I remember that he did work at RedHat in the
late '90s king...@redhat.com would be a good bet too.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 12:02:51PM +0100, Richard Biener wrote:
> For bisecting trunk a merge would be a single commit, right? So I could see 
> value in preserving a patch series where individual steps might introduce 
> temporary issues as a branch merge (after rebasing) so the series is visible 
> but not when bisecting (by default). It would also make the series 
> relatedness obvious and avoids splitting it with a commit race (if that is 
> possible with git). 

"git bisect" actually goes all the way down the rabbit hole, it tries to
find the first bad commit in the range you marked as starting as "good",
ending as "bad".

It is pretty confusing to do if there are many merges, especially if many
commits end up not building at all.  But you can always "git bisect skip"
stuff (it just eats time, and it hampers automated bisecting).

The really nasty cases are when the code does build, but fails for
unrelated reasons.

We require every commit to be individually tested, and if we *do* allow
merges, that should still be done imo.  Which again makes merging less
useful: if you are going to rebase your branch anyway (to fix simple
stuff), why not rebase it onto trunk!

> IMHO exact workflow for merging a patch series as opposed to a single patch 
> should be documented. 

Yes.  It isn't actually documented in so many words for what we do now,
either, but it would be good to have.


Segher


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 12:02, Richard Biener a écrit :

On December 29, 2019 11:41:00 AM GMT+01:00, Segher Boessenkool 
 wrote:

On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud
wrote:

Oh, I'm not talking about historical merges.  I'm saying we

shouldn't do

future merges, where we can help that.  It disagrees with our

documented

"submitting patches" protocol.

I don't see how that can be correct. Linux is heavily "submitting
patches" based, with stringent reviews on LKML, yet heavily uses

merges.

Linux has most development done in separate trees, one for each
maintainer.
That is not how GCC works.

I was talking about https://gcc.gnu.org/contribute.html , see heading
"submitting patches" :-)


Nothing should ever be flattened to a single commit.  But before

patches

hit trunk, the patch series can be made nicer than it was at the

start

of its development.

I quite agree with that, and it resonates with my TL;DR chunk of text

above.

Yup.  Rebasing is superior to merging in many ways.  Merging is
appropriate
if there is parallel development of (mostly) independent things.
Features
aren't that, usually: they can be rebased easily, and they should be
posted
for review anyway.

It is very easy to use merges more often than is useful, and it hurts.

For bisecting trunk a merge would be a single commit, right?
Not exactly. It will if the bug was not introduced by the merge, but if 
so then "git bisect" will start looking at individual commits in the 
branch, which is IMHO very good. It is far easier to have a bug pinned 
to a single change (or say 5-6 commits, if all were not buildable or 
testable), than a whole branch. At worst, no commit is testable in the 
branch except the last, and git will say that the bug was introduced in 
the branch, which is not worse that what you'd get without a merge commit.


Julien



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Julien '_FrnchFrgg_' RIVAUD

Le 29/12/2019 à 11:41, Segher Boessenkool a écrit :

On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud wrote:

Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
future merges, where we can help that.  It disagrees with our documented
"submitting patches" protocol.

I don't see how that can be correct. Linux is heavily "submitting
patches" based, with stringent reviews on LKML, yet heavily uses merges.

Linux has most development done in separate trees, one for each maintainer.
That is not how GCC works.


I mentioned the git development for a reason. They use merges for 
*everything*, including patchsets by people who never contributed before 
and might never contribute afterwards. The very *concept* of a DVCS is 
that each developer has a separate tree, not each maintainer.


I'm not arguing that you should go that route, it seems a bit extreme to 
me. But outright refusing merges on the basis they are painful is (if 
you can accept the strong word) ludicrous.



Nothing should ever be flattened to a single commit. But before patches

hit trunk, the patch series can be made nicer than it was at the start
of its development.

I quite agree with that, and it resonates with my TL;DR chunk of text above.

Yup.  Rebasing is superior to merging in many ways.


That's not what I agreed with. I agreed with « the patch series can be 
made nicer », which I took to be the contrary of « append patches at the 
end ». Rebasing is *one* of the ways to do that, especially interactive 
rebasing to shuffle patches around, check that each step compiles and 
passes the full test suite (updating it if needed and correct), reword 
messages, and think a lot of times about the best progression. But I 
never opposed rebasing to merging. In particular, I clearly wrote that 
*even if you rebased*, there are very strong arguments out there about 
refusing fast-forward merges, that is *always* generate a real merge 
commit, with a cover letter message roughly corresponding to the mail 
people send on the ML to convince people their patch series are worth 
including in GCC.


That leaves individual commit messages to explain the local rationale 
behind each discrete change (not the how, as it is readily apparent from 
the code, unless the code is very clever and then an in-code comment is 
warranted)




Merging is appropriate if there is parallel development of (mostly) independent 
things.


Which is almost always the case.


Features aren't that, usually: they can be rebased easily, and they should be 
posted
for review anyway.
How often successive features checked into GCC are dependent on each 
other ? The fact that they can be rebased either way and easily is 
almost a testimony of that. And the fact that they need review has 
nothing to do with anything.

It is very easy to use merges more often than is useful, and it hurts.


And it is very easy to use SVN-like workflows, and it hurts far more. 
SVN, due to its centrality and inherent impossibility to encode logical 
relationships between changes (as opposed to time-based evolution), 
slowly impaired most developers mind openness about what can be done in 
a worthwhile VCS. Moving to git is an opportunity to at last free 
yourselves, not continue that narrow treading on SVN paths.


SVN was like an almanac listing successive events without any analysis. 
That's not History (as in the field of study). Git at least can let you 
express and use to your common benefit logical links between 
modifications. Don't miss that train.


Merges are not scary when the tools are good. Even the logs are totally 
usable with a lot of merges, with suitable tools. The tool has to adapt, 
not you.


Julien


Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Richard Biener
On December 29, 2019 11:41:00 AM GMT+01:00, Segher Boessenkool 
 wrote:
>On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud
>wrote:
>> >Oh, I'm not talking about historical merges.  I'm saying we
>shouldn't do
>> >future merges, where we can help that.  It disagrees with our
>documented
>> >"submitting patches" protocol.
>> 
>> I don't see how that can be correct. Linux is heavily "submitting 
>> patches" based, with stringent reviews on LKML, yet heavily uses
>merges. 
>
>Linux has most development done in separate trees, one for each
>maintainer.
>That is not how GCC works.
>
>I was talking about https://gcc.gnu.org/contribute.html , see heading
>"submitting patches" :-)
>
>> >Nothing should ever be flattened to a single commit.  But before
>patches
>> >hit trunk, the patch series can be made nicer than it was at the
>start
>> >of its development.
>> 
>> I quite agree with that, and it resonates with my TL;DR chunk of text
>above.
>
>Yup.  Rebasing is superior to merging in many ways.  Merging is
>appropriate
>if there is parallel development of (mostly) independent things. 
>Features
>aren't that, usually: they can be rebased easily, and they should be
>posted
>for review anyway.
>
>It is very easy to use merges more often than is useful, and it hurts.

For bisecting trunk a merge would be a single commit, right? So I could see 
value in preserving a patch series where individual steps might introduce 
temporary issues as a branch merge (after rebasing) so the series is visible 
but not when bisecting (by default). It would also make the series relatedness 
obvious and avoids splitting it with a commit race (if that is possible with 
git). 

IMHO exact workflow for merging a patch series as opposed to a single patch 
should be documented. 

Richard. 

>
>Segher



Re: Proposal for the transition timetable for the move to GIT

2019-12-29 Thread Segher Boessenkool
On Sun, Dec 29, 2019 at 02:40:45AM +0100, Julien FrnchFrgg Rivaud wrote:
> >Oh, I'm not talking about historical merges.  I'm saying we shouldn't do
> >future merges, where we can help that.  It disagrees with our documented
> >"submitting patches" protocol.
> 
> I don't see how that can be correct. Linux is heavily "submitting 
> patches" based, with stringent reviews on LKML, yet heavily uses merges. 

Linux has most development done in separate trees, one for each maintainer.
That is not how GCC works.

I was talking about https://gcc.gnu.org/contribute.html , see heading
"submitting patches" :-)

> >Nothing should ever be flattened to a single commit.  But before patches
> >hit trunk, the patch series can be made nicer than it was at the start
> >of its development.
> 
> I quite agree with that, and it resonates with my TL;DR chunk of text above.

Yup.  Rebasing is superior to merging in many ways.  Merging is appropriate
if there is parallel development of (mostly) independent things.  Features
aren't that, usually: they can be rebased easily, and they should be posted
for review anyway.

It is very easy to use merges more often than is useful, and it hurts.


Segher