[Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-14 Thread Maxim Kuvyrkov
This patch adds scripts to contrib/ to migrate full history of GCC's subversion 
repository to git.  My hope is that these scripts will finally allow GCC 
project to migrate to Git.

The result of the conversion is at 
https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with "@rev" 
suffixes represent branch points.  The conversion is still running, so not all 
branches may appear right away.

The scripts are not specific to GCC repo and are usable for other projects.  In 
particular, they should be able to convert downstream GCC svn repos.

The scripts convert svn history branch by branch.  They rely on git-svn on 
convert individual branches.  Git-svn is a good tool for converting individual 
branches.  It is, however, either very slow at converting the entire GCC repo, 
or goes into infinite loop.

There are 3 scripts:

- svn-git-repo.sh: top level script to convert entire repo or a part of it 
(e.g., branches/),
- svn-list-branches.sh: helper script to output branches and their parents in 
bottom-up order,
- svn-git-branch.sh: helper script to convert a single branch.

Whenever possible, svn-git-branch.sh uses existing git branches as caches.

What are your questions and comments?

The attached is cleaned up version, which hasn't been fully tested yet; typos 
and other silly mistakes are likely.  OK to commit after testing?

--
Maxim Kuvyrkov
www.linaro.org




0001-Contrib-SVN-Git-conversion-scripts.patch
Description: Binary data


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-05 Thread Maxim Kuvyrkov


> On Aug 2, 2019, at 11:41 AM, Maxim Kuvyrkov  wrote:
> 
>> On Aug 1, 2019, at 11:43 PM, Jason Merrill  wrote:
>> 
...
>>> Unfortunately, current mirror does not and could not account for rewrites 
>>> of SVN commit log messages.  For trunk the histories of diverge in 2008 due 
>>> to commit message change of r138154.  This is not a single occurrence; I've 
>>> compared histories only of trunk and gcc-6-branch, and both had commit 
>>> message change (for gcc-6-branch see r259978).
>>> 
>>> It's up to the community is to weigh pros and cons of re-using existing GCC 
>>> mirror as conversion base vs regenerating history from scratch:
>>> 
>>> Pros of using GCC mirror:
>>> + No need to rebase public git-only branches
>>> + No need to rebase private branches
>>> + No need to rebase current clones, checkouts, work-in-progress trees
>>> 
>>> Cons of using GCC mirror:
>>> - Poor author / committer IDs (this breaks patch statistics software)
>>> - Several commit messages will not be the current "fixed" version
>>> 
>>> Thoughts?
>> 
>> I'm still inclined to stick with the mirror.  I would expect patch
>> statistics software to be able to be taught about multiple addresses
>> for the same person.
> 
> Patch tracking software breaks on emails like 
>  , where 
> 38bc75d-0d04-0410-961f-82ee72b054a4 is not a reasonable domain name.
> 
> For completeness, I'll generate and upload a repo based on current mirror 
> with all branches and tags converted.

Yeah, this didn't worked as well as I hoped.  Current gcc git mirror has wrong 
history for branches that followed scenario:
1. create $branch from $base at revision N
2. commit WORK on $branch
3. delete $branch
4. create $branch from $base at revision N+M
5. rebase WORK on current $branch

Current mirror connects histories of two versions of $branch, and we get wrong 
history.  In step (4) instead of plain history of $base we get a commit merging 
histories of $branch just before deletion and $base at revision N+M.

There are many branches like this, e.g., branches/gccgo.

--
Maxim Kuvyrkov
www.linaro.org




Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-05 Thread Martin Liška
On 8/3/19 12:31 AM, Jason Merrill wrote:
> On Fri, Aug 2, 2019 at 7:35 AM Martin Liška  wrote:
>>
>> On 8/2/19 1:06 PM, Richard Biener wrote:
>>> On Fri, Aug 2, 2019 at 1:01 PM Martin Liška  wrote:

 On 8/2/19 12:54 PM, Maxim Kuvyrkov wrote:
>> On Aug 2, 2019, at 1:26 PM, Martin Liška  wrote:
>>
>> On 8/2/19 10:41 AM, Maxim Kuvyrkov wrote:
>>> In the end, I don't care much to which version of the repo we switch, 
>>> as long as we switch.
>>
>> Hi Maxim.
>>
>> I really appreciate that you've been working on that. Same as you I 
>> would like to see
>> any change that will lead to a git repository.
>>
>> I have couple of questions about the upcoming Cauldron:
>>
>> - Are you planning to attend?
>
> Unfortunately, I won't attend this time.

 I see.

>
>> - Would it be possible to prepare a voting during e.g. Steering 
>> Committee where
>>  we'll vote about transition options?
>> - Would it make sense to do an online questionnaire in advance in order
>>  to guess what's prevailing opinion?
>>
>> If you are interested, I can help you?
>
> Let's organize an online survey now.  While most active GCC developers 
> will attend Cauldron, many others will not, so we shouldn't rely on 
> Cauldron to make any final decisions.

 Sure, online is the best option as all active community members can vote.

>
> Martin, would you please organize the survey?

 Yes, but I haven't followed the discussion in recent weeks. Is the only 
 question
 whether we want the current GIT mirror or your rebased git repository?
 Is Eric Raymond's transition still in play or not?
>>>
>>> 1) Stay with SVN
>>> 2) Switch to the existing GIT mirror
>>> 3) Wait for ERS to complete his conversion to GIT
>>> 4) Use the existing new conversion to GIT fixing authors and commit messages
>>> 5) I don't care
>>> 6) I don't care as long as we switch to GIT
>>>
 Are there any other sub-question regarding commit message format, git 
 hooks, etc.
 that will deserve a place in the questionnaire?
>>>
>>> No, please do not make it unnecessarily complicated.  Maybe the questionaire
>>> can include a free-form text field for more comments.
>>>
>>> Btw, I do not believe we should do this kind of voting.  Who's eligible to 
>>> vote?
>>> Is the vote anonymous?  What happens when the majority (what is the 
>>> majority?)
>>> votes for option N?
>>>
>>> IMHO voting is bike-shedding.
>>>
>>> Those who do the work decide.  _They_ may ask questions _and_ decide whether
>>> to listen to the answer.
>>>
>>> Richard.
>>>
 Thank,
 Martin

>
> Regards,
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>

>>
>> So Richi is suggesting to finish all necessary for transition before we'll 
>> vote.
>> That should include bugzilla reporting script and maybe other git hooks?
>> Do we have a checklist of these? Jason?
> 
> As far as I can see, the SVN hooks only send email to the *cvs and
> gcc-bugzilla lists, that shouldn't be hard to mimic.
> 
> I think we also want to decide on policies for creating branches/tags,
> deleting refs, or pushing non-fast-forward updates.  In the current
> mirror you can delete branches in your own subdirectory, but not other
> branches.
> 
> Jason
> 

Hello.

Based on the IRC discussion with Jakub, there's missing key element of the 
transition.
Jakub requests to have a monotonically increasing revisions (aka rXXX) to 
be assigned
for the future git revisions. These will be linked from bugzilla and 
http://gcc.gnu.org/rN

I don't like the suggested requirement and I would prefer to use git hashes for 
both bugzilla
links and general references to revisions. That's what all projects using git 
do.

As it's still unresolved, I'm not planning to organize any GIT transition 
survey.

Martin



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-05 Thread Mike Stump
On Aug 2, 2019, at 4:06 AM, Richard Biener  wrote:
> 
> IMHO voting is bike-shedding.
> 
> Those who do the work decide.  _They_ may ask questions _and_ decide whether
> to listen to the answer.

I'd tend to agree.  I also think the recent conversion work is a fine solution, 
and that my preference for that might influence my agreement here.

I don't think we should maintain a requirement that we have monotonic numbers 
going forward.  That's just not the git way.  I've been known to do git log, 
and then manually pick start and end, and then bisect based upon not date, but 
out of a large hash list.  The concerns that some dates have a ton and other 
dates have few, doesn't come in to play, as each hash is 1 unit of work, so a 
list of 10235 hashs, can be trivially split into 2, 3, 4 or 1042, if you have 
the machines for it.

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-06 Thread Maxim Kuvyrkov
> On Aug 5, 2019, at 11:24 AM, Maxim Kuvyrkov  wrote:
> 
> 
>> On Aug 2, 2019, at 11:41 AM, Maxim Kuvyrkov  
>> wrote:
>> 
>>> On Aug 1, 2019, at 11:43 PM, Jason Merrill  wrote:
>>> 
> ...
 Unfortunately, current mirror does not and could not account for rewrites 
 of SVN commit log messages.  For trunk the histories of diverge in 2008 
 due to commit message change of r138154.  This is not a single occurrence; 
 I've compared histories only of trunk and gcc-6-branch, and both had 
 commit message change (for gcc-6-branch see r259978).
 
 It's up to the community is to weigh pros and cons of re-using existing 
 GCC mirror as conversion base vs regenerating history from scratch:
 
 Pros of using GCC mirror:
 + No need to rebase public git-only branches
 + No need to rebase private branches
 + No need to rebase current clones, checkouts, work-in-progress trees
 
 Cons of using GCC mirror:
 - Poor author / committer IDs (this breaks patch statistics software)
 - Several commit messages will not be the current "fixed" version
 
 Thoughts?
>>> 
>>> I'm still inclined to stick with the mirror.  I would expect patch
>>> statistics software to be able to be taught about multiple addresses
>>> for the same person.
>> 
>> Patch tracking software breaks on emails like 
>>  , where 
>> 38bc75d-0d04-0410-961f-82ee72b054a4 is not a reasonable domain name.
>> 
>> For completeness, I'll generate and upload a repo based on current mirror 
>> with all branches and tags converted.
> 
> Yeah, this didn't worked as well as I hoped.  Current gcc git mirror has 
> wrong history for branches that followed scenario:
> 1. create $branch from $base at revision N
> 2. commit WORK on $branch
> 3. delete $branch
> 4. create $branch from $base at revision N+M
> 5. rebase WORK on current $branch
> 
> Current mirror connects histories of two versions of $branch, and we get 
> wrong history.  In step (4) instead of plain history of $base we get a commit 
> merging histories of $branch just before deletion and $base at revision N+M.
> 
> There are many branches like this, e.g., branches/gccgo.

I've setup uploads and updates of fully converted GCC history (all branches and 
all tags) in 3 flavors.  These will be updated roughly hourly.

1. https://git-us.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/
This is a fresh conversion from scratch with "pretty" authors.

2. https://git.linaro.org/people/maxim-kuvyrkov/gcc-mirror.git/
This is a close match to current GCC mirror.  Trunk and gcc-*-branch branches 
are imported from the mirror, and the rest is reconstructed starting from the 
imported branches.

3. https://git-us.linaro.org/people/maxim-kuvyrkov/gcc-raw.git/
This is a fresh conversion from scratch with no author rewrites.

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-22 Thread Maxim Kuvyrkov
> On Aug 6, 2019, at 12:32 PM, Maxim Kuvyrkov  wrote:
> 
...
> I've setup uploads and updates of fully converted GCC history (all branches 
> and all tags) in 3 flavors.  These will be updated roughly hourly.
> 
> 1. https://git-us.linaro.org/people/maxim-kuvyrkov/gcc-pretty.git/
> This is a fresh conversion from scratch with "pretty" authors.
> 
> 2. https://git.linaro.org/people/maxim-kuvyrkov/gcc-mirror.git/
> This is a close match to current GCC mirror.  Trunk and gcc-*-branch branches 
> are imported from the mirror, and the rest is reconstructed starting from the 
> imported branches.
> 
> 3. https://git-us.linaro.org/people/maxim-kuvyrkov/gcc-raw.git/
> This is a fresh conversion from scratch with no author rewrites.
> 

The conversion is now fully complete.  The above 3 repositories all have 
complete and accurate [1] history of all branches and tags.  SVN's /branches/* 
are converted to Git's refs/heads/*, and SVN's /tags/* are converted to Git's 
annotated tags refs/tags/*.  SVN's /trunk is Git's refs/heads/master.

I propose that we switch to gcc-pretty.git repository, because it has accurate 
Committer and Author fields.  Developer names and email addresses are extracted 
from source history, and accurately track people changing companies, email 
addresses, and names.  IMO, it is more important for people to get credit for 
open-source contributions on github, ohloh, etc., than the inconvenience of 
rebasing local git branches.  It's also an important marketing tool for 
open-source companies to show stats of their corporate email addresses 
appearing in git commit logs.

I also suggest that we don't wait for Cauldron to make plan on when and how to 
switch.  I believe the big decisions should be made on the mailing list, and at 
Cauldron we can discuss finer points.  [Also, unfortunately, I won't attend 
this year.]


[1] Gcc-mirror.git has artifacts in several commit messages due to edits of SVN 
commit messages after the fact.

Regards,

--
Maxim Kuvyrkov
www.linaro.org





Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-23 Thread Joseph Myers
On Fri, 23 Aug 2019, Maxim Kuvyrkov wrote:

> I propose that we switch to gcc-pretty.git repository, because it has 
> accurate Committer and Author fields.  Developer names and email 
> addresses are extracted from source history, and accurately track people 
> changing companies, email addresses, and names.  IMO, it is more 
> important for people to get credit for open-source contributions on 
> github, ohloh, etc., than the inconvenience of rebasing local git 
> branches.  It's also an important marketing tool for open-source 
> companies to show stats of their corporate email addresses appearing in 
> git commit logs.

I concur that accurately crediting contributors is important and means we 
should not start from the existing mirror (though we should keep its 
branches available, so references to them and to their commit hashes 
continue to work - either keeping the existing repository available under 
a different name, or renaming the branches to put them in the new 
repository - which should not enlarge the repository much because blob and 
tree objects will generally be shared between the two versions of the 
history).

I note that the Go conversion of reposurgeon is now just five test 
failures away from passing the whole reposurgeon testsuite (at which point 
it should be ready for an attempt on the GCC conversion).  Given the good 
progress being made there at present, I thus suggest we plan to compare 
this conversion with one from reposurgeon (paying special attention to the 
messiest parts of the repository, such as artifacts from cvs2svn 
attempting to locate branchpoints), unless those last five goreposurgeon 
test failures prove unexpectedly time-consuming to get resolved.

There are of course plenty of things to do relating to a git conversion 
that do not depend on the particular choice of a converted repository - 
such as writing git hooks and git versions of the maintainer-scripts 
scripts that currently work with SVN, or working out a specific choice of 
how to arrange annotated tags to allow "git describe" to give the sort of 
monotonic version number some contributors want.

A reasonable starting point for hooks would be that they closely 
approximate what the current SVN hooks do for commit mails to gcc-cvs and 
for Bugzilla updates, as what the current hooks do is clearly OK at 
present and we shouldn't need to entangle substantive changes to what the 
hooks do with the actual conversion to git; we can always discuss changes 
later.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-23 Thread Joseph Myers
On Tue, 21 May 2019, Segher Boessenkool wrote:

> > I think having author names and email addresses is a basic requirement of 
> > any reasonable repository conversion
> 
> Yes, and they should be the same as they were in the original repository.

That's what the "changelogs" feature in reposurgeon does, when the commit 
that made the change also added a ChangeLog entry.

In the case where the commit didn't add a ChangeLog entry, a name and 
email address from an author map is the best we can practically do (and I 
think it's much better than having something that never was a valid name 
and email address for author or committer at all).  In particular, that 
applies to changes from the gcc2 tree (where ChangeLog wasn't version 
controlled at all until 1998, and after that didn't generally have log 
messages that could be matched up with those of the corresponding commits 
to other files).  Many of the names and addresses in the author map for 
the gcc2 repository *are* taken directly from the ChangeLogs (for some 
commit for each committer) as that was the most practical way of 
identifying who all the committers from that period were.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-24 Thread Segher Boessenkool
On Thu, May 23, 2019 at 10:33:28PM +, Joseph Myers wrote:
> On Tue, 21 May 2019, Segher Boessenkool wrote:
> 
> > > I think having author names and email addresses is a basic requirement of 
> > > any reasonable repository conversion
> > 
> > Yes, and they should be the same as they were in the original repository.
> 
> That's what the "changelogs" feature in reposurgeon does, when the commit 
> that made the change also added a ChangeLog entry.
> 
> In the case where the commit didn't add a ChangeLog entry, a name and 
> email address from an author map is the best we can practically do (and I 

IMO the best we can do is use what we already have: what CVS or SVN used
as the committer identity.  *That* info is *correct* at least.

In many cases we can glance someone's real name from the changelog, sure.
People looking up things can trivially do that, and with much better
accuracy than any script can.  In some other cases you cannot, no matter
how hard you try.

> think it's much better than having something that never was a valid name 
> and email address for author or committer at all).

The fields in Git are just called "Author" and "Commit".  Not "real name"
or "email address" or anything like that.  It is just text.  Git does not
require anything specific about what you put here.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-24 Thread Florian Weimer
* Segher Boessenkool:

> On Thu, May 23, 2019 at 10:33:28PM +, Joseph Myers wrote:
>> On Tue, 21 May 2019, Segher Boessenkool wrote:
>> 
>> > > I think having author names and email addresses is a basic requirement 
>> > > of 
>> > > any reasonable repository conversion
>> > 
>> > Yes, and they should be the same as they were in the original repository.
>> 
>> That's what the "changelogs" feature in reposurgeon does, when the commit 
>> that made the change also added a ChangeLog entry.
>> 
>> In the case where the commit didn't add a ChangeLog entry, a name and 
>> email address from an author map is the best we can practically do (and I 
>
> IMO the best we can do is use what we already have: what CVS or SVN used
> as the committer identity.  *That* info is *correct* at least.
>
> In many cases we can glance someone's real name from the changelog, sure.
> People looking up things can trivially do that, and with much better
> accuracy than any script can.  In some other cases you cannot, no matter
> how hard you try.

Looking at the git fsck sources, I think you have to fake an email address:

if (*p == '<')
return report(options, obj, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, 
"invalid author/committer line - missing space before email");
p += strcspn(p, "<>\n");
if (*p == '>')
return report(options, obj, FSCK_MSG_BAD_NAME, "invalid 
author/committer line - bad name");
if (*p != '<')
return report(options, obj, FSCK_MSG_MISSING_EMAIL, "invalid 
author/committer line - missing email");
if (p[-1] != ' ')
return report(options, obj, 
FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, "invalid author/committer line - missing 
space before email");
p++;
p += strcspn(p, "<>\n");
if (*p != '>')
return report(options, obj, FSCK_MSG_BAD_EMAIL, "invalid 
author/committer line - bad email");
p++;

But something like “fw ” would probably be acceptable.

Thanks,
Florian


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-28 Thread Maxim Kuvyrkov
Hi Everyone,

What can I say, I was too optimistic about how easy it would be to convert 
GCC's svn repo to git one branch at a time.  After 2 more weeks and several 
re-writes of the scripts I now know more about GCC's svn history than I would 
ever wanted.

The prize for most complicated branch history goes to /branches/ibm/* .  It has 
merges, it has re-creation branches from /trunk and even an accidental deletion 
of all of IBM's branches.

The version of scripts I'm testing right now seems to deal with all of that.

Also, to avoid controversy -- I'm working on these scripts to satisfy my own 
curiosity, and to give GCC community another option to choose from for the 
final migration.  If by end of Summer 2019 we have 2-3 git repos to choose 
from, then we are likely to push GCC [kicking and screaming] into 2010's by the 
end of this decade.

--
Maxim Kuvyrkov
www.linaro.org



> On May 14, 2019, at 7:11 PM, Maxim Kuvyrkov  wrote:
> 
> This patch adds scripts to contrib/ to migrate full history of GCC's 
> subversion repository to git.  My hope is that these scripts will finally 
> allow GCC project to migrate to Git.
> 
> The result of the conversion is at 
> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with "@rev" 
> suffixes represent branch points.  The conversion is still running, so not 
> all branches may appear right away.
> 
> The scripts are not specific to GCC repo and are usable for other projects.  
> In particular, they should be able to convert downstream GCC svn repos.
> 
> The scripts convert svn history branch by branch.  They rely on git-svn on 
> convert individual branches.  Git-svn is a good tool for converting 
> individual branches.  It is, however, either very slow at converting the 
> entire GCC repo, or goes into infinite loop.
> 
> There are 3 scripts:
> 
> - svn-git-repo.sh: top level script to convert entire repo or a part of it 
> (e.g., branches/),
> - svn-list-branches.sh: helper script to output branches and their parents in 
> bottom-up order,
> - svn-git-branch.sh: helper script to convert a single branch.
> 
> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
> 
> What are your questions and comments?
> 
> The attached is cleaned up version, which hasn't been fully tested yet; typos 
> and other silly mistakes are likely.  OK to commit after testing?
> 
> --
> Maxim Kuvyrkov
> www.linaro.org
> 
> 
> <0001-Contrib-SVN-Git-conversion-scripts.patch>



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-28 Thread Joseph Myers
On Fri, 24 May 2019, Segher Boessenkool wrote:

> IMO the best we can do is use what we already have: what CVS or SVN used
> as the committer identity.  *That* info is *correct* at least.

CVS and SVN have a local identity.  git has a global identity.  I consider 
it simply *incorrect* to put a local identity in a git committer or author 
- I think this is just like any other data format conversion necessarily 
involved in a repository conversion.  There's an argument that (real name 
from /etc/passwd on the repository system at the time of commit) plus 
(username @ hostname of repository system at the time of commit) 
corresponds most accurately to the old local identities, but we don't have 
the contemporaneous /etc/passwd and I don't see doing something like that 
as an improvement on using whatever each person's preferred identity for 
git commits is (or some name and email address we've found, in the absence 
of an expressed preference from a given committer), if there wasn't a 
ChangeLog entry added in that commit.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-29 Thread Segher Boessenkool
On Wed, May 29, 2019 at 12:53:30AM +, Joseph Myers wrote:
> On Fri, 24 May 2019, Segher Boessenkool wrote:
> 
> > IMO the best we can do is use what we already have: what CVS or SVN used
> > as the committer identity.  *That* info is *correct* at least.
> 
> CVS and SVN have a local identity.  git has a global identity.  I consider 

Git has an identity (well, two) _per commit_, and there is no way you can
reconstruct people's prefered name and email address (at any point in time,
for every commit separately) correctly.  IMO it is much better to not even
try.  We already *have* enough info for anyone to trivially look up who wrote
what, and what might be that person's email address at the time.  But
pretending that is more than a guess is just wrong.

> it simply *incorrect* to put a local identity in a git committer or author 

On the contrary, the identity on the CVS or SVN server is 100% correct, and
it is the best we can do as far as I can see.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-30 Thread Joseph Myers
On Wed, 29 May 2019, Segher Boessenkool wrote:

> On Wed, May 29, 2019 at 12:53:30AM +, Joseph Myers wrote:
> > On Fri, 24 May 2019, Segher Boessenkool wrote:
> > 
> > > IMO the best we can do is use what we already have: what CVS or SVN used
> > > as the committer identity.  *That* info is *correct* at least.
> > 
> > CVS and SVN have a local identity.  git has a global identity.  I consider 
> 
> Git has an identity (well, two) _per commit_, and there is no way you can
> reconstruct people's prefered name and email address (at any point in time,
> for every commit separately) correctly.  IMO it is much better to not even
> try.  We already *have* enough info for anyone to trivially look up who wrote
> what, and what might be that person's email address at the time.  But
> pretending that is more than a guess is just wrong.

I think not doing a best-effort identification (name+email) is just as 
wrong as converting a CVS repository to a changeset-based system without 
doing a best-effort unification of commits to different files around the 
same time with the same log message into changesets.  Both are the same 
sort of heuristic conversion of data to the form idiomatic for a different 
version control system based around different concepts.  Neither is 
perfect, but the most useful conversion tries to combine CVS commits to 
different files into changesets, and the most useful conversion tries to 
identify authors in the way idiomatic for git using the information we 
have about what person (globally) a given username on a given system 
corresponds to.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-02 Thread Segher Boessenkool
On Fri, May 31, 2019 at 12:05:41AM +, Joseph Myers wrote:
> On Wed, 29 May 2019, Segher Boessenkool wrote:
> 
> > On Wed, May 29, 2019 at 12:53:30AM +, Joseph Myers wrote:
> > > On Fri, 24 May 2019, Segher Boessenkool wrote:
> > > 
> > > > IMO the best we can do is use what we already have: what CVS or SVN used
> > > > as the committer identity.  *That* info is *correct* at least.
> > > 
> > > CVS and SVN have a local identity.  git has a global identity.  I 
> > > consider 
> > 
> > Git has an identity (well, two) _per commit_, and there is no way you can
> > reconstruct people's prefered name and email address (at any point in time,
> > for every commit separately) correctly.  IMO it is much better to not even
> > try.  We already *have* enough info for anyone to trivially look up who 
> > wrote
> > what, and what might be that person's email address at the time.  But
> > pretending that is more than a guess is just wrong.
> 
> I think not doing a best-effort identification (name+email) is just as 

And I think guessing is not a "best effort", but just wrong.

> wrong as converting a CVS repository to a changeset-based system without 
> doing a best-effort unification of commits to different files around the 
> same time with the same log message into changesets.  Both are the same 

These are not similar situations at all.  Converting something to an SVN-
like data model is necessary for the resulting repo to work acceptably;
guessing person's names and email addresses is just nice-to-have in the
best case, and misleading in other cases.

> sort of heuristic conversion of data to the form idiomatic for a different 
> version control system based around different concepts.  Neither is 

It's single short line of text in SVN.  It is a single short line of text
in Git.  Both just show who wrote a patch, or who committed it.

Good luck finding out who was the primary author of every commit, btw.

> perfect, but the most useful conversion tries to combine CVS commits to 
> different files into changesets, and the most useful conversion tries to 
> identify authors in the way idiomatic for git using the information we 
> have about what person (globally) a given username on a given system 
> corresponds to.

We don't have that information.  This information can change over time,
and we never did track people's email addresses properly either.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-03 Thread Joseph Myers
On Sun, 2 Jun 2019, Segher Boessenkool wrote:

> > > Git has an identity (well, two) _per commit_, and there is no way you can
> > > reconstruct people's prefered name and email address (at any point in 
> > > time,
> > > for every commit separately) correctly.  IMO it is much better to not even
> > > try.  We already *have* enough info for anyone to trivially look up who 
> > > wrote
> > > what, and what might be that person's email address at the time.  But
> > > pretending that is more than a guess is just wrong.
> > 
> > I think not doing a best-effort identification (name+email) is just as 
> 
> And I think guessing is not a "best effort", but just wrong.

It's 100% accurate about the identity of the person who was the committer 
(modulo the one username from the gcc2 period where it was clear who the 
author of the commits by that username was, and so that went in the author 
map, but not clear that was the same as the committer, who did not commit 
patches for any other author).  So it's as accurate as any case where 
someone committing natively in git for someone else failed to use --author 
(and if the CVS/SVN commit included a ChangeLog entry, we have credit 
given from there via the "changelogs" feature).

I think failing to credit (by name and email address) the person implied 
by the commit metadata, in the absence of positive evidence (such as a 
ChangeLog entry) for the change being authored by someone else, is just 
wrong, in the same way it's wrong not to use --author when committing for 
someone else in git.

Where a person used different names over time, there's no generally 
applicable rule for whether they'd prefer the latest version or the 
version used at the time to be used in reference to past commits, and I 
think using the most current version known is most appropriate, in the 
absence of a ChangeLog entry added in the commit, unless they've specified 
a preference for some other rule for which commits get what name.  
Likewise for email addresses.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-03 Thread Segher Boessenkool
On Mon, Jun 03, 2019 at 10:33:17PM +, Joseph Myers wrote:
> Where a person used different names over time, there's no generally 
> applicable rule for whether they'd prefer the latest version or the 
> version used at the time to be used in reference to past commits, and I 
> think using the most current version known is most appropriate, in the 
> absence of a ChangeLog entry added in the commit, unless they've specified 
> a preference for some other rule for which commits get what name.  
> Likewise for email addresses.

A common case is people changed employer.  Using someone's current email
address is just wrong.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-05 Thread Jason Merrill

On 6/3/19 6:33 PM, Joseph Myers wrote:

On Sun, 2 Jun 2019, Segher Boessenkool wrote:


Git has an identity (well, two) _per commit_, and there is no way you can
reconstruct people's prefered name and email address (at any point in time,
for every commit separately) correctly.  IMO it is much better to not even
try.  We already *have* enough info for anyone to trivially look up who wrote
what, and what might be that person's email address at the time.  But
pretending that is more than a guess is just wrong.


I think not doing a best-effort identification (name+email) is just as


And I think guessing is not a "best effort", but just wrong.


It's 100% accurate about the identity of the person who was the committer
(modulo the one username from the gcc2 period where it was clear who the
author of the commits by that username was, and so that went in the author
map, but not clear that was the same as the committer, who did not commit
patches for any other author).  So it's as accurate as any case where
someone committing natively in git for someone else failed to use --author
(and if the CVS/SVN commit included a ChangeLog entry, we have credit
given from there via the "changelogs" feature).

I think failing to credit (by name and email address) the person implied
by the commit metadata, in the absence of positive evidence (such as a
ChangeLog entry) for the change being authored by someone else, is just
wrong, in the same way it's wrong not to use --author when committing for
someone else in git.


It's wrong, but it's not importantly wrong.  If we're doing a 
reposurgeon conversion, this adjustment makes sense.  If we're starting 
from the git-svn mirror, it doesn't justify breaking everyone's copies 
by rewriting branches.  And the bird in the hand looks more and more 
appealing as time goes by.



Where a person used different names over time, there's no generally
applicable rule for whether they'd prefer the latest version or the
version used at the time to be used in reference to past commits, and I
think using the most current version known is most appropriate, in the
absence of a ChangeLog entry added in the commit, unless they've specified
a preference for some other rule for which commits get what name.
Likewise for email addresses.


For email addresses, I think that using @gcc.gnu.org would be the best 
approach for people that have such accounts, rather than an employer 
address from an arbitrary point in time.


Jason


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-06 Thread Richard Earnshaw (lists)
On 05/06/2019 19:04, Jason Merrill wrote:
> On 6/3/19 6:33 PM, Joseph Myers wrote:
>> On Sun, 2 Jun 2019, Segher Boessenkool wrote:
>>
> Git has an identity (well, two) _per commit_, and there is no way
> you can
> reconstruct people's prefered name and email address (at any point
> in time,
> for every commit separately) correctly.  IMO it is much better to
> not even
> try.  We already *have* enough info for anyone to trivially look up
> who wrote
> what, and what might be that person's email address at the time.  But
> pretending that is more than a guess is just wrong.

 I think not doing a best-effort identification (name+email) is just as
>>>
>>> And I think guessing is not a "best effort", but just wrong.
>>
>> It's 100% accurate about the identity of the person who was the committer
>> (modulo the one username from the gcc2 period where it was clear who the
>> author of the commits by that username was, and so that went in the
>> author
>> map, but not clear that was the same as the committer, who did not commit
>> patches for any other author).  So it's as accurate as any case where
>> someone committing natively in git for someone else failed to use
>> --author
>> (and if the CVS/SVN commit included a ChangeLog entry, we have credit
>> given from there via the "changelogs" feature).
>>
>> I think failing to credit (by name and email address) the person implied
>> by the commit metadata, in the absence of positive evidence (such as a
>> ChangeLog entry) for the change being authored by someone else, is just
>> wrong, in the same way it's wrong not to use --author when committing for
>> someone else in git.
> 
> It's wrong, but it's not importantly wrong.  If we're doing a
> reposurgeon conversion, this adjustment makes sense.  If we're starting
> from the git-svn mirror, it doesn't justify breaking everyone's copies
> by rewriting branches.  And the bird in the hand looks more and more
> appealing as time goes by.
> 
>> Where a person used different names over time, there's no generally
>> applicable rule for whether they'd prefer the latest version or the
>> version used at the time to be used in reference to past commits, and I
>> think using the most current version known is most appropriate, in the
>> absence of a ChangeLog entry added in the commit, unless they've
>> specified
>> a preference for some other rule for which commits get what name.
>> Likewise for email addresses.
> 
> For email addresses, I think that using @gcc.gnu.org would be the best
> approach for people that have such accounts, rather than an employer
> address from an arbitrary point in time.

Or @gnu.org for accounts that pre-date the switch to EGCS and CVS.

> 
> Jason



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-06 Thread Joseph Myers
On Wed, 5 Jun 2019, Jason Merrill wrote:

> > I think failing to credit (by name and email address) the person implied
> > by the commit metadata, in the absence of positive evidence (such as a
> > ChangeLog entry) for the change being authored by someone else, is just
> > wrong, in the same way it's wrong not to use --author when committing for
> > someone else in git.
> 
> It's wrong, but it's not importantly wrong.

I think it's importantly wrong not to have a name and email address for 
the committer in the absence of using such information for the author.  
(Whereas if the name or email address refer to the right person but are 
anachronistic for that commit, that's what I'd consider not importantly 
wrong.)

> For email addresses, I think that using @gcc.gnu.org would be the best
> approach for people that have such accounts, rather than an employer address
> from an arbitrary point in time.

I'm fine with use of @gcc.gnu.org (used together with a name for the 
person in question that is or was valid, at or after the time of some 
commit they made) for committers who in fact do have or did have such an 
address (as opposed to inventing such addresses for committers from the 
gcc2 era who never had such addresses, or anyone who only committed in the 
egcs.cygnus.com era and who no longer had an account by the time of the 
move to gcc.gnu.org).

When the commit adds a ChangeLog entry and thus contains contemporaneous 
information about the preferred name and email address for the author at 
that time, I think using that information (via the reposurgeon 
"changelogs" feature) is preferable to a generic author map entry (thus, 
the author map entries should be considered a fallback for those commits 
that didn't add a ChangeLog entry (or added one with bad syntax for which 
parsing fails, etc.)).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-06 Thread Joseph Myers
On Thu, 6 Jun 2019, Richard Earnshaw (lists) wrote:

> > For email addresses, I think that using @gcc.gnu.org would be the best
> > approach for people that have such accounts, rather than an employer
> > address from an arbitrary point in time.
> 
> Or @gnu.org for accounts that pre-date the switch to EGCS and CVS.

When were such addresses introduced?  I'm not sure if all the gcc2 
committers would have had them, or only @.ai.mit.edu if 
that's where the repository was (certainly many early ChangeLog entries 
tend to use the .ai.mit.edu form, if not just 
).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-06 Thread Ian Lance Taylor
On Thu, Jun 6, 2019 at 4:41 PM Joseph Myers  wrote:
>
> On Thu, 6 Jun 2019, Richard Earnshaw (lists) wrote:
>
> > > For email addresses, I think that using @gcc.gnu.org would be the best
> > > approach for people that have such accounts, rather than an employer
> > > address from an arbitrary point in time.
> >
> > Or @gnu.org for accounts that pre-date the switch to EGCS and CVS.
>
> When were such addresses introduced?  I'm not sure if all the gcc2
> committers would have had them, or only @.ai.mit.edu if
> that's where the repository was (certainly many early ChangeLog entries
> tend to use the .ai.mit.edu form, if not just
> ).

I got a @gnu.org account around 1990 or 1991, and I was hardly the
first, so they were introduced some time before then.

Ian


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-06-07 Thread Richard Earnshaw (lists)
On 07/06/2019 00:50, Ian Lance Taylor wrote:
> On Thu, Jun 6, 2019 at 4:41 PM Joseph Myers  wrote:
>>
>> On Thu, 6 Jun 2019, Richard Earnshaw (lists) wrote:
>>
 For email addresses, I think that using @gcc.gnu.org would be the best
 approach for people that have such accounts, rather than an employer
 address from an arbitrary point in time.
>>>
>>> Or @gnu.org for accounts that pre-date the switch to EGCS and CVS.
>>
>> When were such addresses introduced?  I'm not sure if all the gcc2
>> committers would have had them, or only @.ai.mit.edu if
>> that's where the repository was (certainly many early ChangeLog entries
>> tend to use the .ai.mit.edu form, if not just
>> ).
> 
> I got a @gnu.org account around 1990 or 1991, and I was hardly the
> first, so they were introduced some time before then.
> 
> Ian
> 

Well, according to CVS, the only accounts to make commits to GCC before
the end of 1990 are

 mycroft
 rms
 roland

And if you go to the end of 1991, it only adds

 meissner
 kenner
 dennisg
 wood
 tiemann
 wilson
 jrv

so it wouldn't be a major job to special case those if really necessary.
 I would imagine that gnu.org inherited the majority of user names from
prep when the domain was split off, so the precise dates probably don't
matter to much.

R.


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-07-16 Thread Maxim Kuvyrkov
Hi Everyone,

I've been swamped with other projects for most of June, which gave me time to 
digest all the feedback I've got on GCC's conversion from SVN to Git.

The scripts have heavily evolved from the initial version posted here.  They 
have become fairly generic in that they have no implied knowledge about GCC's 
repo structure.  Due to this I no longer plan to merge them into GCC tree, but 
rather publish as a separate project on github.  For now, you can track the 
current [hairy] version at https://review.linaro.org/c/toolchain/gcc/+/31416 .

The initial version of scripts used heuristics to construct branch tree, which 
turned out to be error-prone.  The current version parse entire history of SVN 
repo to detect all trees that start at /trunk@1.  Therefore all branches in the 
converted repo converge to the same parent at the beginning of their histories.

As far as GCC conversion goes, below is what I plan to do and what not to do.  
This is based on comments from everyone in this thread:

1. Construct GCC's git repo from SVN using same settings as current git mirror.
2. Compare the resulting git repo with current GCC mirror -- they should match 
on the commit hash level for trunk, branches/gcc-*-branch, and other "normal" 
branches.
3. Investigate any differences between converted GCC repo and current GCC 
mirror.  These can be due to bugs in git-svn or other misconfigurations.
4. Import git-only branches from current GCC mirror.
5. Publish this "raw" repo for community to sanity-check its contents.
6. Re-write history of all branches -- converted from svn and git-only -- see 
note below [*].
7. Publish this "pretty" repo for community to sanity-check its contents.
8. Update both "raw" and "pretty" repos daily with new commits
9. Fix problems in the "raw" and "pretty" repos as they reported by the 
community.

Once these steps are done, the community could switch from SVN to git by 
disabling commits to SVN, waiting for final history to be absorbed by the 
"pretty" repo, and deploying the git repo as the official repo.

[*] Note on branch re-writing:
During svn->git conversion we have an opportunity to correct some of the 
artifacts of current git mirror:

a. Author and committer entries.  These are difficult to get right during 
git-svn import process because the tool gives only SVN committer ID without 
much else.  We could do much better by matching SVN committer ID with person's 
name in the map file, and then searching for person's current-at-the-time email 
address in the commit diff.  I.e., mkuvyrkov -> Maxim Kuvyrkov -> [changelog 
from 2010's commit] -> ma...@codesourcery.com .

b. Re-write tags/ branches into annotated tags.  Note that tags/* are included 
into history of several branches via merge or copy commits, so we would need to 
re-write history to have proper references to annotated tag commits in the 
histories of such branches.

c. Since we are re-writing history anyway, it would be nice to convert 
"svn-git: svn+ssh://" tags to "svn-git: https://";.  We are sure to retain 
publicly-visible svn repo accessible via https://, but not as likely to retain 
svn+ssh:// interface.

Which of these will make into the final repo is for community to decide.

Regards,

--
Maxim Kuvyrkov
www.linaro.org



> On May 28, 2019, at 1:31 PM, Maxim Kuvyrkov  wrote:
> 
> Hi Everyone,
> 
> What can I say, I was too optimistic about how easy it would be to convert 
> GCC's svn repo to git one branch at a time.  After 2 more weeks and several 
> re-writes of the scripts I now know more about GCC's svn history than I would 
> ever wanted.
> 
> The prize for most complicated branch history goes to /branches/ibm/* .  It 
> has merges, it has re-creation branches from /trunk and even an accidental 
> deletion of all of IBM's branches.
> 
> The version of scripts I'm testing right now seems to deal with all of that.
> 
> Also, to avoid controversy -- I'm working on these scripts to satisfy my own 
> curiosity, and to give GCC community another option to choose from for the 
> final migration.  If by end of Summer 2019 we have 2-3 git repos to choose 
> from, then we are likely to push GCC [kicking and screaming] into 2010's by 
> the end of this decade.
> 
> --
> Maxim Kuvyrkov
> www.linaro.org
> 
> 
> 
>> On May 14, 2019, at 7:11 PM, Maxim Kuvyrkov  
>> wrote:
>> 
>> This patch adds scripts to contrib/ to migrate full history of GCC's 
>> subversion repository to git.  My hope is that these scripts will finally 
>> allow GCC project to migrate to Git.
>> 
>> The result of the conversion is at 
>> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with "@rev" 
>> suffixes represent branch points.  The conversion is still running, so not 
>> all branches may appear right away.
>> 
>> The scripts are not specific to GCC repo and are usable for other projects.  
>> In particular, they should be able to convert downstream GCC svn repos.
>> 
>> The scripts convert svn history branch by branch.  

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-07-16 Thread Jason Merrill
On Tue, Jul 16, 2019 at 12:18 PM Maxim Kuvyrkov
 wrote:
>
> Hi Everyone,
>
> I've been swamped with other projects for most of June, which gave me time to 
> digest all the feedback I've got on GCC's conversion from SVN to Git.
>
> The scripts have heavily evolved from the initial version posted here.  They 
> have become fairly generic in that they have no implied knowledge about GCC's 
> repo structure.  Due to this I no longer plan to merge them into GCC tree, 
> but rather publish as a separate project on github.  For now, you can track 
> the current [hairy] version at 
> https://review.linaro.org/c/toolchain/gcc/+/31416 .
>
> The initial version of scripts used heuristics to construct branch tree, 
> which turned out to be error-prone.  The current version parse entire history 
> of SVN repo to detect all trees that start at /trunk@1.  Therefore all 
> branches in the converted repo converge to the same parent at the beginning 
> of their histories.
>
> As far as GCC conversion goes, below is what I plan to do and what not to do. 
>  This is based on comments from everyone in this thread:
>
> 1. Construct GCC's git repo from SVN using same settings as current git 
> mirror.
> 2. Compare the resulting git repo with current GCC mirror -- they should 
> match on the commit hash level for trunk, branches/gcc-*-branch, and other 
> "normal" branches.
> 3. Investigate any differences between converted GCC repo and current GCC 
> mirror.  These can be due to bugs in git-svn or other misconfigurations.
> 4. Import git-only branches from current GCC mirror.
> 5. Publish this "raw" repo for community to sanity-check its contents.

Why not start from the current mirror?  Perhaps a mirror of the mirror?

> 6. Re-write history of all branches -- converted from svn and git-only -- see 
> note below [*].
> 7. Publish this "pretty" repo for community to sanity-check its contents.
> 8. Update both "raw" and "pretty" repos daily with new commits
> 9. Fix problems in the "raw" and "pretty" repos as they reported by the 
> community.
>
> Once these steps are done, the community could switch from SVN to git by 
> disabling commits to SVN, waiting for final history to be absorbed by the 
> "pretty" repo, and deploying the git repo as the official repo.
>
> [*] Note on branch re-writing:
> During svn->git conversion we have an opportunity to correct some of the 
> artifacts of current git mirror:
>
> a. Author and committer entries.  These are difficult to get right during 
> git-svn import process because the tool gives only SVN committer ID without 
> much else.  We could do much better by matching SVN committer ID with 
> person's name in the map file, and then searching for person's 
> current-at-the-time email address in the commit diff.  I.e., mkuvyrkov -> 
> Maxim Kuvyrkov -> [changelog from 2010's commit] -> ma...@codesourcery.com .

> c. Since we are re-writing history anyway, it would be nice to convert 
> "svn-git: svn+ssh://" tags to "svn-git: https://";.  We are sure to retain 
> publicly-visible svn repo accessible via https://, but not as likely to 
> retain svn+ssh:// interface.

I am moderately opposed to rewriting trunk and release branch history;
if we're using git-svn anyway, the benefit would have to be large to
outweigh the significant inconvenience to all current users of needing
to switch their local trees over to a new history.

> b. Re-write tags/ branches into annotated tags.  Note that tags/* are 
> included into history of several branches via merge or copy commits, so we 
> would need to re-write history to have proper references to annotated tag 
> commits in the histories of such branches.

Missing tags is definitely something to fix about the current mirror.
I don't think we need to worry about inserting them into branch
history.

We should definitely also rewrite vendor/subdirectory branches into
multiple branches.

Jason

> Which of these will make into the final repo is for community to decide.
>
> Regards,
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>
>
> > On May 28, 2019, at 1:31 PM, Maxim Kuvyrkov  
> > wrote:
> >
> > Hi Everyone,
> >
> > What can I say, I was too optimistic about how easy it would be to convert 
> > GCC's svn repo to git one branch at a time.  After 2 more weeks and several 
> > re-writes of the scripts I now know more about GCC's svn history than I 
> > would ever wanted.
> >
> > The prize for most complicated branch history goes to /branches/ibm/* .  It 
> > has merges, it has re-creation branches from /trunk and even an accidental 
> > deletion of all of IBM's branches.
> >
> > The version of scripts I'm testing right now seems to deal with all of that.
> >
> > Also, to avoid controversy -- I'm working on these scripts to satisfy my 
> > own curiosity, and to give GCC community another option to choose from for 
> > the final migration.  If by end of Summer 2019 we have 2-3 git repos to 
> > choose from, then we are likely to push GCC [kicking and screaming] into 

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-07-16 Thread Maxim Kuvyrkov
> On Jul 16, 2019, at 3:34 PM, Jason Merrill  wrote:
> 
> On Tue, Jul 16, 2019 at 12:18 PM Maxim Kuvyrkov
>  wrote:
>> 
>> Hi Everyone,
>> 
>> I've been swamped with other projects for most of June, which gave me time 
>> to digest all the feedback I've got on GCC's conversion from SVN to Git.
>> 
>> The scripts have heavily evolved from the initial version posted here.  They 
>> have become fairly generic in that they have no implied knowledge about 
>> GCC's repo structure.  Due to this I no longer plan to merge them into GCC 
>> tree, but rather publish as a separate project on github.  For now, you can 
>> track the current [hairy] version at 
>> https://review.linaro.org/c/toolchain/gcc/+/31416 .
>> 
>> The initial version of scripts used heuristics to construct branch tree, 
>> which turned out to be error-prone.  The current version parse entire 
>> history of SVN repo to detect all trees that start at /trunk@1.  Therefore 
>> all branches in the converted repo converge to the same parent at the 
>> beginning of their histories.
>> 
>> As far as GCC conversion goes, below is what I plan to do and what not to 
>> do.  This is based on comments from everyone in this thread:
>> 
>> 1. Construct GCC's git repo from SVN using same settings as current git 
>> mirror.
>> 2. Compare the resulting git repo with current GCC mirror -- they should 
>> match on the commit hash level for trunk, branches/gcc-*-branch, and other 
>> "normal" branches.
>> 3. Investigate any differences between converted GCC repo and current GCC 
>> mirror.  These can be due to bugs in git-svn or other misconfigurations.
>> 4. Import git-only branches from current GCC mirror.
>> 5. Publish this "raw" repo for community to sanity-check its contents.
> 
> Why not start from the current mirror?  Perhaps a mirror of the mirror?

To check that git-svn is self-consistent and generates same commits now as it 
was several years ago when you setup the current mirror.  

> 
>> 6. Re-write history of all branches -- converted from svn and git-only -- 
>> see note below [*].
>> 7. Publish this "pretty" repo for community to sanity-check its contents.
>> 8. Update both "raw" and "pretty" repos daily with new commits
>> 9. Fix problems in the "raw" and "pretty" repos as they reported by the 
>> community.
>> 
>> Once these steps are done, the community could switch from SVN to git by 
>> disabling commits to SVN, waiting for final history to be absorbed by the 
>> "pretty" repo, and deploying the git repo as the official repo.
>> 
>> [*] Note on branch re-writing:
>> During svn->git conversion we have an opportunity to correct some of the 
>> artifacts of current git mirror:
>> 
>> a. Author and committer entries.  These are difficult to get right during 
>> git-svn import process because the tool gives only SVN committer ID without 
>> much else.  We could do much better by matching SVN committer ID with 
>> person's name in the map file, and then searching for person's 
>> current-at-the-time email address in the commit diff.  I.e., mkuvyrkov -> 
>> Maxim Kuvyrkov -> [changelog from 2010's commit] -> ma...@codesourcery.com .
> 
>> c. Since we are re-writing history anyway, it would be nice to convert 
>> "svn-git: svn+ssh://" tags to "svn-git: https://";.  We are sure to retain 
>> publicly-visible svn repo accessible via https://, but not as likely to 
>> retain svn+ssh:// interface.
> 
> I am moderately opposed to rewriting trunk and release branch history;
> if we're using git-svn anyway, the benefit would have to be large to
> outweigh the significant inconvenience to all current users of needing
> to switch their local trees over to a new history.

I mostly agree with your point.  My thinking is that the git mirror was never 
official canonical GCC repo, and if we ever want to get better author/committer 
identities -- this is our chance.

> 
>> b. Re-write tags/ branches into annotated tags.  Note that tags/* are 
>> included into history of several branches via merge or copy commits, so we 
>> would need to re-write history to have proper references to annotated tag 
>> commits in the histories of such branches.
> 
> Missing tags is definitely something to fix about the current mirror.
> I don't think we need to worry about inserting them into branch
> history.

If we don't do this then "git branch -a --contains some/tag" will not work 
correctly.

> 
> We should definitely also rewrite vendor/subdirectory branches into
> multiple branches.

Vendor and subdirectory branches are properly handled by the scripts.  I wonder 
whether re-writing them using tree-filters would produce same result as git-svn 
conversions I'm doing.

--
Maxim Kuvyrkov
www.linaro.org


> 
>> Which of these will make into the final repo is for community to decide.
>> 
>> Regards,
>> 
>> --
>> Maxim Kuvyrkov
>> www.linaro.org
>> 
>> 
>> 
>>> On May 28, 2019, at 1:31 PM, Maxim Kuvyrkov  
>>> wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> What can I say, I was too optimistic abo

Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-07-19 Thread Maxim Kuvyrkov
> On Jul 16, 2019, at 5:14 PM, Maxim Kuvyrkov  wrote:
> 
>> On Jul 16, 2019, at 3:34 PM, Jason Merrill  wrote:
>> 
...
>> 
>>> b. Re-write tags/ branches into annotated tags.  Note that tags/* are 
>>> included into history of several branches via merge or copy commits, so we 
>>> would need to re-write history to have proper references to annotated tag 
>>> commits in the histories of such branches.
>> 
>> Missing tags is definitely something to fix about the current mirror.
>> I don't think we need to worry about inserting them into branch
>> history.
> 
> If we don't do this then "git branch -a --contains some/tag" will not work 
> correctly.

I was wrong here.  Git tag objects (annotated tags) cannot appear in branch 
history because they are resolved to the commits they are pointing to.  Only 
commit objects can appear in branch history.

This makes conversion of tags much simpler, since [annotated] tags cannot 
affect history branches.

Regards,

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-07-22 Thread Maxim Kuvyrkov
> On Jul 16, 2019, at 5:14 PM, Maxim Kuvyrkov  wrote:
> 
>> On Jul 16, 2019, at 3:34 PM, Jason Merrill  wrote:
>> 
>> On Tue, Jul 16, 2019 at 12:18 PM Maxim Kuvyrkov
>>  wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> I've been swamped with other projects for most of June, which gave me time 
>>> to digest all the feedback I've got on GCC's conversion from SVN to Git.
>>> 
>>> The scripts have heavily evolved from the initial version posted here.  
>>> They have become fairly generic in that they have no implied knowledge 
>>> about GCC's repo structure.  Due to this I no longer plan to merge them 
>>> into GCC tree, but rather publish as a separate project on github.  For 
>>> now, you can track the current [hairy] version at 
>>> https://review.linaro.org/c/toolchain/gcc/+/31416 .
>>> 
>>> The initial version of scripts used heuristics to construct branch tree, 
>>> which turned out to be error-prone.  The current version parse entire 
>>> history of SVN repo to detect all trees that start at /trunk@1.  Therefore 
>>> all branches in the converted repo converge to the same parent at the 
>>> beginning of their histories.
>>> 
>>> As far as GCC conversion goes, below is what I plan to do and what not to 
>>> do.  This is based on comments from everyone in this thread:
>>> 
>>> 1. Construct GCC's git repo from SVN using same settings as current git 
>>> mirror.
>>> 2. Compare the resulting git repo with current GCC mirror -- they should 
>>> match on the commit hash level for trunk, branches/gcc-*-branch, and other 
>>> "normal" branches.
>>> 3. Investigate any differences between converted GCC repo and current GCC 
>>> mirror.  These can be due to bugs in git-svn or other misconfigurations.
>>> 4. Import git-only branches from current GCC mirror.
>>> 5. Publish this "raw" repo for community to sanity-check its contents.
>> 
>> Why not start from the current mirror?  Perhaps a mirror of the mirror?
> 
> To check that git-svn is self-consistent and generates same commits now as it 
> was several years ago when you setup the current mirror.  

Unfortunately, current mirror does not and could not account for rewrites of 
SVN commit log messages.  For trunk the histories of diverge in 2008 due to 
commit message change of r138154.  This is not a single occurrence; I've 
compared histories only of trunk and gcc-6-branch, and both had commit message 
change (for gcc-6-branch see r259978).

It's up to the community is to weigh pros and cons of re-using existing GCC 
mirror as conversion base vs regenerating history from scratch:

Pros of using GCC mirror:
+ No need to rebase public git-only branches
+ No need to rebase private branches
+ No need to rebase current clones, checkouts, work-in-progress trees

Cons of using GCC mirror:
- Poor author / committer IDs (this breaks patch statistics software)
- Several commit messages will not be the current "fixed" version

Thoughts?

--
Maxim Kuvyrkov
www.linaro.org




Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-01 Thread Jason Merrill
On Mon, Jul 22, 2019 at 5:05 AM Maxim Kuvyrkov
 wrote:
>
> > On Jul 16, 2019, at 5:14 PM, Maxim Kuvyrkov  
> > wrote:
> >
> >> On Jul 16, 2019, at 3:34 PM, Jason Merrill  wrote:
> >>
> >> On Tue, Jul 16, 2019 at 12:18 PM Maxim Kuvyrkov
> >>  wrote:
> >>>
> >>> Hi Everyone,
> >>>
> >>> I've been swamped with other projects for most of June, which gave me 
> >>> time to digest all the feedback I've got on GCC's conversion from SVN to 
> >>> Git.
> >>>
> >>> The scripts have heavily evolved from the initial version posted here.  
> >>> They have become fairly generic in that they have no implied knowledge 
> >>> about GCC's repo structure.  Due to this I no longer plan to merge them 
> >>> into GCC tree, but rather publish as a separate project on github.  For 
> >>> now, you can track the current [hairy] version at 
> >>> https://review.linaro.org/c/toolchain/gcc/+/31416 .
> >>>
> >>> The initial version of scripts used heuristics to construct branch tree, 
> >>> which turned out to be error-prone.  The current version parse entire 
> >>> history of SVN repo to detect all trees that start at /trunk@1.  
> >>> Therefore all branches in the converted repo converge to the same parent 
> >>> at the beginning of their histories.
> >>>
> >>> As far as GCC conversion goes, below is what I plan to do and what not to 
> >>> do.  This is based on comments from everyone in this thread:
> >>>
> >>> 1. Construct GCC's git repo from SVN using same settings as current git 
> >>> mirror.
> >>> 2. Compare the resulting git repo with current GCC mirror -- they should 
> >>> match on the commit hash level for trunk, branches/gcc-*-branch, and 
> >>> other "normal" branches.
> >>> 3. Investigate any differences between converted GCC repo and current GCC 
> >>> mirror.  These can be due to bugs in git-svn or other misconfigurations.
> >>> 4. Import git-only branches from current GCC mirror.
> >>> 5. Publish this "raw" repo for community to sanity-check its contents.
> >>
> >> Why not start from the current mirror?  Perhaps a mirror of the mirror?
> >
> > To check that git-svn is self-consistent and generates same commits now as 
> > it was several years ago when you setup the current mirror.
>
> Unfortunately, current mirror does not and could not account for rewrites of 
> SVN commit log messages.  For trunk the histories of diverge in 2008 due to 
> commit message change of r138154.  This is not a single occurrence; I've 
> compared histories only of trunk and gcc-6-branch, and both had commit 
> message change (for gcc-6-branch see r259978).
>
> It's up to the community is to weigh pros and cons of re-using existing GCC 
> mirror as conversion base vs regenerating history from scratch:
>
> Pros of using GCC mirror:
> + No need to rebase public git-only branches
> + No need to rebase private branches
> + No need to rebase current clones, checkouts, work-in-progress trees
>
> Cons of using GCC mirror:
> - Poor author / committer IDs (this breaks patch statistics software)
> - Several commit messages will not be the current "fixed" version
>
> Thoughts?

I'm still inclined to stick with the mirror.  I would expect patch
statistics software to be able to be taught about multiple addresses
for the same person.

Jason


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Maxim Kuvyrkov
> On Jul 22, 2019, at 12:05 PM, Maxim Kuvyrkov  
> wrote:
> 
...
 As far as GCC conversion goes, below is what I plan to do and what not to 
 do.  This is based on comments from everyone in this thread:
 
 1. Construct GCC's git repo from SVN using same settings as current git 
 mirror.
 2. Compare the resulting git repo with current GCC mirror -- they should 
 match on the commit hash level for trunk, branches/gcc-*-branch, and other 
 "normal" branches.
 3. Investigate any differences between converted GCC repo and current GCC 
 mirror. These can be due to bugs in git-svn or other misconfigurations.
 4. Import git-only branches from current GCC mirror.
 5. Publish this "raw" repo for community to sanity-check its contents.
>>> 
>>> Why not start from the current mirror?  Perhaps a mirror of the mirror?
>> 
>> To check that git-svn is self-consistent and generates same commits now as 
>> it was several years ago when you setup the current mirror.  
> 
> Unfortunately, current mirror does not and could not account for rewrites of 
> SVN commit log messages.  For trunk the histories of diverge in 2008 due to 
> commit message change of r138154.  This is not a single occurrence; I've 
> compared histories only of trunk and gcc-6-branch, and both had commit 
> message change (for gcc-6-branch see r259978).
> 
> It's up to the community is to weigh pros and cons of re-using existing GCC 
> mirror as conversion base vs regenerating history from scratch:
> 
> Pros of using GCC mirror:
> + No need to rebase public git-only branches
> + No need to rebase private branches
> + No need to rebase current clones, checkouts, work-in-progress trees
> 
> Cons of using GCC mirror:
> - Poor author / committer IDs (this breaks patch statistics software)
> - Several commit messages will not be the current "fixed" version
> 
> Thoughts?

Ping?

Meanwhile, status update:

1. GitHub blocked https://github.com/maxim-kuvyrkov/gcc/ due to excessive 
resource usage.  I've asked them to unblock and explained why pushes are as big 
as they are.

2. I'm uploading to https://git.linaro.org/people/maxim.kuvyrkov/gcc.git/ for 
now.

3. Conversion is now feature-complete.  The scripts re-write author and 
committer fields, as well as create proper git tags.

4. "Raw" version of repository is available under refs/raw/*.

5. refs/raw/* are not expected to change.

6. refs/heads/* and refs/tags/* might change due to author/committer name fixes 
and improvements.

Please scrutinize the repo and let me know of any artifacts.

--
Maxim Kuvyrkov
www.linaro.org





Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Maxim Kuvyrkov
> On Aug 1, 2019, at 11:43 PM, Jason Merrill  wrote:
> 
> On Mon, Jul 22, 2019 at 5:05 AM Maxim Kuvyrkov
>  wrote:
>> 
>>> On Jul 16, 2019, at 5:14 PM, Maxim Kuvyrkov  
>>> wrote:
>>> 
 On Jul 16, 2019, at 3:34 PM, Jason Merrill  wrote:
 
 On Tue, Jul 16, 2019 at 12:18 PM Maxim Kuvyrkov
  wrote:
> 
> Hi Everyone,
> 
> I've been swamped with other projects for most of June, which gave me 
> time to digest all the feedback I've got on GCC's conversion from SVN to 
> Git.
> 
> The scripts have heavily evolved from the initial version posted here.  
> They have become fairly generic in that they have no implied knowledge 
> about GCC's repo structure.  Due to this I no longer plan to merge them 
> into GCC tree, but rather publish as a separate project on github.  For 
> now, you can track the current [hairy] version at 
> https://review.linaro.org/c/toolchain/gcc/+/31416 .
> 
> The initial version of scripts used heuristics to construct branch tree, 
> which turned out to be error-prone.  The current version parse entire 
> history of SVN repo to detect all trees that start at /trunk@1.  
> Therefore all branches in the converted repo converge to the same parent 
> at the beginning of their histories.
> 
> As far as GCC conversion goes, below is what I plan to do and what not to 
> do.  This is based on comments from everyone in this thread:
> 
> 1. Construct GCC's git repo from SVN using same settings as current git 
> mirror.
> 2. Compare the resulting git repo with current GCC mirror -- they should 
> match on the commit hash level for trunk, branches/gcc-*-branch, and 
> other "normal" branches.
> 3. Investigate any differences between converted GCC repo and current GCC 
> mirror. These can be due to bugs in git-svn or other misconfigurations.
> 4. Import git-only branches from current GCC mirror.
> 5. Publish this "raw" repo for community to sanity-check its contents.
 
 Why not start from the current mirror?  Perhaps a mirror of the mirror?
>>> 
>>> To check that git-svn is self-consistent and generates same commits now as 
>>> it was several years ago when you setup the current mirror.
>> 
>> Unfortunately, current mirror does not and could not account for rewrites of 
>> SVN commit log messages.  For trunk the histories of diverge in 2008 due to 
>> commit message change of r138154.  This is not a single occurrence; I've 
>> compared histories only of trunk and gcc-6-branch, and both had commit 
>> message change (for gcc-6-branch see r259978).
>> 
>> It's up to the community is to weigh pros and cons of re-using existing GCC 
>> mirror as conversion base vs regenerating history from scratch:
>> 
>> Pros of using GCC mirror:
>> + No need to rebase public git-only branches
>> + No need to rebase private branches
>> + No need to rebase current clones, checkouts, work-in-progress trees
>> 
>> Cons of using GCC mirror:
>> - Poor author / committer IDs (this breaks patch statistics software)
>> - Several commit messages will not be the current "fixed" version
>> 
>> Thoughts?
> 
> I'm still inclined to stick with the mirror.  I would expect patch
> statistics software to be able to be taught about multiple addresses
> for the same person.

Patch tracking software breaks on emails like 
 , where 
38bc75d-0d04-0410-961f-82ee72b054a4 is not a reasonable domain name.

For completeness, I'll generate and upload a repo based on current mirror with 
all branches and tags converted.

In the end, I don't care much to which version of the repo we switch, as long 
as we switch.

--
Maxim Kuvyrkov
www.linaro.org






Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Richard Biener
On Fri, Aug 2, 2019 at 10:41 AM Maxim Kuvyrkov
 wrote:
>
> > On Aug 1, 2019, at 11:43 PM, Jason Merrill  wrote:
> >
> > On Mon, Jul 22, 2019 at 5:05 AM Maxim Kuvyrkov
> >  wrote:
> >>
> >>> On Jul 16, 2019, at 5:14 PM, Maxim Kuvyrkov  
> >>> wrote:
> >>>
>  On Jul 16, 2019, at 3:34 PM, Jason Merrill  wrote:
> 
>  On Tue, Jul 16, 2019 at 12:18 PM Maxim Kuvyrkov
>   wrote:
> >
> > Hi Everyone,
> >
> > I've been swamped with other projects for most of June, which gave me 
> > time to digest all the feedback I've got on GCC's conversion from SVN 
> > to Git.
> >
> > The scripts have heavily evolved from the initial version posted here.  
> > They have become fairly generic in that they have no implied knowledge 
> > about GCC's repo structure.  Due to this I no longer plan to merge them 
> > into GCC tree, but rather publish as a separate project on github.  For 
> > now, you can track the current [hairy] version at 
> > https://review.linaro.org/c/toolchain/gcc/+/31416 .
> >
> > The initial version of scripts used heuristics to construct branch 
> > tree, which turned out to be error-prone.  The current version parse 
> > entire history of SVN repo to detect all trees that start at /trunk@1.  
> > Therefore all branches in the converted repo converge to the same 
> > parent at the beginning of their histories.
> >
> > As far as GCC conversion goes, below is what I plan to do and what not 
> > to do.  This is based on comments from everyone in this thread:
> >
> > 1. Construct GCC's git repo from SVN using same settings as current git 
> > mirror.
> > 2. Compare the resulting git repo with current GCC mirror -- they 
> > should match on the commit hash level for trunk, branches/gcc-*-branch, 
> > and other "normal" branches.
> > 3. Investigate any differences between converted GCC repo and current 
> > GCC mirror. These can be due to bugs in git-svn or other 
> > misconfigurations.
> > 4. Import git-only branches from current GCC mirror.
> > 5. Publish this "raw" repo for community to sanity-check its contents.
> 
>  Why not start from the current mirror?  Perhaps a mirror of the mirror?
> >>>
> >>> To check that git-svn is self-consistent and generates same commits now 
> >>> as it was several years ago when you setup the current mirror.
> >>
> >> Unfortunately, current mirror does not and could not account for rewrites 
> >> of SVN commit log messages.  For trunk the histories of diverge in 2008 
> >> due to commit message change of r138154.  This is not a single occurrence; 
> >> I've compared histories only of trunk and gcc-6-branch, and both had 
> >> commit message change (for gcc-6-branch see r259978).
> >>
> >> It's up to the community is to weigh pros and cons of re-using existing 
> >> GCC mirror as conversion base vs regenerating history from scratch:
> >>
> >> Pros of using GCC mirror:
> >> + No need to rebase public git-only branches
> >> + No need to rebase private branches
> >> + No need to rebase current clones, checkouts, work-in-progress trees
> >>
> >> Cons of using GCC mirror:
> >> - Poor author / committer IDs (this breaks patch statistics software)
> >> - Several commit messages will not be the current "fixed" version
> >>
> >> Thoughts?
> >
> > I'm still inclined to stick with the mirror.  I would expect patch
> > statistics software to be able to be taught about multiple addresses
> > for the same person.
>
> Patch tracking software breaks on emails like 
>  , where 
> 38bc75d-0d04-0410-961f-82ee72b054a4 is not a reasonable domain name.
>
> For completeness, I'll generate and upload a repo based on current mirror 
> with all branches and tags converted.
>
> In the end, I don't care much to which version of the repo we switch, as long 
> as we switch.

I think that if we have something clearly better than the current
mirror we should use that.  rebasing might
be a hassle but git should make that reasonably easy - proper
instructions how to do that are of course
welcome (just add a new remote, adjust all local branches remote and
then rebase?).

Richard.

> --
> Maxim Kuvyrkov
> www.linaro.org
>
>
>
>


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Martin Liška
On 8/2/19 10:41 AM, Maxim Kuvyrkov wrote:
> In the end, I don't care much to which version of the repo we switch, as long 
> as we switch.

Hi Maxim.

I really appreciate that you've been working on that. Same as you I would like 
to see
any change that will lead to a git repository.

I have couple of questions about the upcoming Cauldron:

- Are you planning to attend?
- Would it be possible to prepare a voting during e.g. Steering Committee where
  we'll vote about transition options?
- Would it make sense to do an online questionnaire in advance in order
  to guess what's prevailing opinion?

If you are interested, I can help you?

Martin


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Maxim Kuvyrkov
> On Aug 2, 2019, at 1:26 PM, Martin Liška  wrote:
> 
> On 8/2/19 10:41 AM, Maxim Kuvyrkov wrote:
>> In the end, I don't care much to which version of the repo we switch, as 
>> long as we switch.
> 
> Hi Maxim.
> 
> I really appreciate that you've been working on that. Same as you I would 
> like to see
> any change that will lead to a git repository.
> 
> I have couple of questions about the upcoming Cauldron:
> 
> - Are you planning to attend?

Unfortunately, I won't attend this time.

> - Would it be possible to prepare a voting during e.g. Steering Committee 
> where
>  we'll vote about transition options?
> - Would it make sense to do an online questionnaire in advance in order
>  to guess what's prevailing opinion?
> 
> If you are interested, I can help you?

Let's organize an online survey now.  While most active GCC developers will 
attend Cauldron, many others will not, so we shouldn't rely on Cauldron to make 
any final decisions.

Martin, would you please organize the survey?

Regards,

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Martin Liška
On 8/2/19 12:54 PM, Maxim Kuvyrkov wrote:
>> On Aug 2, 2019, at 1:26 PM, Martin Liška  wrote:
>>
>> On 8/2/19 10:41 AM, Maxim Kuvyrkov wrote:
>>> In the end, I don't care much to which version of the repo we switch, as 
>>> long as we switch.
>>
>> Hi Maxim.
>>
>> I really appreciate that you've been working on that. Same as you I would 
>> like to see
>> any change that will lead to a git repository.
>>
>> I have couple of questions about the upcoming Cauldron:
>>
>> - Are you planning to attend?
> 
> Unfortunately, I won't attend this time.

I see.

> 
>> - Would it be possible to prepare a voting during e.g. Steering Committee 
>> where
>>  we'll vote about transition options?
>> - Would it make sense to do an online questionnaire in advance in order
>>  to guess what's prevailing opinion?
>>
>> If you are interested, I can help you?
> 
> Let's organize an online survey now.  While most active GCC developers will 
> attend Cauldron, many others will not, so we shouldn't rely on Cauldron to 
> make any final decisions.

Sure, online is the best option as all active community members can vote.

> 
> Martin, would you please organize the survey?

Yes, but I haven't followed the discussion in recent weeks. Is the only question
whether we want the current GIT mirror or your rebased git repository?
Is Eric Raymond's transition still in play or not?
Are there any other sub-question regarding commit message format, git hooks, 
etc.
that will deserve a place in the questionnaire?

Thank,
Martin

> 
> Regards,
> 
> --
> Maxim Kuvyrkov
> www.linaro.org
> 



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Richard Biener
On Fri, Aug 2, 2019 at 1:01 PM Martin Liška  wrote:
>
> On 8/2/19 12:54 PM, Maxim Kuvyrkov wrote:
> >> On Aug 2, 2019, at 1:26 PM, Martin Liška  wrote:
> >>
> >> On 8/2/19 10:41 AM, Maxim Kuvyrkov wrote:
> >>> In the end, I don't care much to which version of the repo we switch, as 
> >>> long as we switch.
> >>
> >> Hi Maxim.
> >>
> >> I really appreciate that you've been working on that. Same as you I would 
> >> like to see
> >> any change that will lead to a git repository.
> >>
> >> I have couple of questions about the upcoming Cauldron:
> >>
> >> - Are you planning to attend?
> >
> > Unfortunately, I won't attend this time.
>
> I see.
>
> >
> >> - Would it be possible to prepare a voting during e.g. Steering Committee 
> >> where
> >>  we'll vote about transition options?
> >> - Would it make sense to do an online questionnaire in advance in order
> >>  to guess what's prevailing opinion?
> >>
> >> If you are interested, I can help you?
> >
> > Let's organize an online survey now.  While most active GCC developers will 
> > attend Cauldron, many others will not, so we shouldn't rely on Cauldron to 
> > make any final decisions.
>
> Sure, online is the best option as all active community members can vote.
>
> >
> > Martin, would you please organize the survey?
>
> Yes, but I haven't followed the discussion in recent weeks. Is the only 
> question
> whether we want the current GIT mirror or your rebased git repository?
> Is Eric Raymond's transition still in play or not?

1) Stay with SVN
2) Switch to the existing GIT mirror
3) Wait for ERS to complete his conversion to GIT
4) Use the existing new conversion to GIT fixing authors and commit messages
5) I don't care
6) I don't care as long as we switch to GIT

> Are there any other sub-question regarding commit message format, git hooks, 
> etc.
> that will deserve a place in the questionnaire?

No, please do not make it unnecessarily complicated.  Maybe the questionaire
can include a free-form text field for more comments.

Btw, I do not believe we should do this kind of voting.  Who's eligible to vote?
Is the vote anonymous?  What happens when the majority (what is the majority?)
votes for option N?

IMHO voting is bike-shedding.

Those who do the work decide.  _They_ may ask questions _and_ decide whether
to listen to the answer.

Richard.

> Thank,
> Martin
>
> >
> > Regards,
> >
> > --
> > Maxim Kuvyrkov
> > www.linaro.org
> >
>


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Martin Liška
On 8/2/19 1:06 PM, Richard Biener wrote:
> On Fri, Aug 2, 2019 at 1:01 PM Martin Liška  wrote:
>>
>> On 8/2/19 12:54 PM, Maxim Kuvyrkov wrote:
 On Aug 2, 2019, at 1:26 PM, Martin Liška  wrote:

 On 8/2/19 10:41 AM, Maxim Kuvyrkov wrote:
> In the end, I don't care much to which version of the repo we switch, as 
> long as we switch.

 Hi Maxim.

 I really appreciate that you've been working on that. Same as you I would 
 like to see
 any change that will lead to a git repository.

 I have couple of questions about the upcoming Cauldron:

 - Are you planning to attend?
>>>
>>> Unfortunately, I won't attend this time.
>>
>> I see.
>>
>>>
 - Would it be possible to prepare a voting during e.g. Steering Committee 
 where
  we'll vote about transition options?
 - Would it make sense to do an online questionnaire in advance in order
  to guess what's prevailing opinion?

 If you are interested, I can help you?
>>>
>>> Let's organize an online survey now.  While most active GCC developers will 
>>> attend Cauldron, many others will not, so we shouldn't rely on Cauldron to 
>>> make any final decisions.
>>
>> Sure, online is the best option as all active community members can vote.
>>
>>>
>>> Martin, would you please organize the survey?
>>
>> Yes, but I haven't followed the discussion in recent weeks. Is the only 
>> question
>> whether we want the current GIT mirror or your rebased git repository?
>> Is Eric Raymond's transition still in play or not?
> 
> 1) Stay with SVN
> 2) Switch to the existing GIT mirror
> 3) Wait for ERS to complete his conversion to GIT
> 4) Use the existing new conversion to GIT fixing authors and commit messages
> 5) I don't care
> 6) I don't care as long as we switch to GIT
> 
>> Are there any other sub-question regarding commit message format, git hooks, 
>> etc.
>> that will deserve a place in the questionnaire?
> 
> No, please do not make it unnecessarily complicated.  Maybe the questionaire
> can include a free-form text field for more comments.
> 
> Btw, I do not believe we should do this kind of voting.  Who's eligible to 
> vote?
> Is the vote anonymous?  What happens when the majority (what is the majority?)
> votes for option N?
> 
> IMHO voting is bike-shedding.
> 
> Those who do the work decide.  _They_ may ask questions _and_ decide whether
> to listen to the answer.
> 
> Richard.
> 
>> Thank,
>> Martin
>>
>>>
>>> Regards,
>>>
>>> --
>>> Maxim Kuvyrkov
>>> www.linaro.org
>>>
>>

So Richi is suggesting to finish all necessary for transition before we'll vote.
That should include bugzilla reporting script and maybe other git hooks?
Do we have a checklist of these? Jason?

Thanks,
Martin


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Maxim Kuvyrkov
> On Aug 2, 2019, at 11:35 AM, Maxim Kuvyrkov  wrote:
> 
>> On Jul 22, 2019, at 12:05 PM, Maxim Kuvyrkov  
>> wrote:
>> 
> ...
> As far as GCC conversion goes, below is what I plan to do and what not to 
> do.  This is based on comments from everyone in this thread:
> 
> 1. Construct GCC's git repo from SVN using same settings as current git 
> mirror.
> 2. Compare the resulting git repo with current GCC mirror -- they should 
> match on the commit hash level for trunk, branches/gcc-*-branch, and 
> other "normal" branches.
> 3. Investigate any differences between converted GCC repo and current GCC 
> mirror. These can be due to bugs in git-svn or other misconfigurations.
> 4. Import git-only branches from current GCC mirror.
> 5. Publish this "raw" repo for community to sanity-check its contents.
 
 Why not start from the current mirror?  Perhaps a mirror of the mirror?
>>> 
>>> To check that git-svn is self-consistent and generates same commits now as 
>>> it was several years ago when you setup the current mirror.  
>> 
>> Unfortunately, current mirror does not and could not account for rewrites of 
>> SVN commit log messages.  For trunk the histories of diverge in 2008 due to 
>> commit message change of r138154.  This is not a single occurrence; I've 
>> compared histories only of trunk and gcc-6-branch, and both had commit 
>> message change (for gcc-6-branch see r259978).
>> 
>> It's up to the community is to weigh pros and cons of re-using existing GCC 
>> mirror as conversion base vs regenerating history from scratch:
>> 
>> Pros of using GCC mirror:
>> + No need to rebase public git-only branches
>> + No need to rebase private branches
>> + No need to rebase current clones, checkouts, work-in-progress trees
>> 
>> Cons of using GCC mirror:
>> - Poor author / committer IDs (this breaks patch statistics software)
>> - Several commit messages will not be the current "fixed" version
>> 
>> Thoughts?
> 
> Ping?
> 
> Meanwhile, status update:
> 
> 1. GitHub blocked https://github.com/maxim-kuvyrkov/gcc/ due to excessive 
> resource usage. I've asked them to unblock and explained why pushes are as 
> big as they are.
> 
> 2. I'm uploading to https://git.linaro.org/people/maxim.kuvyrkov/gcc.git/ for 
> now.

The correct link is https://git.linaro.org/people/maxim-kuvyrkov/gcc.git/ .  
Thanks to Segher for pointing this out.

> 
> 3. Conversion is now feature-complete.  The scripts re-write author and 
> committer fields, as well as create proper git tags.
> 
> 4. "Raw" version of repository is available under refs/raw/*.
> 
> 5. refs/raw/* are not expected to change.
> 
> 6. refs/heads/* and refs/tags/* might change due to author/committer name 
> fixes and improvements.
> 
> Please scrutinize the repo and let me know of any artifacts.


--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Segher Boessenkool
On Fri, Aug 02, 2019 at 01:06:12PM +0200, Richard Biener wrote:
> 1) Stay with SVN
> 2) Switch to the existing GIT mirror
> 3) Wait for ERS to complete his conversion to GIT
> 4) Use the existing new conversion to GIT fixing authors and commit messages
> 5) I don't care
> 6) I don't care as long as we switch to GIT

7) I don't care as long as we do either 2) or 4).

> IMHO voting is bike-shedding.

Yes, that is the definition of voting, pretty much.

> Those who do the work decide.  _They_ may ask questions _and_ decide whether
> to listen to the answer.

But unfortunately we have been deadlocked for years now.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Maxim Kuvyrkov
> On Aug 2, 2019, at 2:06 PM, Richard Biener  wrote:
> 
> On Fri, Aug 2, 2019 at 1:01 PM Martin Liška  wrote:
>> 
>> On 8/2/19 12:54 PM, Maxim Kuvyrkov wrote:
 On Aug 2, 2019, at 1:26 PM, Martin Liška  wrote:
 
 On 8/2/19 10:41 AM, Maxim Kuvyrkov wrote:
> In the end, I don't care much to which version of the repo we switch, as 
> long as we switch.
 
 Hi Maxim.
 
 I really appreciate that you've been working on that. Same as you I would 
 like to see
 any change that will lead to a git repository.
 
 I have couple of questions about the upcoming Cauldron:
 
 - Are you planning to attend?
>>> 
>>> Unfortunately, I won't attend this time.
>> 
>> I see.
>> 
>>> 
 - Would it be possible to prepare a voting during e.g. Steering Committee 
 where
 we'll vote about transition options?
 - Would it make sense to do an online questionnaire in advance in order
 to guess what's prevailing opinion?
 
 If you are interested, I can help you?
>>> 
>>> Let's organize an online survey now.  While most active GCC developers will 
>>> attend Cauldron, many others will not, so we shouldn't rely on Cauldron to 
>>> make any final decisions.
>> 
>> Sure, online is the best option as all active community members can vote.
>> 
>>> 
>>> Martin, would you please organize the survey?
>> 
>> Yes, but I haven't followed the discussion in recent weeks. Is the only 
>> question
>> whether we want the current GIT mirror or your rebased git repository?
>> Is Eric Raymond's transition still in play or not?
> 
> 1) Stay with SVN
> 2) Switch to the existing GIT mirror
> 3) Wait for ERS to complete his conversion to GIT
> 4) Use the existing new conversion to GIT fixing authors and commit messages
> 5) I don't care
> 6) I don't care as long as we switch to GIT
> 
>> Are there any other sub-question regarding commit message format, git hooks, 
>> etc.
>> that will deserve a place in the questionnaire?
> 
> No, please do not make it unnecessarily complicated.  Maybe the questionaire
> can include a free-form text field for more comments.
> 
> Btw, I do not believe we should do this kind of voting.  Who's eligible to 
> vote?
> Is the vote anonymous?  What happens when the majority (what is the majority?)
> votes for option N?
> 
> IMHO voting is bike-shedding.
> 
> Those who do the work decide.  _They_ may ask questions _and_ decide whether
> to listen to the answer.

I think we should do a /survey/, not organize a vote.  From reading this thread 
an independent observer will get an impression that GCC developers are split 
into 3 roughly equal parts:
1. those who prefer switch to existing mirror,
2. those who prefer to wait for reposurgeon conversion,
3. those who prefer to switch to svn-git conversion with authors fixed.

The survey might show that we have a clear majority for one of the options, and 
that we can conclude the discussion.  If survey shows that we don't have a 
clear winner, then let's continue the discussion.

IMO, anyone who considers himself or herself a GCC developer should participate 
and survey should not be anonymous to avoid abuse.

--
Maxim Kuvyrkov
www.linaro.org







Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Segher Boessenkool
On Fri, Aug 02, 2019 at 05:14:20PM +0300, Maxim Kuvyrkov wrote:
> > On Aug 2, 2019, at 11:35 AM, Maxim Kuvyrkov  
> > wrote:
> >> On Jul 22, 2019, at 12:05 PM, Maxim Kuvyrkov  
> >> wrote:
> > 3. Conversion is now feature-complete.  The scripts re-write author and 
> > committer fields, as well as create proper git tags.
> > 
> > 4. "Raw" version of repository is available under refs/raw/*.
> > 
> > 5. refs/raw/* are not expected to change.
> > 
> > 6. refs/heads/* and refs/tags/* might change due to author/committer name 
> > fixes and improvements.
> > 
> > Please scrutinize the repo and let me know of any artifacts.

When cloning it says
  warning: remote HEAD refers to nonexistent ref, unable to checkout.

$ git checkout origin/trunk -b master
seems to fix the local checkout, but of course

$ git remote show origin
still says
  HEAD branch: (unknown)

Other than that, it looks fine.  (I checked all my own entries, and all
got a reasonable email address).


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-08-02 Thread Jason Merrill
On Fri, Aug 2, 2019 at 7:35 AM Martin Liška  wrote:
>
> On 8/2/19 1:06 PM, Richard Biener wrote:
> > On Fri, Aug 2, 2019 at 1:01 PM Martin Liška  wrote:
> >>
> >> On 8/2/19 12:54 PM, Maxim Kuvyrkov wrote:
>  On Aug 2, 2019, at 1:26 PM, Martin Liška  wrote:
> 
>  On 8/2/19 10:41 AM, Maxim Kuvyrkov wrote:
> > In the end, I don't care much to which version of the repo we switch, 
> > as long as we switch.
> 
>  Hi Maxim.
> 
>  I really appreciate that you've been working on that. Same as you I 
>  would like to see
>  any change that will lead to a git repository.
> 
>  I have couple of questions about the upcoming Cauldron:
> 
>  - Are you planning to attend?
> >>>
> >>> Unfortunately, I won't attend this time.
> >>
> >> I see.
> >>
> >>>
>  - Would it be possible to prepare a voting during e.g. Steering 
>  Committee where
>   we'll vote about transition options?
>  - Would it make sense to do an online questionnaire in advance in order
>   to guess what's prevailing opinion?
> 
>  If you are interested, I can help you?
> >>>
> >>> Let's organize an online survey now.  While most active GCC developers 
> >>> will attend Cauldron, many others will not, so we shouldn't rely on 
> >>> Cauldron to make any final decisions.
> >>
> >> Sure, online is the best option as all active community members can vote.
> >>
> >>>
> >>> Martin, would you please organize the survey?
> >>
> >> Yes, but I haven't followed the discussion in recent weeks. Is the only 
> >> question
> >> whether we want the current GIT mirror or your rebased git repository?
> >> Is Eric Raymond's transition still in play or not?
> >
> > 1) Stay with SVN
> > 2) Switch to the existing GIT mirror
> > 3) Wait for ERS to complete his conversion to GIT
> > 4) Use the existing new conversion to GIT fixing authors and commit messages
> > 5) I don't care
> > 6) I don't care as long as we switch to GIT
> >
> >> Are there any other sub-question regarding commit message format, git 
> >> hooks, etc.
> >> that will deserve a place in the questionnaire?
> >
> > No, please do not make it unnecessarily complicated.  Maybe the questionaire
> > can include a free-form text field for more comments.
> >
> > Btw, I do not believe we should do this kind of voting.  Who's eligible to 
> > vote?
> > Is the vote anonymous?  What happens when the majority (what is the 
> > majority?)
> > votes for option N?
> >
> > IMHO voting is bike-shedding.
> >
> > Those who do the work decide.  _They_ may ask questions _and_ decide whether
> > to listen to the answer.
> >
> > Richard.
> >
> >> Thank,
> >> Martin
> >>
> >>>
> >>> Regards,
> >>>
> >>> --
> >>> Maxim Kuvyrkov
> >>> www.linaro.org
> >>>
> >>
>
> So Richi is suggesting to finish all necessary for transition before we'll 
> vote.
> That should include bugzilla reporting script and maybe other git hooks?
> Do we have a checklist of these? Jason?

As far as I can see, the SVN hooks only send email to the *cvs and
gcc-bugzilla lists, that shouldn't be hard to mimic.

I think we also want to decide on policies for creating branches/tags,
deleting refs, or pushing non-fast-forward updates.  In the current
mirror you can delete branches in your own subdirectory, but not other
branches.

Jason


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-14 Thread Segher Boessenkool
On Tue, May 14, 2019 at 07:11:18PM +0300, Maxim Kuvyrkov wrote:
> This patch adds scripts to contrib/ to migrate full history of GCC's
> subversion repository to git.  My hope is that these scripts will
> finally allow GCC project to migrate to Git.

Thank you for doing this.

> The result of the conversion is at
> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with
> "@rev" suffixes represent branch points.  The conversion is still
> running, so not all branches may appear right away.

What exactly is a branch point here?  Why is it useful to have tags
at branch points?  Why did you make branches instead of tags?


Only very lightly tested so far, but it looks promising.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-15 Thread Maxim Kuvyrkov
> On May 15, 2019, at 12:20 AM, Segher Boessenkool  
> wrote:
> 
> On Tue, May 14, 2019 at 07:11:18PM +0300, Maxim Kuvyrkov wrote:
>> This patch adds scripts to contrib/ to migrate full history of GCC's
>> subversion repository to git.  My hope is that these scripts will
>> finally allow GCC project to migrate to Git.
> 
> Thank you for doing this.
> 
>> The result of the conversion is at
>> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with
>> "@rev" suffixes represent branch points.  The conversion is still
>> running, so not all branches may appear right away.
> 
> What exactly is a branch point here?

Branch point corresponds to parent branch's revision at fork.

>  Why is it useful to have tags
> at branch points?

This is to speedup git-svn, which creates uses such entries internally.  We 
need them for conversion's internals; I deleted them from github copy to avoid 
clutter.

>  Why did you make branches instead of tags?

For simplicity purposes, it's internals after all.

> 
> Only very lightly tested so far, but it looks promising.
> 
> 
> Segher

I've fixed several cleanup bugs.  Updated patch attached.

--
Maxim Kuvyrkov
www.linaro.org




0001-Contrib-SVN-Git-conversion-scripts.patch
Description: Binary data


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-15 Thread Richard Biener
On Tue, May 14, 2019 at 6:11 PM Maxim Kuvyrkov
 wrote:
>
> This patch adds scripts to contrib/ to migrate full history of GCC's 
> subversion repository to git.  My hope is that these scripts will finally 
> allow GCC project to migrate to Git.
>
> The result of the conversion is at 
> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with "@rev" 
> suffixes represent branch points.  The conversion is still running, so not 
> all branches may appear right away.
>
> The scripts are not specific to GCC repo and are usable for other projects.  
> In particular, they should be able to convert downstream GCC svn repos.
>
> The scripts convert svn history branch by branch.  They rely on git-svn on 
> convert individual branches.  Git-svn is a good tool for converting 
> individual branches.  It is, however, either very slow at converting the 
> entire GCC repo, or goes into infinite loop.
>
> There are 3 scripts:
>
> - svn-git-repo.sh: top level script to convert entire repo or a part of it 
> (e.g., branches/),
> - svn-list-branches.sh: helper script to output branches and their parents in 
> bottom-up order,
> - svn-git-branch.sh: helper script to convert a single branch.
>
> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
>
> What are your questions and comments?

Any comments on how it deals with "errors" like removing trunk which
happened a few times?
(not sure what other "errors" Eric refers to reposurgeon "deals" with...)

I suppose it converts only history of not deleted branches?

For the official converted repo do we really want all (old)
development branches to be in the
main git repo?  I suppose we could create a readonly git from the
state of the whole repository
at the point of conversion (and also keep the SVN in readonly mode),
just to make migration
of content we want easy in the future?

> The attached is cleaned up version, which hasn't been fully tested yet; typos 
> and other silly mistakes are likely.  OK to commit after testing?

Thanks for taking up this ball!

Richard.

> --
> Maxim Kuvyrkov
> www.linaro.org
>
>


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-15 Thread Maxim Kuvyrkov
> On May 15, 2019, at 2:19 PM, Richard Biener  
> wrote:
> 
> On Tue, May 14, 2019 at 6:11 PM Maxim Kuvyrkov
>  wrote:
>> 
>> This patch adds scripts to contrib/ to migrate full history of GCC's 
>> subversion repository to git.  My hope is that these scripts will finally 
>> allow GCC project to migrate to Git.
>> 
>> The result of the conversion is at 
>> https://github.com/maxim-kuvyrkov/gcc/branches/all . Branches with "@rev" 
>> suffixes represent branch points.  The conversion is still running, so not 
>> all branches may appear right away.
>> 
>> The scripts are not specific to GCC repo and are usable for other projects.  
>> In particular, they should be able to convert downstream GCC svn repos.
>> 
>> The scripts convert svn history branch by branch.  They rely on git-svn on 
>> convert individual branches.  Git-svn is a good tool for converting 
>> individual branches.  It is, however, either very slow at converting the 
>> entire GCC repo, or goes into infinite loop.
>> 
>> There are 3 scripts:
>> 
>> - svn-git-repo.sh: top level script to convert entire repo or a part of it 
>> (e.g., branches/),
>> - svn-list-branches.sh: helper script to output branches and their parents 
>> in bottom-up order,
>> - svn-git-branch.sh: helper script to convert a single branch.
>> 
>> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
>> 
>> What are your questions and comments?
> 
> Any comments on how it deals with "errors" like removing trunk which
> happened a few times?
> (not sure what other "errors" Eric refers to reposurgeon "deals" with...)

Stock git-svn can deal with deleted parents; e.g., for the first deletion of 
trunk, git-svn treats trunk@180802 as a /generic/ parent path for trunk, and 
happily follows its history.

> 
> I suppose it converts only history of not deleted branches?

The scripts can convert history of deleted and moved branches.  E.g., 
branches/gcc-3_2-rhl8-branch was moved (which is copy and delete for svn) to 
branches/redhat/gcc-3_2-rhl8-branch around revision 95470, so one would need to 
point the scripts to branches/gcc-3_2-rhl8-branch@95470 to convert its history. 
 Something like:

./svn-git-repo.sh --repo $HOME/gcc-branches --svnpath 
branches/gcc-3_2-rhl8-branch@95470

> 
> For the official converted repo do we really want all (old)
> development branches to be in the
> main git repo?  I suppose we could create a readonly git from the
> state of the whole repository
> at the point of conversion (and also keep the SVN in readonly mode),
> just to make migration
> of content we want easy in the future?

Having a single full repo is simpler than having the main repo and the full one 
with all the history.  So, unless full repo is twice the size of the main one, 
let's keep all the branches.

We can also give a shout to representatives of RedHat, Google, and others to 
voluntarily remove their old maintenance branches from the repo, and, possibly, 
stash them somewhere on github.

> 
>> The attached is cleaned up version, which hasn't been fully tested yet; 
>> typos and other silly mistakes are likely.  OK to commit after testing?
> 
> Thanks for taking up this ball!

--
Maxim Kuvyrkov
www.linaro.org






Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-15 Thread Eric Gallager
On 5/15/19, Maxim Kuvyrkov  wrote:
>> On May 15, 2019, at 2:19 PM, Richard Biener 
>> wrote:
>>
>> On Tue, May 14, 2019 at 6:11 PM Maxim Kuvyrkov
>>  wrote:
>>>
>>> This patch adds scripts to contrib/ to migrate full history of GCC's
>>> subversion repository to git.  My hope is that these scripts will finally
>>> allow GCC project to migrate to Git.
>>>
>>> The result of the conversion is at
>>> https://github.com/maxim-kuvyrkov/gcc/branches/all . Branches with "@rev"
>>> suffixes represent branch points.  The conversion is still running, so
>>> not all branches may appear right away.
>>>
>>> The scripts are not specific to GCC repo and are usable for other
>>> projects.  In particular, they should be able to convert downstream GCC
>>> svn repos.
>>>
>>> The scripts convert svn history branch by branch.  They rely on git-svn
>>> on convert individual branches.  Git-svn is a good tool for converting
>>> individual branches.  It is, however, either very slow at converting the
>>> entire GCC repo, or goes into infinite loop.
>>>
>>> There are 3 scripts:
>>>
>>> - svn-git-repo.sh: top level script to convert entire repo or a part of
>>> it (e.g., branches/),
>>> - svn-list-branches.sh: helper script to output branches and their
>>> parents in bottom-up order,
>>> - svn-git-branch.sh: helper script to convert a single branch.
>>>
>>> Whenever possible, svn-git-branch.sh uses existing git branches as
>>> caches.
>>>
>>> What are your questions and comments?
>>
>> Any comments on how it deals with "errors" like removing trunk which
>> happened a few times?
>> (not sure what other "errors" Eric refers to reposurgeon "deals" with...)
>
> Stock git-svn can deal with deleted parents; e.g., for the first deletion of
> trunk, git-svn treats trunk@180802 as a /generic/ parent path for trunk, and
> happily follows its history.
>
>>
>> I suppose it converts only history of not deleted branches?
>
> The scripts can convert history of deleted and moved branches.  E.g.,
> branches/gcc-3_2-rhl8-branch was moved (which is copy and delete for svn) to
> branches/redhat/gcc-3_2-rhl8-branch around revision 95470, so one would need
> to point the scripts to branches/gcc-3_2-rhl8-branch@95470 to convert its
> history.  Something like:
>
> ./svn-git-repo.sh --repo $HOME/gcc-branches --svnpath
> branches/gcc-3_2-rhl8-branch@95470
>
>>
>> For the official converted repo do we really want all (old)
>> development branches to be in the
>> main git repo?  I suppose we could create a readonly git from the
>> state of the whole repository
>> at the point of conversion (and also keep the SVN in readonly mode),
>> just to make migration
>> of content we want easy in the future?
>
> Having a single full repo is simpler than having the main repo and the full
> one with all the history.  So, unless full repo is twice the size of the
> main one, let's keep all the branches.
>
> We can also give a shout to representatives of RedHat, Google, and others to
> voluntarily remove their old maintenance branches from the repo, and,
> possibly, stash them somewhere on github.
>
>>
>>> The attached is cleaned up version, which hasn't been fully tested yet;
>>> typos and other silly mistakes are likely.  OK to commit after testing?
>>
>> Thanks for taking up this ball!
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>

Wasn't Eric S. Raymond working on his own conversion of the GCC repo
from SVN to Git? Whatever happened to his?


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-15 Thread Segher Boessenkool
On Wed, May 15, 2019 at 11:34:34AM +0300, Maxim Kuvyrkov wrote:
> > On May 15, 2019, at 12:20 AM, Segher Boessenkool 
> >  wrote:
> > On Tue, May 14, 2019 at 07:11:18PM +0300, Maxim Kuvyrkov wrote:
> >> This patch adds scripts to contrib/ to migrate full history of GCC's
> >> subversion repository to git.  My hope is that these scripts will
> >> finally allow GCC project to migrate to Git.
> > 
> > Thank you for doing this.
> > 
> >> The result of the conversion is at
> >> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with
> >> "@rev" suffixes represent branch points.  The conversion is still
> >> running, so not all branches may appear right away.
> > 
> > What exactly is a branch point here?
> 
> Branch point corresponds to parent branch's revision at fork.
> 
> >  Why is it useful to have tags
> > at branch points?
> 
> This is to speedup git-svn, which creates uses such entries internally.  We 
> need them for conversion's internals; I deleted them from github copy to 
> avoid clutter.

Ah!  Great.  Looks better now :-)

Has it finished conversion yet?  I don't see all branches.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-15 Thread Paul Koning



> On May 15, 2019, at 2:42 PM, Eric Gallager  wrote:
> 
>> ...
> 
> Wasn't Eric S. Raymond working on his own conversion of the GCC repo
> from SVN to Git? Whatever happened to his?

Yes, and from what I recall he found that doing it fully correctly is an 
extremely hard task.  It might be a good idea to ask him to comment.

paul



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Maxim Kuvyrkov
> On May 15, 2019, at 9:47 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, May 15, 2019 at 11:34:34AM +0300, Maxim Kuvyrkov wrote:
>>> On May 15, 2019, at 12:20 AM, Segher Boessenkool 
>>>  wrote:
>>> On Tue, May 14, 2019 at 07:11:18PM +0300, Maxim Kuvyrkov wrote:
 This patch adds scripts to contrib/ to migrate full history of GCC's
 subversion repository to git.  My hope is that these scripts will
 finally allow GCC project to migrate to Git.
>>> 
>>> Thank you for doing this.
>>> 
 The result of the conversion is at
 https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with
 "@rev" suffixes represent branch points.  The conversion is still
 running, so not all branches may appear right away.
>>> 
>>> What exactly is a branch point here?
>> 
>> Branch point corresponds to parent branch's revision at fork.
>> 
>>> Why is it useful to have tags
>>> at branch points?
>> 
>> This is to speedup git-svn, which creates uses such entries internally.  We 
>> need them for conversion's internals; I deleted them from github copy to 
>> avoid clutter.
> 
> Ah!  Great.  Looks better now :-)
> 
> Has it finished conversion yet?  I don't see all branches.

Still running.  I had to restart it a few times to fix bugs in the corner cases 
and to speed it up.  Luckily, the scripts seem to be able to pick up where they 
left off, so I restarts are relatively cheap.

For those interested in fixes and changes between scripts versions, I'm 
uploading updated patches to 
https://review.linaro.org/#/c/toolchain/gcc/+/31416/ .

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Maxim Kuvyrkov
> On May 16, 2019, at 3:33 AM, Paul Koning  wrote:
> 
> 
> 
>> On May 15, 2019, at 2:42 PM, Eric Gallager  wrote:
>> 
>>> ...
>> 
>> Wasn't Eric S. Raymond working on his own conversion of the GCC repo
>> from SVN to Git? Whatever happened to his?
> 
> Yes, and from what I recall he found that doing it fully correctly is an 
> extremely hard task.  It might be a good idea to ask him to comment.

That's a good suggestion; thanks, Paul.

Hi Eric,

The svn->git conversion scripts I'm testing work on individual svn branches, 
and I would appreciate a list of svn branches in GCC's repo that caused 
problems.  It would be best to double-check conversion of these branches for 
any artifacts.

Regards,

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Jeff Law
On 5/15/19 5:19 AM, Richard Biener wrote:
> 
> For the official converted repo do we really want all (old)
> development branches to be in the
> main git repo?  I suppose we could create a readonly git from the
> state of the whole repository
> at the point of conversion (and also keep the SVN in readonly mode),
> just to make migration
> of content we want easy in the future?
I've always assumed we'd keep the old SVN tree read-only for historical
purposes.  I strongly suspect that, ignoring release branches, that
non-active branches just aren't terribly interesting.


Jeff


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Maxim Kuvyrkov
> On May 16, 2019, at 7:22 PM, Jeff Law  wrote:
> 
> On 5/15/19 5:19 AM, Richard Biener wrote:
>> 
>> For the official converted repo do we really want all (old)
>> development branches to be in the
>> main git repo?  I suppose we could create a readonly git from the
>> state of the whole repository
>> at the point of conversion (and also keep the SVN in readonly mode),
>> just to make migration
>> of content we want easy in the future?
> I've always assumed we'd keep the old SVN tree read-only for historical
> purposes.  I strongly suspect that, ignoring release branches, that
> non-active branches just aren't terribly interesting.

Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
getting community consensus to switch to git) and (2) deciding on which 
branches to keep in the new repo.

With git, we can always split away unneeded history by removing unnecessary 
branches and tags and re-packing the repo.  We can equally easily bring that 
history back if we change our minds.

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Ramana Radhakrishnan
On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
 wrote:
>
> > On May 16, 2019, at 7:22 PM, Jeff Law  wrote:
> >
> > On 5/15/19 5:19 AM, Richard Biener wrote:
> >>
> >> For the official converted repo do we really want all (old)
> >> development branches to be in the
> >> main git repo?  I suppose we could create a readonly git from the
> >> state of the whole repository
> >> at the point of conversion (and also keep the SVN in readonly mode),
> >> just to make migration
> >> of content we want easy in the future?
> > I've always assumed we'd keep the old SVN tree read-only for historical
> > purposes.  I strongly suspect that, ignoring release branches, that
> > non-active branches just aren't terribly interesting.
>
> Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
> getting community consensus to switch to git) and (2) deciding on which 
> branches to keep in the new repo.
>

I'm hoping that there is still community consensus to switch to git.

Personally speaking, a +1 to switch to git.

regards
Ramana

> With git, we can always split away unneeded history by removing unnecessary 
> branches and tags and re-packing the repo.  We can equally easily bring that 
> history back if we change our minds.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Jeff Law
On 5/16/19 12:36 PM, Ramana Radhakrishnan wrote:
> On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
>  wrote:
>>
>>> On May 16, 2019, at 7:22 PM, Jeff Law  wrote:
>>>
>>> On 5/15/19 5:19 AM, Richard Biener wrote:

 For the official converted repo do we really want all (old)
 development branches to be in the
 main git repo?  I suppose we could create a readonly git from the
 state of the whole repository
 at the point of conversion (and also keep the SVN in readonly mode),
 just to make migration
 of content we want easy in the future?
>>> I've always assumed we'd keep the old SVN tree read-only for historical
>>> purposes.  I strongly suspect that, ignoring release branches, that
>>> non-active branches just aren't terribly interesting.
>>
>> Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
>> getting community consensus to switch to git) and (2) deciding on which 
>> branches to keep in the new repo.
>>
> 
> I'm hoping that there is still community consensus to switch to git.
> 
> Personally speaking, a +1 to switch to git.
Absolutely +1 for converting as well.

jeff


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Jonathan Wakely

On 16/05/19 13:07 -0600, Jeff Law wrote:

On 5/16/19 12:36 PM, Ramana Radhakrishnan wrote:

On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
 wrote:



On May 16, 2019, at 7:22 PM, Jeff Law  wrote:

On 5/15/19 5:19 AM, Richard Biener wrote:


For the official converted repo do we really want all (old)
development branches to be in the
main git repo?  I suppose we could create a readonly git from the
state of the whole repository
at the point of conversion (and also keep the SVN in readonly mode),
just to make migration
of content we want easy in the future?

I've always assumed we'd keep the old SVN tree read-only for historical
purposes.  I strongly suspect that, ignoring release branches, that
non-active branches just aren't terribly interesting.


Let's avoid mixing the two discussions: (1) converting svn repo to git (and 
getting community consensus to switch to git) and (2) deciding on which 
branches to keep in the new repo.



I'm hoping that there is still community consensus to switch to git.

Personally speaking, a +1 to switch to git.

Absolutely +1 for converting as well.


Yes please!

Thanks for working on this, Maxim.




Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Joseph Myers
On Tue, 14 May 2019, Maxim Kuvyrkov wrote:

> The scripts convert svn history branch by branch.  They rely on git-svn 
> on convert individual branches.  Git-svn is a good tool for converting 
> individual branches.  It is, however, either very slow at converting the 
> entire GCC repo, or goes into infinite loop.

I think git-svn is in fact a bad tool for repository conversion when the 
history is nontrivial (for the reasons that have been discussed at length 
in the past), and we should convert with reposurgeon.

ESR, can you give an update on the status of the conversion with 
reposurgeon?  You said "another serious attack on the repository 
conversion is probably about two months out" in 
.  Is it on target to be 
done by the time of the GNU Tools Cauldron in Montreal in September?

And, could you bring git://thyrsus.com/repositories/gcc-conversion.git up 
to date with changes since Jan 2018, or push the latest version of that 
repository to some other public hosting location?  That repository 
represents what I consider the collaboratively built consensus on such 
things as the desired author map (including handling of the ambiguous 
author name), which directories represent branches and tags, and what tags 
should be kept or removed - but building up such a consensus and keeping 
it up to date over time (for new committers etc.) requires that the public 
repository actually reflects the latest version of the conversion 
machinery, day by day as the consensus develops.  Review of that 
repository will be important for reviewing the details of whether the 
conversion is being done as desired - the details of the machinery will 
help suggest things to spot-check in a converted repository.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-16 Thread Joseph Myers
On Thu, 16 May 2019, Maxim Kuvyrkov wrote:

> Let's avoid mixing the two discussions: (1) converting svn repo to git 
> (and getting community consensus to switch to git) and (2) deciding on 
> which branches to keep in the new repo.
> 
> With git, we can always split away unneeded history by removing 
> unnecessary branches and tags and re-packing the repo.  We can equally 
> easily bring that history back if we change our minds.

A prerequisite of a move to git is to have policies on branch deletion / 
force-pushes, and hook implementations that ensure those policies are 
followed (as well as implementing what's agreed on commit messages, 
Bugzilla updates, etc.).  There has of course been a lot of past 
discussion of those that someone will need to find, read and describe the 
issues and conclusions from.  I think there was a view that branch 
deletion and force-pushes should be limited to a particular namespace for 
user branches.

(I support a move to git, but not one using git-svn, and only one that 
properly takes into account the large amount of work previously done on 
author maps, understanding the repository peculiarities and how to 
correctly identify exactly which directories are branches or tags, fixing 
cases where there are both a branch and tag of the same name, identifying 
which tags to remove and which to keep, etc.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Richard Sandiford
Joseph Myers  writes:
> On Thu, 16 May 2019, Maxim Kuvyrkov wrote:
>
>> Let's avoid mixing the two discussions: (1) converting svn repo to git 
>> (and getting community consensus to switch to git) and (2) deciding on 
>> which branches to keep in the new repo.
>> 
>> With git, we can always split away unneeded history by removing 
>> unnecessary branches and tags and re-packing the repo.  We can equally 
>> easily bring that history back if we change our minds.
>
> A prerequisite of a move to git is to have policies on branch deletion / 
> force-pushes, and hook implementations that ensure those policies are 
> followed (as well as implementing what's agreed on commit messages, 
> Bugzilla updates, etc.).  There has of course been a lot of past 
> discussion of those that someone will need to find, read and describe the 
> issues and conclusions from.  I think there was a view that branch 
> deletion and force-pushes should be limited to a particular namespace for 
> user branches.

We're not starting from scratch on that though.  The public git
(semi-)mirror has been going for a long time, so IMO we should just
inherit the policies for that.  (Like you say, forced pushed are
restricted to the user namespace.)  Policies can evoluve over time :-)

Agreeing on a format for commit messages would be good, but IMO it's
a separate improvement to the repo discussion.  We don't have an agreed
format for SVN commit messages either, and although it's not ideal,
it doesn't make SVN unworkable.  The same would be true for git.
Whatever policy we come up with can't apply retrospectively anyway,
so the full git history is always going to have a mixture of styles.

And I think that's the major downside of putting so many barriers
in the way of the conversion.  Switching to git without new commit
message guidelines might not be perfect, but if we'd done it two years
ago anyway, people would have been committing (mostly) git-friendly
commits since then, even if the messages weren't very consistent.
Whereas at the moment, many commit messages are neither git-friendly
nor consistent.  And that's going to continue to be the case until
we switch.

So although the intention of these requirements seems to be to make the
final git history as good as it can be, I think in practice it's having
the opposite effect.

> (I support a move to git, but not one using git-svn, and only one that 
> properly takes into account the large amount of work previously done on 
> author maps, understanding the repository peculiarities and how to 
> correctly identify exactly which directories are branches or tags, fixing 
> cases where there are both a branch and tag of the same name, identifying 
> which tags to remove and which to keep, etc.)

But the discussion upthread seemed to be that having the very old stuff
in git wasn't necessarily that important anyway.

FWIW, I've been using the "official" git-svn based mirror for at least
the last five years, only using SVN to actually commit.  And I've never
needed any of the above during that time.

E.g. having proper author names seems like a nice-to-have rather than
a requirement.  A lot of the effort spent on compiling that list seemed
to be getting names and email addresses for people who haven't contributed
to gcc for a long time (in some cases 20 years or more).  It's interesting
historical data, but in almost all cases, the email addresses used are
going to be defunct anyway.

It would be a really neat project to create a GCC git repo that goes
far back in time and gives the closest illusion possible that git had
been used all that time.  And personally I'd be very interested in
seeing that.  But its main use would be as a historical artefact,
to show how a long-running software project evolved over time.

I think the focus for the development git repo should be on what's
needed for day-to-day work, and like I say, the git-svn mirror we
have now is in practice a good enough conversion for that.  If we can
do better then great.  But I think we're in serious danger of making the
best the enemy of the good here.

The big advantage of Maxim's approach is that it's a public script in
our own repo that anyone can contribute to.  So if there are specific
tweaks people want to make, there's now the opportunity to do that.

So FWIW, my vote would be for having a window to allow people to tweak
the script if they want to, then make the switch.

Thanks,
Richard


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Martin Liška
On 5/17/19 12:04 AM, Jonathan Wakely wrote:
> On 16/05/19 13:07 -0600, Jeff Law wrote:
>> On 5/16/19 12:36 PM, Ramana Radhakrishnan wrote:
>>> On Thu, May 16, 2019 at 5:41 PM Maxim Kuvyrkov
>>>  wrote:

> On May 16, 2019, at 7:22 PM, Jeff Law  wrote:
>
> On 5/15/19 5:19 AM, Richard Biener wrote:
>>
>> For the official converted repo do we really want all (old)
>> development branches to be in the
>> main git repo?  I suppose we could create a readonly git from the
>> state of the whole repository
>> at the point of conversion (and also keep the SVN in readonly mode),
>> just to make migration
>> of content we want easy in the future?
> I've always assumed we'd keep the old SVN tree read-only for historical
> purposes.  I strongly suspect that, ignoring release branches, that
> non-active branches just aren't terribly interesting.

 Let's avoid mixing the two discussions: (1) converting svn repo to git 
 (and getting community consensus to switch to git) and (2) deciding on 
 which branches to keep in the new repo.

>>>
>>> I'm hoping that there is still community consensus to switch to git.
>>>
>>> Personally speaking, a +1 to switch to git.
>> Absolutely +1 for converting as well.
> 
> Yes please!
> 
> Thanks for working on this, Maxim.
> 
> 

I fully support that and thank you Maxim for working on that!

Martin


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Martin Liška
On 5/17/19 1:06 AM, Joseph Myers wrote:
> That repository 
> represents what I consider the collaboratively built consensus on such 
> things as the desired author map (including handling of the ambiguous 
> author name), which directories represent branches and tags, and what tags 
> should be kept or removed - but building up such a consensus and keeping 

About the map. I agree with Richard that we should do best approach and not
to fully reconstruct history of people who has switched email address multi
times. I cloned git://thyrsus.com/repositories/gcc-conversion.git and made
a clean up:

- for logins with duplicite emails I chose the latest one used on gcc-patches 
mailing list
- comments were removed
- a few entries contained timezone and I stripped that

Final version of the map can be seen here:
https://github.com/marxin/gcc-git-conversion/blob/cleanup/gcc.map

@Maxim: would it be possible to update your script so that it will use:
--authors-file=gcc.map ?

Is it desired for the transition to use the author map? Do we want it?

Martin



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Jakub Jelinek
On Fri, May 17, 2019 at 02:22:47PM +0200, Martin Liška wrote:
> On 5/17/19 1:06 AM, Joseph Myers wrote:
> > That repository 
> > represents what I consider the collaboratively built consensus on such 
> > things as the desired author map (including handling of the ambiguous 
> > author name), which directories represent branches and tags, and what tags 
> > should be kept or removed - but building up such a consensus and keeping 
> 
> About the map. I agree with Richard that we should do best approach and not
> to fully reconstruct history of people who has switched email address multi
> times. I cloned git://thyrsus.com/repositories/gcc-conversion.git and made
> a clean up:
> 
> - for logins with duplicite emails I chose the latest one used on gcc-patches 
> mailing list
> - comments were removed
> - a few entries contained timezone and I stripped that
> 
> Final version of the map can be seen here:
> https://github.com/marxin/gcc-git-conversion/blob/cleanup/gcc.map
> 
> @Maxim: would it be possible to update your script so that it will use:
> --authors-file=gcc.map ?
> 
> Is it desired for the transition to use the author map? Do we want it?

Can people proposing the conversion also come up with the precommit hooks
etc. scripts we'll need?
I'd think we want to enforce linear history (and stress that every commit
should be bootstrappable, with git it is much easier to screw that up by
pushing many git commits at once, even with rebase actually not testing each
of them).
And something to keep the numeric commit numbers working for
http://gcc.gnu.org/rNN (I believe a roughly working scheme has been
identified, but not implemented).

Jakub


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Jason Merrill
On Tue, May 14, 2019 at 12:11 PM Maxim Kuvyrkov
 wrote:
>
> This patch adds scripts to contrib/ to migrate full history of GCC's 
> subversion repository to git.  My hope is that these scripts will finally 
> allow GCC project to migrate to Git.

Thanks for looking into this.  My feeling has been that, if we give up
on reposurgeon, there's no need to start a new conversion at all: we
can just switch the current mirror over to being the primary
repository with some minor surgery (e.g. using git filter-branch to
fix subdirectory branches).  That approach will produce the least
disruption in the workflows of people already using it.  Do you see a
problem with this?

Jason


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Maxim Kuvyrkov
> On May 17, 2019, at 2:06 AM, Joseph Myers  wrote:
> 
> On Tue, 14 May 2019, Maxim Kuvyrkov wrote:
> 
>> The scripts convert svn history branch by branch.  They rely on git-svn 
>> on convert individual branches.  Git-svn is a good tool for converting 
>> individual branches.  It is, however, either very slow at converting the 
>> entire GCC repo, or goes into infinite loop.
> 
> I think git-svn is in fact a bad tool for repository conversion when the 
> history is nontrivial (for the reasons that have been discussed at length 
> in the past),

I agree with this.  However, with git -- we don't need to force ourselves to 
convert the whole history in one go; git-svn seems to be doing a good job at 
converting branch at a time.

> and we should convert with reposurgeon.

If reposurgeon works -- great, let's convert with it.  It's good to have two 
independent tools so that we can compare and sanity check the results; from the 
top of my head:
1. number of merges should match on all branches,
2. changed files should match for all revisions.

What I want to avoid is delaying the switch to git.

> 
> ESR, can you give an update on the status of the conversion with 
> reposurgeon?  You said "another serious attack on the repository 
> conversion is probably about two months out" in 
> .  Is it on target to be 
> done by the time of the GNU Tools Cauldron in Montreal in September?
> 
> And, could you bring git://thyrsus.com/repositories/gcc-conversion.git up 
> to date with changes since Jan 2018, or push the latest version of that 
> repository to some other public hosting location?  That repository 
> represents what I consider the collaboratively built consensus on such 
> things as the desired author map (including handling of the ambiguous 
> author name), which directories represent branches and tags, and what tags 
> should be kept or removed - but building up such a consensus and keeping 
> it up to date over time (for new committers etc.) requires that the public 
> repository actually reflects the latest version of the conversion 
> machinery, day by day as the consensus develops.  Review of that 
> repository will be important for reviewing the details of whether the 
> conversion is being done as desired - the details of the machinery will 
> help suggest things to spot-check in a converted repository.



--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Maxim Kuvyrkov
> On May 17, 2019, at 3:22 PM, Martin Liška  wrote:
> 
> On 5/17/19 1:06 AM, Joseph Myers wrote:
>> That repository 
>> represents what I consider the collaboratively built consensus on such 
>> things as the desired author map (including handling of the ambiguous 
>> author name), which directories represent branches and tags, and what tags 
>> should be kept or removed - but building up such a consensus and keeping 
> 
> About the map. I agree with Richard that we should do best approach and not
> to fully reconstruct history of people who has switched email address multi
> times. I cloned git://thyrsus.com/repositories/gcc-conversion.git and made
> a clean up:
> 
> - for logins with duplicite emails I chose the latest one used on gcc-patches 
> mailing list
> - comments were removed
> - a few entries contained timezone and I stripped that
> 
> Final version of the map can be seen here:
> https://github.com/marxin/gcc-git-conversion/blob/cleanup/gcc.map
> 
> @Maxim: would it be possible to update your script so that it will use:
> --authors-file=gcc.map ?

Should not be a problem.  I'll try that.

> 
> Is it desired for the transition to use the author map? Do we want it?

IIUC, the downside is that converted repo will not match current git mirror 
unless we do log re-writing, which would add extra info on the side.

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Maxim Kuvyrkov
> On May 17, 2019, at 4:07 PM, Jason Merrill  wrote:
> 
> On Tue, May 14, 2019 at 12:11 PM Maxim Kuvyrkov
>  wrote:
>> 
>> This patch adds scripts to contrib/ to migrate full history of GCC's 
>> subversion repository to git.  My hope is that these scripts will finally 
>> allow GCC project to migrate to Git.
> 
> Thanks for looking into this.  My feeling has been that, if we give up
> on reposurgeon, there's no need to start a new conversion at all: we
> can just switch the current mirror over to being the primary
> repository with some minor surgery (e.g. using git filter-branch to
> fix subdirectory branches).  That approach will produce the least
> disruption in the workflows of people already using it.  Do you see a
> problem with this?

No technical problem.  The scripts start with the existing git mirror and only 
convert the parts that are not present in it.  FWIW, the scripts can start with 
a bare repo, but that would take longer time.

It is a good idea to run a test conversion without using existing mirror as 
cache to confirm that there are no discrepancies in repos produced by old and 
new versions of git-svn.

However, if the community decides that we want use author maps, then, iiuc, the 
new repo would not be compatible with the existing mirror.

--
Maxim Kuvyrkov
www.linaro.org



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Segher Boessenkool
On Fri, May 17, 2019 at 09:18:53AM +0100, Richard Sandiford wrote:
> Joseph Myers  writes:
> > On Thu, 16 May 2019, Maxim Kuvyrkov wrote:
> >> With git, we can always split away unneeded history by removing 
> >> unnecessary branches and tags and re-packing the repo.  We can equally 
> >> easily bring that history back if we change our minds.

Only if we have stored it somewhere!  But we all agree we will never
delete any (non-user) branches I think.

> > (I support a move to git, but not one using git-svn, and only one that 
> > properly takes into account the large amount of work previously done on 
> > author maps, understanding the repository peculiarities and how to 

I am very much **against** most of that author map stuff.  It is falsifying
history.  Replacing the sourceware account name of people by one of their
current email addresses is just foolish anyway.

In many cases anyone can trivially glance the correct info from the
changelogs.  In some other cases it is hard or impossible to find the
correct info.

The information we have now in SVN is what we have there now.  We should
convert *that* information to Git, nothing more, nothing less.  And that
is very much possible to do, not a gargantuan task.

> > correctly identify exactly which directories are branches or tags, fixing 
> > cases where there are both a branch and tag of the same name, identifying 
> > which tags to remove and which to keep, etc.)

Yes -- one of the problems I have with the current git-svn mirror is that
it doesn't have any of the SVN branches under ibm/ as separate Git branches.
It looks like Maxim's scripts will handle this; the conversion hasn't
reached those branches yet though.  Soon :-)

> FWIW, I've been using the "official" git-svn based mirror for at least
> the last five years, only using SVN to actually commit.  And I've never
> needed any of the above during that time.

I do look through all history pretty often.  The current mirror is good
enough for most of that, and there is no way to get the rest back AFAIK.

> It would be a really neat project to create a GCC git repo that goes
> far back in time and gives the closest illusion possible that git had
> been used all that time.  And personally I'd be very interested in
> seeing that.  But its main use would be as a historical artefact,
> to show how a long-running software project evolved over time.

Agreed.

> I think the focus for the development git repo should be on what's
> needed for day-to-day work, and like I say, the git-svn mirror we
> have now is in practice a good enough conversion for that.  If we can
> do better then great.  But I think we're in serious danger of making the
> best the enemy of the good here.

Yes.  There are a few things that should be fixed though, like that branches
vs. tags thing, and the subdir branches problem.

> The big advantage of Maxim's approach is that it's a public script in
> our own repo that anyone can contribute to.  So if there are specific
> tweaks people want to make, there's now the opportunity to do that.
> 
> So FWIW, my vote would be for having a window to allow people to tweak
> the script if they want to, then make the switch.

I agree.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Steve Ellcey
I hope this isn't too much of a thread drift but I was wondering if,
after the Git conversion, will the GCC repo look like a 'normal' GIT
repo with the main line sources on the master branch?

I.e. right now instead of a simple clone, the GCC Wiki says to use a
sequence of git init/config/fetch/checkout commands.  After the
conversion will we be able to just use 'git clone'?  And will the
default master branch be the usable GCC top-of-tree sources (vs the
trunk branch) that we can do checkins to?

Steve Ellcey
sell...@marvell.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-17 Thread Jason Merrill
On Fri, May 17, 2019 at 3:51 PM Segher Boessenkool
 wrote:
> Yes -- one of the problems I have with the current git-svn mirror is that
> it doesn't have any of the SVN branches under ibm/ as separate Git branches.
> It looks like Maxim's scripts will handle this; the conversion hasn't
> reached those branches yet though.  Soon :-)

I talk about how to rewrite subdirectory branches without doing the
slow git-svn checkout in

https://gcc.gnu.org/wiki/GitMirror#Subdirectory_branches

Jason


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-19 Thread Martin Liška

On 5/17/19 4:59 PM, Maxim Kuvyrkov wrote:

On May 17, 2019, at 3:22 PM, Martin Liška  wrote:

On 5/17/19 1:06 AM, Joseph Myers wrote:

That repository
represents what I consider the collaboratively built consensus on such
things as the desired author map (including handling of the ambiguous
author name), which directories represent branches and tags, and what tags
should be kept or removed - but building up such a consensus and keeping


About the map. I agree with Richard that we should do best approach and not
to fully reconstruct history of people who has switched email address multi
times. I cloned git://thyrsus.com/repositories/gcc-conversion.git and made
a clean up:

- for logins with duplicite emails I chose the latest one used on gcc-patches 
mailing list
- comments were removed
- a few entries contained timezone and I stripped that

Final version of the map can be seen here:
https://github.com/marxin/gcc-git-conversion/blob/cleanup/gcc.map

@Maxim: would it be possible to update your script so that it will use:
--authors-file=gcc.map ?


Should not be a problem.  I'll try that.



Is it desired for the transition to use the author map? Do we want it?


IIUC, the downside is that converted repo will not match current git mirror 
unless we do log re-writing, which would add extra info on the side.


Just to be clear: I don't insist on the authors map and I see @Segher is 
strongly against (@Richard probably as well).
I'm just saying that we have a pretty compete authors map and we can liberally 
decide whether to use it or not
(with all pros and cons).

Martin



--
Maxim Kuvyrkov
www.linaro.org





Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-19 Thread Martin Liška

On 5/17/19 2:39 PM, Jakub Jelinek wrote:

On Fri, May 17, 2019 at 02:22:47PM +0200, Martin Liška wrote:

On 5/17/19 1:06 AM, Joseph Myers wrote:

That repository
represents what I consider the collaboratively built consensus on such
things as the desired author map (including handling of the ambiguous
author name), which directories represent branches and tags, and what tags
should be kept or removed - but building up such a consensus and keeping


About the map. I agree with Richard that we should do best approach and not
to fully reconstruct history of people who has switched email address multi
times. I cloned git://thyrsus.com/repositories/gcc-conversion.git and made
a clean up:

- for logins with duplicite emails I chose the latest one used on gcc-patches 
mailing list
- comments were removed
- a few entries contained timezone and I stripped that

Final version of the map can be seen here:
https://github.com/marxin/gcc-git-conversion/blob/cleanup/gcc.map

@Maxim: would it be possible to update your script so that it will use:
--authors-file=gcc.map ?

Is it desired for the transition to use the author map? Do we want it?


Can people proposing the conversion also come up with the precommit hooks
etc. scripts we'll need?


Can you please point out to a discussion where these were mentioned?
I'm aware of 'no-merge-commits hook' and a hook that will paste commit
message to bugzilla entries.


I'd think we want to enforce linear history (and stress that every commit
should be bootstrappable, with git it is much easier to screw that up by
pushing many git commits at once, even with rebase actually not testing each
of them).
And something to keep the numeric commit numbers working for
http://gcc.gnu.org/rNN (I believe a roughly working scheme has been
identified, but not implemented).


Do we really need a commit integer numbers after the transition? I know we're 
used to it.
But git commit hash provides that same.

Martin



Jakub





Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-19 Thread Segher Boessenkool
On Sun, May 19, 2019 at 09:35:45AM +0200, Martin Liška wrote:
> Do we really need a commit integer numbers after the transition? I know 
> we're used to it.
> But git commit hash provides that same.

Revision numbers are nice short text strings, and from a revision number
you can see approximately when it happened, and from two revision numbers
on the same branch you can trivially tell which one is older.  Those are
nice features.  But we can live without it, IMO.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-19 Thread Marek Polacek
On Sun, May 19, 2019 at 03:11:08AM -0500, Segher Boessenkool wrote:
> On Sun, May 19, 2019 at 09:35:45AM +0200, Martin Liška wrote:
> > Do we really need a commit integer numbers after the transition? I know 
> > we're used to it.
> > But git commit hash provides that same.
> 
> Revision numbers are nice short text strings, and from a revision number
> you can see approximately when it happened, and from two revision numbers
> on the same branch you can trivially tell which one is older.  Those are
> nice features.  But we can live without it, IMO.

Since I do many bisections a day, losing this capability would be Very Bad.
Without it, there's no range, and without a range, there's nothing to _bisect_.

I bisect by hand, so if I have cc1plus.25 (good) and cc1plus.26 (bad),
I know the commit I'm looking for is within that range, and I can easily split
the range, and it's at most log n steps.  Whereas if we had e.g. cc1plus.de28b0
and cc1plus.a9bd4d, I couldn't do it anymore.

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-19 Thread Andreas Schwab
On Mai 19 2019, Marek Polacek  wrote:

> On Sun, May 19, 2019 at 03:11:08AM -0500, Segher Boessenkool wrote:
>> On Sun, May 19, 2019 at 09:35:45AM +0200, Martin Liška wrote:
>> > Do we really need a commit integer numbers after the transition? I know 
>> > we're used to it.
>> > But git commit hash provides that same.
>> 
>> Revision numbers are nice short text strings, and from a revision number
>> you can see approximately when it happened, and from two revision numbers
>> on the same branch you can trivially tell which one is older.  Those are
>> nice features.  But we can live without it, IMO.
>
> Since I do many bisections a day, losing this capability would be Very Bad.
> Without it, there's no range, and without a range, there's nothing to 
> _bisect_.

What's wrong with git bisect?  It does everything you need.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-19 Thread Segher Boessenkool
On Sun, May 19, 2019 at 03:21:01PM -0400, Marek Polacek wrote:
> On Sun, May 19, 2019 at 03:11:08AM -0500, Segher Boessenkool wrote:
> > On Sun, May 19, 2019 at 09:35:45AM +0200, Martin Liška wrote:
> > > Do we really need a commit integer numbers after the transition? I know 
> > > we're used to it.
> > > But git commit hash provides that same.
> > 
> > Revision numbers are nice short text strings, and from a revision number
> > you can see approximately when it happened, and from two revision numbers
> > on the same branch you can trivially tell which one is older.  Those are
> > nice features.  But we can live without it, IMO.
> 
> Since I do many bisections a day, losing this capability would be Very Bad.
> Without it, there's no range, and without a range, there's nothing to 
> _bisect_.
> 
> I bisect by hand, so if I have cc1plus.25 (good) and cc1plus.26 (bad),
> I know the commit I'm looking for is within that range, and I can easily split
> the range, and it's at most log n steps.  Whereas if we had e.g. 
> cc1plus.de28b0
> and cc1plus.a9bd4d, I couldn't do it anymore.

Git can bisect automatically just fine, there is no upside to doing things
manually.  In git there are various handy ways of referring to commits; you
can say  master@{3 days ago}  for example, or zut@{31}  to get the 31st
commit back on branch "zut", etc.  See "man gitrevisions".


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-19 Thread Andrew Pinski
On Sun, May 19, 2019 at 12:54 PM Segher Boessenkool
 wrote:
>
> On Sun, May 19, 2019 at 03:21:01PM -0400, Marek Polacek wrote:
> > On Sun, May 19, 2019 at 03:11:08AM -0500, Segher Boessenkool wrote:
> > > On Sun, May 19, 2019 at 09:35:45AM +0200, Martin Liška wrote:
> > > > Do we really need a commit integer numbers after the transition? I know
> > > > we're used to it.
> > > > But git commit hash provides that same.
> > >
> > > Revision numbers are nice short text strings, and from a revision number
> > > you can see approximately when it happened, and from two revision numbers
> > > on the same branch you can trivially tell which one is older.  Those are
> > > nice features.  But we can live without it, IMO.
> >
> > Since I do many bisections a day, losing this capability would be Very Bad.
> > Without it, there's no range, and without a range, there's nothing to 
> > _bisect_.
> >
> > I bisect by hand, so if I have cc1plus.25 (good) and cc1plus.26 
> > (bad),
> > I know the commit I'm looking for is within that range, and I can easily 
> > split
> > the range, and it's at most log n steps.  Whereas if we had e.g. 
> > cc1plus.de28b0
> > and cc1plus.a9bd4d, I couldn't do it anymore.
>
> Git can bisect automatically just fine, there is no upside to doing things
> manually.  In git there are various handy ways of referring to commits; you
> can say  master@{3 days ago}  for example, or zut@{31}  to get the 31st
> commit back on branch "zut", etc.  See "man gitrevisions".

Well one thing is if you have prebuilt cc1/cc1plus.  So it is not
really doing a manual bisect per-say but rather it is doing a manual
bisect using prebuilt binaries and knowing which one comes before
which one.
One way is store the binaries based on the date that commit happened
instead.  This is a bit more complex but still doable.

Thanks,
Andrew Pinski

>
>
> Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-19 Thread Marek Polacek
On Sun, May 19, 2019 at 01:00:45PM -0700, Andrew Pinski wrote:
> On Sun, May 19, 2019 at 12:54 PM Segher Boessenkool
>  wrote:
> >
> > On Sun, May 19, 2019 at 03:21:01PM -0400, Marek Polacek wrote:
> > > On Sun, May 19, 2019 at 03:11:08AM -0500, Segher Boessenkool wrote:
> > > > On Sun, May 19, 2019 at 09:35:45AM +0200, Martin Liška wrote:
> > > > > Do we really need a commit integer numbers after the transition? I 
> > > > > know
> > > > > we're used to it.
> > > > > But git commit hash provides that same.
> > > >
> > > > Revision numbers are nice short text strings, and from a revision number
> > > > you can see approximately when it happened, and from two revision 
> > > > numbers
> > > > on the same branch you can trivially tell which one is older.  Those are
> > > > nice features.  But we can live without it, IMO.
> > >
> > > Since I do many bisections a day, losing this capability would be Very 
> > > Bad.
> > > Without it, there's no range, and without a range, there's nothing to 
> > > _bisect_.
> > >
> > > I bisect by hand, so if I have cc1plus.25 (good) and cc1plus.26 
> > > (bad),
> > > I know the commit I'm looking for is within that range, and I can easily 
> > > split
> > > the range, and it's at most log n steps.  Whereas if we had e.g. 
> > > cc1plus.de28b0
> > > and cc1plus.a9bd4d, I couldn't do it anymore.
> >
> > Git can bisect automatically just fine, there is no upside to doing things
> > manually.  In git there are various handy ways of referring to commits; you
> > can say  master@{3 days ago}  for example, or zut@{31}  to get the 31st
> > commit back on branch "zut", etc.  See "man gitrevisions".
> 
> Well one thing is if you have prebuilt cc1/cc1plus.  So it is not
> really doing a manual bisect per-say but rather it is doing a manual
> bisect using prebuilt binaries and knowing which one comes before
> which one.

Exactly, we have many TBs of prebuilt binaries.

> One way is store the binaries based on the date that commit happened
> instead.  This is a bit more complex but still doable.

Yeah, I guess we'll have to do something like that, then.  :/

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Martin Liška
On 5/19/19 10:06 PM, Marek Polacek wrote:
> On Sun, May 19, 2019 at 01:00:45PM -0700, Andrew Pinski wrote:
>> On Sun, May 19, 2019 at 12:54 PM Segher Boessenkool
>>  wrote:
>>>
>>> On Sun, May 19, 2019 at 03:21:01PM -0400, Marek Polacek wrote:
 On Sun, May 19, 2019 at 03:11:08AM -0500, Segher Boessenkool wrote:
> On Sun, May 19, 2019 at 09:35:45AM +0200, Martin Liška wrote:
>> Do we really need a commit integer numbers after the transition? I know
>> we're used to it.
>> But git commit hash provides that same.
>
> Revision numbers are nice short text strings, and from a revision number
> you can see approximately when it happened, and from two revision numbers
> on the same branch you can trivially tell which one is older.  Those are
> nice features.  But we can live without it, IMO.

 Since I do many bisections a day, losing this capability would be Very Bad.
 Without it, there's no range, and without a range, there's nothing to 
 _bisect_.

 I bisect by hand, so if I have cc1plus.25 (good) and cc1plus.26 
 (bad),
 I know the commit I'm looking for is within that range, and I can easily 
 split
 the range, and it's at most log n steps.  Whereas if we had e.g. 
 cc1plus.de28b0
 and cc1plus.a9bd4d, I couldn't do it anymore.
>>>
>>> Git can bisect automatically just fine, there is no upside to doing things
>>> manually.  In git there are various handy ways of referring to commits; you
>>> can say  master@{3 days ago}  for example, or zut@{31}  to get the 31st
>>> commit back on branch "zut", etc.  See "man gitrevisions".
>>
>> Well one thing is if you have prebuilt cc1/cc1plus.  So it is not
>> really doing a manual bisect per-say but rather it is doing a manual
>> bisect using prebuilt binaries and knowing which one comes before
>> which one.
> 
> Exactly, we have many TBs of prebuilt binaries.

I combine both together, feel free to use my script:
https://github.com/marxin/script-misc/blob/master/gcc-bisect.py

It uses git repository for navigation, information about branches, tag releases
and so on. And then I have a folder with pre-built binaries which are identified
by commit hash. That works all fine.

Martin

> 
>> One way is store the binaries based on the date that commit happened
>> instead.  This is a bit more complex but still doable.
> 
> Yeah, I guess we'll have to do something like that, then.  :/
> 
> --
> Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA
> 



Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Florian Weimer
* Andrew Pinski:

> On Sun, May 19, 2019 at 12:54 PM Segher Boessenkool
>  wrote:

>> Git can bisect automatically just fine, there is no upside to doing things
>> manually.  In git there are various handy ways of referring to commits; you
>> can say  master@{3 days ago}  for example, or zut@{31}  to get the 31st
>> commit back on branch "zut", etc.  See "man gitrevisions".
>
> Well one thing is if you have prebuilt cc1/cc1plus.  So it is not
> really doing a manual bisect per-say but rather it is doing a manual
> bisect using prebuilt binaries and knowing which one comes before
> which one.

If GCC policy is to reject merge commits, a command similar to
“git log --pretty=oneline | wc -l” gives something that is very
much like a Subversion revision number, in the sense that it matches
the commit ordering and that the assigned numbers remain stable
over time.

Thanks,
Florian


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Segher Boessenkool
On Mon, May 20, 2019 at 03:56:35PM +0200, Florian Weimer wrote:
> If GCC policy is to reject merge commits, a command similar to
> “git log --pretty=oneline | wc -l” gives something that is very
> much like a Subversion revision number, in the sense that it matches
> the commit ordering and that the assigned numbers remain stable
> over time.

Yup.  About twice faster:

git rev-list --count 


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Jakub Jelinek
On Mon, May 20, 2019 at 03:56:35PM +0200, Florian Weimer wrote:
> * Andrew Pinski:
> 
> > On Sun, May 19, 2019 at 12:54 PM Segher Boessenkool
> >  wrote:
> 
> >> Git can bisect automatically just fine, there is no upside to doing things
> >> manually.  In git there are various handy ways of referring to commits; you
> >> can say  master@{3 days ago}  for example, or zut@{31}  to get the 31st
> >> commit back on branch "zut", etc.  See "man gitrevisions".
> >
> > Well one thing is if you have prebuilt cc1/cc1plus.  So it is not
> > really doing a manual bisect per-say but rather it is doing a manual
> > bisect using prebuilt binaries and knowing which one comes before
> > which one.
> 
> If GCC policy is to reject merge commits, a command similar to
> “git log --pretty=oneline | wc -l” gives something that is very
> much like a Subversion revision number, in the sense that it matches
> the commit ordering and that the assigned numbers remain stable
> over time.

That is way too slow for our purposes, note we have at least 15 trunk
commits at least, git log --pretty=oneline | wc -l takes on my box more than
1.4 seconds.

So far the best suggestion I was given for this was:
> so, if we have tags like r163 for 163000'th commit then git describe --all 
> --match 'r[0-9]*' whatever | sed 's/^r\([0-9]\)*-\([0-9]\)*-.*$/\1\2/'
> would give us the r163147 (except it doesn't handle the r163000 commit or < 
> 100 commits after it, to be fixed)
matz> yeah.  With the other direction then being as discussed above (limiting 
the git log revision to just the couple thousands in range).  One thing to 
consider: having a zillion tags locally can make other operations slow (as the 
mapping from SHA1 to tag is slow), so if really using rXXX tags it should 
probably be a larger range than just 1000 revisions.

Thus perhaps tag in a post-commit hook every 5000th commit and handle the
rest in some git alias command or script that.  We need a quick way to map
between these revisions and hashes bidirectionally.
Unfortunately, the above is just a per-branch number rather than branch
number common to trunk and official release branches.

Jakub


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Andreas Schwab
On Mai 20 2019, Florian Weimer  wrote:

> If GCC policy is to reject merge commits, a command similar to
> “git log --pretty=oneline | wc -l” gives something that is very

git rev-list HEAD | wc -l

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Jakub Jelinek
On Mon, May 20, 2019 at 04:26:45PM +0200, Andreas Schwab wrote:
> On Mai 20 2019, Florian Weimer  wrote:
> 
> > If GCC policy is to reject merge commits, a command similar to
> > “git log --pretty=oneline | wc -l” gives something that is very
> 
> git rev-list HEAD | wc -l

That is still in the 1.3 seconds range, git rev-list --count HEAD | wc -l
is in the 1 seconds range user time.

Jakub


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Andreas Schwab
On Mai 20 2019, Jakub Jelinek  wrote:

> On Mon, May 20, 2019 at 04:26:45PM +0200, Andreas Schwab wrote:
>> On Mai 20 2019, Florian Weimer  wrote:
>> 
>> > If GCC policy is to reject merge commits, a command similar to
>> > “git log --pretty=oneline | wc -l” gives something that is very
>> 
>> git rev-list HEAD | wc -l
>
> That is still in the 1.3 seconds range, git rev-list --count HEAD | wc -l
> is in the 1 seconds range user time.

You don't want the wc -l any more, though. :-)

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Segher Boessenkool
On Mon, May 20, 2019 at 04:29:10PM +0200, Jakub Jelinek wrote:
> On Mon, May 20, 2019 at 04:26:45PM +0200, Andreas Schwab wrote:
> > On Mai 20 2019, Florian Weimer  wrote:
> > 
> > > If GCC policy is to reject merge commits, a command similar to
> > > “git log --pretty=oneline | wc -l” gives something that is very
> > 
> > git rev-list HEAD | wc -l
> 
> That is still in the 1.3 seconds range, git rev-list --count HEAD | wc -l
> is in the 1 seconds range user time.

You can store the output of

  git rev-list 

somewhere, this should be fully static (on official branches).  Use
--reverse if you want extra convenience ;-)


$ git rev-list --reverse $BRANCH | grep -n $HASH | sed 's/:.*//'


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Joseph Myers
On Fri, 17 May 2019, Richard Sandiford wrote:

> We're not starting from scratch on that though.  The public git
> (semi-)mirror has been going for a long time, so IMO we should just
> inherit the policies for that.  (Like you say, forced pushed are
> restricted to the user namespace.)  Policies can evoluve over time :-)

It doesn't send anything to gcc-cvs or to Bugzilla, so we need to define 
what goes there, for example, and implement it (presumably as far as 
possible by configuring one of the sets of git hooks already in use on 
sourceware, e.g. the AdaCore ones used for binutils/gdb and glibc, rather 
than writing our own from scratch).  (When referring to commit messages I 
was thinking about the messages on gcc-cvs rather than the messages 
written by committers; I agree that the format of the latter is 
independent of a move to git, and have been using git-style messages for 
commits to GCC for some time.)

> But the discussion upthread seemed to be that having the very old stuff
> in git wasn't necessarily that important anyway.

I think git should have all the branches that haven't been deleted in SVN, 
minus any where there is a specific decision to remove in the conversion 
(messed up history, branch was an artefact of conversion from CVS rather 
than a real branch, etc.).  If a branch or tag has been deleted in SVN it 
should not be brought across to the git repository (SVN will remain 
readonly, just as the old CVS repository remains available readonly).

> FWIW, I've been using the "official" git-svn based mirror for at least
> the last five years, only using SVN to actually commit.  And I've never
> needed any of the above during that time.

That the git-svn mirror is useful for many purposes for which people want 
to use git also provides a clear argument against needing to do the final 
conversion in a hurry; people can use it when convenient while we take the 
time to get the conversion right (in particular, seeing what the Go 
conversion of reposurgeon comes up with), and then rebase their git 
branches on the final converted history.

(As previously noted I expect the objects from the git-svn mirror should 
go in the new repository with the refs appropriately renamed, so that old 
commit hash references remain valid and people don't need to check out a 
separate repository to access old git branches, which should be doable 
with a single "git fetch" command; the two versions of the history would 
be disconnected, but most blob and tree objects would have the same hashes 
so this shouldn't enlarge the repository much.  Rebasing on top of the 
final conversion, for active branches currently git-only, would be 
preferred to anything that connects the two versions of the history.)

> E.g. having proper author names seems like a nice-to-have rather than
> a requirement.  A lot of the effort spent on compiling that list seemed
> to be getting names and email addresses for people who haven't contributed
> to gcc for a long time (in some cases 20 years or more).  It's interesting
> historical data, but in almost all cases, the email addresses used are
> going to be defunct anyway.

I think having author names and email addresses is a basic requirement of 
any reasonable repository conversion - it's simply how git identifies 
authors; having something that is not a name and email for the author / 
committer there is not a proper use of git datastructures.  For me, that 
means that, when the author and committer are the same, some name and 
email address for the author that are or were valid at some point should 
be listed for both those fields in git.

I'm not particularly concerned with distinguishing between different names 
and email addresses for an author depending on when or in what capacity 
they contributed a change, or with the cases where a patch was committed 
for someone else and SVN simply doesn't provide a way to distinguish that 
information.  However, since some people were concerned with that, and 
since the feature needed for that was implemented (the "changelogs" 
feature in reposurgeon, which will do it as long as a proper ChangeLog 
entry was included in the commit), we may as well use that feature.  (The 
author map is still needed for commits without ChangeLog entries.)

> The big advantage of Maxim's approach is that it's a public script in
> our own repo that anyone can contribute to.  So if there are specific
> tweaks people want to make, there's now the opportunity to do that.

reposurgeon is public code in its own repository.  So now is the 
conversion machinery using it.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-20 Thread Joseph Myers
On Fri, 17 May 2019, Jason Merrill wrote:

> Thanks for looking into this.  My feeling has been that, if we give up
> on reposurgeon, there's no need to start a new conversion at all: we
> can just switch the current mirror over to being the primary
> repository with some minor surgery (e.g. using git filter-branch to
> fix subdirectory branches).  That approach will produce the least
> disruption in the workflows of people already using it.  Do you see a
> problem with this?

I'd expect more major surgery (in particular remapping authors) if 
reposurgeon fails.

I don't expect reposurgeon to fail.  I expect even the python version of 
reposurgeon would work, with a few bug fixes - it's just that with 
day-long test cycles and hours to load a GCC repository dump into the 
python version, it's very hard actually to debug subtle issues with 
handling Subversion dumps to find out what those bugs are and fix them, 
which is where the Go version should help once finished, by being 
substantially faster.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-21 Thread Richard Earnshaw (lists)
On 20/05/2019 23:42, Joseph Myers wrote:

> I'm not particularly concerned with distinguishing between different names 
> and email addresses for an author depending on when or in what capacity 
> they contributed a change, or with the cases where a patch was committed 
> for someone else and SVN simply doesn't provide a way to distinguish that 
> information.  However, since some people were concerned with that, and 
> since the feature needed for that was implemented (the "changelogs" 
> feature in reposurgeon, which will do it as long as a proper ChangeLog 
> entry was included in the commit), we may as well use that feature.  (The 
> author map is still needed for commits without ChangeLog entries.)
> 

For very old commits, back in the GCC 2 days, even the ChangeLogs don't
always show the author.  At that time only the committers' name was
used.  I'm pretty sure that some of my earliest patches to GCC were
committed by tege and kenner under their names.  So we'll never really
be able to fully reconstruct the early history.

R.


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-21 Thread Jeff Law
On 5/21/19 8:24 AM, Richard Earnshaw (lists) wrote:
> On 20/05/2019 23:42, Joseph Myers wrote:
> 
>> I'm not particularly concerned with distinguishing between different names 
>> and email addresses for an author depending on when or in what capacity 
>> they contributed a change, or with the cases where a patch was committed 
>> for someone else and SVN simply doesn't provide a way to distinguish that 
>> information.  However, since some people were concerned with that, and 
>> since the feature needed for that was implemented (the "changelogs" 
>> feature in reposurgeon, which will do it as long as a proper ChangeLog 
>> entry was included in the commit), we may as well use that feature.  (The 
>> author map is still needed for commits without ChangeLog entries.)
>>
> 
> For very old commits, back in the GCC 2 days, even the ChangeLogs don't
> always show the author.  At that time only the committers' name was
> used.  I'm pretty sure that some of my earliest patches to GCC were
> committed by tege and kenner under their names.  So we'll never really
> be able to fully reconstruct the early history.
I'd say we make a reasonable effort here, but the importance of
authorship decays rapidly the further back we go.  Even when the author
(or committer) is still around, they often can't remember the details
around commits from that era.

jeff


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-21 Thread Richard Earnshaw (lists)
On 21/05/2019 15:44, Jeff Law wrote:
> On 5/21/19 8:24 AM, Richard Earnshaw (lists) wrote:
>> On 20/05/2019 23:42, Joseph Myers wrote:
>>
>>> I'm not particularly concerned with distinguishing between different names 
>>> and email addresses for an author depending on when or in what capacity 
>>> they contributed a change, or with the cases where a patch was committed 
>>> for someone else and SVN simply doesn't provide a way to distinguish that 
>>> information.  However, since some people were concerned with that, and 
>>> since the feature needed for that was implemented (the "changelogs" 
>>> feature in reposurgeon, which will do it as long as a proper ChangeLog 
>>> entry was included in the commit), we may as well use that feature.  (The 
>>> author map is still needed for commits without ChangeLog entries.)
>>>
>>
>> For very old commits, back in the GCC 2 days, even the ChangeLogs don't
>> always show the author.  At that time only the committers' name was
>> used.  I'm pretty sure that some of my earliest patches to GCC were
>> committed by tege and kenner under their names.  So we'll never really
>> be able to fully reconstruct the early history.
> I'd say we make a reasonable effort here, but the importance of
> authorship decays rapidly the further back we go.  Even when the author
> (or committer) is still around, they often can't remember the details
> around commits from that era.
> 
> jeff
> 


Agreed, and I'm well aware of my limitation on remembering which of
those early patches were mine.  I was just pointing out that the
ChangeLogs from that period cannot be taken as an indication of authorship.

There's a fair chance that, if it was Arm related and dated from mid
1992 onwards, I had a hand in it.  But that's by no means a claim on all
such patches from that era.

R.


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-05-21 Thread Segher Boessenkool
Hi Joseph,

On Mon, May 20, 2019 at 10:42:45PM +, Joseph Myers wrote:
> (SVN will remain 
> readonly, just as the old CVS repository remains available readonly).

> That the git-svn mirror is useful for many purposes for which people want 
> to use git also provides a clear argument against needing to do the final 
> conversion in a hurry;

Right.  But trying to correct the ancient history in the repo isn't useful
*anyway*.  One much bigger problem is that very often very unrelated things
are committed at the same time, in big omnibus patches.  Another much
bigger problm is that when you are doing the kind of archeology where this
matters, you need to have old the email archives anyway, which aren't
available.

> I think having author names and email addresses is a basic requirement of 
> any reasonable repository conversion

Yes, and they should be the same as they were in the original repository.


Segher


Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git

2019-09-13 Thread Maxim Kuvyrkov
> On Aug 24, 2019, at 12:30 AM, Joseph Myers  wrote:
> 
> On Fri, 23 Aug 2019, Maxim Kuvyrkov wrote:
> 
>> I propose that we switch to gcc-pretty.git repository, because it has 
>> accurate Committer and Author fields.  Developer names and email 
>> addresses are extracted from source history, and accurately track people 
>> changing companies, email addresses, and names.  IMO, it is more 
>> important for people to get credit for open-source contributions on 
>> github, ohloh, etc., than the inconvenience of rebasing local git 
>> branches.  It's also an important marketing tool for open-source 
>> companies to show stats of their corporate email addresses appearing in 
>> git commit logs.
> 
> I concur that accurately crediting contributors is important and means we 
> should not start from the existing mirror (though we should keep its 
> branches available, so references to them and to their commit hashes 
> continue to work - either keeping the existing repository available under 
> a different name, or renaming the branches to put them in the new 
> repository - which should not enlarge the repository much because blob and 
> tree objects will generally be shared between the two versions of the 
> history).
> 
> I note that the Go conversion of reposurgeon is now just five test 
> failures away from passing the whole reposurgeon testsuite (at which point 
> it should be ready for an attempt on the GCC conversion).  Given the good 
> progress being made there at present, I thus suggest we plan to compare 
> this conversion with one from reposurgeon (paying special attention to the 
> messiest parts of the repository, such as artifacts from cvs2svn 
> attempting to locate branchpoints), unless those last five goreposurgeon 
> test failures prove unexpectedly time-consuming to get resolved.

Could you upload GCC repo converted with reposurgeon somewhere public?  And 
also list expected artifacts in its current version?

>From my side, the machine on which the conversion ran ran out of disk space 
>about 3 weeks ago.  I'll clean it up and restart the conversion updates.

I'll also improve author entries a bit, so gcc-pretty.git's history will change 
ever so slightly.

> 
> There are of course plenty of things to do relating to a git conversion 
> that do not depend on the particular choice of a converted repository - 
> such as writing git hooks and git versions of the maintainer-scripts 
> scripts that currently work with SVN, or working out a specific choice of 
> how to arrange annotated tags to allow "git describe" to give the sort of 
> monotonic version number some contributors want.
> 
> A reasonable starting point for hooks would be that they closely 
> approximate what the current SVN hooks do for commit mails to gcc-cvs and 
> for Bugzilla updates, as what the current hooks do is clearly OK at 
> present and we shouldn't need to entangle substantive changes to what the 
> hooks do with the actual conversion to git; we can always discuss changes 
> later.

Would the community please assign a volunteer for this at Cauldron? :-P

Thank you,

--
Maxim Kuvyrkov
www.linaro.org





Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git)

2019-08-05 Thread Jason Merrill
On Mon, Aug 5, 2019 at 9:20 AM Martin Liška  wrote:

> Based on the IRC discussion with Jakub, there's missing key element of the 
> transition.
> Jakub requests to have a monotonically increasing revisions (aka rXXX) to 
> be assigned
> for the future git revisions. These will be linked from bugzilla and 
> http://gcc.gnu.org/rN
>
> I don't like the suggested requirement and I would prefer to use git hashes 
> for both bugzilla
> links and general references to revisions. That's what all projects using git 
> do.

I agree.  But for those who want a monotonically increasing
identifier, there's already one in git: CommitDate.  In the discussion
of this issue four years ago,

https://gcc.gnu.org/ml/gcc/2015-09/threads.html#00028

I provided a set of git aliases to generate and use reposurgeon-style
action stamps for naming commits.  For Jakub's use-case, the committer
part of the action stamp is probably unnecessary, just the date/time
part should be enough.

Looking at it again, I notice that the different timezones in the
committer date would interfere with sorting, so this update to the
stamp alias uses UTC unconditionally:

stamp = "!f(){ TZ=UTC git show -s
--date='format-local:%Y-%m-%dT%H:%M:%SZ' --format='%cd!%ce'
${1:+\"$@\"}; }; f"

To drop the committer from the stamp, remove "!%ce" from the format argument.

Jakub, it seems to me that this should do the trick for you; binaries
would be named by date/time rather than by revision.  What do you
think?

Jason


Re: Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git)

2019-08-05 Thread Jakub Jelinek
On Mon, Aug 05, 2019 at 11:20:09AM -0400, Jason Merrill wrote:
> I agree.  But for those who want a monotonically increasing
> identifier, there's already one in git: CommitDate.  In the discussion
> of this issue four years ago,

While commit date is monotonically increasing, it has the problem that at
certain dates there are very few commits, at others many.  When doing
bisection by hand, one does the revision computation (min+max)/2 in head
(it doesn't have to be precise of course, just roughly, and isn't perfect
either, because in svn all of trunk and branches contribute to the revision
numbers), with dates it would be division into further uneven chunks.

Could we tag the branchpoints, say when we branch off gcc 10, we tag the
branchpoint as tags/r11 and then we could use r11-123 as 123th commit on the
trunk since the branchpoint, and let bugzilla and web redirection handle
those rNN- style identifiers?
git describe --all --match 'r[0-9]*' ... | sed ...
to map hashes etc. to these rNN- identifiers and something to map them
back to hashes say for git web?

Jakub


Re: Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git)

2019-08-05 Thread Richard Earnshaw (lists)

On 05/08/2019 16:34, Jakub Jelinek wrote:

On Mon, Aug 05, 2019 at 11:20:09AM -0400, Jason Merrill wrote:

I agree.  But for those who want a monotonically increasing
identifier, there's already one in git: CommitDate.  In the discussion
of this issue four years ago,


While commit date is monotonically increasing, it has the problem that at
certain dates there are very few commits, at others many.  When doing
bisection by hand, one does the revision computation (min+max)/2 in head
(it doesn't have to be precise of course, just roughly, and isn't perfect
either, because in svn all of trunk and branches contribute to the revision
numbers), with dates it would be division into further uneven chunks.

Could we tag the branchpoints, say when we branch off gcc 10, we tag the
branchpoint as tags/r11 and then we could use r11-123 as 123th commit on the
trunk since the branchpoint, and let bugzilla and web redirection handle
those rNN- style identifiers?
git describe --all --match 'r[0-9]*' ... | sed ...
to map hashes etc. to these rNN- identifiers and something to map them
back to hashes say for git web?

Jakub



git rev-list --reverse branchtag..branchname

Will list all the revs on that branch from branchtag through to the head 
of the branch.  I guess you could then count the individual revs on that 
list to index them.


R


Re: Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git)

2019-08-05 Thread Jason Merrill

On 8/5/19 11:34 AM, Jakub Jelinek wrote:

On Mon, Aug 05, 2019 at 11:20:09AM -0400, Jason Merrill wrote:

I agree.  But for those who want a monotonically increasing
identifier, there's already one in git: CommitDate.  In the discussion
of this issue four years ago,


While commit date is monotonically increasing, it has the problem that at
certain dates there are very few commits, at others many.  When doing
bisection by hand, one does the revision computation (min+max)/2 in head
(it doesn't have to be precise of course, just roughly, and isn't perfect
either, because in svn all of trunk and branches contribute to the revision
numbers), with dates it would be division into further uneven chunks.


That's true, but is it a major problem?  If you have multiple commits on 
one day, you (can) have multiple binaries with the same date and 
different times, and you can adjust your heuristic for choosing the next 
bisection point accordingly.  Over longer periods, the number of commits 
per day averages out.



Could we tag the branchpoints, say when we branch off gcc 10, we tag the
branchpoint as tags/r11 and then we could use r11-123 as 123th commit on the
trunk since the branchpoint, and let bugzilla and web redirection handle
those rNN- style identifiers?
git describe --all --match 'r[0-9]*' ... | sed ...
to map hashes etc. to these rNN- identifiers and something to map them
back to hashes say for git web?


Well, having such tags would allow git describe to produce identifiers 
that you might find more readable.  For instance, if I do


git tag -a -m 'GCC 9 branchpoint' b9 $(git merge-base trunk gcc-9-branch)
git describe trunk

I get

b9-2260-gdb868bacf6a

i.e. commit #2260 since tag b9, with abbreviated hash gdb868bacf6a.

or if I do

git tag -a -m'Beginning of Time' r1 3cf0d8938a953ef13e57239613d42686f152b4fe
git describe --match r1 trunk

r1-170718-gdb868bacf6a

These tags don't need to be shared, this works fine locally.

Note that when you feed such an identifier to other git commands, they 
ignore the first two parts and just use the hash.


This might be a good alternative to dates for you, and people could 
refer to them interchangeably with raw hashes in the web interface.


Jason


Re: Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git)

2019-08-14 Thread Jason Merrill
On Mon, Aug 5, 2019 at 2:22 PM Jason Merrill  wrote:
> On 8/5/19 11:34 AM, Jakub Jelinek wrote:
> > On Mon, Aug 05, 2019 at 11:20:09AM -0400, Jason Merrill wrote:
> >> I agree.  But for those who want a monotonically increasing
> >> identifier, there's already one in git: CommitDate.  In the discussion
> >> of this issue four years ago,
> >
> > While commit date is monotonically increasing, it has the problem that at
> > certain dates there are very few commits, at others many.  When doing
> > bisection by hand, one does the revision computation (min+max)/2 in head
> > (it doesn't have to be precise of course, just roughly, and isn't perfect
> > either, because in svn all of trunk and branches contribute to the revision
> > numbers), with dates it would be division into further uneven chunks.
>
> That's true, but is it a major problem?  If you have multiple commits on
> one day, you (can) have multiple binaries with the same date and
> different times, and you can adjust your heuristic for choosing the next
> bisection point accordingly.  Over longer periods, the number of commits
> per day averages out.
>
> > Could we tag the branchpoints, say when we branch off gcc 10, we tag the
> > branchpoint as tags/r11 and then we could use r11-123 as 123th commit on the
> > trunk since the branchpoint, and let bugzilla and web redirection handle
> > those rNN- style identifiers?
> > git describe --all --match 'r[0-9]*' ... | sed ...
> > to map hashes etc. to these rNN- identifiers and something to map them
> > back to hashes say for git web?
>
> Well, having such tags would allow git describe to produce identifiers
> that you might find more readable.  For instance, if I do
>
> git tag -a -m 'GCC 9 branchpoint' b9 $(git merge-base trunk gcc-9-branch)

Though I guess what you were suggesting is slightly different: this
will put the tag in the history of both trunk and branch, and it would
be better for "r11" to be only in the history of GCC 11.  So probably
best to tag the commit that bumps BASE-VER instead, i.e.

$ git tag -a -m 'GCC 10 stage 1 open' gcc10
70f448fa5347ba24e0916201dd8549bc16783ff0
$ git tag -a -m 'GCC 9 stage 1 open' gcc9
949bc65ce4d0d7dd036ccfb279bffe63d02feee6
$ git tag -a -m 'GCC 8 stage 1 open' gcc8
498621e8159c1f494a9b8a5f2c3e5225c74ed242
...
$ git describe trunk
gcc10-2527-gac18cc031cd
$ git describe gcc-9-branch
gcc9-7633-g28a024c36af

Does this sound good to you?  Anyone have thoughts about naming for the tags?

Since alphabetical sorting won't do well with gcc9 and gcc10, you may
want to use the beginning of time tag for naming your binaries.  Also
because the stage 1 boundary isn't that interesting for bisection.

Jason

> git tag -a -m'Beginning of Time' r1 3cf0d8938a953ef13e57239613d42686f152b4fe
> git describe --match r1 trunk
>
> r1-170718-gdb868bacf6a
>
> These tags don't need to be shared, this works fine locally.
>
> Note that when you feed such an identifier to other git commands, they
> ignore the first two parts and just use the hash.
>
> This might be a good alternative to dates for you, and people could
> refer to them interchangeably with raw hashes in the web interface.


Re: Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git)

2019-09-19 Thread Jason Merrill
On Wed, Aug 14, 2019 at 2:14 PM Jason Merrill  wrote:
> On Mon, Aug 5, 2019 at 2:22 PM Jason Merrill  wrote:
> > On 8/5/19 11:34 AM, Jakub Jelinek wrote:
> > > On Mon, Aug 05, 2019 at 11:20:09AM -0400, Jason Merrill wrote:
> > >> I agree.  But for those who want a monotonically increasing
> > >> identifier, there's already one in git: CommitDate.  In the discussion
> > >> of this issue four years ago,
> > >
> > > While commit date is monotonically increasing, it has the problem that at
> > > certain dates there are very few commits, at others many.  When doing
> > > bisection by hand, one does the revision computation (min+max)/2 in head
> > > (it doesn't have to be precise of course, just roughly, and isn't perfect
> > > either, because in svn all of trunk and branches contribute to the 
> > > revision
> > > numbers), with dates it would be division into further uneven chunks.
> >
> > That's true, but is it a major problem?  If you have multiple commits on
> > one day, you (can) have multiple binaries with the same date and
> > different times, and you can adjust your heuristic for choosing the next
> > bisection point accordingly.  Over longer periods, the number of commits
> > per day averages out.
> >
> > > Could we tag the branchpoints, say when we branch off gcc 10, we tag the
> > > branchpoint as tags/r11 and then we could use r11-123 as 123th commit on 
> > > the
> > > trunk since the branchpoint, and let bugzilla and web redirection handle
> > > those rNN- style identifiers?
> > > git describe --all --match 'r[0-9]*' ... | sed ...
> > > to map hashes etc. to these rNN- identifiers and something to map them
> > > back to hashes say for git web?
> >
> > Well, having such tags would allow git describe to produce identifiers
> > that you might find more readable.  For instance, if I do
> >
> > git tag -a -m 'GCC 9 branchpoint' b9 $(git merge-base trunk gcc-9-branch)
>
> Though I guess what you were suggesting is slightly different: this
> will put the tag in the history of both trunk and branch, and it would
> be better for "r11" to be only in the history of GCC 11.  So probably
> best to tag the commit that bumps BASE-VER instead, i.e.
>
> $ git tag -a -m 'GCC 10 stage 1 open' gcc10
> 70f448fa5347ba24e0916201dd8549bc16783ff0
> $ git tag -a -m 'GCC 9 stage 1 open' gcc9
> 949bc65ce4d0d7dd036ccfb279bffe63d02feee6
> $ git tag -a -m 'GCC 8 stage 1 open' gcc8
> 498621e8159c1f494a9b8a5f2c3e5225c74ed242
> ...
> $ git describe trunk
> gcc10-2527-gac18cc031cd
> $ git describe gcc-9-branch
> gcc9-7633-g28a024c36af
>
> Does this sound good to you?  Anyone have thoughts about naming for the tags?
>
> Since alphabetical sorting won't do well with gcc9 and gcc10, you may
> want to use the beginning of time tag for naming your binaries.  Also
> because the stage 1 boundary isn't that interesting for bisection.
>
> > git tag -a -m'Beginning of Time' r1 3cf0d8938a953ef13e57239613d42686f152b4fe
> > git describe --match r1 trunk
> >
> > r1-170718-gdb868bacf6a
> >
> > These tags don't need to be shared, this works fine locally.
> >
> > Note that when you feed such an identifier to other git commands, they
> > ignore the first two parts and just use the hash.
> >
> > This might be a good alternative to dates for you, and people could
> > refer to them interchangeably with raw hashes in the web interface.

I suppose we also need to decide what form we want to use for the
equivalent of gcc.gnu.org/rN.  I figure it needs to be some prefix
followed by a "commit-ish" (hash, tag, etc.).  Perhaps "g:" as the
prefix, so

gcc.gnu.org/g:HEAD
gcc.gnu.org/g:gcc-9-branch
gcc.gnu.org/g:3cf0d8938a953e

?

Jason


Re: Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git)

2019-09-21 Thread Segher Boessenkool
On Thu, Sep 19, 2019 at 03:29:27PM -0400, Jason Merrill wrote:
> I suppose we also need to decide what form we want to use for the
> equivalent of gcc.gnu.org/rN.  I figure it needs to be some prefix
> followed by a "commit-ish" (hash, tag, etc.).  Perhaps "g:" as the
> prefix, so
> 
> gcc.gnu.org/g:HEAD
> gcc.gnu.org/g:gcc-9-branch
> gcc.gnu.org/g:3cf0d8938a953e

Hrm, but we should probably not allow everything here, some things can
be expensive to compute (HEAD~123456 for example), and we don't want to
expose the reflog on the server (if there even is one), etc.  OTOH
allowing pretty much everything here is a quite neat idea.

What do we use for gitweb thing?  That might have a solution for this
already.  Currently we seem to use plain gitweb, maybe cgit or similar
would be nicer?  It looks more modern, anyway :-P


Segher


Re: Monotonically increasing counter (was Re: [Contrib PATCH] Add scripts to convert GCC repo from SVN to Git)

2019-09-21 Thread Nicholas Krause



On 9/21/19 2:18 PM, Segher Boessenkool wrote:

On Thu, Sep 19, 2019 at 03:29:27PM -0400, Jason Merrill wrote:

I suppose we also need to decide what form we want to use for the
equivalent of gcc.gnu.org/rN.  I figure it needs to be some prefix
followed by a "commit-ish" (hash, tag, etc.).  Perhaps "g:" as the
prefix, so

gcc.gnu.org/g:HEAD
gcc.gnu.org/g:gcc-9-branch
gcc.gnu.org/g:3cf0d8938a953e

Hrm, but we should probably not allow everything here, some things can
be expensive to compute (HEAD~123456 for example), and we don't want to
expose the reflog on the server (if there even is one), etc.  OTOH
allowing pretty much everything here is a quite neat idea.

What do we use for gitweb thing?  That might have a solution for this
already.  Currently we seem to use plain gitweb, maybe cgit or similar
would be nicer?  It looks more modern, anyway :-P


Segher


If I recall correctly using git branches based off tags is the preferred 
way. And to


Seger's point after a server there is none after pulling down in git. 
Everything is


off line unless he means something else. The biggest thing as I pointed 
out at


Cauldron in terms of issues are:

a) How much history do you need in terms of how far back for git bisect 
or rebasing


and

b. Branching after a major release or for other non trunk branches. How 
to allow


other branches or how to set them up using tags e.t.c in git for this.

Mostly the problem with git is getting in right for these two based on 
the project


requirments,

Nick



  1   2   >