> On Dec 30, 2019, at 6:31 PM, Richard Earnshaw (lists) 
> <richard.earns...@arm.com> wrote:
> 
> On 30/12/2019 13:00, Maxim Kuvyrkov wrote:
>>> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) 
>>> <richard.earns...@arm.com> wrote:
>>> 
>>> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>>>> Below are several more issues I found in reposurgeon-6a conversion 
>>>> comparing it against gcc-reparent conversion.
>>>> 
>>>> I am sure, these and whatever other problems I may find in the reposurgeon 
>>>> conversion can be fixed in time.  However, I don't see why should bother.  
>>>> My conversion has been available since summer 2019, I made it ready in 
>>>> time for GCC Cauldron 2019, and it didn't change in any significant way 
>>>> since then.
>>>> 
>>>> With the "Missed merges" problem (see below) I don't see how reposurgeon 
>>>> conversion can be considered "ready".  Also, I expected a diligent 
>>>> developer to compare new conversion (aka reposurgeon's) against existing 
>>>> conversion (aka gcc-pretty / gcc-reparent) before declaring the new 
>>>> conversion "better" or even "ready".  The data I'm seeing in differences 
>>>> between my and reposurgeon conversions shows that gcc-reparent conversion 
>>>> is /better/.
>>>> 
>>>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
>>>> conversion.  I welcome Richard E. to modify his summary scripts to work 
>>>> with svn-git scripts, which should be straightforward, and I'm ready to 
>>>> help.
>>>> 
>>> 
>>> I don't think either of these conversions are any more ready to use than
>>> the reposurgeon one, possibly less so.  In fact, there are still some
>>> major issues to resolve first before they can be considered.
>>> 
>>> gcc-pretty has completely wrong parent information for the gcc-3 era
>>> release tags, showing the tags as being made directly from trunk with
>>> massive deltas representing the roll-up of all the commits that were
>>> made on the gcc-3 release branch.
>> 
>> I will clarify the above statement, and please correct me where you think 
>> I'm wrong.  Gcc-pretty conversion has the exact right parent information for 
>> the gcc-3 era
>> release tags as recorded in SVN version history.  Gcc-pretty conversion aims 
>> to produce an exact copy of SVN history in git.  IMO, it manages to do so 
>> just fine.
>> 
>> It is a different thing that SVN history has a screwed up record of gcc-3 
>> era tags.
> 
> It's not screwed up in svn.  Svn shows the correct history information for 
> the gcc-3 era release tags, but the git-svn conversion in gcc-pretty does not.
> 
> For example, looking at gcc_3_0_release in expr.c with git blame and svn 
> blame shows

In SVN history tags/gcc_3_0_release has been copied off /trunk:39596 and in the 
same commit bunch of files were replaced from /branches/gcc-3_0-branch/ (and 
from different revisions of this branch!).

$ svn log -qv --stop-on-copy file://$(pwd)/tags/gcc_3_0_release | grep 
"/tags/gcc_3_0_release \|/tags/gcc_3_0_release/gcc/expr.c 
\|/tags/gcc_3_0_release/gcc/reload.c "
   A /tags/gcc_3_0_release (from /trunk:39596)
   R /tags/gcc_3_0_release/gcc/expr.c (from 
/branches/gcc-3_0-branch/gcc/expr.c:43255)
   R /tags/gcc_3_0_release/gcc/reload.c (from 
/branches/gcc-3_0-branch/gcc/reload.c:42007)

IMO, from such history (absent external knowledge about better reparenting 
options) the best choice for parent branch is /trunk@39596, not 
/branches/gcc-3_0-branch at a random revision from the replaced files.

Still, I see your point, and I will fix reparenting support.  Whether GCC 
community opts to reparent or not reparent is a different topic.

--
Maxim Kuvyrkov
https://www.linaro.org


> git blame expr.c:
> 
> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   396)         
> return temp;
> ba0a9cb85431 (Richard Kenner         1992-03-03 23:34:57 +0000   397)       }
> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   398)     /* 
> Copy the address into a pseudo, so that the returned value
> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   399)        
> remains correct across calls to emit_queue.  */
> 5fbf0b0d5828 (no-author              2001-06-17 19:44:25 +0000   400)     
> XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
> 59f26b7caad9 (Richard Kenner         1994-01-11 00:23:47 +0000   401)     
> return new;
> 
> git log 5fbf0b0d5828
> commit 5fbf0b0d5828687914c1c18a83ff12c8627d5a70 (HEAD, tag: gcc_3_0_release)
> Author: no-author <no-aut...@gcc.gnu.org>
> Date:   Sun Jun 17 19:44:25 2001 +0000
> 
>    This commit was manufactured by cvs2svn to create tag
>    'gcc_3_0_release'.
> 
> while svn blame expr.c correctly shows:
> 
>   386     kenner             return temp;
>   386     kenner           }
> 42209     bernds         /* Copy the address into a pseudo, so that the 
> returned value
> 42209     bernds            remains correct across calls to emit_queue.  */
> 42209     bernds         XEXP (new, 0) = copy_to_reg (XEXP (new, 0));
>  6375     kenner         return new;
> 
> svn log -r42209 ^/
> ------------------------------------------------------------------------
> r42209 | bernds | 2001-05-17 18:07:08 +0100 (Thu, 17 May 2001) | 2 lines
> 
> Fix queueing-related bugs
> 
> In other words, svn can correctly track the files that were modified on the 
> release branch, while the git conversion looses that information, rolling up 
> all the diffs on the release branch into a single unattributed commit.
> 
> As I said, gcc-reparent is better in this regard, but there are still 
> artefacts from conversion, such as incorrect merge records, that show up.
> 
> R.
> 
>> 
>>> 
>>> gcc-reparent is better, but many (most?) of the release tags are shown
>>> as merge commits with a fake parent back to the gcc-3 branch point,
>>> which is certainly not what happened when the tagging was done at that
>>> time.
>> 
>> I agree with you here.
>> 
>>> 
>>> Both of these factually misrepresent the history at the time of the
>>> release tag being made.
>> 
>> Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the 
>> need for reparenting -- we lived with current history for gcc-3 release tags 
>> for a long time.  I would argue their continued brokenness is not a 
>> show-stopper.
>> 
>> Looking at this from a different perspective, when I posted the initial 
>> svn-git scripts back in Summer, the community roughly agreed on a plan to
>> 1. Convert entire SVN history to git.
>> 2. Use the stock git history rewrite tools (git filter-branch) to fixup what 
>> we want, e.g., reparent tags and branches or set better author/committer 
>> entries.
>> 
>> Gcc-pretty does (1) in entirety.
>> 
>> For reparenting, I tried a 15min fix to my scripts to enable reparenting, 
>> which worked, but with artifacts like the merge commit from old and new 
>> parents.  I will drop this and instead use tried-and-true "git 
>> filter-branch" to reparent those tags and branches, thus producing 
>> gcc-reparent from gcc-pretty.
>> 
>>> 
>>> As for converting my script to work with your tools, I'm afraid I don't
>>> have time to work on that right now.  I'm still bogged down validating
>>> the incorrect bug ids that the script has identified for some commits.
>>> I'm making good progress (we're down to 160 unreviewed commits now), but
>>> it is still going to take what time I have over the next week to
>>> complete that task.
>>> 
>>> Furthermore, there is no documentation on how your conversion scripts
>>> work, so it is not possible for me to test any work I might do in order
>>> to validate such changes.  Not being able to run the script locally to
>>> test change would be a non-starter.
>>> 
>>> You are welcome, of course, to clone the script I have and attempt to
>>> modify it yourself, it's reasonably well documented.  The sources can be
>>> found in esr's gcc-conversion repository here:
>>> https://gitlab.com/esr/gcc-conversion.git
>> 
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>> 
>>> 
>>> 
>>>> Meanwhile, I'm going to add additional root commits to my gcc-reparent 
>>>> conversion to bring in "missing" branches (the ones, which don't share 
>>>> history with trunk@1) and restart daily updates of gcc-reparent conversion.
>>>> 
>>>> Finally, with the comparison data I have, I consider statements about 
>>>> git-svn's poor quality to be very misleading.  Git-svn may have had 
>>>> serious bugs years ago when Eric R. evaluated it and started his work on 
>>>> reposurgeon.  But a lot of development has happened and many problems have 
>>>> been fixed since them.  At the moment it is reposurgeon that is producing 
>>>> conversions with obscure mistakes in repository metadata.
>>>> 
>>>> 
>>>> === Missed merges ===
>>>> 
>>>> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked 
>>>> ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane 
>>>> merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
>>>> 
>>>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
>>>> ----
>>>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
>>>> Author: Richard Earnshaw <rearn...@gcc.gnu.org>
>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>> 
>>>>   Merge trunk through to r149768
>>>> 
>>>>   Legacy-ID: 149804
>>>> 
>>>> COPYING.RUNTIME                                     |    73 +
>>>> ChangeLog                                           |   270 +-
>>>> MAINTAINERS                                         |    19 +-
>>>> <MANY OTHER FILES>
>>>> ----
>>>> 
>>>> at the same time for svn-git scripts we have:
>>>> 
>>>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
>>>> ----
>>>> commit ce7d5c8df673a7a561c29f095869f20567a7c598
>>>> Merge: 4970119c20da 3a69b1e566a7
>>>> Author: Richard Earnshaw <rearn...@arm.com>
>>>> Date:   Mon Jul 20 08:15:51 2009 +0000
>>>> 
>>>>   Merge trunk through to r149768
>>>> 
>>>>   git-svn-id: 
>>>> https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 
>>>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>>> ----
>>>> 
>>>> ... which agrees with
>>>> $ svn propget svn:mergeinfo 
>>>> file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
>>>> /trunk:142588-149768
>>>> 
>>>> === Bad author entries ===
>>>> 
>>>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
>>>> "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is 
>>>> unlikely to start with a digit.
>>>> 
>>>> === Missed authors ===
>>>> 
>>>> Reposurgeon-6a conversion misses many authors, below is a list of people 
>>>> with names starting with "A".
>>>> 
>>>> Akos Kiss
>>>> Anders Bertelrud
>>>> Andrew Pochinsky
>>>> Anton Hartl
>>>> Arthur Norman
>>>> Aymeric Vincent
>>>> 
>>>> === Conservative author entries ===
>>>> 
>>>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many 
>>>> commits where svn-git conversion manages to extract valid email from 
>>>> commit data.  This happens for hundreds of author entries.
>>>> 
>>>> Regards,
>>>> 
>>>> --
>>>> Maxim Kuvyrkov
>>>> https://www.linaro.org
>>>> 
>>>> 
>>>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
>>>>> wrote:
>>>>> 
>>>>> 
>>>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <ja...@redhat.com> wrote:
>>>>>> 
>>>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>>>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>>>>> spelling and other mistakes in the commit authors?
>>>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>>>>>> Jakub Jakub Jelinek (1):
>>>>>> Jakub Jeilnek (1):
>>>>>> Jelinek (1):
>>>>>> entries next to the expected one with most of the commits.
>>>>>> For the misspellings, wonder if e.g. we couldn't compute edit distances 
>>>>>> from
>>>>>> other names and if we have one with many commits and then one with very 
>>>>>> few
>>>>>> with small edit distance from those, flag it for human review.
>>>>> 
>>>>> This is close to what svn-git-author.sh script is doing in gcc-pretty and 
>>>>> gcc-reparent conversions.  It ignores 1-3 character differences in 
>>>>> author/committer names and email addresses.  I've audited results for all 
>>>>> branches and didn't spot any mistakes.
>>>>> 
>>>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and 
>>>>> gcc-reposurgeon-5a repos among themselves.  Below are current notes for 
>>>>> comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>>>> 
>>>>> == Merges on trunk ==
>>>>> 
>>>>> Reposurgeon creates merge entries on trunk when changes from a branch are 
>>>>> merged into trunk.  This brings entire development history from the 
>>>>> branch to trunk, which is both good and bad.  The good part is that we 
>>>>> get more visibility into how the code evolved.  The bad part is that we 
>>>>> get many "noisy" commits from merged branch (e.g., "Merge in trunk" every 
>>>>> few revisions) and that our SVN branches are work-in-progress quality, 
>>>>> not ready for review/commit quality.  It's common for files to be 
>>>>> re-written in large chunks on branches.
>>>>> 
>>>>> Also, reposurgeon's commit logs don't have information on SVN path from 
>>>>> which the change came, so there is no easy way to determine that a given 
>>>>> commit is from a merged branch, not an original trunk commit.  Git-svn, 
>>>>> on the other hand, provides "git-svn-id: <path>@<revision>" tags in its 
>>>>> commit logs.
>>>>> 
>>>>> My conversion follows current GCC development policy that trunk history 
>>>>> should be linear.  Branch merges to trunk are squashed.  Merges between 
>>>>> non-trunk branches are handled as specified by svn:mergeinfo SVN 
>>>>> properties.
>>>>> 
>>>>> == Differences in trees ==
>>>>> 
>>>>> Git trees (aka filesystem content) match between pretty/trunk and 
>>>>> reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>>>>> Here is SVN log of that revision (restoration of deleted trunk):
>>>>> ------------------------------------------------------------------------
>>>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>>>>> Changed paths:
>>>>> A /trunk (from /trunk:130802)
>>>>> ------------------------------------------------------------------------
>>>>> 
>>>>> Reposurgeon conversion has:
>>>>> -------------
>>>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>>>>> Author: Daniel Berlin <dber...@gcc.gnu.org>
>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>> 
>>>>>  Readd trunk
>>>>> 
>>>>>  Legacy-ID: 130805
>>>>> 
>>>>> .gitignore | 17 -----------------
>>>>> 1 file changed, 17 deletions(-)
>>>>> -------------
>>>>> and my conversion has:
>>>>> -------------
>>>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>>>>> Author: Daniel Berlin <dber...@dbrelin.org>
>>>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>>>> 
>>>>>  Readd trunk
>>>>> 
>>>>> 
>>>>>  git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 
>>>>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>>>> -------------
>>>>> 
>>>>> It appears that .gitignore has been added in r1 by reposurgeon and then 
>>>>> deleted at r130805.  In SVN repository .gitignore was added in r195087.  
>>>>> I speculate that addition of .gitignore at r1 is expected, but it's 
>>>>> deletion at r130805 is highly suspicious.
>>>>> 
>>>>> == Committer entries ==
>>>>> 
>>>>> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even 
>>>>> when it correctly detects author name from ChangeLog.
>>>>> 
>>>>> reposurgeon-5a:
>>>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mar...@gcc.gnu.org>
>>>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz 
>>>>> <joz...@gcc.gnu.org>
>>>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath 
>>>>> <frede...@gcc.gnu.org>
>>>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay 
>>>>> <g...@gcc.gnu.org>
>>>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener 
>>>>> <rgue...@gcc.gnu.org>
>>>>> 
>>>>> pretty:
>>>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mli...@suse.cz>
>>>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz 
>>>>> <joze...@mittosystems.com>
>>>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath 
>>>>> <frede...@codesourcery.com>
>>>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay <a...@gjlay.de>
>>>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener 
>>>>> <rguent...@suse.de>
>>>>> 
>>>>> == Bad summary line ==
>>>>> 
>>>>> While looking around r138087, below caught my eye.  Is the contents of 
>>>>> summary line as expected?
>>>>> 
>>>>> commit cc2726884d56995c514d8171cc4a03657851657e
>>>>> Author: Chris Fairles <chris.fair...@gmail.com>
>>>>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>>>> 
>>>>>  acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>>>> 
>>>>>  2008-07-23  Chris Fairles <chris.fair...@gmail.com>
>>>>> 
>>>>>          * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define 
>>>>> GLIBCXX_LIBS.
>>>>>          Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>>>>          * src/Makefile.am: Use it.
>>>>>          * configure: Regenerate.
>>>>>          * configure.in: Likewise.
>>>>>          * Makefile.in: Likewise.
>>>>>          * src/Makefile.in: Likewise.
>>>>>          * libsup++/Makefile.in: Likewise.
>>>>>          * po/Makefile.in: Likewise.
>>>>>          * doc/Makefile.in: Likewise.
>>>>> 
>>>>>  Legacy-ID: 138087
>>>>> 
>>>>> 
>>>>> --
>>>>> Maxim Kuvyrkov
>>>>> https://www.linaro.org

Reply via email to