Re: error printing in reversed order ?

2016-10-10 Thread Jonathan Wakely
On 7 October 2016 at 21:41, nicolas bouillot wrote:
> oops this works better:
> alias reversed_make='make 2>&1 >/dev/null | tac | egrep --color
> "\b(error|cpp|hpp)\b|$"'

Or just make 2>&1 | less


Re: Repository for the conversion machinery

2016-10-10 Thread Jonathan Wakely
On 7 October 2016 at 22:26, Joseph Myers wrote:
> On Fri, 7 Oct 2016, Frank Ch. Eigler wrote:
>> FWIW, I thought at one point the consensus was that the mailmap would
>> expand only to $use...@gcc.gnu.org rather than $userid@$organization,
>> esp. considering the case where there is no single $organization that
>> accurately covers the whole contribution timespan of the given $userid.
>
> I don't think there was any such consensus (older ids weren't from
> gcc.gnu.org anyway so @gcc.gnu.org would be nonsense for that part of the
> history).
>
> My view is: contributors are free to specify what name and email address
> they want used, but if they want something other than a single name and
> email address for the whole commit history with a given username, it's the
> contributor's responsibility to come up with lists of commits that use
> each mapping rather than a hypothetical recipe based on examining
> ChangeLogs.

We'd only need to look at the actual ChangeLogs if the commit message
doesn't include a name and email address. And if we just use the
committer, how do we record the author of a change?

As Richi said a year ago (and my reply was drafted a year ago but not sent) ...

On 17 September 2015 at 11:44, Richard Biener wrote:
> Maybe I'm missing sth but apart from the CVS imported revisions each
> SVN revision should contain the actual change plus the changes to the
> ChangeLog files (you can't count on the commit message itself I guess
> as not all people replicate the ChangeLog entries there).

It's probably a good start though. If the commit message does have:

-MM-DD  John Doe  

then it's probably reliable. If the commit message doesn't have that
(when I'm committing my own work I don't include that line in the
commit message) then look for ChangeLog entries in the commit.

> There may be cases we can't handle and then doing some commit ID
> mapping might be ok, but I expect 95% of the cases to work out nicely
> so we should preserve what is in the ChangeLog entry (note that we have
> very strict formatting requirement for the authors there).

Particularly since the ChangeLog entry gives the Author, which is
often not the same as the Committer.




>
> [reposurgeon aside from observations with other conversions where
> different author maps were needed for different revisions: the revision
> range for commits from the gcc2 repository works in the GCC case because
> that revision range came from CVS and so there are no tags with valid
> commit authors in that range.  But if you have a repository with different
> ranges of commits having different author maps *and* those ranges contain
> SVN tags, simply specifying a range .. doesn't
> work as expected, since ranges are interpreted in reposurgeon's ordering
> of events, not SVN's ordering, and the tag events are out of sequence with
> the commit events.]
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: Repository for the conversion machinery

2016-10-10 Thread Jason Merrill
On Mon, Oct 10, 2016 at 6:38 AM, Jonathan Wakely  wrote:
> On 7 October 2016 at 22:26, Joseph Myers wrote:
>> On Fri, 7 Oct 2016, Frank Ch. Eigler wrote:
>>> FWIW, I thought at one point the consensus was that the mailmap would
>>> expand only to $use...@gcc.gnu.org rather than $userid@$organization,
>>> esp. considering the case where there is no single $organization that
>>> accurately covers the whole contribution timespan of the given $userid.
>>
>> I don't think there was any such consensus (older ids weren't from
>> gcc.gnu.org anyway so @gcc.gnu.org would be nonsense for that part of the
>> history).
>>
>> My view is: contributors are free to specify what name and email address
>> they want used, but if they want something other than a single name and
>> email address for the whole commit history with a given username, it's the
>> contributor's responsibility to come up with lists of commits that use
>> each mapping rather than a hypothetical recipe based on examining
>> ChangeLogs.
>
> We'd only need to look at the actual ChangeLogs if the commit message
> doesn't include a name and email address. And if we just use the
> committer, how do we record the author of a change?
>
> As Richi said a year ago (and my reply was drafted a year ago but not sent) 
> ...
>
> On 17 September 2015 at 11:44, Richard Biener wrote:
>> Maybe I'm missing sth but apart from the CVS imported revisions each
>> SVN revision should contain the actual change plus the changes to the
>> ChangeLog files (you can't count on the commit message itself I guess
>> as not all people replicate the ChangeLog entries there).
>
> It's probably a good start though. If the commit message does have:
>
> -MM-DD  John Doe  
>
> then it's probably reliable. If the commit message doesn't have that
> (when I'm committing my own work I don't include that line in the
> commit message) then look for ChangeLog entries in the commit.
>
>> There may be cases we can't handle and then doing some commit ID
>> mapping might be ok, but I expect 95% of the cases to work out nicely
>> so we should preserve what is in the ChangeLog entry (note that we have
>> very strict formatting requirement for the authors there).
>
> Particularly since the ChangeLog entry gives the Author, which is
> often not the same as the Committer.

Yes, very often they will be different.  This processing can, and
probably should, be done with git filter-branch after the initial
conversion.

Jason


Re: Repository for the conversion machinery

2016-10-10 Thread Joseph Myers
On Mon, 10 Oct 2016, Jonathan Wakely wrote:

> > My view is: contributors are free to specify what name and email address
> > they want used, but if they want something other than a single name and
> > email address for the whole commit history with a given username, it's the
> > contributor's responsibility to come up with lists of commits that use
> > each mapping rather than a hypothetical recipe based on examining
> > ChangeLogs.
> 
> We'd only need to look at the actual ChangeLogs if the commit message
> doesn't include a name and email address. And if we just use the
> committer, how do we record the author of a change?

This is still hypothetical, since I haven't seen any scripts posted that 
would actual implement this, or any resulting mappings of commits, and one 
wouldn't normally expect a repository conversion to attempt to distinguish 
committer from author when the source version control system has no such 
distinction.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Repository for the conversion machinery

2016-10-10 Thread Eric S. Raymond
Joseph Myers :
> This is still hypothetical, since I haven't seen any scripts posted that 
> would actual implement this, or any resulting mappings of commits, and one 
> wouldn't normally expect a repository conversion to attempt to distinguish 
> committer from author when the source version control system has no such 
> distinction.

Wow.  Even attempting this would be a huge, ugly job.

I strongly recomend that if you want to try this, you separate it from the
initial repo conversion.  That is, get the project to git first.  Then
see if you can data-mine author information out of the history. If,
and only if, you get results that look reasonable, then you patch the repo
and force-push it, warning everyone there'll be a flag day.

The reason I recommend this is that I think you're going to have serious
trouble getting clean authorship data with good coverage.  The data
mining will be messy and take longer than you expect.

Here's how I'd do it:

1. Write an analyzer for commit logs.  Its goal should be to parse
   logs and produce a list of records each consiting of an author, a
   commit date, and a list of modified-file paths - one record
   per commit-log entry.

2. Run this once on each terminal commit log - that is, at each branch
   head on both the main Commit log and all its archival versions.
   Aggregate all the records, dropping duplicates.

3. Write a custom Python extension to reposurgeon that generates the
   same report, only this time per-commit and thus yielding a committer ID.

3. Set a recognition time window.  It must be more than 24 hours or you're
   going to have spurious negatives due to time-zone skew.

4. Write a program that fuzzy-matches the commit-log file-modification
   cliques to the per-commit cliques.  One aspect of "fuzzy" is the
   time window; you need to include as potential matches any commits back
   from the date of the commit-log entry *and those up to 24 hours forward*
   (time-zone skew again).  Also, you can't only look at the most recent
   matching commit if it's within the 24-hour window - time zone skew might
   mean that another one that looks older also matches, and might actually
   be more recent.

5. Try the naive implementation using a 24-hour time window.  Now look
   at the percentages of unmatched commits and commit-log entries.  If
   it's too high, how does it vary as the time window rises?

Alas, there are other dimensions of 'fuzzy'. Here are a couple:

1. Typos or omissions in the commit-log file cliques and/or author
   names.  To get good coverage you might find you need to do
   something like a Ratcliff-Obershelp fuzzy match.  Set a high
   similarity percentage, then back off it if you have lots of
   unmatched commits.

2. What if someone did two or more commits on different filesets, but
   described them in one commit-log entry?  Ideally you'd like to propagate
   the commit-log author info correctly to both, but testing for this case
   mechanically would be combinatorially explosive.  Your only hope is that
   you end up with few enough unmatched commits and commit-log entries
   that the problem can be solved manually.

Maybe you'll get lucky and the residuals (the sets of commits and commit-log
entries that don't have a match in the other set) will be tiny.  I wouldn't
count on it - I'd expect that you will trip over other noise sources and
have to figure out ways to fuzzy-match around them.

Once you have the residuals down to an acceptably low number, make your
matcher grind out a set of reposurgeon commands that patches the attributions
appropriately.  Apply.  By careful to add a predicate check that prevents
each transformation from applying if the date matches more than one commit;
those two will have to be treated as residuals and hand-patched.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


Re: Repository for the conversion machinery

2016-10-10 Thread Joseph Myers
On Mon, 10 Oct 2016, Eric S. Raymond wrote:

> I strongly recomend that if you want to try this, you separate it from the
> initial repo conversion.  That is, get the project to git first.  Then
> see if you can data-mine author information out of the history. If,
> and only if, you get results that look reasonable, then you patch the repo
> and force-push it, warning everyone there'll be a flag day.
> 
> The reason I recommend this is that I think you're going to have serious
> trouble getting clean authorship data with good coverage.  The data
> mining will be messy and take longer than you expect.

I also think it would be too messy, and don't think having such a flag day 
would be a good idea - once we've done the conversion we should keep 
commit ids stable (while having the commit objects from the existing git 
mirror in a disjoint set of branches not connected to the cleanly 
converted history, whether in a separate repository or not, so existing 
references to those commit ids continue to work as well - but I don't want 
to add a third set of commit ids for the same history as well).

In practice there are a lot of ways people have messed up ChangeLog 
commits or commit messages that I would expect to confuse such author 
extraction, even before you get to the parts of the history converted from 
CVS.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Converting to LRA (calling all maintainers)

2016-10-10 Thread Eric Botcazou
> > Do we have a Wiki page for the cc0 conversion?  If no, I can start one
> > based on my fresh experience with the Visium port.
> 
> There is none so far as far as I know.  Thanks for volunteering,

OK, the page is at https://gcc.gnu.org/wiki/CC0Transition and linked to from 
the "Current Projects" list on the HomePage.  Probably a bit too verbose...

-- 
Eric Botcazou


Re: Repository for the conversion machinery

2016-10-10 Thread Eric S. Raymond
Joseph Myers :
> In practice there are a lot of ways people have messed up ChangeLog 
> commits or commit messages that I would expect to confuse such author 
> extraction, even before you get to the parts of the history converted from 
> CVS.

This is also true.

I looked seriously at what it would take to recover this information
from the Emacs history, which is why I had the steps worked out in
such detail.  Having researched the matter, I did *not* offer to
actually add this wrinkle when I did their conversion...
-- 
http://www.catb.org/~esr/";>Eric S. Raymond