Attached an updated version of the script with inline unified diffs,
and update regex.
If you run the script and find other problems, send me the cs id and
I'll look into it before the conversion.

On Fri, Feb 10, 2017 at 7:41 AM, Berker Peksağ <berker.pek...@gmail.com> wrote:
> On Fri, Feb 10, 2017 at 4:36 AM, Senthil Kumaran <sent...@uthcode.com> wrote:
>> On Thu, Feb 9, 2017 at 4:28 PM, Ezio Melotti <ezio.melo...@gmail.com> wrote:
>>> No need to wait, I put together a script that shows the result of the
>>> rewriting :)
>>
>> Thank you, Ezio!
>>
>> I and Ezio were working on this today afternoon and agreed that if we
>> do rewrite of various formats issue NNNN to bpo-NNNN then doing it
>> over entire commit message gives much better experience than doing it
>> over the first line.
>>
>>
>> We can judge this by looking at the actual output (from the script
>> that Ezio shared).
>>
>> http://orsenthil.github.io/cpython-hg-to-git/
>>
>> This picked up 1000 random revisions and added some tricky corner
>> cases that we identified and did the re-write.
>>
>> Please view the output of commit logs converted here
>> http://orsenthil.github.io/cpython-hg-to-git/ and see if it looks
>> better than the status quo.
>
> Thanks, Senthil and Ezio! Looks pretty good to me. I noticed some edge cases:
>
> * -46692:46918 merged from branch aimacintyre-sf1454481
>   +46692:46918 merged from branch aimacintyre-bpo-1454481
>

It's a branch name, so it shouldn't be changed, but it actually refers
to a valid issue id.
This is now fixed.

> * -SF bug #1012315: weakref.WeakValueDictionary should override .has_key()
>   +SF bpo-1012315:  weakref.WeakValueDictionary should override .has_key()
>
>   -Backport checkin: Fix typo (from SF bug #1086127).
>   +Backport checkin: Fix typo (from SF bpo-1086127).
>
>   -   used on BTree databases.  [SF bug id 788421]
>   +   used on BTree databases.  [SF bpo-788421]
>
>   Is it possible to replace 'SF bug #NNNN' with 'bpo-NNNN'?
>

I had intentionally left it for these cases, but thinking about it,
removing SF might actually be a good idea, since it might be confusing
and it's already possible to distinguish them from the id.
This is now fixed.

> * -SF 798269:  bug fix for doctest (sf bug id: 798254
>   +bpo-798269:  bug fix for doctest (sf bug id: 798254
>
>   'bug id NNNN' matches but not 'bug id: NNNN'.
>

There are a few commit messages with this structure.
I thought the id was the same but at least in this case they are
different (and both valid).
Regardless, it was an easy fix, so I updated the regex.

> * -Resolves SF bugs 697989, 697988, 697986.
>   +Resolves SF bpo-697989, 697988, 697986.
>
>   This doesn't look like a common case and we can just ignore it :)
>

Yes, I didn't bother fixing this, also because it's not as trivial as
the other fixes.


> --Berker
from __future__ import print_function

import re
import sys
import json
import random
import difflib
import xmlrpclib

from mercurial import ui, hg

u = ui.ui()
repo = hg.repository(ui.ui(), '.')


print('Retrieving bpo ids    ...', end=' ')
try:
    with open('valid_ids.json') as f:
        valid_ids = json.load(f)
except IOError:
    url = 'http://bugs.python.org/xmlrpc'
    roundup = xmlrpclib.ServerProxy(url)
    valid_ids = roundup.list('issue', 'id')
    with open('valid_ids.json', 'w') as f:
        json.dump(valid_ids, f)
valid_ids = set(valid_ids)
print('[done]')

r = r'(?:SF\s*)?(?:(?:(?<!org/)issues?|bugs?|(?<!-)SF)(?:\s+id:?)?\s*#?|#)\s*(\d+)'
regex = re.compile(r, flags=re.MULTILINE|re.IGNORECASE)


N = 100

if len(sys.argv) == 1:
    print('Generating revs sample...', end=' ')
    revs = random.sample(repo, N)
    print('[done]')
else:
    revs = [repo[rev] for rev in sys.argv[1:]]

# uncomment to run on all the revs
#revs = [repo[rev] for rev in repo]

def re_cb(match):
    id = match.group(1)
    if id not in valid_ids:
        return match.group(0)
    else:
        return '<b>bpo-%s</b>' % id


print('Generating output     ...', end=' ')
with open('output.html', 'w') as f:
    for n, rev in enumerate(revs):
        desc = repo[rev].description() + '\n'
        newdesc = regex.sub(re_cb, desc)
        diff = list(difflib.unified_diff(desc.splitlines(True),
                                         newdesc.splitlines(True),
                                         n=1000))[3:]
        unch = '' if diff else ' (unchanged)'
        f.write('<pre>[<b>%s</b>]%s:\n' % (rev, unch))
        f.writelines(diff if diff else [desc])
        f.write('</pre>\n<hr>\n')
        #f.write()
        #f.write('<pre>[<b>%s</b>]: %s\n' % (rev, desc))
        #f.write('[<b>%s</b>]: %s</pre>\n<hr>\n' % (rev, newdesc))
print('[done]')
print(n+1, 'revisions converted')


unusual = """
794dad4b849f af811172717d f7d23bca599f 76a9a5131aae 31342913fb1e c4dd30b5d07e 3094843e7b92 e8940d4cd8ca 6d1e8162e855 27b698395d35 077d29384399 81ce9d412a4c c6df85e1d42e fedd6ccc5e5b 4ca32e4f7839 6db0a62b6aa6 69ac672b49b3 a6bcf4df1a85 a74b463bf76d 756c27efe193 df4943d24cb6 04f2801d9977 d9d69060f5e4
"""
ambiguous = """
0e8077cb3dd5 04f2801d9977 76a9a5131aae 3094843e7b92 e8940d4cd8ca fedd6ccc5e5b a4d869ecef33 bd2aa0247ada
"""
invalid = """
d2ae5affde14 329b28a85947
"""
broken = """
cfcc45f18515
"""
_______________________________________________
core-workflow mailing list
core-workflow@python.org
https://mail.python.org/mailman/listinfo/core-workflow
This list is governed by the PSF Code of Conduct: 
https://www.python.org/psf/codeofconduct

Reply via email to