Re: [PATCH 2 of 2 v8] releasenotes: add similarity check function to compare incoming notes

2017-08-06 Thread Yuya Nishihara
On Sun, 6 Aug 2017 14:34:19 +0530, Rishabh Madan wrote:
> I checked for test-check-* and test-releasenotes-*. I guess I'll run the
> complete test suite next time.

The problem won't be seen if you have the fuzzywuzzy. Our build scripts
load extensions to collect docstrings to be translated for example, which
fails if fuzzywuzzy can't be imported.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2 v8] releasenotes: add similarity check function to compare incoming notes

2017-08-06 Thread Rishabh Madan
I checked for test-check-* and test-releasenotes-*. I guess I'll run the
complete test suite next time.
ᐧ

On Sat, Aug 5, 2017 at 8:09 PM, Yuya Nishihara  wrote:

> On Sat, 05 Aug 2017 08:53:50 +0530, Rishabh Madan wrote:
> > # HG changeset patch
> > # User Rishabh Madan 
> > # Date 1501890936 -19800
> > #  Sat Aug 05 05:25:36 2017 +0530
> > # Node ID 4121eab826799f3a257eb1fe26015583b36bbb66
> > # Parent  1d79b04c402f3f431ca052b677b1021ddd93a10e
> > releasenotes: add similarity check function to compare incoming notes
>
> > +import fuzzywuzzy.fuzz as fuzz
>
> I've moved this into similaritycheck() function to work around several
> test failures.
>



-- 
Rishabh Madan
Second Year Undergraduate student
Indian Institute of Technology, Kharagpur
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2 v8] releasenotes: add similarity check function to compare incoming notes

2017-08-05 Thread Yuya Nishihara
On Sat, 05 Aug 2017 08:53:50 +0530, Rishabh Madan wrote:
> # HG changeset patch
> # User Rishabh Madan 
> # Date 1501890936 -19800
> #  Sat Aug 05 05:25:36 2017 +0530
> # Node ID 4121eab826799f3a257eb1fe26015583b36bbb66
> # Parent  1d79b04c402f3f431ca052b677b1021ddd93a10e
> releasenotes: add similarity check function to compare incoming notes

> +import fuzzywuzzy.fuzz as fuzz

I've moved this into similaritycheck() function to work around several
test failures.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2 v8] releasenotes: add similarity check function to compare incoming notes

2017-08-05 Thread Yuya Nishihara
On Sat, 05 Aug 2017 08:53:50 +0530, Rishabh Madan wrote:
> # HG changeset patch
> # User Rishabh Madan 
> # Date 1501890936 -19800
> #  Sat Aug 05 05:25:36 2017 +0530
> # Node ID 4121eab826799f3a257eb1fe26015583b36bbb66
> # Parent  1d79b04c402f3f431ca052b677b1021ddd93a10e
> releasenotes: add similarity check function to compare incoming notes

Queued these, thanks.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 2 v8] releasenotes: add similarity check function to compare incoming notes

2017-08-04 Thread Rishabh Madan
# HG changeset patch
# User Rishabh Madan 
# Date 1501890936 -19800
#  Sat Aug 05 05:25:36 2017 +0530
# Node ID 4121eab826799f3a257eb1fe26015583b36bbb66
# Parent  1d79b04c402f3f431ca052b677b1021ddd93a10e
releasenotes: add similarity check function to compare incoming notes

It is possible that the incoming note fragments have some similar content as the
existing release notes. In case of a bug fix, we match for issue in the
existing notes. For other general cases, it makes use of fuzzywuzzy library to 
get
a similarity score. If the score is above a certain threshold, we ignore the
fragment, otherwise add it. But the score might be misleading for small commit
messages. So, it uses similarity function only if the length of string (in 
words)
is above a certain value. The patch adds tests related to its usage. But it 
needs
improvement in the sense of combining incoming notes. We can use interactive 
mode
for adding notes. Maybe we can do this if similarity is under a certain range.

diff -r 1d79b04c402f -r 4121eab82679 hgext/releasenotes.py
--- a/hgext/releasenotes.py Sat Jul 29 14:06:26 2017 +0530
+++ b/hgext/releasenotes.py Sat Aug 05 05:25:36 2017 +0530
@@ -14,6 +14,7 @@
 from __future__ import absolute_import
 
 import errno
+import fuzzywuzzy.fuzz as fuzz
 import re
 import sys
 import textwrap
@@ -46,6 +47,7 @@
 ]
 
 RE_DIRECTIVE = re.compile('^\.\. ([a-zA-Z0-9_]+)::\s*([^$]+)?$')
+RE_ISSUE = r'\bissue ?[0-9]{4,6}(?![0-9])\b'
 
 BULLET_SECTION = _('Other Changes')
 
@@ -92,6 +94,8 @@
 This is used to combine multiple sources of release notes together.
 """
 for section in other:
+existingnotes = converttitled(self.titledforsection(section)) + \
+convertnontitled(self.nontitledforsection(section))
 for title, paragraphs in other.titledforsection(section):
 if self.hastitledinsection(section, title):
 # TODO prompt for resolution if different and running in
@@ -100,16 +104,32 @@
  (title, section))
 continue
 
-# TODO perform similarity comparison and try to match against
-# existing.
+incoming_str = converttitled([(title, paragraphs)])[0]
+if section == 'fix':
+issue = getissuenum(incoming_str)
+if issue:
+if findissue(ui, existingnotes, issue):
+continue
+
+if similar(ui, existingnotes, incoming_str):
+continue
+
 self.addtitleditem(section, title, paragraphs)
 
 for paragraphs in other.nontitledforsection(section):
 if paragraphs in self.nontitledforsection(section):
 continue
 
-# TODO perform similarily comparison and try to match against
-# existing.
+incoming_str = convertnontitled([paragraphs])[0]
+if section == 'fix':
+issue = getissuenum(incoming_str)
+if issue:
+if findissue(ui, existingnotes, issue):
+continue
+
+if similar(ui, existingnotes, incoming_str):
+continue
+
 self.addnontitleditem(section, paragraphs)
 
 class releasenotessections(object):
@@ -136,6 +156,78 @@
 
 return None
 
+def converttitled(titledparagraphs):
+"""
+Convert titled paragraphs to strings
+"""
+string_list = []
+for title, paragraphs in titledparagraphs:
+lines = []
+for para in paragraphs:
+lines.extend(para)
+string_list.append(' '.join(lines))
+return string_list
+
+def convertnontitled(nontitledparagraphs):
+"""
+Convert non-titled bullets to strings
+"""
+string_list = []
+for paragraphs in nontitledparagraphs:
+lines = []
+for para in paragraphs:
+lines.extend(para)
+string_list.append(' '.join(lines))
+return string_list
+
+def getissuenum(incoming_str):
+"""
+Returns issue number from the incoming string if it exists
+"""
+issue = re.search(RE_ISSUE, incoming_str, re.IGNORECASE)
+if issue:
+issue = issue.group()
+return issue
+
+
+def findissue(ui, existing, issue):
+"""
+Returns true if issue number already exists in notes.
+"""
+if any(issue in s for s in existing):
+ui.write(_("\"%s\" already exists in notes; "
+ "ignoring\n") % issue)
+return True
+else:
+return False
+
+def similar(ui, existing, incoming_str):
+"""
+Returns true if similar note found in existing notes.
+"""
+if len(incoming_str.split()) > 10:
+merge = similaritycheck(incoming_str, existing)
+if not merge:
+ui.write(_("\"%s\" already