[issue24384] difflib.SequenceMatcher faster quick_ratio with lower bound specification

2018-06-11 Thread floyd


floyd  added the comment:

Yes, I agree this should be closed. Especially because my proposed code is so 
incredibly bad (e.g. regarding performance) that it should be rejected. Back 
then I was horribly wrong and didn't understand the problem well enough yet.

If somebody would like to have such a function, this is all it needs:

def quick_ratio_ge(self, a, b, threshold):
return threshold <= 2.0*(len(a))/(len(a)+len(b))

Here is how I actually use it in code: 
https://github.com/modzero/burp-ResponseClusterer/blob/master/ResponseClusterer.py#L343

Sorry for the fuzz

--
resolution:  -> rejected
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24384] difflib.SequenceMatcher faster quick_ratio with lower bound specification

2018-06-11 Thread Tal Einat


Tal Einat  added the comment:

Since this is a small enhancement proposal that is not sure to be approved, and 
there has been no followup for years, I vote to close this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24384] difflib.SequenceMatcher faster quick_ratio with lower bound specification

2015-06-08 Thread Tal Einat

Tal Einat added the comment:

You should post this on the python-ideas mailing list if you think this should 
be added to the stdlib. Make sure to reference this issue if you do so.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24384] difflib.SequenceMatcher faster quick_ratio with lower bound specification

2015-06-08 Thread floyd

floyd added the comment:

Agree with the separate function (especially as the return value would change 
from float to bool).

In my experience this is one of the most often occuring use cases for difflib 
in practice.

Another reason is that it is not obvious that the user can optimize it with the 
appended version.

Some more opinions would be nice.

If this suggestion is rejected we could include a performance warning for this 
use case in the docs and/or I'll write an online code recipe which can be 
linked to.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24384] difflib.SequenceMatcher faster quick_ratio with lower bound specification

2015-06-07 Thread Tal Einat

Tal Einat added the comment:

Your second suggestion of adding a 'threshold' parameter to quick_ratio() is a 
bad idea in my opinion, since it would then be two significantly different 
functions crammed into one.

The separate function would be possible. However, is there a compelling reason 
for this to be in the stdlib, rather than an online code recipe?

--
nosy: +taleinat

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24384] difflib.SequenceMatcher faster quick_ratio with lower bound specification

2015-06-04 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24384] difflib.SequenceMatcher faster quick_ratio with lower bound specification

2015-06-04 Thread floyd

floyd added the comment:

Now that I gave it another thought, I think it would be better if we simply add 
threshold as a named parameter of quick_ratio

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24384] difflib.SequenceMatcher faster quick_ratio with lower bound specification

2015-06-04 Thread floyd

New submission from floyd:

I guess a lot of users of difflib call the SequenceMatcher in the following way 
(where a and b often have different lengths):

if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold:

However, for this use case the current quick_ratio is quite a performance loss. 
Therefore I propose to add an additional, optimized version quick_ratio_ge 
which would be called like this:

if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold):

As we are able to calculate upper bounds for threshold depending on the lengths 
of a and b this function would return much faster in a lot of cases.

An example of how quick_ratio_ge could be implemented is attached.

--
components: Library (Lib)
files: difflib_SequenceMatcher_quick_ratio_ge.py
messages: 244840
nosy: floyd
priority: normal
severity: normal
status: open
title: difflib.SequenceMatcher faster quick_ratio with lower bound specification
type: enhancement
Added file: 
http://bugs.python.org/file39625/difflib_SequenceMatcher_quick_ratio_ge.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com