Robert Collins added the comment:

A few thoughts.

Adding a new public symbol seems inappropriate here: this is a performance 
issue that is well predictable and we should cater for that (given difflibs 
current performance).

I'll note in passing that both bzr and hg have much higher performance 
difference algorithms that we could pick up and includes as a replacement 
SequenceMatcher, which might significantly reduce the threshold at which we 
need to default-cap things - but such a threshold will still exist.

I totally agree that _diffThreshold should apply to non-string sequences - 
anything where we're going to hit high-order complexity outputting the 
difference. That said, I speculate that perhaps we'd be better off outputting 
both objects in some structured fashion and letting a later process render them 
(for things like CI systems and test databases, where fidelity of reproduction 
is more important than having the output fit on one screen. This is a different 
issue though and something we should revisit later.

That suggests to me though that the largest diff we output should be chosen 
based on the textual representation of the diff - we're doing it for human 
readability. Whereas the threshold for calculating a diff at all should be 
based on performance. It can be very expensive to calculate a diff on large 
sequences, but the diff might be much much larger than the sequence length 
indicates [because each item in the sequence may be very large]. Perhaps thats 
over thinking it?

Anyhow- short term, surely just making the threshold apply to any sequenced 
type is sufficient to fix the bug?

----------
nosy: +rbcollins

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19217>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to