[issue19217] Calling assertEquals for moderately long list takes too long

Ezio Melotti Sun, 10 Aug 2014 02:30:17 -0700

Ezio Melotti added the comment:

I thought some more about this, and I think we can do better.  
Since _diffThreshold only affects strings, the goal of this issue is to extend 
the check to other types.  Instead of doing this by adding more attributes and 
behaviors, I would like to keep things as simple and consistent as possible 
(since they are already quite complicated), and possibly making them even 
simpler.


I'll try to summarize the current situation and propose some possible solutions 
to achieve this goal.

Currently we have:
* _diffThreshold (used to avoid computing the diff, only affects strings);
* maxDiff (used to "hide" diffs that are too big, after they have been 
computed);

These two attributes deal with two different but related problems:
* _diffThreshold tries to avoid /calculating/ expensive diffs;
* maxDiff tries to avoid /printing/ big and unreadable diffs;

These are different because one acts before and the other after, but they are 
also related because the threshold directly affects the size of the diff.

By picking some ideas from Serhiy comments and patch, I think we should:
1) try to have a single threshold for all types, and use line-based counting 
for strings (so if the threshold is 32, this means 32 elements in a list, 32 
items in a dict, 32 lines in a string);
2) make this threshold public, so that users can control this behavior (even 
though we should be careful about what we add -- we can always keep it private 
and expose it later);
3) factor out the diff computation/truncation function (like Serhiy did), and 
use it consistently, so that we can get rid of _truncateMessage and simplify 
the formatting.
4) if possible, try to unify the threshold and the maxDiff -- if the threshold 
is low the diff will be small, if it's high/None the diff will be big;

Now, doing 1) shouldn't be an issue, 2) is also doable if we agree on it, and 
3) is also not a problem since these are internal functions.  If we decide to 
go for 4) as well there are at least two options:
a) repurpose maxDiff to indicate the number of items/lines;
b) introduce a new attribute (e.g. maxDiffItems) and deprecate maxDiff;

Option a) might be doable, and even if it introduces a change in behavior it 
might be acceptable since it affects the output of the messages in case of 
failure, and I don't think anyone is relying on an exact output (also because 
tests shouldn't be failing).  Moreover, the most common usage of maxDiff is 
setting it to None, and having the threshold to None means that the full diff 
will be computed and printed, leaving the behavior unchanged.

Thoughts?


> Second, we can extend difflib by functions which allow to limit a number
> of reported differences.

I was thinking about having a lazy, generator-based diffing, so we could 
generate as many diff lines we need and stop once we reach the threshold.  I'm 
not sure if difflib already has something similar but anyway this is a separate 
issue.  Also, if we unify maxDiff and the threshold we won't need this, since 
all that is computer is also printed.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19217>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19217] Calling assertEquals for moderately long list takes too long

Reply via email to