[issue43198] Operations on sets more than hundred times less efficient with python3.9 than with previous versions

Raymond Hettinger Thu, 11 Feb 2021 18:46:13 -0800


Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:


Addenda.  Top use cases for sets (roughly in order of commonality):

1. Eliminate duplicates from an iterable input and loop over the result.
2. Store elements in a set just once but do many membership tests.
3. Perform data analysis on multiple sets using union, intersection, 
difference, and isdisjoint.
4. Remove elements from a set, one at a time, until it is empty.
5. Algorithms and that alternately add and remove different elements over time 
(toposort, NFA, DFA, etc).  

The first three are all penalized by an extra inner loop test and the loss of 
the register to track the freeslot.  Those use cases get no offsetting benefit.

Case 4 doesn't exercise the dummy reuse at all.  So, it is unaffected.

The last is approximately a net wash.  It pays the inner loop price, suffers 
the loss of a register, and incurs an extra test outside the loop, but it 
occasionally gets lucky and reuses a freeslot. The benefit for reuse is that it 
is delays the inevitable resize which would have reclaimed the dummy entries 
earlier. (The comparable use case for dicts is LRU/LFU caches where new entries 
are added at the same rate as old entries are removed.  That use case also 
showed a net wash when freeslot tracking was removed from dicts).

Not on the list at all is the case of a large set, where exactly the same 
element is added and removed in a tight loop.  The payoff for this case is that 
the O(n) resize step never occurs; however, this case is so rare that there is 
no reason to preference it over the common cases.  It now has the same 
performance as case 5.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43198>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43198] Operations on sets more than hundred times less efficient with python3.9 than with previous versions

Reply via email to