the proof is in the pudding: In [1]: a = range(10000)
In [2]: s = set(a) In [3]: s2 = set(a) In [5]: b = range(10000) In [6]: a == b Out[6]: True In [7]: s == s2 Out[7]: True In [8]: %timeit a == b 1000 loops, best of 3: 204 us per loop In [9]: %timeit s == s2 10000 loops, best of 3: 124 us per loop On Tue, Apr 6, 2010 at 2:11 PM, Gustavo Narea <m...@gustavonarea.net> wrote: > Hello! > > Could you please confirm whether my understanding of equality > operations in sets and lists is correct? This is how I think things > work, partially based on experimentation and the online documentation > for Python: > > When you compare two lists, *every* element of one of the lists is > compared against the element at the same position in the other list; > that comparison is done by the __eq__() method (or the equivalent for > builtin types). This is interrupted when a result is False. > > When you compare two sets, there's a loop over all the elements of the > first set, where the hash for that element is looked up in the second > set: > - If this hash matches the hash for one or more elements in the > second set, the element in the first set is compared (with __eq__ or > equivalent) against the elements in the second set which have the same > hash. When a result is True, nothing else is done on that element and > the loop takes the next element in the first set; when all the results > are False, the loop ends and the two sets are not equivalent. > - If the hash doesn't match that of an element in the second set, > then the loop ends and the two sets are not equivalent. > > So this means that: > 1.- When you have two collections which have the same elements, the > equality operation will *always* be faster with lists. > 2.- When you have two collections with different elements, the > equality operation *may* be faster with sets. > > For example, if you have two collections of 1,000 elements each and > 998 of them are equivalent, comparing both collections as sets will be > slower than doing it with lists. But if you have two collections of > 1,000 elements each and 998 of them are not equivalent, then comparing > both collections as lists will be slower than doing it with sets. > > The performance of equality operations on sets is directly > proportional to the amount of different elements in both sets, while > the performance of equality operations on lists is simply proportional > to the cardinality of the collection. > > In other words: The more different elements two collections have, the > faster it is to compare them as sets. And as a consequence, the more > equivalent elements two collections have, the faster it is to compare > them as lists. > > Is this correct? > > This is why so many people advocate the use of sets instead of lists/ > tuples in similar situations, right? > > Cheers, > > - Gustavo. > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list