[Python-ideas] Re: Set operations with Lists

Richard Higginbotham Mon, 14 Oct 2019 09:53:05 -0700

Steve Jorgensen wrote:
> What if, instead, there was an OrderedBag class that acts like a
> list and supports set operations.
> The advantage of this is that any hidden indexing that is used to optimize the
> operations could be efficiently created in the object that results from the 
> set operation
> during its construction while performing the operation. Using lists, any such 
> indexes
> would have to be produced and discarded for each operation (or could perhaps 
> be kept in
> weak references or something like that, but that's messy).
It seems like it would depend on the memory layout. If the items weren't stored 
in order then it would most likely suffer from cache misses the same way set 
operations do. I'm not sure if it would be slower than set operations in that 
case. Most likely there would be a curve in run time where sets were faster for 
some small number of items and slower for large collections similar to the list 
implementation. If you added binary searches it might be faster in the majority 
of cases but that's a lot of variables. 
The big question then becomes what criteria does it need to meet and is that 
quantified? It doesn't seem like that quantification is a consideration of this 
list. Unless you have a particular itch to scratch that is a lot of work. 
Adding enough general testing scenarios to satisfy a consensus is probably much 
much more.


Lets look at the test results from earlier for 50 million and 10 million item 
lists.
For A not in B:
trying A 1000000, B 5000000
best relcomp_set test time (msec): 82.131
best relcomp_set with set init test time (msec): 880.753
best relcomp_set with set init intersect test time (msec): 886.761
best list_relcomp test time (msec): 7.902
best sort test time (msec): 5.826

For A not in B:
trying A 5000000, B 10000000
best relcomp_set test time (msec): 557.368
best relcomp_set with set init test time (msec): 3228.475
best relcomp_set with set init intersect test time (msec): 3725.047
best list_relcomp test time (msec): 25.626
best sort test time (msec): 13.632

Changing the order of set operations based on which set is largest is almost 
10x the speed improvement. Should set operations be changed? It's probably 
faster no matter the order to convert them into lists and use my algorithm 
after about 10 million. In the end though we are still talking about a few 
seconds at most. If it worth it to you for that improvement that's one thing, 
but you should consider what you really need. Without clear performance/use 
case criteria I think you'll never leave the bike-shed here.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/REW55TA7R54W2ED7MGYHQOFEZ3DLGUPD/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Set operations with Lists

Reply via email to