>> For example are the file names in A in B and if so return a new list with 
>> those items. Long story short, I wrote some functions to do that. They are 
>> quite simple and fast (due to timsort in no small part).
>> Even the plain python code is faster than the built in set functions (afaik).
>You’re testing pretty small values in a very narrow use case, and testing them 
>inaccurately by using datetime instead of timeit (the fact that some of your 
>results are 0 vs. 0 should be a clue…), and you haven’t even given us the 
>results, just claimed that
they’re faster.

I'm sorry I haven't used the mail list before and didn't send my comments to 
the list. Please see my response to Chris for some example times. I'm reluctant 
to use anything less than about a ms for timing purposes. There's too many 
interactions at the us level that can skew results. I'd prefer to expand the 
sample size until it's more clear. Here is an example

For A not in B:
trying A 10000, B 50000
naive test  0:00:10.319032
set test  0:00:00.009001
listex test  0:00:00.005000

For A subset of B:
trying A 100000, B 500000
set test  0:00:00.180018
listex test  0:00:00.054006

If I increase a and b by 10 times the execution time goes up by about 100x on 
the set. This is because its related to a * b. the listex only goes up by about 
10x which is about 10a + 10b. 

>When I run my own test on lists of 10000 strings made up 5 random lowercase 
>characters,
>I get 485us for set(a).intersection(b), and 6609us for list_intersection(a, 
>b), so it’s
>actually more than 10x slower.
I can't reproduce this here is what I get
For A not in B:
trying A 10000, B 10000
naive test  0:00:02.661266
set test  0:00:00.002000
listex test  0:00:00.005001

>>The complexity ends up being about the longer of list a or b.
>How can that be true? Sorting a takes nlogn steps, sorting b takes mlogm steps,
>step-comparing then takes n+m steps, so that’s O(nlogn + mlogm * n + m) = 
>O(nlogn), which
i>s bigger than O(n).
I should have been more specific. The readme.md in the project explains a 
little more as does my response to Chris. You are right the sort + alg 
operation takes n log n and m log m just to sort the lists. The 
intersection,union, other functions though are m + n. The total is 
O(m+n+nlogn+mlogm). There is no multiplication of the sort time and the 
relation operation time because you only have to do it once no matter how many 
items are in a or b. 

>Why would you want this as a method of list rather than a function that works
>just as well on any sortable iterables? 
It doesn't have to be. I just thought that was the purpose of this list so I 
should start with that. A stand alone module works for me at the moment. If it 
were included in the list functions or as some "extra" function that worked on 
lists I figured the final result would be done in C like the sort() function. 
That would make it a significant change so my idea was just to present the 
algorithms and see if there was interest. If it is in a addon package that 
would be ok with me, but I think think people would use the set method and 
probably never know there was a better way.

>c = [val for val in a if val in filterset]
This is the crux I think. The complexity is the number of elements in 
filterset. Since filterset has to be check for every value in a its O(len(a) * 
len(b) * filter search time). You wrote O(n+m) but I think that is incorrect.

The module (listex) I linked to implements the other operations as well. I 
didn't mean just intersection I was just using that as an example. If you run 
the module it will generate some time comparison for that, but its just for 
comparison not meant to be exhaustive. It's the set of functions I have found 
useful in the past. It's not all of the ones in the module I regularly use for 
purely academic purposes actually.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PSIALU4ITFPYUMSRU45AQS64XXISZ2V5/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to