Re: filter list fast

2006-03-18 Thread Ben Cartwright
lars_woetmann wrote:
> I have a list I filter using another list and I would like this to be
> as fast as possible
> right now I do like this:
>
> [x for x in list1 if x not in list2]
>
> i tried using the method filter:
>
> filter(lambda x: x not in list2, list1)
>
> but it didn't make much difference, because of lambda I guess
> is there any way I can speed this up

Both of these techniques are O(n^2).  You can reduce it to O(n log n)
by using sets:

>>> set2 = set(list2)
>>> [x for x in list1 if x not in set2]

Checking to see if an item is in a set is much more efficient than a
list.

--Ben

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: filter list fast

2006-03-18 Thread Fredrik Lundh
lars_woetmann wrote:

> I have a list I filter using another list and I would like this to be
> as fast as possible right now I do like this:
>
> [x for x in list1 if x not in list2]
>
> i tried using the method filter:
>
> filter(lambda x: x not in list2, list1)
>
> but it didn't make much difference, because of lambda I guess
> is there any way I can speed this up

if list2 is a list object, "not in list2" is an O(N) operation.

maybe you should use sets instead ?  does the following work
better ?

set2 = set(list2)
result = [x for x in list1 if x not in set2]

?





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: filter list fast

2006-03-18 Thread Klaus Alexander Seistrup
Lars Woetmann wrote:

> I have a list I filter using another list and I would like 
> this to be as fast as possible
> right now I do like this:
>
> [x for x in list1 if x not in list2]
>
> i tried using the method filter:
>
> filter(lambda x: x not in list2, list1)
>
> but it didn't make much difference, because of lambda I guess
> is there any way I can speed this up

If you use a reasonably new python version, you could use sets:

#v+

>>> a = set(range(10))
>>> a
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = set(range(5, 15))
>>> b
set([5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
>>> a.difference(b)
set([0, 1, 2, 3, 4])
>>> a-b
set([0, 1, 2, 3, 4])
>>> list(a-b)
[0, 1, 2, 3, 4]
>>> 

#v-

Cheers, 

-- 
Klaus Alexander Seistrup
SubZeroNet, Copenhagen, Denmark
http://magnetic-ink.dk/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: filter list fast

2006-03-18 Thread Diez B. Roggisch
> Both of these techniques are O(n^2).  You can reduce it to O(n log n)
> by using sets:
> 
 set2 = set(list2)
 [x for x in list1 if x not in set2]
> 
> Checking to see if an item is in a set is much more efficient than a
> list.

Is the set-lookup reliably O(log n)? I was under the impression that it is
hash-based, and this should be O(1) usually, but couldbve O(n) worst-case
(hash the same for _all_ entries).

Regards,

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: filter list fast

2006-03-18 Thread Peter Hansen
Diez B. Roggisch wrote:
>>Both of these techniques are O(n^2).  You can reduce it to O(n log n)
>>by using sets:
>>
>set2 = set(list2)
>[x for x in list1 if x not in set2]
>>
>>Checking to see if an item is in a set is much more efficient than a
>>list.
> 
> Is the set-lookup reliably O(log n)? I was under the impression that it is
> hash-based, and this should be O(1) usually, but couldbve O(n) worst-case
> (hash the same for _all_ entries).

That's largely a theoretical concern.  Google for something like

   '''dict worst-case performance "tim peters"'''

to learn more.  (The third article there (no doubt obsolete in some 
ways, given that it was in 2000) says that Python "keeps at least 1/3 of 
the internal hash table entries unused, making collisions very rarely a 
problem...  It's possible to contrive keys that will cause collisions 
systematically ... but unlikely to happen by accident in 2.0")

-Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: filter list fast

2006-03-18 Thread lars_woetmann
Thanks all, sets work like i charm

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: filter list fast

2006-03-18 Thread Paddy
What was the speed-up ?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: filter list fast

2006-03-18 Thread lars_woetmann
comparing
[x for x in list1 if x not in list2]
with
set1, set2 = set(list1), set(list2)
list(set1-set2)

gives something like
len(list2) speedup
--
10010
1000  100
11000

the speedup is constant for different len(list1)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: filter list fast

2006-03-18 Thread Paddy
Thanks.

-- 
http://mail.python.org/mailman/listinfo/python-list