Re: What is the most efficient way to compare similar contents in two lists?

Dan Stromberg Mon, 13 Jun 2011 11:06:09 -0700

On Mon, Jun 13, 2011 at 8:09 AM, Chris Angelico <[email protected]> wrote:

> On Tue, Jun 14, 2011 at 12:58 AM, Zachary Dziura <[email protected]>
> wrote:
> > if set(source_headers) == set(target_headers):
> >    similar_headers = len(source_headers)
>
> Since you're making sets already, I'd recommend using set operations -
> same_headers is the (length of the) intersection of those two sets,
> and different_headers is the XOR.
>
> # If you need the lists afterwards, use different variable names
> source_headers = set(source_headers)
> target_headers = set(target_headers)
> similar_headers = len(source_headers & target_headers)
> different_headers = len(source_headers ^ target_headers)
>

This is a beautiful solution, and yet I feel compelled to mention that it
disregards duplicates within a given list.  If you need duplicate
detection/differencing, it's better to sort each list and then use an
algorithm similar to the merge step of mergesort.

Using sets as above is O(n), while the sorting version is O(nlogn) usually.
O(n) is better than O(nlogn).

And of course, the version based on sorting assumes order doesn't matter.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: What is the most efficient way to compare similar contents in two lists?

Reply via email to