On Mon, Jun 13, 2011 at 8:09 AM, Chris Angelico <ros...@gmail.com> wrote:

> On Tue, Jun 14, 2011 at 12:58 AM, Zachary Dziura <zcdzi...@gmail.com>
> wrote:
> > if set(source_headers) == set(target_headers):
> >    similar_headers = len(source_headers)
>
> Since you're making sets already, I'd recommend using set operations -
> same_headers is the (length of the) intersection of those two sets,
> and different_headers is the XOR.
>
> # If you need the lists afterwards, use different variable names
> source_headers = set(source_headers)
> target_headers = set(target_headers)
> similar_headers = len(source_headers & target_headers)
> different_headers = len(source_headers ^ target_headers)
>

This is a beautiful solution, and yet I feel compelled to mention that it
disregards duplicates within a given list.  If you need duplicate
detection/differencing, it's better to sort each list and then use an
algorithm similar to the merge step of mergesort.

Using sets as above is O(n), while the sorting version is O(nlogn) usually.
O(n) is better than O(nlogn).

And of course, the version based on sorting assumes order doesn't matter.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to