James Stroud wrote: > Prateek wrote: > >> I have 3 variable length lists of sets. I need to find the common >> elements in each list (across sets) really really quickly. >> >> Here is some sample code: >> >> # Doesn't make sense to union the sets - we're going to do >> intersections later anyway >> l1 = reduce(operator.add, list(x) for x in l1) >> l2 = reduce(operator.add, list(x) for x in l2) >> l3 = reduce(operator.add, list(x) for x in l3) >> >> # Should I do this in two steps? Maybe by intersecting the two >> shortest lists first? >> s = frozenset(l1) & frozenset(l2) & frozenset(l3) >> >> I'm assuming frozensets are (somehow) quicker than sets because >> they're immutable. >> >> Any code suggestions? Maybe using something in the new fancy-schmancy >> itertools module? >> >> Thanks, >> Prateek >> > > I don't understand why you cast to list. I would propose: > > lists_of_sets = [l1, l2, l3] > > reduce(set.intersection, (reduce(set.union, x) for x in lists_of_sets)) > > Since this is simplest, I'm guessing it should be fastest because I'm > also guessing that set impelmentation is as optimized as list--I think > this would hold true especially for large sets with sparse overlap > between lists. So it might be reasonable consider the content of your > sets and lists and write your code based on the content or on assuming a > particular composition.
Python sets are hashes, like dictionaries, not trees. Intersection is implemented by iterating over the smallest set and trying all its keys on the other set. The Python implementation compares the length of two sets being intersected. This is OK (it's about O(N log N), maybe better). For the above example, it's worth sorting lists_of_sets by the length of the sets, and doing the short ones first. How big are the sets? If they're small, but you have a lot of them, you may be better off with a bit-set representation, then using AND operations for intersection. If they're huge (tens of millions of entries), you might be better off doing sorts and merges on the sets. When you ask questions like this, it's helpful to give some background. We don't know whether this is a homework assignment, or some massive application that's slow and you need to fix it, even if it requires heavy implementation effort. John Nagle -- http://mail.python.org/mailman/listinfo/python-list