On 21/02/15 19:46, TommyVee wrote: > Start off with sets of elements as follows: > > 1. A,B,E,F > 2. G,H,L,P,Q > 3. C,D,E,F > 4. E,X,Z > 5. L,M,R > 6. O,M,Y > > Note that sets 1, 3 and 4 all have the element 'E' in common, therefore > they are "related" and form the following superset: > > A,B,C,D,E,F,X,Z > > Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6 > have element 'M' in common, therefore they form the following superset: > > G,H,L,M,O,P,Q,R,Y > > I think you get the point. As long as sets have at least 1 common > element, they combine to form a superset. Also "links" (common > elements) between sets may go down multiple levels, as described in the > second case above (2->5->6). Cycles thankfully, are not possible. > > BTW, the number of individual sets (and resultant supersets) will be > very large. > > I don't know where to start with this. I thought about some type of > recursive algorithm, but I'm not sure. I could figure out the Python > implementation easy enough, I'm just stumped on the algorithm itself. > > Anybody have an idea? > > Thanks, Tom
Tom, You could construct a graph G such that there exists an edge {v,w} for each pair of items that appear in a common set. Each connected component will contain the nodes of one of the supersets you're looking for. That's unnecessarily expensive, so you can adopt a similar approach using trees. http://en.wikipedia.org/wiki/Disjoint-set_data_structure Construct a forest of trees (initially one tree for each item) and join pairs of trees until you have one tree for each of your supersets. For each set of size n you only need to consider n-1 joins. That will ensure that any pair of items that are in one of the sets are contained in a single tree. The find and union operations are amortized constant, so this should be efficient for your large numbers of sets. The union_find module can be found at https://github.com/DuncanSmith147/KVMS. Cheers. Duncan >>> import union_find >>> sets = (['A','B','E','F'], ['G','H','L','P','Q'], ['C','D','E','F'], ['E','X','Z'], ['L','M','R'], ['O','M','Y']) >>> uf = union_find.UnionFindTree() >>> for a_set in sets: for item in a_set: try: uf.add(item) except ValueError: pass n = a_set[0] for item in a_set[1:]: is_merged = uf.union(n, item) >>> from collections import defaultdict >>> res = defaultdict(list) >>> for item in uf: res[uf.find(item)].append(item) >>> res.values() [['G', 'H', 'M', 'L', 'O', 'Q', 'P', 'R', 'Y'], ['A', 'C', 'B', 'E', 'D', 'F', 'X', 'Z']] >>> -- https://mail.python.org/mailman/listinfo/python-list