On Sep 21, 7:19 am, "Alf P. Steinbach /Usenet" <alf.p.steinbach +use...@gmail.com> wrote: > * Alf P. Steinbach /Usenet, on 21.09.2010 01:09: > > > > > > > * Astley Le Jasper, on 20.09.2010 23:42: > >> I have a list of tuples that indicate a relationship, ie a is related > >> to b, b is related to c etc etc. What I want to do is cluster these > >> relationships into groups. An item will only be associated with a > >> single cluster. > > >> Before I started, I wondered if there was any particular tool within > >> Python I should be looking at. I don't expect anyone to code this for > >> me, just say ... "you need to look at using x". I was going to use > >> populate a dictionary and > > >> Sorry for being so vague. > > >> Example Data: > >> [(a,b) > >> (a,c) > >> (a,d) > >> (b,c) > >> (b,d) > >> (c,d) > >> (e,f) > >> (e,g) > >> (f,g) > >> (h,i)] > > >> Output (grouping id, item ref) > >> (1,a), > >> (1,b), > >> (1,c), > >> (1,d), > >> (2,e), > >> (2,f), > >> (2,g), > >> (3,h), > >> (3,i) > > > It seems to be the same problem as "equivalence sets". > > > This problem was solved early on because e.g. Fortran compilers had to > > construct > > such sets (equivalence partitions of a set). > > > I though I'd just say "Google it!", because I know there's a standard > > algorithm > > but I can't recall the name. However, googling it I found no mention of that > > algorithm. Not even in the Wikipedia articles on equivalence sets. > > > A number of approaches spring to mind: > > > Approach 1: > > Multi-pass. Originally you assign a distinct set number to each symbol. > > In each pass through the symbols you replace one number with another as > > per one of the equivalence specs. Stop when no replacements are done. > > > Approach 2: > > Merging. For each new equivalence A=B you check whether A has been assigned > > to a set, if not you create a new one, call that S1, and ditto for B, S2. > > Merge S1 and S2, e.g. move all elements of S2 to S1. > > > Approach 3: > > In a first pass convert the data to more explicit form, linking each symbol > > to the symbols it's directly equivalent to. Then in second pass simply drag > > out each equivalence group (recurse on each symbol's list of equivalences). > > > Approaches 1 and 2 seem to be pretty inefficient for a large number of > > symbols, > > but I think approach 1 may be a practical option for a small number of > > symbols. > > Uhm, thinking about it (it must have been my unconscious mind doing the work, > it > just popped into my head), if you first sort each individual equivalence > relation so that you never have e.g. C=A but only A=C and so on, and then sort > the list of equivalences, then it should reduce to walking the list and just > starting a new set whenever a symbol is encountered that isn't yet in a set. > > <code language="Py3"> > class Attributes: pass > > sym = Attributes() > for name in ("a", "b", "c", "d", "e", "f", "g", "h", "i"): > setattr( sym, name, name ) > > eq_specs = [ > (sym.a, sym.d), > (sym.b, sym.a), > (sym.b, sym.c), > (sym.b, sym.d), > (sym.c, sym.d), > (sym.c, sym.a), > (sym.e, sym.f), > (sym.e, sym.g), > (sym.f, sym.g), > (sym.h, sym.i), > ] > > equalities = [] > for eq in eq_specs: > sorted_eq = eq if eq[0] <= eq[1] else (eq[1], eq[0]) > equalities.append( sorted_eq ) > equalities.sort() > > eq_sets = {} > eq_set_ids = {} > current_eq_set_id = 0 > for eq in equalities:
This would make the body of the loop easier to read: for x, y in equalities: > if eq_set_ids.get( eq[0] ) is None: Why not simply: if eq[0] in eq_set_ids: > current_eq_set_id += 1 > eq_set_ids[eq[0]] = current_eq_set_id > eq_sets[current_eq_set_id] = set( eq[0] ) > eq_set_id = eq_set_ids[eq[0]] > eq_set_ids[eq[1]] = eq_set_id > eq_sets[eq_set_id].add( eq[1] ) > > for eq_set_id in eq_sets.keys(): > for sym_name in eq_sets[eq_set_id]: > print( "{}, {}".format( eq_set_id, sym_name ) ) > </code> > > <output> > 1, a > 1, c > 1, b > 1, d > 2, e > 2, g > 2, f > 3, i > 3, h > </output> > > Disclaimer: for me it's pretty late in day/night. > > Cheers & hth., > > - Alf > > -- > blog at <url:http://alfps.wordpress.com> I think this won't work with the following graph: eq_specs = [('a', 'c'), ('b', 'd'), ('c', 'd')] Note that it is already sorted according to your sorting method. I don't have a Python machine to check this but I believe it will output: 1, a 1, c 1, d 2, b The flaw is that you fail to consider that the two vertices in the current pair may already be part of a (partial) connected component. -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list