Florian Brucker <[EMAIL PROTECTED]> wrote: > Given a dictionary, I want to create a clustered version of it, > collecting keys that have the same value: > > >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3} > >>> cluster(d) > {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']} > > That is, generate a new dict which holds for each value of the old dict > a list of the keys of the old dict that have that very value.
>>> from collections import defaultdict >>> def cluster(d): ... if not hasattr(d, 'iteritems'): ... d = dict((x,y) for x,y in enumerate(d)) ... cluster = defaultdict(list) ... for key, value in d.iteritems(): ... cluster[value].append(key) ... return cluster ... >>> d1 = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3} >>> d2 = ['a','a','b','c','b','d','d'] >>> cluster(d1) defaultdict(<type 'list'>, {1: ['a', 'c', 'd'], 2: ['b', 'e'], 3: ['f']}) >>> cluster(d2) defaultdict(<type 'list'>, {'a': [0, 1], 'c': [3], 'b': [2, 4], 'd': [5, 6]}) defaultdicts act pretty much exactly like dicts, except for the default-value-for-everything behaviour, but if you'd rather have pure dicts, just coerce with a dict(cluster) at the end of the function. My version converts lists to dicts so the same core behaviour will work. At worst it'll step through a sequence twice, but if it's passed a dict it'll only do it once, unlike yours' which steps through each twice no matter what (in order to obtain the unique values you require). A side note: if you're using Python2.4+, you can replace your unique() function with the built-in set(). -- http://mail.python.org/mailman/listinfo/python-list