Re: unique-ifying a list
On Aug 7, 4:53 pm, kj wrote: > Suppose that x is some list. To produce a version of the list with > duplicate elements removed one could, I suppose, do this: > > x = list(set(x)) > > but I expect that this will not preserve the original order of > elements. > > I suppose that I could write something like > > def uniquify(items): > seen = set() > ret = [] > for i in items: > if not i in seen: > ret.append(i) > seen.add(i) > return ret > > But this seems to me like such a commonly needed operation that I > find it hard to believe one would need to resort to such self-rolled > solutions. Isn't there some more standard (and hopefully more > efficient, as in "C-coded"/built-in) approach? > > TIA! > > kynn Unique items in a list, in the same order as in the list: x = sorted(set(x), key=x.index) ;] -- http://mail.python.org/mailman/listinfo/python-list
Re: unique-ifying a list
Dennis Lee Bieber writes: > Why bother with seen? The version with seen runs in linear time because of the O(1) set lookup. Your version runs in quadratic time. -- http://mail.python.org/mailman/listinfo/python-list
Re: unique-ifying a list
On Aug 7, 4:53 pm, kj wrote: > Suppose that x is some list. To produce a version of the list with > duplicate elements removed one could, I suppose, do this: > > x = list(set(x)) > > but I expect that this will not preserve the original order of > elements. > OrderedSet is most likely on the way, but for now Python 3.1 and 2.7 have OrderedDict. For 3.0 and 2.6 the recipe is here: http://code.activestate.com/recipes/576669 With OrderedDict you can do: OrderedDict.fromkeys(x).keys() # Returns an iterator in 3.x, a list in 2.x. -- http://mail.python.org/mailman/listinfo/python-list
Re: unique-ifying a list
"kj" wrote: I suppose that I could write something like def uniquify(items): seen = set() ret = [] for i in items: if not i in seen: ret.append(i) seen.add(i) return ret But this seems to me like such a commonly needed operation that I find it hard to believe one would need to resort to such self-rolled solutions. Isn't there some more standard (and hopefully more efficient, as in "C-coded"/built-in) approach? The most "standard" way is a recipe from the itertools docs (I'd give a link, but python.org is down at the moment): def unique_everseen(iterable, key=None): "List unique elements, preserving order. Remember all elements ever seen." # unique_everseen('BBBCCDAABBB') --> A B C D # unique_everseen('ABBCcAD', str.lower) --> A B C D seen = set() seen_add = seen.add if key is None: for element in iterable: if element not in seen: seen_add(element) yield element else: for element in iterable: k = key(element) if k not in seen: seen_add(k) yield element All the recipes mentioned there are pretty handy, so I've made a module (iterutil.py) out of them. -- http://mail.python.org/mailman/listinfo/python-list
Re: unique-ifying a list
En Fri, 07 Aug 2009 17:53:10 -0300, kj escribió: Suppose that x is some list. To produce a version of the list with duplicate elements removed one could, I suppose, do this: x = list(set(x)) but I expect that this will not preserve the original order of elements. I suppose that I could write something like def uniquify(items): seen = set() ret = [] for i in items: if not i in seen: ret.append(i) seen.add(i) return ret Assuming the elements are hashable, yes, that's the fastest way (minus some microoptimizations like using local names for ret.append and seen.add, or the 'not in' operator). See bearophile's recipe [1], another one [2] by Tim Peters (quite old but worths reading the comment section), and this thread [3] [1] http://code.activestate.com/recipes/438599/ [2] http://code.activestate.com/recipes/52560/ [3] http://groups.google.com/group/comp.lang.python/t/40c6c455f4fd5154/ -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: unique-ifying a list
On Aug 7, 1:53 pm, kj wrote: > Suppose that x is some list. To produce a version of the list with > duplicate elements removed one could, I suppose, do this: > > x = list(set(x)) > > but I expect that this will not preserve the original order of > elements. > > I suppose that I could write something like > > def uniquify(items): > seen = set() > ret = [] > for i in items: > if not i in seen: > ret.append(i) > seen.add(i) > return ret > > But this seems to me like such a commonly needed operation that I > find it hard to believe one would need to resort to such self-rolled > solutions. Isn't there some more standard (and hopefully more > efficient, as in "C-coded"/built-in) approach? > Honestly, doing unique operations is pretty rare in the application level. Unless you're writing some kind of database, I don't see why you'd do it. (My recommendation is not to write databases.) If you want unique elements, use a set. If you want to order them, sort a list of the items in the set. If you want to preserve the order, then using a dict may be even better. orig_order = dict(reversed([reversed(i) for i in enumerate (items)]) unique_ordered = sorted(orig_order.keys(), key=lambda k: orig_order [k]) Hints to understanding: * enumerate generates (index, item) pairs. * We reverse each pair so that we get an item -> index mapping. * We reverse it so that the first ones appear last. Later pairs override earlier ones in dict(). -- http://mail.python.org/mailman/listinfo/python-list
unique-ifying a list
Suppose that x is some list. To produce a version of the list with duplicate elements removed one could, I suppose, do this: x = list(set(x)) but I expect that this will not preserve the original order of elements. I suppose that I could write something like def uniquify(items): seen = set() ret = [] for i in items: if not i in seen: ret.append(i) seen.add(i) return ret But this seems to me like such a commonly needed operation that I find it hard to believe one would need to resort to such self-rolled solutions. Isn't there some more standard (and hopefully more efficient, as in "C-coded"/built-in) approach? TIA! kynn -- http://mail.python.org/mailman/listinfo/python-list