Re: dict generator question
Steven D'Aprano: >I'm sorry, I don't recognise leniter(). Did I miss something?< I have removed the docstring/doctests: def leniter(iterator): if hasattr(iterator, "__len__"): return len(iterator) nelements = 0 for _ in iterator: nelements += 1 return nelements >it doesn't work for arbitrary iterables, only sequences (lazy or otherwise)< I don't understand well. >Since you're generating the entire length anyway, len(list(iterable)) is more >readable and almost as efficient for most practical cases.< I don't agree, len(list()) creates an actual list, with lot of GC activity. >But the expected semantics of __len__ is that it is expected to return an int, >and do it quickly with minimal effort. Methods that do something else are an >abuse of __len__ and should be treated as a bug.< I see. In the past I have read similar positions in discussions regarding API of data structures in D, so this may be right, and this fault may be enough to kill my proposal. But I'll keep using leniter(). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
On Mon, 22 Sep 2008 04:21:12 -0700, bearophileHUGS wrote: > Steven D'Aprano: > >>Extending len() to support iterables sounds like a good idea, except >>that it's not.< > > Python language lately has shifted toward more and more usage of lazy > iterables (see range lazy by default, etc). So they are now quite > common. So extending len() to make it act like leniter() too is a way to > adapt a basic Python construct to the changes of the other parts of the > language. I'm sorry, I don't recognise leniter(). Did I miss something? > In languages like Haskell you can count how many items a lazy sequence > has. But those sequences are generally immutable, so they can be > accessed many times, so len(iterable) doesn't exhaust them like in > Python. So in Python it's less useful. In Python, xrange() is a lazy sequence that isn't exhausted, but that's a special case: it actually has a __len__ method, and presumably the length is calculated from the xrange arguments, not by generating all the items and counting them. How would you count the number of items in a generic lazy sequence without actually generating the items first? > This is a common situation where I can only care of the len of the g > group: > [leniter(g) for h,g in groupby(iterable)] > > There are other situations where I may be interested only in how many > items there are: > leniter(ifilter(predicate, iterable)) leniter(el for el in iterable if > predicate(el)) > > For my usage I have written a version of the itertools module in D (a > lot of work, but the result is quite useful and flexible, even if I miss > the generator/iterator syntax a lot), and later I have written a len() > able to count the length of lazy iterables too (if the given variable > has a length attribute/property then it returns that value), I'm not saying that no iterables can accurately predict how many items they will produce. If they can, then len() should support iterables with a __len__ attribute. But in general there's no way of predicting how many items the iterable will produce without iterating over it, and len() shouldn't do that. > and I have > found that it's useful often enough (almost as the string.xsplit()). But > in Python there is less need for a len() that counts lazy iterables too > because you can use the following syntax that isn't bad (and isn't > available in D): > > [sum(1 for x in g) for h,g in groupby(iterable)] sum(1 for x in > ifilter(predicate, iterable)) sum(1 for el in iterable if predicate(el)) I think the idiom sum(1 for item in iterable) is, in general, a mistake. For starters, it doesn't work for arbitrary iterables, only sequences (lazy or otherwise) and your choice of variable name may fool people into thinking they can pass a use-once iterator to your code and have it work. Secondly, it's not clear what sum(1 for item in iterable) does without reading over it carefully. Since you're generating the entire length anyway, len(list(iterable)) is more readable and almost as efficient for most practical cases. As things stand now, list(iterable) is a "dangerous" operation, as it may consume arbitrarily huge resources. But len() isn't[1], because len() doesn't operate on arbitrary iterables. This is a good thing. > So you and Python designers may choose to not extend the semantics of > len() for various good reasons, but you will have a hard time convincing > me it's a useless capability :-) I didn't say that knowing the length of iterators up front was useless. Sometimes it may be useful, but it is rarely (never?) essential. [1] len(x) may call x.__len__() which might do anything. But the expected semantics of __len__ is that it is expected to return an int, and do it quickly with minimal effort. Methods that do something else are an abuse of __len__ and should be treated as a bug. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
Steven D'Aprano: >Extending len() to support iterables sounds like a good idea, except that it's >not.< Python language lately has shifted toward more and more usage of lazy iterables (see range lazy by default, etc). So they are now quite common. So extending len() to make it act like leniter() too is a way to adapt a basic Python construct to the changes of the other parts of the language. In languages like Haskell you can count how many items a lazy sequence has. But those sequences are generally immutable, so they can be accessed many times, so len(iterable) doesn't exhaust them like in Python. So in Python it's less useful. This is a common situation where I can only care of the len of the g group: [leniter(g) for h,g in groupby(iterable)] There are other situations where I may be interested only in how many items there are: leniter(ifilter(predicate, iterable)) leniter(el for el in iterable if predicate(el)) For my usage I have written a version of the itertools module in D (a lot of work, but the result is quite useful and flexible, even if I miss the generator/iterator syntax a lot), and later I have written a len() able to count the length of lazy iterables too (if the given variable has a length attribute/property then it returns that value), and I have found that it's useful often enough (almost as the string.xsplit()). But in Python there is less need for a len() that counts lazy iterables too because you can use the following syntax that isn't bad (and isn't available in D): [sum(1 for x in g) for h,g in groupby(iterable)] sum(1 for x in ifilter(predicate, iterable)) sum(1 for el in iterable if predicate(el)) So you and Python designers may choose to not extend the semantics of len() for various good reasons, but you will have a hard time convincing me it's a useless capability :-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
On Fri, Sep 19, 2008 at 9:51 PM, Steven D'Aprano <[EMAIL PROTECTED]> wrote: > Extending len() to support iterables sounds like a good idea, except that > it's not. > > Here are two iterables: > > > def yes(): # like the Unix yes command >while True: >yield "y" > > def rand(total): >"Return random numbers up to a given total." >from random import random >tot = 0.0 >while tot < total: >x = random() >yield x >tot += x > > > What should len(yes()) and len(rand(100)) return? Clearly, len(yes()) would never return, and len(rand(100)) would return a random integer not less than 101. -Miles -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
MRAB: > except that it could be misleading when: > len(file(path)) > returns the number of lines and /not/ the length in bytes as you might > first think! :-) Well, file(...) returns an iterable of lines, so its len is the number of lines :-) I think I am able to always remember this fact. > Anyway, here's another possible implementation using bags (multisets): This function looks safer/faster: def major_version(version_string): "convert '1.2.3.2' to '1.2'" return '.'.join(version_string.strip().split('.', 2)[:2]) Another version: import re patt = re.compile(r"^(\d+\.\d+)") dict_of_counts = defaultdict(int) for ver in versions: dict_of_counts[patt.match(ver).group(1)] += 1 print dict_of_counts Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
On Fri, 19 Sep 2008 17:00:56 -0700, MRAB wrote: > Extending len() to support iterables sounds like a good idea, except > that it could be misleading when: > > len(file(path)) > > returns the number of lines and /not/ the length in bytes as you might > first think! Extending len() to support iterables sounds like a good idea, except that it's not. Here are two iterables: def yes(): # like the Unix yes command while True: yield "y" def rand(total): "Return random numbers up to a given total." from random import random tot = 0.0 while tot < total: x = random() yield x tot += x What should len(yes()) and len(rand(100)) return? -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
On Sep 19, 2:01 pm, [EMAIL PROTECTED] wrote: > Gerard flanagan: > > > data.sort() > > datadict = \ > > dict((k, len(list(g))) for k,g in groupby(data, lambda s: > > '.'.join(s.split('.',2)[:2]))) > > That code may run correctly, but it's quite unreadable, while good > Python programmers value high readability. So the right thing to do is > to split that line into parts, giving meaningful names, and maybe even > add comments. > > len(list(g))) looks like a good job for my little leniter() function > (or better just an extension to the semantics of len) that time ago > some people here have judged as useless, while I use it often in both > Python and D ;-) > Extending len() to support iterables sounds like a good idea, except that it could be misleading when: len(file(path)) returns the number of lines and /not/ the length in bytes as you might first think! :-) Anyway, here's another possible implementation using bags (multisets): def major_version(version_string): "convert '1.2.3.2' to '1.2'" return '.'.join(version_string.split('.')[:2]) versions = ["1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] bag_of_versions = bag(major_version(x) for x in versions) dict_of_counts = dict(bag_of_versions.items()) Here's my implementation of the bag class in Python (sorry about the length): class bag(object): def __init__(self, iterable = None): self._counts = {} if isinstance(iterable, dict): for x, n in iterable.items(): if not isinstance(n, int): raise TypeError() if n < 0: raise ValueError() self._counts[x] = n elif iterable: for x in iterable: try: self._counts[x] += 1 except KeyError: self._counts[x] = 1 def __and__(self, other): new_counts = {} for x, n in other._counts.items(): try: new_counts[x] = min(self._counts[x], n) except KeyError: pass result = bag() result._counts = new_counts return result def __iand__(self): new_counts = {} for x, n in other._counts.items(): try: new_counts[x] = min(self._counts[x], n) except KeyError: pass self._counts = new_counts def __or__(self, other): new_counts = self._counts.copy() for x, n in other._counts.items(): try: new_counts[x] = max(new_counts[x], n) except KeyError: new_counts[x] = n result = bag() result._counts = new_counts return result def __ior__(self): for x, n in other._counts.items(): try: self._counts[x] = max(self._counts[x], n) except KeyError: self._counts[x] = n def __len__(self): return sum(self._counts.values()) def __list__(self): result = [] for x, n in self._counts.items(): result.extend([x] * n) return result def __repr__(self): return "bag([%s])" % ", ".join(", ".join([repr(x)] * n) for x, n in self._counts.items()) def __iter__(self): for x, n in self._counts.items(): for i in range(n): yield x def keys(self): return self._counts.keys() def values(self): return self._counts.values() def items(self): return self._counts.items() def __add__(self, other): for x, n in other.items(): self._counts[x] = self._counts.get(x, 0) + n def __contains__(self, x): return x in self._counts def add(self, x): try: self._counts[x] += 1 except KeyError: self._counts[x] = 1 def __add__(self, other): new_counts = self._counts.copy() for x, n in other.items(): try: new_counts[x] += n except KeyError: new_counts[x] = n result = bag() result._counts = new_counts return result def __sub__(self, other): new_counts = self._counts.copy() for x, n in other.items(): try: new_counts[x] -= n if new_counts[x] < 1: del new_counts[x] except KeyError: pass result = bag() result._counts = new_counts return result def __iadd__(self, other): for x, n in other.items(): try: self._counts[x] += n except KeyError: self._counts[x] = n def __isub__(self, other): for x, n in other.items(): try: self._counts[x] -= n if self._counts[x] < 1: del self._counts[x] except KeyError: pass def clear(
Re: dict generator question
Gerard flanagan: > data.sort() > datadict = \ > dict((k, len(list(g))) for k,g in groupby(data, lambda s: > '.'.join(s.split('.',2)[:2]))) That code may run correctly, but it's quite unreadable, while good Python programmers value high readability. So the right thing to do is to split that line into parts, giving meaningful names, and maybe even add comments. len(list(g))) looks like a good job for my little leniter() function (or better just an extension to the semantics of len) that time ago some people here have judged as useless, while I use it often in both Python and D ;-) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
Boris Borcic wrote: Gerard flanagan wrote: George Sakkis wrote: .. Note that this works correctly only if the versions are already sorted by major version. Yes, I should have mentioned it. Here's a fuller example below. There's maybe better ways of sorting version numbers, but this is what I do. Indeed, your sort takes George's objection too litterally, what's needed for a correct endresult is only that major versions be grouped together, and this is most simply obtained by sorting the input data in (default) string order, is it not ? Yes, I see what you mean - the fact that a default sort orders "1.10" before "1.9" doesn't actually matter for the required result. datadict = \ dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3])) And, s[:3] is wrong. So: data.sort() datadict = \ dict((k, len(list(g))) for k,g in groupby(data, lambda s: '.'.join(s.split('.',2)[:2]))) should work, I hope. Cheers, Gerard -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
Gerard flanagan wrote: George Sakkis wrote: .. Note that this works correctly only if the versions are already sorted by major version. Yes, I should have mentioned it. Here's a fuller example below. There's maybe better ways of sorting version numbers, but this is what I do. Indeed, your sort takes George's objection too litterally, what's needed for a correct endresult is only that major versions be grouped together, and this is most simply obtained by sorting the input data in (default) string order, is it not ? data = [ "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.1.1.1", "1.3.14.5", "1.3.21.6" ] from itertools import groupby import re RXBUILDSORT = re.compile(r'\d+|[a-zA-Z]') def versionsort(s): key = [] for part in RXBUILDSORT.findall(s.lower()): try: key.append(int(part)) except ValueError: key.append(ord(part)) return tuple(key) data.sort(key=versionsort) print data datadict = \ dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3])) print datadict -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
"Simon Mullis" <[EMAIL PROTECTED]> writes: > l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] > ... > dict_of_counts = dict([(v[0:3], "count") for v in l]) Untested: def major_version(version_string): "convert '1.2.3.2' to '1.2'" return '.'.join(version_string.split('.')[:2]) dict_of_counts = defaultdict(int) for x in l: dict_of_counts[major_version(l)] += 1 -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
George Sakkis wrote: On Sep 18, 11:43 am, Gerard flanagan <[EMAIL PROTECTED]> wrote: Simon Mullis wrote: Hi, Let's say I have an arbitrary list of minor software versions of an imaginary software product: l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] I'd like to create a dict with major_version : count. (So, in this case: dict_of_counts = { "1.1" : "1", "1.2" : "2", "1.3" : "2" } [...] data = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] from itertools import groupby datadict = \ dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3])) print datadict Note that this works correctly only if the versions are already sorted by major version. Yes, I should have mentioned it. Here's a fuller example below. There's maybe better ways of sorting version numbers, but this is what I do. data = [ "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.1.1.1", "1.3.14.5", "1.3.21.6" ] from itertools import groupby import re RXBUILDSORT = re.compile(r'\d+|[a-zA-Z]') def versionsort(s): key = [] for part in RXBUILDSORT.findall(s.lower()): try: key.append(int(part)) except ValueError: key.append(ord(part)) return tuple(key) data.sort(key=versionsort) print data datadict = \ dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3])) print datadict -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
Haha! Thanks for all of the suggestions... (I love this list!) SM 2008/9/18 <[EMAIL PROTECTED]>: > On Sep 18, 10:54 am, "Simon Mullis" <[EMAIL PROTECTED]> wrote: >> Hi, >> >> Let's say I have an arbitrary list of minor software versions of an >> imaginary software product: >> >> l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] >> >> I'd like to create a dict with major_version : count. >> >> (So, in this case: >> >> dict_of_counts = { "1.1" : "1", >>"1.2" : "2", >>"1.3" : "2" } >> >> Something like: >> >> dict_of_counts = dict([(v[0:3], "count") for v in l]) >> >> I can't seem to figure out how to get "count", as I cannot do x += 1 >> or x++ as x may or may not yet exist, and I haven't found a way to >> create default values. >> >> I'm most probably not thinking pythonically enough... (I know I could >> do this pretty easily with a couple more lines, but I'd like to >> understand if there's a way to use a dict generator for this). >> >> Thanks in advance >> >> SM >> >> -- >> Simon Mullis > > Considering 3 identical "simultpost" solutions I'd say: > "one obvious way to do it" FTW :-) > -- > http://mail.python.org/mailman/listinfo/python-list > -- Simon Mullis _ [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
On Sep 18, 11:43 am, Gerard flanagan <[EMAIL PROTECTED]> wrote: > Simon Mullis wrote: > > Hi, > > > Let's say I have an arbitrary list of minor software versions of an > > imaginary software product: > > > l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] > > > I'd like to create a dict with major_version : count. > > > (So, in this case: > > > dict_of_counts = { "1.1" : "1", > >"1.2" : "2", > >"1.3" : "2" } > > [...] > data = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] > > from itertools import groupby > > datadict = \ >dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3])) > print datadict Note that this works correctly only if the versions are already sorted by major version. George -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
On Sep 18, 10:54 am, "Simon Mullis" <[EMAIL PROTECTED]> wrote: > Hi, > > Let's say I have an arbitrary list of minor software versions of an > imaginary software product: > > l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] > > I'd like to create a dict with major_version : count. > > (So, in this case: > > dict_of_counts = { "1.1" : "1", >"1.2" : "2", >"1.3" : "2" } > > Something like: > > dict_of_counts = dict([(v[0:3], "count") for v in l]) > > I can't seem to figure out how to get "count", as I cannot do x += 1 > or x++ as x may or may not yet exist, and I haven't found a way to > create default values. > > I'm most probably not thinking pythonically enough... (I know I could > do this pretty easily with a couple more lines, but I'd like to > understand if there's a way to use a dict generator for this). > > Thanks in advance > > SM > > -- > Simon Mullis Considering 3 identical "simultpost" solutions I'd say: "one obvious way to do it" FTW :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
Simon Mullis wrote: Hi, Let's say I have an arbitrary list of minor software versions of an imaginary software product: l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] I'd like to create a dict with major_version : count. (So, in this case: dict_of_counts = { "1.1" : "1", "1.2" : "2", "1.3" : "2" } [...] data = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] from itertools import groupby datadict = \ dict((k, len(list(g))) for k,g in groupby(data, lambda s: s[:3])) print datadict -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
On Sep 18, 10:54 am, "Simon Mullis" <[EMAIL PROTECTED]> wrote: > Hi, > > Let's say I have an arbitrary list of minor software versions of an > imaginary software product: > > l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] > > I'd like to create a dict with major_version : count. > > (So, in this case: > > dict_of_counts = { "1.1" : "1", >"1.2" : "2", >"1.3" : "2" } > > Something like: > > dict_of_counts = dict([(v[0:3], "count") for v in l]) > > I can't seem to figure out how to get "count", as I cannot do x += 1 > or x++ as x may or may not yet exist, and I haven't found a way to > create default values. > > I'm most probably not thinking pythonically enough... (I know I could > do this pretty easily with a couple more lines, but I'd like to > understand if there's a way to use a dict generator for this). Not everything has to be a one-liner; also v[0:3] is wrong if any sub- version is greater than 9. Here's a standard idiom (in 2.5+ at least): from collection import defaultdict versions = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] major2count = defaultdict(int) for v in versions: major2count['.'.join(v.split('.',2)[:2])] += 1 print major2count HTH, George -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
On Sep 18, 10:54 am, "Simon Mullis" <[EMAIL PROTECTED]> wrote: > Hi, > > Let's say I have an arbitrary list of minor software versions of an > imaginary software product: > > l = [ "1.1.1.1", "1.2.2.2", "1.2.2.3", "1.3.1.2", "1.3.4.5"] > > I'd like to create a dict with major_version : count. > > (So, in this case: > > dict_of_counts = { "1.1" : "1", >"1.2" : "2", >"1.3" : "2" } > > Something like: > > dict_of_counts = dict([(v[0:3], "count") for v in l]) > > I can't seem to figure out how to get "count", as I cannot do x += 1 > or x++ as x may or may not yet exist, and I haven't found a way to > create default values. > > I'm most probably not thinking pythonically enough... (I know I could > do this pretty easily with a couple more lines, but I'd like to > understand if there's a way to use a dict generator for this). > > Thanks in advance > > SM > > -- > Simon Mullis 3 lines: from collections import defaultdict dd=defaultdict(int) for x in l: dd[x[0:3]]+=1 -- http://mail.python.org/mailman/listinfo/python-list
Re: dict generator question
Simon Mullis napisał(a): > Something like: > > dict_of_counts = dict([(v[0:3], "count") for v in l]) > > I can't seem to figure out how to get "count", as I cannot do x += 1 > or x++ as x may or may not yet exist, and I haven't found a way to > create default values. It seems to me that the "count" you're looking for is the number of elements from l whose first 3 characters are the same as the v[0:3] thing. So you may try: >>> dict_of_counts = dict((v[0:3], sum(1 for x in l if x[:3] == v[:3])) for v >>> in l) But this isn't particularly efficient. The 'canonical way' to construct such histograms/frequency counts in python is probably by using defaultdict: >>> dict_of_counts = collections.defaultdict(int) >>> for x in l: >>> dict_of_counts[x[:3]] += 1 Regards, Marek -- http://mail.python.org/mailman/listinfo/python-list