Re: count items in generator
In article [EMAIL PROTECTED], Alex Martelli [EMAIL PROTECTED] wrote: Cameron Laird [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED], Alex Martelli [EMAIL PROTECTED] wrote: . . . My preference would be (with the original definition for words_of_the_file) to code numwords = sum(1 for w in words_of_the_file(thefilepath)) . . . There are times when numwords = len(list(words_of_the_file(thefilepath)) will be advantageous. Can you please give some examples? None comes readily to mind... . . . Maybe in an alternative universe where Python style emphasizes functional expressions. This thread--or at least the follow-ups to my rather frivolous observation--illustrate how distinct is Python's direction. If we could neglect memory impact, and procedural side-effects, then, sure, I'd argue for my len(list(...)) formulation, on the expressive grounds that it doesn't require the two magic tokens '1' and 'w'. Does category theory have a term for formulas of the sort that introduce a free variable only to ignore (discard, ...) it? There certainly are times when that's apt ... -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
In article [EMAIL PROTECTED], Alex Martelli [EMAIL PROTECTED] wrote: . . . I'd be a bit worried about having len(x) change x's state into an unusable one. Yes, it happens in other cases (if y in x:), but adding more such problematic cases doesn't seem advisable to me anyway -- I'd evaluate this proposal as a -0, even taking into account the potential optimizations to be garnered by having some iterables expose __len__ (e.g., a genexp such as (f(x) fox x in foo), without an if-clause, might be optimized to delegate __len__ to foo -- again, there may be semantic alterations lurking that make this optimization a bit iffy). Alex Quite so. My proposal isn't at all serious; I'm doing this largely for practice in thinking about functionalism and its complement in Python. However, maybe I should take this one step farther: while I think your caution about attractive nuisance is perfect, what is the precise nuisance here? Is there ever a time when a developer would be tempted to evaluate len() on an iterable even though there's another approach that does NOT impact the iterable's state? On the other hand, maybe all we're doing is observing that expanding the domain of len() means we give up guarantees on its finiteness, and that's simply not worth doing. -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
George Sakkis a écrit : (snip) def length(iterable): try: return len(iterable) except: except TypeError: i = 0 for x in iterable: i += 1 return i (snip) -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
Paul Rubin http://[EMAIL PROTECTED] wrote: George Sakkis [EMAIL PROTECTED] writes: As clunky as it seems, I don't think you can beat it in terms of brevity; if you care about memory efficiency though, here's what I use: def length(iterable): try: return len(iterable) except: i = 0 for x in iterable: i += 1 return i Alex's example amounted to something like that, for the generator case. Notice that the argument to sum() was a generator comprehension. The sum function then iterated through it. True. Changing the except clause here to except: return sum(1 for x in iterable) keeps George's optimization (O(1), not O(N), for containers) and is a bit faster (while still O(N)) for non-container iterables. Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
In article [EMAIL PROTECTED], Alex Martelli [EMAIL PROTECTED] wrote: . . . My preference would be (with the original definition for words_of_the_file) to code numwords = sum(1 for w in words_of_the_file(thefilepath)) . . . There are times when numwords = len(list(words_of_the_file(thefilepath)) will be advantageous. For that matter, would it be an advantage for len() to operate on iterables? It could be faster and thriftier on memory than either of the above, and my first impression is that it's sufficiently natural not to offend those of suspicious of language bloat. -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
[EMAIL PROTECTED] (Cameron Laird) writes: For that matter, would it be an advantage for len() to operate on iterables? print len(itertools.count()) Ouch!! -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
True. Changing the except clause here to except: return sum(1 for x in iterable) keeps George's optimization (O(1), not O(N), for containers) and is a bit faster (while still O(N)) for non-container iterables. Every thing was going just great. Now I have to think again. Thank you all. rick -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
Paul Rubin wrote: [EMAIL PROTECTED] (Cameron Laird) writes: For that matter, would it be an advantage for len() to operate on iterables? print len(itertools.count()) Ouch!! How is this worse than list(itertools.count()) ? -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
Cameron Laird [EMAIL PROTECTED] wrote: In article [EMAIL PROTECTED], Alex Martelli [EMAIL PROTECTED] wrote: . . . My preference would be (with the original definition for words_of_the_file) to code numwords = sum(1 for w in words_of_the_file(thefilepath)) . . . There are times when numwords = len(list(words_of_the_file(thefilepath)) will be advantageous. Can you please give some examples? None comes readily to mind... For that matter, would it be an advantage for len() to operate on iterables? It could be faster and thriftier on memory than either of the above, and my first impression is that it's sufficiently natural not to offend those of suspicious of language bloat. I'd be a bit worried about having len(x) change x's state into an unusable one. Yes, it happens in other cases (if y in x:), but adding more such problematic cases doesn't seem advisable to me anyway -- I'd evaluate this proposal as a -0, even taking into account the potential optimizations to be garnered by having some iterables expose __len__ (e.g., a genexp such as (f(x) fox x in foo), without an if-clause, might be optimized to delegate __len__ to foo -- again, there may be semantic alterations lurking that make this optimization a bit iffy). Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
George Sakkis [EMAIL PROTECTED] wrote: Paul Rubin wrote: [EMAIL PROTECTED] (Cameron Laird) writes: For that matter, would it be an advantage for len() to operate on iterables? print len(itertools.count()) Ouch!! How is this worse than list(itertools.count()) ? It's a slightly worse trap because list(x) ALWAYS iterates on x (just like for y in x:), while len(x) MAY OR MAY NOT iterate on x (under Cameron's proposal; it currently never does). Yes, there are other subtle traps of this ilk already in Python, such as if y in x: -- this, too, may or may not iterate. But the fact that a potential problem exists in some corner cases need not be a good reason to extend the problem to higher frequency;-). Alex -- http://mail.python.org/mailman/listinfo/python-list
RE: count items in generator
George Sakkis wrote: Paul Rubin wrote: [EMAIL PROTECTED] (Cameron Laird) writes: For that matter, would it be an advantage for len() to operate on iterables? print len(itertools.count()) Ouch!! How is this worse than list(itertools.count()) ? list(itertools.count()) will eventually fail with a MemoryError. Actually len(itertools.count()) would as well - when a couple of long instances used up everything available - but it would take a *lot* longer. Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
RE: count items in generator
Delaney, Timothy (Tim) wrote: Actually len(itertools.count()) would as well - when a couple of long instances used up everything available - but it would take a *lot* longer. Actually, this would depend on whether len(iterable) used a C integral variable to accumulate the length (which would roll over and never end) or a Python long (which would eventually use up all memory). Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
Delaney, Timothy (Tim) [EMAIL PROTECTED] writes: Actually len(itertools.count()) would as well - when a couple of long instances used up everything available - but it would take a *lot* longer. Actually, this would depend on whether len(iterable) used a C integral variable to accumulate the length (which would roll over and never end) or a Python long (which would eventually use up all memory). That's only because itertools.count itself uses a C int instead of a long. IMO, that's a bug (maybe fixed in 2.5): Python 2.3.4 (#1, Feb 2 2005, 12:11:53) [GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2 Type help, copyright, credits or license for more information. import sys,itertools a=sys.maxint - 3 a 2147483644 b = itertools.count(a) [b.next() for i in range(8)] [2147483644, 2147483645, 2147483646, 2147483647, -2147483648, -2147483647, -2147483646, -2147483645] -- http://mail.python.org/mailman/listinfo/python-list
RE: count items in generator
Paul Rubin wrote: That's only because itertools.count itself uses a C int instead of a long. True. In either case, the effect is the same in terms of whether len(itertools.count()) will ever terminate. Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
Delaney, Timothy (Tim) wrote: George Sakkis wrote: Paul Rubin wrote: [EMAIL PROTECTED] (Cameron Laird) writes: For that matter, would it be an advantage for len() to operate on iterables? print len(itertools.count()) Ouch!! How is this worse than list(itertools.count()) ? list(itertools.count()) will eventually fail with a MemoryError. Actually len(itertools.count()) would as well - when a couple of long instances used up everything available - but it would take a *lot* longer. Tim Delaney That's more of a theoretical argument on why the latter is worse. How many real-world programs are prepared for MemoryError every time they call list(), catch it and handle it graciously ? I'd say that the only reason an exception would be preferable in such case would be debugging; it's nice to have an informative traceback instead of a program that entered an infinite loop. George -- http://mail.python.org/mailman/listinfo/python-list
RE: count items in generator
George Sakkis wrote: Delaney, Timothy (Tim) wrote: list(itertools.count()) will eventually fail with a MemoryError. That's more of a theoretical argument on why the latter is worse. How many real-world programs are prepared for MemoryError every time they call list(), catch it and handle it graciously ? I'd say that the only reason an exception would be preferable in such case would be debugging; it's nice to have an informative traceback instead of a program that entered an infinite loop. That's exactly my point. Assuming your test coverage is good, such an error would be caught by the MemoryError. An infinite loop should also be caught by timing out the tests, but that's much more dependent on the test harness. Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
BartlebyScrivener [EMAIL PROTECTED] wrote: Still new. I am trying to make a simple word count script. I found this in the great Python Cookbook, which allows me to process every word in a file. But how do I use it to count the items generated? def words_of_file(thefilepath, line_to_words=str.split): the_file = open(thefilepath) for line in the_file: for word in line_to_words(line): yield word the_file.close() for word in words_of_file(thefilepath): dosomethingwith(word) The best I could come up with: def words_of_file(thefilepath, line_to_words=str.split): the_file = open(thefilepath) for line in the_file: for word in line_to_words(line): yield word the_file.close() len(list(words_of_file(thefilepath))) But that seems clunky. My preference would be (with the original definition for words_of_the_file) to code numwords = sum(1 for w in words_of_the_file(thefilepath)) Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
BartlebyScrivener wrote: Still new. I am trying to make a simple word count script. I found this in the great Python Cookbook, which allows me to process every word in a file. But how do I use it to count the items generated? def words_of_file(thefilepath, line_to_words=str.split): the_file = open(thefilepath) for line in the_file: for word in line_to_words(line): yield word the_file.close() for word in words_of_file(thefilepath): dosomethingwith(word) The best I could come up with: def words_of_file(thefilepath, line_to_words=str.split): the_file = open(thefilepath) for line in the_file: for word in line_to_words(line): yield word the_file.close() len(list(words_of_file(thefilepath))) But that seems clunky. As clunky as it seems, I don't think you can beat it in terms of brevity; if you care about memory efficiency though, here's what I use: def length(iterable): try: return len(iterable) except: i = 0 for x in iterable: i += 1 return i You can even shadow the builtin len() if you prefer: import __builtin__ def len(iterable): try: return __builtin__.len(iterable) except: i = 0 for x in iterable: i += 1 return i HTH, George -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
Thanks! And thanks for the Cookbook. rd There is no abstract art. You must always start with something. Afterward you can remove all traces of reality.--Pablo Picasso -- http://mail.python.org/mailman/listinfo/python-list
Re: count items in generator
George Sakkis [EMAIL PROTECTED] writes: As clunky as it seems, I don't think you can beat it in terms of brevity; if you care about memory efficiency though, here's what I use: def length(iterable): try: return len(iterable) except: i = 0 for x in iterable: i += 1 return i Alex's example amounted to something like that, for the generator case. Notice that the argument to sum() was a generator comprehension. The sum function then iterated through it. -- http://mail.python.org/mailman/listinfo/python-list