Re: count items in generator

2006-05-15 Thread Cameron Laird
In article [EMAIL PROTECTED],
Alex Martelli [EMAIL PROTECTED] wrote:
Cameron Laird [EMAIL PROTECTED] wrote:

 In article [EMAIL PROTECTED],
 Alex Martelli [EMAIL PROTECTED] wrote:
   .
   .
   .
 My preference would be (with the original definition for
 words_of_the_file) to code
 
numwords = sum(1 for w in words_of_the_file(thefilepath))
   .
   .
   .
 There are times when 
 
 numwords = len(list(words_of_the_file(thefilepath))
 
 will be advantageous.

Can you please give some examples?  None comes readily to mind...
.
.
.
Maybe in an alternative universe where Python style emphasizes
functional expressions.  This thread--or at least the follow-ups
to my rather frivolous observation--illustrate how distinct is
Python's direction.

If we could neglect memory impact, and procedural side-effects,
then, sure, I'd argue for my len(list(...)) formulation, on the
expressive grounds that it doesn't require the two magic tokens
'1' and 'w'.  Does category theory have a term for formulas of
the sort that introduce a free variable only to ignore (discard,
...) it?  There certainly are times when that's apt ...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-15 Thread Cameron Laird
In article [EMAIL PROTECTED],
Alex Martelli [EMAIL PROTECTED] wrote:
.
.
.
I'd be a bit worried about having len(x) change x's state into an
unusable one. Yes, it happens in other cases (if y in x:), but adding
more such problematic cases doesn't seem advisable to me anyway -- I'd
evaluate this proposal as a -0, even taking into account the potential
optimizations to be garnered by having some iterables expose __len__
(e.g., a genexp such as (f(x) fox x in foo), without an if-clause, might
be optimized to delegate __len__ to foo -- again, there may be semantic
alterations lurking that make this optimization a bit iffy).


Alex

Quite so.  My proposal isn't at all serious; I'm doing this largely
for practice in thinking about functionalism and its complement in
Python.  However, maybe I should take this one step farther:  while
I think your caution about attractive nuisance is perfect, what is
the precise nuisance here?  Is there ever a time when a developer
would be tempted to evaluate len() on an iterable even though there's
another approach that does NOT impact the iterable's state?  On the
other hand, maybe all we're doing is observing that expanding the
domain of len() means we give up guarantees on its finiteness, and
that's simply not worth doing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-15 Thread Bruno Desthuilliers
George Sakkis a écrit :
(snip)
 def length(iterable):
 try: return len(iterable)
 except:

except TypeError:

 i = 0
 for x in iterable: i += 1
 return i
 
(snip)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread Alex Martelli
Paul Rubin http://[EMAIL PROTECTED] wrote:

 George Sakkis [EMAIL PROTECTED] writes:
  As clunky as it seems, I don't think you can beat it in terms of
  brevity; if you care about memory efficiency though, here's what I use:
  
  def length(iterable):
  try: return len(iterable)
  except:
  i = 0
  for x in iterable: i += 1
  return i
 
 Alex's example amounted to something like that, for the generator
 case.  Notice that the argument to sum() was a generator
 comprehension.  The sum function then iterated through it.

True.  Changing the except clause here to

except: return sum(1 for x in iterable)

keeps George's optimization (O(1), not O(N), for containers) and is a
bit faster (while still O(N)) for non-container iterables.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread Cameron Laird
In article [EMAIL PROTECTED],
Alex Martelli [EMAIL PROTECTED] wrote:
.
.
.
My preference would be (with the original definition for
words_of_the_file) to code

   numwords = sum(1 for w in words_of_the_file(thefilepath))
.
.
.
There are times when 

numwords = len(list(words_of_the_file(thefilepath))

will be advantageous.

For that matter, would it be an advantage for len() to operate
on iterables?  It could be faster and thriftier on memory than
either of the above, and my first impression is that it's 
sufficiently natural not to offend those of suspicious of
language bloat.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread Paul Rubin
[EMAIL PROTECTED] (Cameron Laird) writes:
 For that matter, would it be an advantage for len() to operate
 on iterables?

   print len(itertools.count())

Ouch!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread BartlebyScrivener
 True.  Changing the except clause here to

 except: return sum(1 for x in iterable)

 keeps George's optimization (O(1), not O(N), for containers) and is a
 bit faster (while still O(N)) for non-container iterables.

Every thing was going just great. Now I have to think again.

Thank you all.

rick

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread George Sakkis
Paul Rubin wrote:

 [EMAIL PROTECTED] (Cameron Laird) writes:
  For that matter, would it be an advantage for len() to operate
  on iterables?

print len(itertools.count())
 
 Ouch!!

How is this worse than list(itertools.count()) ?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread Alex Martelli
Cameron Laird [EMAIL PROTECTED] wrote:

 In article [EMAIL PROTECTED],
 Alex Martelli [EMAIL PROTECTED] wrote:
   .
   .
   .
 My preference would be (with the original definition for
 words_of_the_file) to code
 
numwords = sum(1 for w in words_of_the_file(thefilepath))
   .
   .
   .
 There are times when 
 
 numwords = len(list(words_of_the_file(thefilepath))
 
 will be advantageous.

Can you please give some examples?  None comes readily to mind...


 For that matter, would it be an advantage for len() to operate
 on iterables?  It could be faster and thriftier on memory than
 either of the above, and my first impression is that it's 
 sufficiently natural not to offend those of suspicious of
 language bloat.

I'd be a bit worried about having len(x) change x's state into an
unusable one. Yes, it happens in other cases (if y in x:), but adding
more such problematic cases doesn't seem advisable to me anyway -- I'd
evaluate this proposal as a -0, even taking into account the potential
optimizations to be garnered by having some iterables expose __len__
(e.g., a genexp such as (f(x) fox x in foo), without an if-clause, might
be optimized to delegate __len__ to foo -- again, there may be semantic
alterations lurking that make this optimization a bit iffy).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread Alex Martelli
George Sakkis [EMAIL PROTECTED] wrote:

 Paul Rubin wrote:
 
  [EMAIL PROTECTED] (Cameron Laird) writes:
   For that matter, would it be an advantage for len() to operate
   on iterables?
 
 print len(itertools.count())
  
  Ouch!!
 
 How is this worse than list(itertools.count()) ?

It's a slightly worse trap because list(x) ALWAYS iterates on x (just
like for y in x:), while len(x) MAY OR MAY NOT iterate on x (under
Cameron's proposal; it currently never does).

Yes, there are other subtle traps of this ilk already in Python, such as
if y in x: -- this, too, may or may not iterate.  But the fact that a
potential problem exists in some corner cases need not be a good reason
to extend the problem to higher frequency;-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: count items in generator

2006-05-14 Thread Delaney, Timothy (Tim)
George Sakkis wrote:

 Paul Rubin wrote:
 
 [EMAIL PROTECTED] (Cameron Laird) writes:
 For that matter, would it be an advantage for len() to operate
 on iterables?
 
print len(itertools.count())
 
 Ouch!!
 
 How is this worse than list(itertools.count()) ?

list(itertools.count()) will eventually fail with a MemoryError.

Actually len(itertools.count()) would as well - when a couple of long
instances used up everything available - but it would take a *lot*
longer.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: count items in generator

2006-05-14 Thread Delaney, Timothy (Tim)
Delaney, Timothy (Tim) wrote:

 Actually len(itertools.count()) would as well - when a couple of long
 instances used up everything available - but it would take a *lot*
 longer.

Actually, this would depend on whether len(iterable) used a C integral
variable to accumulate the length (which would roll over and never end)
or a Python long (which would eventually use up all memory).

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread Paul Rubin
Delaney, Timothy (Tim) [EMAIL PROTECTED] writes:
  Actually len(itertools.count()) would as well - when a couple of long
  instances used up everything available - but it would take a *lot*
  longer.
 
 Actually, this would depend on whether len(iterable) used a C integral
 variable to accumulate the length (which would roll over and never end)
 or a Python long (which would eventually use up all memory).

That's only because itertools.count itself uses a C int instead of a long.
IMO, that's a bug (maybe fixed in 2.5):

Python 2.3.4 (#1, Feb  2 2005, 12:11:53) 
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type help, copyright, credits or license for more information.
 import sys,itertools
 a=sys.maxint - 3
 a
2147483644
 b = itertools.count(a)
 [b.next() for i in range(8)]
[2147483644, 2147483645, 2147483646, 2147483647, -2147483648,
-2147483647, -2147483646, -2147483645]
 
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: count items in generator

2006-05-14 Thread Delaney, Timothy (Tim)
Paul Rubin wrote:

 That's only because itertools.count itself uses a C int instead of a
 long.

True. In either case, the effect is the same in terms of whether
len(itertools.count()) will ever terminate.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-14 Thread George Sakkis
Delaney, Timothy (Tim) wrote:

 George Sakkis wrote:

  Paul Rubin wrote:
 
  [EMAIL PROTECTED] (Cameron Laird) writes:
  For that matter, would it be an advantage for len() to operate
  on iterables?
 
 print len(itertools.count())
 
  Ouch!!
 
  How is this worse than list(itertools.count()) ?

 list(itertools.count()) will eventually fail with a MemoryError.

 Actually len(itertools.count()) would as well - when a couple of long
 instances used up everything available - but it would take a *lot*
 longer.

 Tim Delaney

That's more of a theoretical argument on why the latter is worse. How
many real-world programs are prepared for MemoryError every time they
call list(), catch it and handle it graciously ? I'd say that the only
reason an exception would be preferable in such case would be
debugging; it's nice to have an informative traceback instead of a
program that entered an infinite loop.

George

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: count items in generator

2006-05-14 Thread Delaney, Timothy (Tim)
George Sakkis wrote:

 Delaney, Timothy (Tim) wrote:
 
 list(itertools.count()) will eventually fail with a MemoryError.
 
 That's more of a theoretical argument on why the latter is worse. How
 many real-world programs are prepared for MemoryError every time they
 call list(), catch it and handle it graciously ? I'd say that the only
 reason an exception would be preferable in such case would be
 debugging; it's nice to have an informative traceback instead of a
 program that entered an infinite loop.

That's exactly my point. Assuming your test coverage is good, such an
error would be caught by the MemoryError. An infinite loop should also
be caught by timing out the tests, but that's much more dependent on the
test harness.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-13 Thread Alex Martelli
BartlebyScrivener [EMAIL PROTECTED] wrote:

 Still new. I am trying to make a simple word count script.
 
 I found this in the great Python Cookbook, which allows me to process
 every word in a file. But how do I use it to count the items generated?
 
 def words_of_file(thefilepath, line_to_words=str.split):
 the_file = open(thefilepath)
 for line in the_file:
 for word in line_to_words(line):
 yield word
 the_file.close()
 for word in words_of_file(thefilepath):
 dosomethingwith(word)
 
 The best I could come up with:
 
 def words_of_file(thefilepath, line_to_words=str.split):
 the_file = open(thefilepath)
 for line in the_file:
 for word in line_to_words(line):
 yield word
 the_file.close()
 len(list(words_of_file(thefilepath)))
 
 But that seems clunky.

My preference would be (with the original definition for
words_of_the_file) to code

   numwords = sum(1 for w in words_of_the_file(thefilepath))


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-13 Thread George Sakkis
BartlebyScrivener wrote:

 Still new. I am trying to make a simple word count script.

 I found this in the great Python Cookbook, which allows me to process
 every word in a file. But how do I use it to count the items generated?

 def words_of_file(thefilepath, line_to_words=str.split):
 the_file = open(thefilepath)
 for line in the_file:
 for word in line_to_words(line):
 yield word
 the_file.close()
 for word in words_of_file(thefilepath):
 dosomethingwith(word)

 The best I could come up with:

 def words_of_file(thefilepath, line_to_words=str.split):
 the_file = open(thefilepath)
 for line in the_file:
 for word in line_to_words(line):
 yield word
 the_file.close()
 len(list(words_of_file(thefilepath)))

 But that seems clunky.

As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i

You can even shadow the builtin len() if you prefer:

import __builtin__

def len(iterable):
try: return __builtin__.len(iterable)
except:
i = 0
for x in iterable: i += 1
return i


HTH,
George

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-13 Thread BartlebyScrivener
Thanks! And thanks for the Cookbook.

rd

There is no abstract art. You must always start with something.
Afterward you can remove all traces of reality.--Pablo Picasso

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: count items in generator

2006-05-13 Thread Paul Rubin
George Sakkis [EMAIL PROTECTED] writes:
 As clunky as it seems, I don't think you can beat it in terms of
 brevity; if you care about memory efficiency though, here's what I use:
 
 def length(iterable):
 try: return len(iterable)
 except:
 i = 0
 for x in iterable: i += 1
 return i

Alex's example amounted to something like that, for the generator
case.  Notice that the argument to sum() was a generator
comprehension.  The sum function then iterated through it.
-- 
http://mail.python.org/mailman/listinfo/python-list