Re: maximum() efficency

2006-03-27 Thread Arne Ludwig
Just for completeness: The functions in Steve's original post named
maximum calculate the minimum.

Also, timing-wise, on my machine with a random list of 20 integers
Steve's iteration version and Mitja's version are about equal, the
system built-in is equal or slightly slower, and Paul's version about
3-4x slower. If the comparison function is very complex, the mileage
may vary of course.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-27 Thread Paul McGuire
Arne Ludwig [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 Just for completeness: The functions in Steve's original post named
 maximum calculate the minimum.

 Also, timing-wise, on my machine with a random list of 20 integers
 Steve's iteration version and Mitja's version are about equal, the
 system built-in is equal or slightly slower, and Paul's version about
 3-4x slower. If the comparison function is very complex, the mileage
 may vary of course.


Arne -

The version I posted (which is not mine by the way, but I've lost the
original citation) becomes much more competitive when run with psyco.  With
your list of 200,000 integers, I believe it will outpace even the C library
built-ins.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-27 Thread Steve R. Hastings
On Mon, 27 Mar 2006 07:50:01 -0800, Arne Ludwig wrote:
 Just for completeness: The functions in Steve's original post named
 maximum calculate the minimum.

Er, yes.  I benchmarked them but clearly I should have function-tested
them as well.



Here is, I hope, my last word on maximum().  It figures out whether it got
an iterator or a list, and does the right thing.  I believe it will
compile and run correctly whether your version of Python has iterators or
not.  And, it doesn't compute the minimum.



def maximum(cmp, seq):
  
  Return maximum value of seq, as chosen by provided cmp() function.

  If seq contains multiple maximal values, return the last one.

  Arguments:
cmp -- function reference; function should return values with the
  same semantics as Python's cmp() function.
seq -- any sequence (a list or an iterator).  If seq is empty, this
  function will raise ValueError.
  
  # is it a list?
  if hasattr(seq, __getitem__):
if len(seq) == 0:
  raise ValueError, seq must contain at least 1 element

maxval = seq[0]
for v in seq:
if cmp(maxval, v) = 0:
  maxval = v
return maxval


  # is it an iterator?
  if hasattr(seq, __iter__) and hasattr(seq, next):
try:
  maxval = seq.next()
except StopIteration:
  raise ValueError, seq must contain at least 1 element

for v in seq:
if cmp(maxval, v) = 0:
  maxval = v
return maxval


  raise TypeError, seq must be a list or an iterator



-- 
Steve R. HastingsVita est
[EMAIL PROTECTED]http://www.blarg.net/~steveha

-- 
http://mail.python.org/mailman/listinfo/python-list


maximum() efficency

2006-03-26 Thread Steve R. Hastings
I was looking at a Python function to find the maximum from a list. 
The original was more complicated; I simplified it.  The built-in max()
function can replace the simplified example, but not the original.


def maximum(lst):
  maxval = lst[0]

  for i in xrange(1, len(lst)):
v = lst[i]
if maxval  v:
  maxval = v

  return maxval



Immediately I started thinking about ways to make this more efficient.  My
first thought was to use iterators:


def maximum(lst):
  itr = iter(lst)
  maxval = itr.next()

  for v in itr:
if maxval  v:
  maxval = v

  return maxval



Then I thought, I wonder if there is a faster solution that *doesn't* use
iterators, for use with older versions of Python.  I came up with:


def maximum(lst):
  a = []
  for v in lst:
try:
  if a[0]  v:
a[0] = v
except IndexError:
  a.append(v)
  return a[0]



Of course, it's a little ugly to use a[0] instead of maxval.  And I'm
trying not to index a list, and here I am indexing another list.  So:


def maximum(lst):
  for v in lst:
try:
  if maxval  v:
maxval = v
except NameError:
  maxval = v
  return maxval



And we come at last to my actual question:  Is this guaranteed to work on
all versions of Python?

In my testing on my computer, the above function works perfectly.  Even
if you do this:

 maxval = 100
 print maximum(cmp, [2, 0, 3, 1])
3

In other words, you get a NameError in the function even if there is a
global variable with that name.  That's with Python 2.4, however... is
this function guaranteed to work in all Python versions?  I am very
comfortable with the a[0] version, since I explicitly set a to an empty
list, I know a[0] will always raise that IndexError.


P.S. I went ahead and benchmarked the above functions, plus one more
that is similar to the a[0] solution but used an empty dictionary instead
of a zero-length list.  I created a list of a million zeroes, and then
stored a 1 at the beginning of the list.  Thus the maximum functions would
spend all their time iterating through values and comparing them, and very
little time updating maxval or its equivalent.  Times to run each function
100 times on the large list:

36.8 seconds  xrange
22.3 seconds  iterator
39.2 seconds  IndexError on a[0]
31.5 seconds  NameError with maxval
43.4 seconds  KeyError on empty dictionary


The conclusions I draw from these numbers:

The sneaky tricks with exceptions don't bring enough speed to be worth
using; two of them were actually slower than the xrange solution.  Even
the NameError one was only a little bit faster, and simple is better than
complex, so I don't think I'd ever actually use it.

The clear winner was the iterator version. It was much faster than the
others, and in my opinion it is simpler and easier to understand than any
of the others.
-- 
Steve R. HastingsVita est
[EMAIL PROTECTED]http://www.blarg.net/~steveha

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Steven Bethard
Steve R. Hastings wrote:
 I was looking at a Python function to find the maximum from a list. 
 The original was more complicated; I simplified it.  The built-in max()
 function can replace the simplified example, but not the original.

What's the original?  Are you sure max can't solve it with an 
appropriate decorate-sort-undecorate (DSU) or the key= argument coming 
in Python 2.5?

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Mitja Trampus
Steve R. Hastings wrote:
 I was looking at a Python function to find the maximum from a list. 
 The original was more complicated; I simplified it.  The built-in max()
 function can replace the simplified example, but not the original.

But you forgot to shuw us the original...

[snip several implementations]

 Of course, it's a little ugly to use a[0] instead of maxval.  And I'm
 trying not to index a list, and here I am indexing another list.  So:
 
 def maximum(lst):
   for v in lst:
 try:
   if maxval  v:
 maxval = v
 except NameError:
   maxval = v
   return maxval
 
 In my testing on my computer, the above function works perfectly.  Even
 if you do this:
 
 maxval = 100
 print maximum(cmp, [2, 0, 3, 1])
 3
 
 In other words, you get a NameError in the function even if there is a
 global variable with that name.  That's with Python 2.4, however... is
 this function guaranteed to work in all Python versions?  

Not 100% sure, but I think yes. People with longer experience in python can 
answer 
definitely. Or you can try it out yourself.
The reason it works is that this:
def f():
print a
a=7; f()
works and prints 7 (global a is used), while this
def f():
print a
a = 12
a=7; f()
does NOT work. Python notices that a is going to get assigned to inside f and 
treats it as 
a local variable. This is also the case with your code. Try out the above 
examples without 
a=7 and notice the different error messages...

 The clear winner was the iterator version. It was much faster than the
 others, and in my opinion it is simpler and easier to understand than any
 of the others.

I would have done it in the same way, but probably without the iterators. I.e., 
like this:

def maximum(lst):
try: m = lst[0]
except (TypeError, IndexError): raise Exception Non-sequence or empty 
sequence given to 
maximum)

# (you forgot the above sanity checks in your code.
# not strictly necessary, but nice to have.)

for x in lst:
if xm: m=x
return m
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Paul McGuire
Steve R. Hastings [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 I was looking at a Python function to find the maximum from a list.
 The original was more complicated; I simplified it.  The built-in max()
 function can replace the simplified example, but not the original.


If you are interested in both the min and max values, here is an algorithm
that performs only 3 comparisons for every 2 list items, instead of the
brute force method's 4 comparisons.  This walks the input list in pairs,
compares the two items to each other, then compares the lesser with the
current min and the greater with the current max.

def minMax(L):
lenL = len(L)
if lenL == 0:
return None,None
if lenL == 1:
return L[0],L[0]

min_ = max_ = L[0]

if lenL % 2:
i = 1
else:
i = 0
while i  lenL:
a,b = L[i],L[i+1]
if a  b:
a,b = b,a
if a  min_: min_ = a
if b  max_: max_ = b
i += 2

return min_,max_

Of course, this much Python bytecode is not near as fast as simply calling
the builtins min() and max().  But, if you add psyco to the mix, things
aren't so cut-and-dried.  I tested this method using a list of
randomly-generated strings, and after the list length exceeds 100-500 or so,
minMax starts to outperform the compiled min() and max().  The following
table shows the timing for the brute force min() and max() calls, followed
by minMax():

List length  1: 0.229 0.612
List length  2: 0.056 0.0001081
List length 10: 0.059 0.707
List length100: 0.154 0.087
List length500: 0.589 0.534
List length   1000: 0.0001176 0.670
List length  1: 0.0011485 0.0008954
List length 10: 0.0126720 0.0077379

Using the OP's test of 1 million zeros with the first zero changed to a 1,
here is the performance of minMax vs. min() and max():

List length 100:  0.1235953 0.0126896

minMax is 10X faster!  Ironically, it's faster calling minMax() than either
min() or max().

If your list contains objects that are expensive to compare, the crossover
list length may be much shorter.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Steve R. Hastings
On Sun, 26 Mar 2006 20:34:28 +0200, Mitja Trampus wrote:
 I would have done it in the same way, but probably without the iterators. 
 I.e., like this:
 
 def maximum(lst):
   try: m = lst[0]
   except (TypeError, IndexError): raise Exception Non-sequence or empty 
 sequence given to 
 maximum)
 
   # (you forgot the above sanity checks in your code.
   # not strictly necessary, but nice to have.)
 
   for x in lst:
   if xm: m=x
   return m

I left out the original sanity check, because I just wanted a streamlined
simple example.


I had a nagging feeling that I was missing something simple, and you have
put your finger on it.  That's perfect!  It's simple, it's clear, and it
will work on any version of Python.  Thanks!
-- 
Steve R. HastingsVita est
[EMAIL PROTECTED]http://www.blarg.net/~steveha

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Steve R. Hastings
On Sun, 26 Mar 2006 10:34:16 -0700, Steven Bethard wrote:
 What's the original?



def minimum(cmp, lst):
  minimum(cmp, lst)

  Returns the minimal element in non-empty list LST with elements
  compared via CMP() which should return values with the same semantics
  as Python's cmp().  If there are several minimal elements, the last
  one is returned.
  

  if not lst:
raise ValueError, 'empty list'

  minval = lst[0]

  for i in xrange(1, len(lst)):
v = lst[i]
if cmp(minval, v)  0:
  minval = v

  return minval



This is from Google's goopy package.

http://goog-goopy.sourceforge.net/

-- 
Steve R. HastingsVita est
[EMAIL PROTECTED]http://www.blarg.net/~steveha

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Steve R. Hastings
Actually, now that I think about it, the version using iter() has one
advantage over your version: it will work correctly if passed either a
list or an iterator.  So, for versions of Python that have iterators, I'd
use the iter() version, but for older versions of Python, I'd use your
version.

P.S. I benchmarked your version; it ran in 22.0 seconds, just a gnat's
whisker faster than the iter() version.
-- 
Steve R. HastingsVita est
[EMAIL PROTECTED]http://www.blarg.net/~steveha

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Paul McGuire
Steve R. Hastings [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 On Sun, 26 Mar 2006 10:34:16 -0700, Steven Bethard wrote:
  What's the original?



 def minimum(cmp, lst):
   minimum(cmp, lst)

   Returns the minimal element in non-empty list LST with elements
   compared via CMP() which should return values with the same semantics
   as Python's cmp().  If there are several minimal elements, the last
   one is returned.
   

   if not lst:
 raise ValueError, 'empty list'

   minval = lst[0]

   for i in xrange(1, len(lst)):
 v = lst[i]
 if cmp(minval, v)  0:
   minval = v

   return minval



 This is from Google's goopy package.

 http://goog-goopy.sourceforge.net/

 -- 
 Steve R. HastingsVita est
 [EMAIL PROTECTED]http://www.blarg.net/~steveha


The doc string is not correct.  If there are several minimal elements, the
*first* one is returned.  See this example:

import math

class Vector(object):
def __init__(self,*x):
# x is list of n-dimensional coordinates
self.x = x

def magnitude(self):
return math.sqrt(sum(x**2 for x in self.x))

def __cmp__(self,other):
return cmp(self.magnitude(), other.magnitude())

def __str__(self):
return Vector(%s) % ,.join(str(x) for x in self.x)

a = Vector(0,1)
b = Vector(0,0)
c = Vector(1,0)

# these print 1, -1, and 0 to show that cmp() and Vector are doing the right
thing
print cmp(a,b)
print cmp(b,a)
print cmp(a,c)

print min(a,b)   # gives Vector(0,0), or b
print max(a,b,c)   # gives Vector(0,1) or a
print min(a,c)# gives Vector(0,1) or a

print minimum(Vector.__cmp__, [a,b])   # gives Vector(0,0), or b
print minimum(Vector.__cmp__, [a,c])# gives Vector(0,1) or a


-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Steven Bethard
Steve R. Hastings wrote:
 On Sun, 26 Mar 2006 10:34:16 -0700, Steven Bethard wrote:
 What's the original?
 
 def minimum(cmp, lst):
   minimum(cmp, lst)
 
   Returns the minimal element in non-empty list LST with elements
   compared via CMP() which should return values with the same semantics
   as Python's cmp().  If there are several minimal elements, the last
   one is returned.
   
 
   if not lst:
 raise ValueError, 'empty list'
 
   minval = lst[0]
 
   for i in xrange(1, len(lst)):
 v = lst[i]
 if cmp(minval, v)  0:
   minval = v
 
   return minval
 
 This is from Google's goopy package.
 
 http://goog-goopy.sourceforge.net/

Ahh.  Yeah, nasty cmp= arguments certainly do screw that up.

If you're not just trying to implement a particular API, and you 
actually have a use-case you think you need this for, you might consider 
a key= argument instead of a cmp= argument.  It'll probably be at least 
as fast, and Python 2.5 will have one on both min() and max().  It'll 
work something like:
 http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/389659


STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: maximum() efficency

2006-03-26 Thread Steve R. Hastings
On Mon, 27 Mar 2006 00:11:04 +, Paul McGuire wrote:
 The doc string is not correct.

The fault is mine, I'm afraid.  I had thought I was copying that from an
intact original version, but I must have edited it.  I don't remember
doing it, but I must have.

To check, I went to Sourceforge and downloaded goopy again.  Here, for
the record, is the original Google maximum():



def maximum(cmp, lst):
  maximum(cmp, lst)

  Returns the maximal element in non-empty list LST with elements
  compared via CMP() which should return values with the same semantics
  as Python's cmp().  If there are several maximal elements, the last
  one is returned.
  

  if not lst:
raise ValueError, 'empty list'

  maxval = lst[0]

  for i in xrange(1, len(lst)):
v = lst[i]
if cmp(maxval, v) = 0:
  maxval = v

  return maxval



I apologize for the error.

P.S. I'm happy to see that Google uses lst for a list.  I do, too.  I
don't much like the convention of using L for a list, but I agree with
Guido that using l is bad (looks too much like a one: 1), so lst it
is.

I would have put the cmp argument second, and made it default to the
built-in Python cmp function if not specified.  Oh, well.
-- 
Steve R. HastingsVita est
[EMAIL PROTECTED]http://www.blarg.net/~steveha

-- 
http://mail.python.org/mailman/listinfo/python-list