subject:"why memoizing is faster"

Re: why memoizing is faster

2011-03-26 Thread Terry Reedy


On 3/26/2011 12:17 AM, Stefan Behnel wrote:


Not restarted in the sense that it gets cleaned up, though. The above
simply passes an explicit value for it that will be used for the single
call.


Which satisfies the time test need, but...

 Future calls won't be affected.

Good point. If one does fib(1) and wants to dump the cache without 
dumping all references to the function object, one can currently do

  fib_iter.__kwdefaults__['_cache'] = [0,1]
to truly reset the cache, at least with CPython.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread Lie

On Mar 26, 5:10 pm, Terry Reedy tjre...@udel.edu wrote:
 On 3/26/2011 12:17 AM, Stefan Behnel wrote:

  Not restarted in the sense that it gets cleaned up, though. The above
  simply passes an explicit value for it that will be used for the single
  call.

 Which satisfies the time test need, but...

   Future calls won't be affected.

 Good point. If one does fib(1) and wants to dump the cache without
 dumping all references to the function object, one can currently do
    fib_iter.__kwdefaults__['_cache'] = [0,1]
 to truly reset the cache, at least with CPython.

Alternatively:

new_cache = [0, 1]
fib_iter(100, new_cache)
fib_iter(200, new_cache)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread Andrea Crotti

Stefan Behnel stefan...@behnel.de writes:

 Not restarted in the sense that it gets cleaned up, though. The
 above simply passes an explicit value for it that will be used for the
 single call. Future calls won't be affected.

 Stefan

About this global caching thing I thought, but isn't this a source of
possible HUGE memory leaks?

I mean, when is the object _cache freed from the memory?

From my understanding until the function name is in the scope that
object will never be freed, is that correct?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread Steven D'Aprano

On Sat, 26 Mar 2011 12:06:46 +0100, Andrea Crotti wrote:

 About this global caching thing I thought, but isn't this a source of
 possible HUGE memory leaks?

Hypothetically, but probably not that huge. And I wouldn't call it a 
*leak* as such, since you can access the memory and recover it if you 
want to.

The cache given is potentially unbounded in size, but in practical terms, 
it's unlikely to get that big. Say you have a few million key:value pairs 
in a dict:

 d = {}
 for i in range(300):
... d[i] = 2**100 + i
...
 sys.getsizeof(d)
100663432
 sum(sys.getsizeof(k)+sys.getsizeof(v) for k,v in d.items())
125934462

That's of the order of 200 MB of memory -- not that much for today's 
systems. I've had people email me .doc files that big *wink*

But of course, if you want a more sophisticated caching system, you can 
build one. But this will trade off memory for time: the cache will be 
slower, there will be more misses, but you won't use as much memory.


 I mean, when is the object _cache freed from the memory?

When the function is freed, which will happen at program exit, or if you 
explicitly delete the function.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread eryksun ()

On Saturday, March 26, 2011 7:50:36 AM UTC-4, Steven D#39;Aprano wrote:
 
 That's of the order of 200 MB of memory -- not that much for today's 
 systems. I've had people email me .doc files that big *wink*

Yikes! I know this thread is about caching the output of a function, but in the 
example of Fibonacci numbers, we're blessed with an easily derived closed form 
expression (via Z transform, etc):

def fib(n):
phi = (5 ** 0.5 + 1) / 2
f = (phi ** n - (1 - phi) ** n) / 5 ** 0.5
return int(f)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread Steven D'Aprano

On Sat, 26 Mar 2011 07:14:17 -0700, eryksun () wrote:

 On Saturday, March 26, 2011 7:50:36 AM UTC-4, Steven D#39;Aprano wrote:
 
 That's of the order of 200 MB of memory -- not that much for today's
 systems. I've had people email me .doc files that big *wink*
 
 Yikes! I know this thread is about caching the output of a function, but
 in the example of Fibonacci numbers, we're blessed with an easily
 derived closed form expression (via Z transform, etc):
 
 def fib(n):
 phi = (5 ** 0.5 + 1) / 2
 f = (phi ** n - (1 - phi) ** n) / 5 ** 0.5 return int(f)

Have you tried it?

Unfortunately, with floating point rounding error, that rapidly becomes 
inaccurate. It gives:

 fib(72)
498454011879265

compared to the correct value of 498454011879264 (an error of 1) and 
rapidly gets worse. It's true that the above formula is correct for real 
numbers, but floats are not reals. Unless you have an infinite number of 
decimal places, it will be inaccurate.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread Terry Reedy


On 3/26/2011 5:55 AM, Lie wrote:

On Mar 26, 5:10 pm, Terry Reedytjre...@udel.edu  wrote:

On 3/26/2011 12:17 AM, Stefan Behnel wrote:


Not restarted in the sense that it gets cleaned up, though. The above
simply passes an explicit value for it that will be used for the single
call.


Which satisfies the time test need, but...

Future calls won't be affected.

Good point. If one does fib(1) and wants to dump the cache without
dumping all references to the function object, one can currently do
fib_iter.__kwdefaults__['_cache'] = [0,1]
to truly reset the cache, at least with CPython.


Alternatively:

new_cache = [0, 1]
fib_iter(100, new_cache)
fib_iter(200, new_cache)


No, that is not an alternative to what I wrote ;-).
As Stefan pointed out above, passing a value to over-ride a default only 
substitutes for the default in that one call. It is like a temporary 
substitute teacher. It does not replace the default in the function 
object. The assignment I give above is like a permanent replacement 
teacher. The previous cache list will be garbage collected.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread eryksun ()

On Saturday, March 26, 2011 11:49:02 AM UTC-4, Steven D#39;Aprano wrote:
 
  def fib(n):
  phi = (5 ** 0.5 + 1) / 2
  f = (phi ** n - (1 - phi) ** n) / 5 ** 0.5 
  return int(f)
 
 Have you tried it?
 
 Unfortunately, with floating point rounding error, that rapidly becomes 
 inaccurate. It gives:
 
  fib(72)
 498454011879265
 
 compared to the correct value of 498454011879264 (an error of 1) and 
 rapidly gets worse. It's true that the above formula is correct for real 
 numbers, but floats are not reals. Unless you have an infinite number of 
 decimal places, it will be inaccurate.

That's true. If you need more than double precision, obviously it's time to 
defer to the experts. Here's an O(n) algorithm:

http://www.gnu.org/software/gmp/manual/html_node/Fibonacci-Numbers-Algorithm.html

This library is wrapped by the gmpy extension module:

http://code.google.com/p/gmpy

I was just shocked to see 200 MB of memory used up like chump change, and I 
immediately looked for an alternative. In practice, I prefer to rely on 
libraries created by specialists who have carefully worked out the best 
trade-offs for typical use, and puzzled through the algorithms needed to 
compute that which is so easy to derive on paper (such as my simple little 
Z-transform derived formula that fails so wonderfully in practice). I'm not a 
computer scientist; I don't even play one on TV.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread Terry Reedy


On 3/26/2011 7:06 AM, Andrea Crotti wrote:

Stefan Behnelstefan...@behnel.de  writes:


Not restarted in the sense that it gets cleaned up, though. The
above simply passes an explicit value for it that will be used for the
single call. Future calls won't be affected.

Stefan


About this global caching thing I thought, but isn't this a source of
possible HUGE memory leaks?


The original decorator that used one dictionary for possibly caching 
multiple functions could certainly cause memory problems. It would not 
disappear until the decorator and all wrapped functions were gone.



I mean, when is the object _cache freed from the memory?
From my understanding until the function name is in the scope that
object will never be freed, is that correct?


You mean 'while the function name...'. And yes, correct, unless one used 
the trick I posted in response to Stefen:

 fib_iter.__kwdefaults__['_cache'] = [0,1]
The point of this might be to free a cache swollen by a one-time call 
with a large n, without deleting the function itself. One could then 
continue with calls using normal-sized ints.


Caching is a time-space tradeoff that might need tuning to a particular 
application.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread Terry Reedy


On 3/26/2011 11:49 AM, Steven D'Aprano wrote:

On Sat, 26 Mar 2011 07:14:17 -0700, eryksun () wrote:



Yikes! I know this thread is about caching the output of a function, but
in the example of Fibonacci numbers, we're blessed with an easily
derived closed form expression (via Z transform, etc):

 def fib(n):
 phi = (5 ** 0.5 + 1) / 2
 f = (phi ** n - (1 - phi) ** n) / 5 ** 0.5


In the real number system, the above is the exact integer.

  return int(f)

In the real number system, this would by a pure type cast, with no 
truncation. In the float system, where, in general, answers could be a 
bit too small as well as too big, rounding might be better. If one 
truncates real numbers, there is no need (except for fib(0)) to subtract 
(1-phi)**n as that is always less than 1 (except for fib(0)). However..



Have you tried it?


I have been thinking about it. Thanks for doing it ;-).


Unfortunately, with floating point rounding error, that rapidly becomes
inaccurate. It gives:


fib(72)

498454011879265

compared to the correct value of 498454011879264 (an error of 1) and
rapidly gets worse.


So it appears that float_phi  real_phi, so that the powers get 
increasingly too large with n. Even without thisproblem, Rounding would 
happen anyway as soon as fib(n) had 18 digits (for standard 8-byte floats).


If one uses fib(n) in a floating point calculation, the error might not 
matter, but if one is verifying any of the existing exact equality 
theorems or searching for a new one, exact values are necessary.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-26 Thread Steven D'Aprano

On Sat, 26 Mar 2011 10:25:31 -0700, eryksun () wrote about Fibonacci 
function:

 That's true. If you need more than double precision, obviously it's time
 to defer to the experts. Here's an O(n) algorithm:
 
 http://www.gnu.org/software/gmp/manual/html_node/Fibonacci-Numbers-
Algorithm.html


We should always prefer a well-tested, well-profiled and tuned algorithm 
for one we just made up ourselves :)

The GMP algorithm is tuned to use a small cache of (I estimate) a few 
hundred KB, and then uses bit-twiddling to generate higher numbers. That 
would be unacceptably slow in pure Python, but should be acceptable in C.

But again, notice that they are trading off time for space: they care 
more about keeping the memory footprint small than about being fast (a 
luxury you can have when programming in C, where even slow things tend to 
be fast).

[...]
 I was just shocked to see 200 MB of memory used up like chump change,

But it is chump change :)

Seriously, your point is a very good one. If writing a function that uses 
a cache, the right thing to do is to have some sort of strategy for 
dealing with the cache and preventing it from becoming unbounded in size. 
The re module has a very simple-minded strategy: when the number of items 
in the cache hits 100, delete the lot. There are far more sophisticated 
strategies possible, and also an even more simple one: document the cache 
as part of your user interface, and leave it to the caller to manage it.

But having said that, don't be scared of throwing memory at a problem to 
make it fast, especially in a language like Python.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread Andrea Crotti

Terry Reedy tjre...@udel.edu writes:

 For the reason Stefan explained and hinted above. Try the following instead:

 def fib_iter(n, _cache = [1,1]):
   k = len(_cache)
   if n = k:
 for i in range(k, n+1):
_cache.append(_cache[i-2] + _cache[i-1])
   return _cache[n]

 This should be slightly faster than the crazy-key cache for the
 memoized recursive version.


You're right this is much faster, also of the recursive function, so it
was just my iterative version that was bad...

But I don't see what's the difference, I don't get what's the difference
between that and:

def fib_iter(n):
ls = [0, 1]
for i in range(2, n+1):
ls.append(ls[i-2] + ls[i-1])

return ls[n]


Your version runs in 
250 ns, while this version runs in 25.1 us.
How can passing a default _cache argument can make such a difference?

What I find interesting in all this is that the memoize function works
also because python has the mutable types problem as argument of
functions, which every beginner stumbles upon at some point.

If it had a more expected behaviour of instantiating a new cache
dictionary every time this trick would not be possible.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread Stefan Behnel


Andrea Crotti, 25.03.2011 09:49:

Terry Reedy writes:


For the reason Stefan explained and hinted above. Try the following instead:

def fib_iter(n, _cache = [1,1]):
   k = len(_cache)
   if n= k:
 for i in range(k, n+1):
_cache.append(_cache[i-2] + _cache[i-1])
   return _cache[n]

This should be slightly faster than the crazy-key cache for the
memoized recursive version.



You're right this is much faster, also of the recursive function, so it
was just my iterative version that was bad...

But I don't see what's the difference, I don't get what's the difference
between that and:

def fib_iter(n):
 ls = [0, 1]
 for i in range(2, n+1):
 ls.append(ls[i-2] + ls[i-1])

 return ls[n]


Your version runs in
250 ns, while this version runs in 25.1 us.
How can passing a default _cache argument can make such a difference?


Terry's version is playing with the fact that default arguments are only 
instantiated once, i.e. (unless overridden by passing an explicit argument) 
the _cache is shared over all calls to the function. This is similar to 
what your memoization decorator does, but without the additional dict. And, 
it also suffers from the same timing problem that I mentioned before.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread Tim Wintle

On Fri, 2011-03-25 at 09:49 +0100, Andrea Crotti wrote:
 Terry Reedy tjre...@udel.edu writes:
 
  For the reason Stefan explained and hinted above. Try the following instead:
 
  def fib_iter(n, _cache = [1,1]):
k = len(_cache)
if n = k:
  for i in range(k, n+1):
 _cache.append(_cache[i-2] + _cache[i-1])
return _cache[n]
 
 I don't get what's the difference between that and:
 
 def fib_iter(n):
 ls = [0, 1]
 for i in range(2, n+1):
 ls.append(ls[i-2] + ls[i-1])
 
 return ls[n]

 How can passing a default _cache argument can make such a difference?

Default values are initialised at definition time.

Since lists are mutable, the _cache variable really is a cache - the
following time you call the function it will store all the previous
calculations.

e.g.

 def default_value(_cache=[]):
...   _cache.append(len(_cache))
...   print _cache
... 
 default_value()
[0]
 default_value()
[0, 1]
 default_value()
[0, 1, 2]
 default_value()
[0, 1, 2, 3]



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread eryksun ()

On Thursday, March 24, 2011 8:12:22 PM UTC-4, Terry Reedy wrote:

 If Python did what some have asked, which is to make 'recursive' 
 functions actually recursive by replacing a local variable that matches 
 the function name (in this 'fib') with a direct reference to the 
 function itself, as a constant (and this could be done), then the 
 wrapper would not get called from within the function.

Regarding this I have question about function attributes. Is there an 
equivalent of 'self' for functions? I've seen people use function attributes as 
local static variables, but this seems problematic when all you have is a 
global reference to the function. The original reference might get deleted. 
Then when your function tries to reference its 'static' variable you'll get a 
NameError exception. For example:

In [1]: def test1(n):
   ...: test1.a = n

In [2]: test1(10); test1.a
Out[2]: 10

In [3]: test2 = test1; test2.a
Out[3]: 10

In [4]: del test1

In [5]: test2(20); test2.a
---
NameError


Should such a function always take a reference to itself as the first 
parameter?  

In [6]: def test1(f, n):
   ...: f.a = n

In [7]: test1(test1, 10); test1.a
Out[7]: 10

In [8]: test2 = test1; test2.a
Out[8]: 10

In [9]: del test1

In [10]: test2(test2, 20); test2.a
Out[10]: 20

Granted, this would be better handled in a class.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread Andrea Crotti

eryksun () eryk...@gmail.com writes:


 Regarding this I have question about function attributes. Is there an
 equivalent of 'self' for functions? I've seen people use function
 attributes as local static variables, but this seems problematic when
 all you have is a global reference to the function. The original
 reference might get deleted. Then when your function tries to
 reference its 'static' variable you'll get a NameError exception. For
 example:

 In [1]: def test1(n):
...: test1.a = n

 In [2]: test1(10); test1.a
 Out[2]: 10

 In [3]: test2 = test1; test2.a
 Out[3]: 10


[...]

I had never seen such a thing, but why would this make sense at all?
When does it make sense logically for a function to have an attribute?

A function should have some input, some local variables to compute
temporary results and a return value, I think nothing more...
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread eryksun ()

On Friday, March 25, 2011 6:53:48 AM UTC-4, Andrea Crotti wrote:

 I had never seen such a thing, but why would this make sense at all?
 When does it make sense logically for a function to have an attribute?
 
 A function should have some input, some local variables to compute
 temporary results and a return value, I think nothing more...

See Pep 232: http://www.python.org/dev/peps/pep-0232

Also, see the discussion thread on Python-Dev:

http://mail.python.org/pipermail/python-dev/2000-April/thread.html#3282

I don't think self-referenced static variables (as in C's static declaration) 
are discussed as a use case. Nonetheless, I've seen it done. For example:

http://stackoverflow.com/questions/338101/python-function-attributes-uses-and-abuses/338420#338420
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread Andrea Crotti

eryksun () eryk...@gmail.com writes:
 See Pep 232: http://www.python.org/dev/peps/pep-0232

 Also, see the discussion thread on Python-Dev:

 http://mail.python.org/pipermail/python-dev/2000-April/thread.html#3282

 I don't think self-referenced static variables (as in C's static
 declaration) are discussed as a use case. Nonetheless, I've seen it
 done. For example:

 http://stackoverflow.com/questions/338101/python-function-attributes-uses-and-abuses/338420#338420

Ah ok I see the PEP, but actually it's a bit strange that in the PEP
they don't include any effective possible use of it, isn't it?

They normally always have some examples of why and what you can do with
it.
And by the way the first link is broken in the pep
http://www.neustar.biz/python/workshops/1998-11/proceedings/papers/aycock-little/aycock-little.html

Where should that be notified?
I was wondering some time ago, but big knowledge bases like this should
not have some kind of check_validity_of_links function, which checks
if you really get something from the pages that you link to?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread Terry Reedy


On 3/25/2011 4:49 AM, Andrea Crotti wrote:

Terry Reedytjre...@udel.edu  writes:

def fib_iter(n, _cache = [1,1]):
   k = len(_cache)
   if n= k:
 for i in range(k, n+1):
_cache.append(_cache[i-2] + _cache[i-1])
   return _cache[n]


I just realized that the signature really ought to be
def fib_iter(n, *, _cache = [1,1]):
so that the _cache parameter is keyword only (relatively new feature), 
so that it cannot be overwritten by accident but only by intentional 
resetting of what by its name is obviously intended to be a private, do 
not touch value. This pretty much eliminates one objection people have 
had to this sort of thing.


Actually, it should be
  def fib_iter(n, *, _cache = [0,1]):
to match the standard definition that fib(0)=0, etc.
https://secure.wikimedia.org/wikipedia/en/wiki/Fibonacci_number
I just happened to copy the your initialization.
People who omit 0 from the series also usually leave fib(0) undefined.


If it had a more expected behaviour of instantiating a new cache
dictionary every time this trick would not be possible.


Next time someone claims that mutable defaults are not useful, I will 
try to remember to show this example again.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread Terry Reedy


On 3/25/2011 5:16 AM, Stefan Behnel wrote:


Terry's version is playing with the fact that default arguments are only
instantiated once, i.e. (unless overridden by passing an explicit
argument) the _cache is shared over all calls to the function. This is
similar to what your memoization decorator does, but without the
additional dict. And, it also suffers from the same timing problem
that I mentioned before.


It depends on what one wants to measure. If one is memoizing because one 
expects to call the function 1000s of times in some program, then the 
fast lookup time of repeated calls represents the actual 
post-initialization burden on that program. If one want to measure the 
initialization time, then the cache can easily be restarted with each 
call (once one knows how): fib_iter(100,_cache=[0,1])


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-25 Thread Stefan Behnel


Terry Reedy, 25.03.2011 21:18:

On 3/25/2011 5:16 AM, Stefan Behnel wrote:


Terry's version is playing with the fact that default arguments are only
instantiated once, i.e. (unless overridden by passing an explicit
argument) the _cache is shared over all calls to the function. This is
similar to what your memoization decorator does, but without the
additional dict. And, it also suffers from the same timing problem
that I mentioned before.


It depends on what one wants to measure. If one is memoizing because one
expects to call the function 1000s of times in some program, then the fast
lookup time of repeated calls represents the actual post-initialization
burden on that program. If one want to measure the initialization time,
then the cache can easily be restarted with each call (once one knows how):
fib_iter(100,_cache=[0,1])


Not restarted in the sense that it gets cleaned up, though. The above 
simply passes an explicit value for it that will be used for the single 
call. Future calls won't be affected.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list

why memoizing is faster

2011-03-24 Thread Andrea Crotti

I was showing a nice memoize decorator to a friend using the classic
fibonacci problem.

--8---cut here---start-8---
  def memoize(f, cache={}):
  def g(*args, **kwargs):
  # first must create a key to store the arguments called
  # it's formed by the function pointer, *args and **kwargs
  key = ( f, tuple(args), frozenset(kwargs.items()) )
  # if the result is not there compute it, store and return it
  if key not in cache:
  cache[key] = f(*args, **kwargs)
  
  return cache[key]
  
  return g
  
  # without the caching already for 100 it would be unfeasible
  @memoize
  def fib(n):
  if n = 1:
  return 1
  else:
  return fib(n-1) + fib(n-2)
  
--8---cut here---end---8---


It works very well, but he said that the iterative version would be
anyway faster.

But I tried this
  
--8---cut here---start-8---
  def fib_iter(n):
  ls = {0: 1, 1:1}
  for i in range(2, n+1):
  ls[i] = ls[i-1] + ls[i-2]
  
  return ls[max(ls)]
--8---cut here---end---8---

and what happens is surprising, this version is 20 times slower then the
recursive memoized version.

I first had a list of the two last elements and I thought that maybe the
swaps were expensive, now with a dictionary it should be very fast.

Am I missing something?
Thanks, 
Andrea
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-24 Thread Stefan Behnel


Andrea Crotti, 24.03.2011 14:48:

I was showing a nice memoize decorator to a friend using the classic
fibonacci problem.

--8---cut here---start-8---
   def memoize(f, cache={}):
   def g(*args, **kwargs):
   # first must create a key to store the arguments called
   # it's formed by the function pointer, *args and **kwargs
   key = ( f, tuple(args), frozenset(kwargs.items()) )
   # if the result is not there compute it, store and return it
   if key not in cache:
   cache[key] = f(*args, **kwargs)

   return cache[key]

   return g

   # without the caching already for 100 it would be unfeasible
   @memoize
   def fib(n):
   if n= 1:
   return 1
   else:
   return fib(n-1) + fib(n-2)

--8---cut here---end---8---


It works very well, but he said that the iterative version would be
anyway faster.

But I tried this

--8---cut here---start-8---
   def fib_iter(n):
   ls = {0: 1, 1:1}
   for i in range(2, n+1):
   ls[i] = ls[i-1] + ls[i-2]

   return ls[max(ls)]
--8---cut here---end---8---

and what happens is surprising, this version is 20 times slower then the
recursive memoized version.


What have you used for timing? timeit? Note that the memoized version 
saves the results globally, so even the end result is cached, and the next 
call takes the result from there, instead of actually doing anything. The 
timeit module repeatedly calls the code and only gives you the best runs, 
i.e. those where the result was already cached.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-24 Thread Terry Reedy


On 3/24/2011 9:48 AM, Andrea Crotti wrote:


   def fib_iter(n):
   ls = {0: 1, 1:1}


Storing a linear array in a dict is a bit bizarre


   for i in range(2, n+1):
   ls[i] = ls[i-1] + ls[i-2]

   return ls[max(ls)]


So is using max(keys) to find the highest index, which you already know 
(it is n). Actually, fib_iter(0) will return fib_iter(1), and in general 
that would be wrong. Above should just be ls[n].


Third, why toss away the array only to recalculate with each call?
Memoizing *is* faster ;-). So do it for the iterative version too.


and what happens is surprising, this version is 20 times slower then the
recursive memoized version.


For the reason Stefan explained and hinted above. Try the following instead:

def fib_iter(n, _cache = [1,1]):
  k = len(_cache)
  if n = k:
for i in range(k, n+1):
   _cache.append(_cache[i-2] + _cache[i-1])
  return _cache[n]

This should be slightly faster than the crazy-key cache for the memoized 
recursive version.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-24 Thread Terry Reedy


On 3/24/2011 9:48 AM, Andrea Crotti wrote:

I was showing a nice memoize decorator to a friend using the classic
fibonacci problem.

--8---cut here---start-8---
   def memoize(f, cache={}):
   def g(*args, **kwargs):
   # first must create a key to store the arguments called
   # it's formed by the function pointer, *args and **kwargs
   key = ( f, tuple(args), frozenset(kwargs.items()) )
   # if the result is not there compute it, store and return it
   if key not in cache:
   cache[key] = f(*args, **kwargs)

   return cache[key]

   return g

   # without the caching already for 100 it would be unfeasible
   @memoize
   def fib(n):
   if n= 1:
   return 1
   else:
   return fib(n-1) + fib(n-2)


The irony of this is that memoizing 'recursive' functions with a 
decorator depends on the fact the Python does not have truly recursive 
functions. A function cannot call itself directly. It can only call 
whatever callable is bound externally to its definition name. Hence, 
memoize rebinds the definition name (in this case 'fib') to a wrapper, 
so that the function once known as 'fib' no longer calls itself but 
calls the wrapper instead.


If Python did what some have asked, which is to make 'recursive' 
functions actually recursive by replacing a local variable that matches 
the function name (in this 'fib') with a direct reference to the 
function itself, as a constant (and this could be done), then the 
wrapper would not get called from within the function.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-24 Thread Fons Adriaensen

On Thu, Mar 24, 2011 at 08:12:22PM -0400, Terry Reedy wrote:

 The irony of this is that memoizing 'recursive' functions with a  
 decorator depends on the fact the Python does not have truly recursive  
 functions. A function cannot call itself directly.

I wonder what exactly is meant by that last sentence:

* It doesn't happen (since the function name is evaluated
  to find the function to be called, as you explained).

or

* Even if a variable pointing directly to the function
  were provided (as a global or function argument),
  calling it is not supposed to work (for some reason).

??

Ciao,

-- 
FA

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

2011-03-24 Thread Terry Reedy


On 3/24/2011 8:26 PM, Fons Adriaensen wrote:

On Thu, Mar 24, 2011 at 08:12:22PM -0400, Terry Reedy wrote:


The irony of this is that memoizing 'recursive' functions with a
decorator depends on the fact the Python does not have truly recursive
functions. A function cannot call itself directly.


I wonder what exactly is meant by that last sentence:

* It doesn't happen (since the function name is evaluated
   to find the function to be called, as you explained).


In particular, the function name is resolved to an object when the 
function is called and executed, rather than when the function is 
compiled. To be more exact, I should have said, A user-writen, 
Python-coded function cannot call itself This is assuming that we 
exclude implementation-dependent hackery that subverts Python semantics 
by exploiting implementation-specific information that is not part of 
the language or its specification.


Consider

def f(): return f()

This books bad. Infinite recursion. Bad programmer...
Now continue with

f_orig=f
def f(): return 1
print(f_orig())
# 1

In context, f_orig is not recursive at all, in spite of its appearance.

Now consider this proposal: Introduce a keyword or special name, such as 
'__func__', that means 'this function', to be resolved as such just 
after compilation. With that, def f: return __func__() would be 
directly recursive, and any wrapper would be ignored by the __func__() call.


Wrappers are not needed to memoize with recursion either:

def fib_recur(n, _cache = [1,1]):
  if n = len(_cache):
_cache.append(fib_recur(n-2) + fib_recur(n-1))
  return _cache[n]

Perhaps prettier than fib_iter, posted before, but subject to external 
rebinding and stack overflow.


Both fib_recur and fib_iter calculate fib(100) as 573147844013817084101 
and fib(1000) as 
7033036771142281582183525487718354977018126983635873274260490508715453711819693357974224949456261173348775044924176599108818636326545022364710601205337412127386733998139373125598767690091902245245323403501

in a small fraction of a second.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

Re: why memoizing is faster

27 matches

Site Navigation

Mail list logo

Footer information