Re: Throw the cat among the pigeons
On Thu, May 7, 2015 at 7:14 PM, Alain Ketterlin wrote: > Dave Angel writes: > >> On 05/06/2015 11:36 AM, Alain Ketterlin wrote: >>> Yes, plus the time for memory allocation. Since the code uses "r *= >>> ...", space is reallocated when the result doesn't fit. The new size is >>> probably proportional to the current (insufficient) size. This means >>> that overall, you'll need fewer reallocations, because allocations are >>> made in bigger chunks. >> >> That sounds plausible, but a=5; a*=4 does not update in place. It >> calculates and creates a new object. Updating lists can work as you >> say, but an int is immutable. > > Ah, okay. Even for big ints? If that is the case, my suggestion doesn't > explain anything. Anyway, with so many allocations for so little > arithmetic, the difference is probably due to the behavior of the > allocator (which maybe always finds blocks big enough, since one was > released after the previous multiplication, or something like that). The > only way to know would be to profile the VM. Yes, all integers are immutable. This is true regardless of the size of the integer, because: x = some_big_long_calculation() y = x y += 1 should never change the value of x. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
Dave Angel writes: > On 05/06/2015 11:36 AM, Alain Ketterlin wrote: >> Yes, plus the time for memory allocation. Since the code uses "r *= >> ...", space is reallocated when the result doesn't fit. The new size is >> probably proportional to the current (insufficient) size. This means >> that overall, you'll need fewer reallocations, because allocations are >> made in bigger chunks. > > That sounds plausible, but a=5; a*=4 does not update in place. It > calculates and creates a new object. Updating lists can work as you > say, but an int is immutable. Ah, okay. Even for big ints? If that is the case, my suggestion doesn't explain anything. Anyway, with so many allocations for so little arithmetic, the difference is probably due to the behavior of the allocator (which maybe always finds blocks big enough, since one was released after the previous multiplication, or something like that). The only way to know would be to profile the VM. > It's an optimization that might be applied if the code generator were > a lot smarter, (and if the ref count is exactly 1), but it would then > be confusing to anyone who used id(). "Abandon all hope, ye [optimizer] who enter here." Thanks for the clarification. -- Alain. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wed, 06 May 2015 09:12:05 -0400, Dave Angel wrote: > Remember the days when you knew how many cycles each assembly > instruction took, and could simply add them up to compare algorithms? I do! I do! :-) And then the MC68020 came out, and instruction execution overlapped in weird (but predictable) ways, and interrupts would have different (unpredictable) latency depending on how much internal state of the CPU had to be saved and restored. Man, am I *old*. Dan -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 06/05/2015 17:17, Ian Kelly wrote: On Wed, May 6, 2015 at 1:08 AM, Steven D'Aprano Besides, "typical load" is a myth -- there is no such thing. A high-end Windows web server getting ten thousand hits a minute, a virtual machine starved for RAM, a Mac laptop, a Linux server idling away with a load of 0.1 all day... any of those machines could run your code. How can you *possibly* say what is typical? The very idea is absurd. Agreed. I must disagree with this, on the grounds that a "typical load" of old cobblers are frequently spouted on this list. Practicality beats purity any day of the week. But no, this '=' is misused, it should have been ':='. Who really cares, if you don't like the language, pick another one, there's thousands of them to choose from. Ah, another chance for me to plug the virtues of CORAL 66 and CORAL 250. Capability violation anybody? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 05/06/2015 11:36 AM, Alain Ketterlin wrote: Yes, plus the time for memory allocation. Since the code uses "r *= ...", space is reallocated when the result doesn't fit. The new size is probably proportional to the current (insufficient) size. This means that overall, you'll need fewer reallocations, because allocations are made in bigger chunks. That sounds plausible, but a=5; a*=4 does not update in place. It calculates and creates a new object. Updating lists can work as you say, but an int is immutable. It's an optimization that might be applied if the code generator were a lot smarter, (and if the ref count is exactly 1), but it would then be confusing to anyone who used id(). -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
Ian Kelly writes: > That was my initial thought as well, but the problem is that this > actually predicts the *opposite* of what is being reported: upward > should be less expensive, not more. Wait, what? Hmm, you're right. Needed coffee, will think about it more later. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wed, May 6, 2015 at 1:08 AM, Steven D'Aprano wrote: > On Wednesday 06 May 2015 15:58, Ian Kelly wrote: > >> On Tue, May 5, 2015 at 7:27 PM, Steven D'Aprano >> wrote: >>> Only the minimum is statistically useful. >> >> I disagree. The minimum tells you how fast the code *can* run, under >> optimal circumstances. The mean tells you how fast it *realistically* >> runs, under typical load. Both can be useful to measure. > > Er, not even close. Running code using timeit is in no way the same as > running code "for real" under realistic circumstances. The fact that you are > running the function or code snippet in isolation, in its own scope, via > exec, rather than as part of some larger script or application, should be a > hint. timeit itself has overhead, so you cannot measure the time taken by > the operation alone, you can only measure the time taken by the operation > within the timeit environment. We have no reason to think that the > distribution of noise under timeit will be even vaguely similar to the noise > when running in production. You also can't be sure that the base time taken by the operation in your development environment will be comparable to the time taken in production; different system architectures may produce different results, and what is faster on your workstation may be slower on a server. Also, different algorithms may react to load differently. For example, an algorithm that goes to different parts of memory frequently may start thrashing sooner than an algorithm with better spatial locality if the system is paging a lot. I'll grant that just computing the means on a workstation that is not under a controlled load is not the best way to measure this -- but a difference in mean that is not simply proportional to the difference in min is still potentially useful information. > The purpose of timeit is to compare individual algorithms, in as close as > possible to an equal footing with as little noise as possible. If you want > to profile code used in a realistic application, use a profiler, not timeit. > And even that doesn't tell you how fast the code would be alone, because the > profiler adds overhead. > > Besides, "typical load" is a myth -- there is no such thing. A high-end > Windows web server getting ten thousand hits a minute, a virtual machine > starved for RAM, a Mac laptop, a Linux server idling away with a load of 0.1 > all day... any of those machines could run your code. How can you *possibly* > say what is typical? The very idea is absurd. Agreed. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wed, May 6, 2015 at 9:12 AM, Paul Rubin wrote: > Steven D'Aprano writes: >> Multiplying upwards seems to be more expensive than multiplying >> downwards... I can only guess that it has something to do with the way >> multiplication is implemented, or perhaps the memory management >> involved, or something. Who the hell knows? > > It seems pretty natural if multiplication uses the obvious > quadratic-time pencil and paper algorithm. The cost of multiplying m*n > is basically w(m)*w(n) where w(x) is the width of x in machine words. > So for factorial where m is the counter and n is the running product, > w(m) is always 1 while w(n) is basically log2(n!). From > > from math import log > def xfac(seq): > cost = logfac = 0.0 > for i in seq: > logfac += log(i,2) > cost += logfac > return cost > > def upward(n): return xfac(xrange(1,n+1)) > def downward(n): return xfac(xrange(n,1,-1)) > > print upward(4),downward(4) > > I get: 10499542692.6 11652843833.5 > > A lower number for upward than downward. The difference isn't as large > as your timings, but I think it still gives some explanation of the > effect. That was my initial thought as well, but the problem is that this actually predicts the *opposite* of what is being reported: upward should be less expensive, not more. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
Paul Rubin writes: > Steven D'Aprano writes: >> Multiplying upwards seems to be more expensive than multiplying >> downwards... I can only guess that it has something to do with the way >> multiplication is implemented, or perhaps the memory management >> involved, or something. Who the hell knows? > > It seems pretty natural if multiplication uses the obvious > quadratic-time pencil and paper algorithm. The cost of multiplying m*n > is basically w(m)*w(n) where w(x) is the width of x in machine words. > So for factorial where m is the counter and n is the running product, > w(m) is always 1 while w(n) is basically log2(n!). From > > from math import log > def xfac(seq): > cost = logfac = 0.0 > for i in seq: > logfac += log(i,2) > cost += logfac > return cost > > def upward(n): return xfac(xrange(1,n+1)) > def downward(n): return xfac(xrange(n,1,-1)) > > print upward(4),downward(4) > > I get: 10499542692.6 11652843833.5 > > A lower number for upward than downward. The difference isn't as large > as your timings, but I think it still gives some explanation of the > effect. Yes, plus the time for memory allocation. Since the code uses "r *= ...", space is reallocated when the result doesn't fit. The new size is probably proportional to the current (insufficient) size. This means that overall, you'll need fewer reallocations, because allocations are made in bigger chunks. -- Alain. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
Steven D'Aprano writes: > Multiplying upwards seems to be more expensive than multiplying > downwards... I can only guess that it has something to do with the way > multiplication is implemented, or perhaps the memory management > involved, or something. Who the hell knows? It seems pretty natural if multiplication uses the obvious quadratic-time pencil and paper algorithm. The cost of multiplying m*n is basically w(m)*w(n) where w(x) is the width of x in machine words. So for factorial where m is the counter and n is the running product, w(m) is always 1 while w(n) is basically log2(n!). From from math import log def xfac(seq): cost = logfac = 0.0 for i in seq: logfac += log(i,2) cost += logfac return cost def upward(n): return xfac(xrange(1,n+1)) def downward(n): return xfac(xrange(n,1,-1)) print upward(4),downward(4) I get: 10499542692.6 11652843833.5 A lower number for upward than downward. The difference isn't as large as your timings, but I think it still gives some explanation of the effect. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 05/06/2015 09:55 AM, Chris Angelico wrote: On Wed, May 6, 2015 at 11:12 PM, Dave Angel wrote: I had guessed that the order of multiplication would make a big difference, once the product started getting bigger than the machine word size. Reason I thought that is that if you multiply starting at the top value (and end with multiplying by 2) you're spending more of the time multiplying big-ints. That's why I made sure that both Cecil's and my implementations were counting up, so that wouldn't be a distinction. I'm still puzzled, as it seems your results imply that big-int*int is faster than int*int where the product is also int. Are you using Python 2 or Python 3 for your testing? In Py3, there's no type difference, and barely no performance difference as you cross the word-size boundary. (Bigger numbers are a bit slower to work with, but not hugely.) Both Cecil and I are using 3.x I'm using 3.4 in particular. And I know int covers both big-int and int32. that's why I called it big-int, rather than long. I was, however, mistaken. it's not that threshold that we're crossing here, but another one, for MUCH larger numbers. factorial of 10 and of 20 have 456473 and 97350 digits, respectively. In binary, that would be about 190k bytes and 404k bytes, respectively. I was seeing factorial of 20 taking about 4.5 times as long as factorial of 10. All the other increments seemed fairly proportional. I'll bet the difference is something like the memory allocator using a different algorithm for blocks above 256k. Or the cache logic hitting a threshold. If it's caching, of course the threshold will differ wildly between machine architectures. If it's the memory allocator, that could easily vary between Python versions as well. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wed, May 6, 2015 at 11:12 PM, Dave Angel wrote: > I had guessed that the order of multiplication would make a big difference, > once the product started getting bigger than the machine word size. > > Reason I thought that is that if you multiply starting at the top value (and > end with multiplying by 2) you're spending more of the time multiplying > big-ints. > > That's why I made sure that both Cecil's and my implementations were > counting up, so that wouldn't be a distinction. > > I'm still puzzled, as it seems your results imply that big-int*int is faster > than int*int where the product is also int. Are you using Python 2 or Python 3 for your testing? In Py3, there's no type difference, and barely no performance difference as you cross the word-size boundary. (Bigger numbers are a bit slower to work with, but not hugely.) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 05/06/2015 02:26 AM, Steven D'Aprano wrote: On Wednesday 06 May 2015 14:05, Steven D'Aprano wrote: My interpretation of this is that the difference has something to do with the cost of multiplications. Multiplying upwards seems to be more expensive than multiplying downwards, a result I never would have predicted, but that's what I'm seeing. I can only guess that it has something to do with the way multiplication is implemented, or perhaps the memory management involved, or something. Who the hell knows? I had guessed that the order of multiplication would make a big difference, once the product started getting bigger than the machine word size. Reason I thought that is that if you multiply starting at the top value (and end with multiplying by 2) you're spending more of the time multiplying big-ints. That's why I made sure that both Cecil's and my implementations were counting up, so that wouldn't be a distinction. I'm still puzzled, as it seems your results imply that big-int*int is faster than int*int where the product is also int. That could use some more testing, though. I still say a cutoff of about 10% is where we should draw the line in an interpretive system. Below that, you're frequently measuring noise and coincidence. Remember the days when you knew how many cycles each assembly instruction took, and could simply add them up to compare algorithms? -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wednesday 06 May 2015 15:58, Ian Kelly wrote: > On Tue, May 5, 2015 at 7:27 PM, Steven D'Aprano > wrote: >> Only the minimum is statistically useful. > > I disagree. The minimum tells you how fast the code *can* run, under > optimal circumstances. The mean tells you how fast it *realistically* > runs, under typical load. Both can be useful to measure. Er, not even close. Running code using timeit is in no way the same as running code "for real" under realistic circumstances. The fact that you are running the function or code snippet in isolation, in its own scope, via exec, rather than as part of some larger script or application, should be a hint. timeit itself has overhead, so you cannot measure the time taken by the operation alone, you can only measure the time taken by the operation within the timeit environment. We have no reason to think that the distribution of noise under timeit will be even vaguely similar to the noise when running in production. The purpose of timeit is to compare individual algorithms, in as close as possible to an equal footing with as little noise as possible. If you want to profile code used in a realistic application, use a profiler, not timeit. And even that doesn't tell you how fast the code would be alone, because the profiler adds overhead. Besides, "typical load" is a myth -- there is no such thing. A high-end Windows web server getting ten thousand hits a minute, a virtual machine starved for RAM, a Mac laptop, a Linux server idling away with a load of 0.1 all day... any of those machines could run your code. How can you *possibly* say what is typical? The very idea is absurd. -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wednesday 06 May 2015 14:05, Steven D'Aprano wrote: [...] Here are those anomalous timing results again: > code = "fact(5)" > t1 = Timer(code, setup="from __main__ import factorial_while as fact") > t2 = Timer(code, setup="from __main__ import factorial_reduce as fact") > t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact") > t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact") > for t in (t1, t2, t3, t4): > print(min(t.repeat(repeat=3, number=2))) > > > which takes about two minutes on my computer, and prints: > > 8.604736926034093 # while loop > 10.786483339965343 # reduce > 10.91099695302546 # for loop > 10.821452282369137 # silly version of the for loop [...] > What is surprising is that for very large input like this, the while loop > is significantly faster than reduce or either of the for-loops. I cannot > explain that result. I re-ran the results on a different computer, and got similar results: 7.364120149984956 9.26512472704053 9.141491871327162 9.16900822892785 The while loop is surprisingly faster for calculating fact(5). A thought came to mind: the while loop is different from the other three versions, in that it performs the multiplications from n down to 2, rather than 2 up to n. Maybe that has something to do with it? Introducing version five of the factorial: def factorial_reduce2(n): assert n >= 0 return reduce(mul, range(n, 1, -1), 1) t5 = Timer(code, setup="from __main__ import factorial_reduce2 as fact") for t in (t1, t2, t3, t4, t5): print(min(t.repeat(repeat=3, number=2))) And results: 7.36792928725481 # while loop 9.271950023248792 # reduce, counting up 9.14769447594881 # for loop 9.154150342568755 # silly for loop 7.430811045691371 # reduce, counting down My interpretation of this is that the difference has something to do with the cost of multiplications. Multiplying upwards seems to be more expensive than multiplying downwards, a result I never would have predicted, but that's what I'm seeing. I can only guess that it has something to do with the way multiplication is implemented, or perhaps the memory management involved, or something. Who the hell knows? Just to be sure: def factorial_forloop3(n): assert n >= 0 result = 1 for i in range(n, 1, -1): result *= i return result t6 = Timer(code, setup="from __main__ import factorial_forloop3 as fact") print(min(t6.repeat(repeat=3, number=2))) which prints 7.36256698705256. Lesson to be learned: what you think is important may not be, and what you think is irrelevant may be. Historical fact: for a period of about 60 years, long after the trick to curing and preventing scurvy was known, the British navy suffered greatly from scurvy again (although not to the same degree as they had been in the 17th and 18th centuries). The problem was due to confusion between *lemons* and *limes*. The navy started by using the term "lime" for both lemons and sweet limes from the Mediterranean, as well as West Indian limes: three different citrus fruit. To cut costs and encourage British commerce, the navy gradually stopping buying lemons from Spain and swapped over almost entirely to West Indian limes. Unfortunately limes, and especially West Indian limes, have significantly less Vitamin C than lemons, and the sailors' daily allowance was about 75% less effective on ships that used limes than for those that used Spanish lemons. Scurvy, which had practically be eradicated in the 1850s and 60s, returned, and was a significant factor in World War One. (Especially for the French and Russian armies.) This simple confusion wasn't sorted out until the 1920s, long after Vitamin C was isolated and named. Apparently nobody noticed, or cared, that the two different fruit tasted different and looked different, or concluded that perhaps they had different medicinal properties. But then, for many decades, vinegar and dilute sulphuric acid were recommended as cures for scurvy *solely* on the basis that they were acidic just like proven cures oranges, lemons and sauerkraut. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Tue, May 5, 2015 at 7:27 PM, Steven D'Aprano wrote: > Only the minimum is statistically useful. I disagree. The minimum tells you how fast the code *can* run, under optimal circumstances. The mean tells you how fast it *realistically* runs, under typical load. Both can be useful to measure. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Sun, 3 May 2015 12:20 am, Cecil Westerhof wrote: [...] > But to my surprise tail recursion could even be more efficient. I > wrote two different versions of factorial with self implemented tail > recursion. For bigger values both are more efficient. And I expect > that if the tail recursion is done by the compiler instead of by hand, > it will be a little faster still. I decided to do some experimentation. Firstly, when timing code, one should minimise the amount of "scaffolding" used around the code you care about. Using a single function like this: def factorial_iterative(x, flag): if flag: # first implementation else: # second implementation is a bad idea, because you are not just timing the implementations, but also the scaffolding that selects between them. Also, you increase the size of the function, which increases the chance of cache misses and other processor effects. So first step is to split the implementations into separate functions. Here are my four implementations, the first two are taken from your code: def factorial_forloop1(n): assert n >= 0 result = 1 for i in range(2, n + 1): result *= i return result def factorial_forloop2(n): # Version with a silly extra variable. assert n >= 0 result = 1 j=2 for i in range(2, n + 1): result *= j j += 1 return result from operator import mul try: reduce except NameError: from functools import reduce def factorial_reduce(n): assert n >= 0 return reduce(mul, range(2, n+1), 1) def factorial_while(n): assert n >= 0 result = 1 while n > 1: result *= n n -= 1 return result A quick test to verify that they return the same results: py> factorial_while(10) 3628800 py> factorial_reduce(10) 3628800 py> factorial_forloop1(10) 3628800 py> factorial_forloop2(10) 3628800 There's no point in optimising code that does the wrong thing! Now, let's do some timing tests. It is best to use a well-tested timing framework rather than invent your own dodgy one, so I use the timeit module. Read the comments in the module to see why rolling your own is a bad idea. I'll start with some relative small input sizes. Remember, the inputs may be small, but the outputs will be very large. from timeit import Timer code = "fact(10); fact(20); fact(30)" t1 = Timer(code, setup="from __main__ import factorial_while as fact") t2 = Timer(code, setup="from __main__ import factorial_reduce as fact") t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact") t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact") I'm in a bit of a hurry, so I may be a bit more slap-dash than normal. Normally, I would pick a larger number of trials, and a larger number of iterations per trial, but here I'm going to use just best of three trials, each of 1 iterations each: for t in (t1, t2, t3, t4): print(min(t.repeat(repeat=3, number=1))) which prints: 0.22797810286283493 # while 0.17145151272416115 # reduce 0.16230526939034462 # for-loop 0.22819281555712223 # silly for-loop (Comments added by me.) See my earlier post for why the minimum is the only statistic worth looking at. These results show that: - the version using while is significantly slower than the more straightforward iterative versions using either reduce or a for loop. - adding an extra, unnecessary, variable to the for-loop, and incrementing it by hand, carries as much overhead as using a while loop; - reduce is slightly slower than a pure Python for-loop (at least according to this simple trial, on my computer -- your results may vary); - the obvious winner is the straightforward iterative version with a for-loop. Now I'm going to test it with a larger input: py> big = factorial_forloop1(5) py> sys.getsizeof(big) # number of bytes 94460 py> len(str(big)) # number of digits 213237 Recreate the timer objects, and (again, because I'm in something of a hurry) do a best-of-three with just 2 iterations per trial. code = "fact(5)" t1 = Timer(code, setup="from __main__ import factorial_while as fact") t2 = Timer(code, setup="from __main__ import factorial_reduce as fact") t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact") t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact") for t in (t1, t2, t3, t4): print(min(t.repeat(repeat=3, number=2))) which takes about two minutes on my computer, and prints: 8.604736926034093 # while loop 10.786483339965343 # reduce 10.91099695302546 # for loop 10.821452282369137 # silly version of the for loop (Again, annotations are by me.) These results are fascinating, and rather surprising. I think that they demonstrate that, at this size of input argument, the time is dominated by processing the BigNum int objects, not the iteration: whether we use reduce, a straight-forward for-loop, or a for-loop with an extra variable makes very
Re: Throw the cat among the pigeons
On Wed, May 6, 2015 at 12:57 PM, Steven D'Aprano wrote: > On Wed, 6 May 2015 05:42 am, Cecil Westerhof wrote: > >> I would say that a variable that is filled by a range is different as >> a normal variable. Do not ask me why. ;-) > > > I would say that you are wrong. If I have understood you correctly, that > cannot possibly be the case in Python, all Python variables work the same > way[1], and none of them can remember where they came from. My reading of the code was that one had two variables and the other had just one. Could be the noise in the numbers came from cache locality differences. No difference between the variables, just a peculiarity of CPUs and how they use memory. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wed, 6 May 2015 05:42 am, Cecil Westerhof wrote: > I would say that a variable that is filled by a range is different as > a normal variable. Do not ask me why. ;-) I would say that you are wrong. If I have understood you correctly, that cannot possibly be the case in Python, all Python variables work the same way[1], and none of them can remember where they came from. [1] Well, technically local variables and global variables are implemented differently, locals inside a function use a C array and globals use a dict, but apart from a few implementation details like that, any two variables in the same scope[2] operate identically. [2] Anyone who raises the issue of exec() or import * inside a function body in Python 2 will be slapped with a halibut. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wed, 6 May 2015 02:18 am, Cecil Westerhof wrote: > Well, I did not write many tail recursive functions. But what surprised > me was that for large values the ‘tail recursive’ version was more > efficient as the iterative version. And that was with myself > implementing the tail recursion. I expect the code to be more > efficient when the compiler implements the tail recursion. You cannot know that. Python is not C or Java, and your intuition as to what will be fast and what will be slow will not be accurate in Python. Python has its own fast paths, and slow paths, and they are not the same as C's or Java's. I have been using Python for over 15 years, and I would not want to guess whether a hypothetical compiler-generated tail-recursion-elimination function would be faster or slower than manually unrolling the recursion into a while loop. I am surprised that you find a manually unrolled version with a while loop is faster than an iterative version. I assume the iterative version uses a for-loop. In my experience, a for-loop is faster than a while loop. But perhaps that depends on the exact version and platform. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Wed, 6 May 2015 07:23 am, Ian Kelly wrote: > On Tue, May 5, 2015 at 3:00 PM, Dave Angel wrote: >> def loop(func, funcname, arg): >> start = time.time() >> for i in range(repeats): >> func(arg, True) >> print("{0}({1}) took {2:7.4}".format(funcname, arg, >> time.time()-start)) >> >> start = time.time() >> for i in range(repeats): >> func(arg) >> print("{0}({1}) took {2:7.4}".format(funcname, arg, >> time.time()-start)) > > Note that you're explicitly passing True in one case but leaving the > default in the other. I don't know whether that might be responsible > for the difference you're seeing. It will be responsible for *some* difference, it is certainly faster to pass one argument than two, but likely an entirely trivial amount compared to the time to iterate some large number of times. > Also, it's best to use the timeit module for timing code, e.g.: > from timeit import Timer t1 = Timer("factorial_iterative(10, False)", "from __main__ import factorial_iterative") t1.repeat(10, number=1) > [3.8517245299881324, 3.7571076710009947, 3.7780062559759244, > 3.848508063936606, 3.7627131739864126, 3.8278848479967564, > 3.776115525048226, 3.83024005102925, 3.8322679550619796, > 3.8195601429324597] min(_), sum(_) / len(_) > (3.7571076710009947, 3.8084128216956743) Only the minimum is statistically useful. The reason is that every timing measurement has an error, due to random fluctuations of the OS, CPU pipelining, caches, etc, but that error is always positive. The noise always makes the code take longer to run, not faster! So we can imagine every measurement is made up of the "true" value T, plus an error, where T = the perfectly repeatable timing that the function would take to run if we could somehow eliminate every possible source of noise and error in the system. We can't, of course, but we would like to estimate T as closely as possible. Without loss of generality, suppose we collect five timings: times = [T+ε1, T+ε2, T+ε3, T+ε4, T+ε5] where the epsilons ε are unknown errors due to noise. We don't know the distribution of the errors, except that no epsilon can be less than zero. We want an estimate for T, something which comes as close as possible to T. It should be obvious that the following relationships hold: mean(times) = T + mean([ε1, ε2, ε3, ε4, ε5]) min(times) = T + min([ε1, ε2, ε3, ε4, ε5]) in other words, both the mean of the timings and the minimum of the timings are equivalent to the "true" timing, plus an error. It should also be obvious that the mean of the epsilons must be larger than the smallest of the errors (except in the miraculous case where all the epsilons are exactly the same). So the statistic with the minimum error is the minimum. Any other stat, whether you use the mean, geometric mean, harmonic mean, mode, median, or anything else, will have a larger error and be a worse estimate for the "true" value T. All because the errors are one-sided: they always make the timing take longer, never less time. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 5/5/2015 5:12 PM, Cecil Westerhof wrote: Op Tuesday 5 May 2015 22:46 CEST schreef Terry Reedy: Well, I did not write many tail recursive functions. But what surprised me was that for large values the ‘tail recursive’ version was more efficient as the iterative version. In your first thread, what you mislabelled 'tail recursive version' was an iterative while loop version That is because Python has no tail recursion, Yes is does. Please stop using 'tail recursion' to refer to the absence of recursion or the process of removing recursion. I wrote a tail recursive function, and I believe you did too. What Python does not have is automatic tail call optimization, or specifically, tail recursion *elimination*. Both possibilities, the general and specific, have been considered and rejected for excellent reasons. I hinted at some of them in my post. See https://en.wikipedia.org/wiki/Tail_call for 'tail recursion' and 'tail call elimination' What I believe you showed is that a while loop can be faster than a more or less equivalent for loop that produces the same result by a slightly different method. That is not surprising. Relative timings for CPython vary a few percent between different combinations of Python version, C compiler and settings, operating system, and cpu and other hardware. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 05/05/2015 05:39 PM, Ian Kelly wrote: On Tue, May 5, 2015 at 3:23 PM, Ian Kelly wrote: On Tue, May 5, 2015 at 3:00 PM, Dave Angel wrote: def loop(func, funcname, arg): start = time.time() for i in range(repeats): func(arg, True) print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start)) start = time.time() for i in range(repeats): func(arg) print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start)) Note that you're explicitly passing True in one case but leaving the default in the other. I don't know whether that might be responsible for the difference you're seeing. I don't think that's the cause, but I do think that it has something to do with the way the timing is being run. When I run your loop function, I do observe the difference. If I reverse the order so that the False case is tested first, I observe the opposite. That is, the slower case is consistently the one that is timed *first* in the loop function, regardless of which case that is. I created two functions and called them with Timeit(), and the difference is now below 3% And when I take your lead and double the loop() function so it runs each test twice, I get steadily decreasing numbers. I think all of this has been noise caused by the caching of objects including function objects. I was surprised by this, as the loops are small enough I'd figure the function object would be fully cached the first time it was called. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Tue, May 5, 2015 at 3:23 PM, Ian Kelly wrote: > On Tue, May 5, 2015 at 3:00 PM, Dave Angel wrote: >> def loop(func, funcname, arg): >> start = time.time() >> for i in range(repeats): >> func(arg, True) >> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start)) >> >> start = time.time() >> for i in range(repeats): >> func(arg) >> print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start)) > > Note that you're explicitly passing True in one case but leaving the > default in the other. I don't know whether that might be responsible > for the difference you're seeing. I don't think that's the cause, but I do think that it has something to do with the way the timing is being run. When I run your loop function, I do observe the difference. If I reverse the order so that the False case is tested first, I observe the opposite. That is, the slower case is consistently the one that is timed *first* in the loop function, regardless of which case that is. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
Op Tuesday 5 May 2015 22:46 CEST schreef Terry Reedy: >> Well, I did not write many tail recursive functions. But what >> surprised me was that for large values the ‘tail recursive’ version >> was more efficient as the iterative version. > > In your first thread, what you mislabelled 'tail recursive version' > was an iterative while loop version That is because Python has no tail recursion, so I needed to program the tail recursion myself. Tail recursion would do under the hood what I did there manually. > while the 'iterative version' > was an iterative for loop version. In this thread, you just posted > timings without code. I will not believe your claim until I see one > file that I can run myself with an actual tail recursive function, > as above, that beats the equivalent while or for loop version. https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Tue, May 5, 2015 at 3:00 PM, Dave Angel wrote: > def loop(func, funcname, arg): > start = time.time() > for i in range(repeats): > func(arg, True) > print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start)) > > start = time.time() > for i in range(repeats): > func(arg) > print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start)) Note that you're explicitly passing True in one case but leaving the default in the other. I don't know whether that might be responsible for the difference you're seeing. Also, it's best to use the timeit module for timing code, e.g.: >>> from timeit import Timer >>> t1 = Timer("factorial_iterative(10, False)", "from __main__ import >>> factorial_iterative") >>> t1.repeat(10, number=1) [3.8517245299881324, 3.7571076710009947, 3.7780062559759244, 3.848508063936606, 3.7627131739864126, 3.8278848479967564, 3.776115525048226, 3.83024005102925, 3.8322679550619796, 3.8195601429324597] >>> min(_), sum(_) / len(_) (3.7571076710009947, 3.8084128216956743) >>> t2 = Timer("factorial_iterative(10, True)", "from __main__ import >>> factorial_iterative") >>> t2.repeat(10, number=1) [3.8363616950809956, 3.753201302024536, 3.7838632150087506, 3.7670978900277987, 3.805312803015113, 3.7682680500438437, 3.856655619922094, 3.796431727008894, 3.8224815409630537, 3.765664782957174] >>> min(_), sum(_) / len(_) (3.753201302024536, 3.7955338626052253) As you can see, in my testing the True case was actually marginally (probably not significantly) faster in both the min and the average. -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 05/05/2015 04:30 PM, Ian Kelly wrote: On Tue, May 5, 2015 at 12:45 PM, Dave Angel wrote: When the "simple" is True, the function takes noticeably and consistently longer. For example, it might take 116 instead of 109 seconds. For the same counts, your code took 111. I can't replicate this. What version of Python is it, and what value of x are you testing with? I've looked at dis.dis(factorial_iterative), and can see no explicit reason for the difference. My first thought is that maybe it's a result of the branch. Have you tried swapping the branches, or reimplementing as separate functions and comparing? Logic is quite simple: def factorial_iterative(x, simple=False): assert x >= 0 result = 1 j=2 if not simple: for i in range(2, x + 1): #print("range value is of type", type(i), "and value", i) #print("ordinary value is of type", type(j), "and value", j) result *= i j += 1 else: for i in range(2, x + 1): result *= j j += 1 return result def loop(func, funcname, arg): start = time.time() for i in range(repeats): func(arg, True) print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start)) start = time.time() for i in range(repeats): func(arg) print("{0}({1}) took {2:7.4}".format(funcname, arg, time.time()-start)) repeats = 1 and arg is 10**4 loop(factorial_iterative, "factorial_iterative ", arg) My actual program does the same thing with other versions of the function, including Cecil's factorial_tail_recursion, and my optimized version of that. Python 3.4.0 (default, Apr 11 2014, 13:05:11) [GCC 4.8.2] on linux factorial_iterative (10) took 3.807 factorial_iterative (10) took 3.664 factorial_iterative (20) took 17.07 factorial_iterative (20) took15.3 factorial_iterative (30) took 38.93 factorial_iterative (30) took 36.01 Note that I test them in the opposite order of where they appear in the function. That's because I was originally using the simple flag to test an empty loop. The empty loop is much quicker either way, so it's not the issue. (But if it were, the for-range version is much quicker). I think I'll take your last suggestion and write separate functions. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 5/5/2015 12:18 PM, Cecil Westerhof wrote: Op Tuesday 5 May 2015 17:47 CEST schreef Paul Moore: On Sunday, 3 May 2015 16:23:59 UTC+1, Cecil Westerhof wrote: By the way: I think that even if the recursion does not go further as 500, it is still a good idea to use tail recursion. Why use stack space when it is not necessary? I pushed the example to GitHub: https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py You already know this, as your code shows, but tail call recursion elimination is only possible when you have a *direct* tail call (one An 'indirect tail call' would not be a tail call. with the result of the tail call returned immediately to the caller). Even the standard trivial factorial example doesn't have a direct tail call, without rewriting to use an accumulator variable. Which is a non-intuitive transformation to anyone who's not familiar with recursive functional languages and their idioms. If you're rewriting your recursive function *anyway*, it's not that much harder in many (most?) cases to rewrite it iteratively. For count functions, the main change between tail recursion and while iteration is replacing 'if' with 'while' and converting the tail call to assignment. (One may have to reverse the if-else first to put the tail call in the if branch.) from math import factorial as fac print(fac(0), fac(1), fac(2), fac(6)) def fac_rt(n, i=2, res=1): if i <= n: return fac_rt(n, i+1, res*i) else: return res def fac_iw(n): i = 2 res = 1 while i <= n: i, res = i+1, res*i return res for i in (0, 1, 2, 6): print(fac(i) == fac_rt(i) == fac_iw(i)) >>> 1 1 2 720 True True True True For collection functions that process each item once, 'for item in collection: ...' is nearly always easier to write in the first place. An example of a function that naturally uses direct tail call recursion, but which doesn't have a simple iterative rewrite, would be more compelling. Simple, easily converted functions like the above, with one recursive call in one branch of an if-else, are the most common. Something with multiple mutually exclusive tail calls, like the following def f_rt1(*args): if nonbase1: return f(*revised-args1) elif base_condition: return base(args) else: return f(*revised-args2) must be converted to if-else with all tail calls in the if branch. def f_rt2(*args): if not base_condition: if nonbase1: return f(*revised-args1) else: return f(*revised-args2) else: return base(args) Conversion would then be simple. The complication is that the 'base_condition' in the original version might not really be the base condition due to a dependence on nonbase1 being false. This is a generic problem with reordering if-elif statements. For non-linear (branching) recursion, in which multiple recursive calls may be made for one function call, the last recursive call may be a tail call. An example is in-place quick sort. Eliminating just the tail call may not be simple, but it also does not eliminate the possibility of blowing the call stack. To do that, one must eliminate all recursive calls by adding explicit stacks. Well, I did not write many tail recursive functions. But what surprised me was that for large values the ‘tail recursive’ version was more efficient as the iterative version. In your first thread, what you mislabelled 'tail recursive version' was an iterative while loop version while the 'iterative version' was an iterative for loop version. In this thread, you just posted timings without code. I will not believe your claim until I see one file that I can run myself with an actual tail recursive function, as above, that beats the equivalent while or for loop version. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Tue, May 5, 2015 at 12:45 PM, Dave Angel wrote: > When the "simple" is True, the function takes noticeably and consistently > longer. For example, it might take 116 instead of 109 seconds. For the > same counts, your code took 111. I can't replicate this. What version of Python is it, and what value of x are you testing with? > I've looked at dis.dis(factorial_iterative), and can see no explicit reason > for the difference. My first thought is that maybe it's a result of the branch. Have you tried swapping the branches, or reimplementing as separate functions and comparing? -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
Op Tuesday 5 May 2015 20:45 CEST schreef Dave Angel: > On 05/05/2015 12:18 PM, Cecil Westerhof wrote: > >> >> Well, I did not write many tail recursive functions. But what >> surprised me was that for large values the ‘tail recursive’ version >> was more efficient as the iterative version. And that was with >> myself implementing the tail recursion. I expect the code to be >> more efficient when the compiler implements the tail recursion. >> > > > You've said that repeatedly, so I finally took a look at your > webpage > > https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py > > I didn't have your framework to call the code, so I just extracted > some functions and did some testing. I definitely need to take care of documentation. It can be called with: python3 mathDecebal.py --factorial The problem is that it will do correctness and speed test. I have to split those in two different things. And use a different file for both. Maybe make a directory test and put a correctness_.py and a speed_.py. > I do see some differences, > where the so-called tail_recursive functions are sometimes faster, > but I did some investigating to try to determine why. > > > I came up with the conclusion that sometimes the multiply operation > takes longer than other times. And in particular, i can see more > variation between the two following loops than between your two > functions. > > > def factorial_iterative(x, simple=False): > assert x >= 0 > result = 1 > j=2 > if not simple: > for i in range(2, x + 1): > result *= i > j += 1 > else: > for i in range(2, x + 1): > result *= j > j += 1 > pass > > return result > > When the "simple" is True, the function takes noticeably and > consistently longer. For example, it might take 116 instead of 109 > seconds. For the same counts, your code took 111. > > I've looked at dis.dis(factorial_iterative), and can see no explicit > reason for the difference. I would say that a variable that is filled by a range is different as a normal variable. Do not ask me why. ;-) Even if you (general not personal you) think that the tail recursion is a waist of time, this is an interesting result I think. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On 05/05/2015 12:18 PM, Cecil Westerhof wrote: Well, I did not write many tail recursive functions. But what surprised me was that for large values the ‘tail recursive’ version was more efficient as the iterative version. And that was with myself implementing the tail recursion. I expect the code to be more efficient when the compiler implements the tail recursion. You've said that repeatedly, so I finally took a look at your webpage https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py I didn't have your framework to call the code, so I just extracted some functions and did some testing. I do see some differences, where the so-called tail_recursive functions are sometimes faster, but I did some investigating to try to determine why. I came up with the conclusion that sometimes the multiply operation takes longer than other times. And in particular, i can see more variation between the two following loops than between your two functions. def factorial_iterative(x, simple=False): assert x >= 0 result = 1 j=2 if not simple: for i in range(2, x + 1): result *= i j += 1 else: for i in range(2, x + 1): result *= j j += 1 pass return result When the "simple" is True, the function takes noticeably and consistently longer. For example, it might take 116 instead of 109 seconds. For the same counts, your code took 111. I've looked at dis.dis(factorial_iterative), and can see no explicit reason for the difference. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
Op Tuesday 5 May 2015 17:47 CEST schreef Paul Moore: > On Sunday, 3 May 2015 16:23:59 UTC+1, Cecil Westerhof wrote: >>> By the way: I think that even if the recursion does not go further >>> as 500, it is still a good idea to use tail recursion. Why use >>> stack space when it is not necessary? >> >> I pushed the example to GitHub: >> https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py > > You already know this, as your code shows, but tail call recursion > elimination is only possible when you have a *direct* tail call (one > with the result of the tail call returned immediately to the > caller). Even the standard trivial factorial example doesn't have a > direct tail call, without rewriting to use an accumulator variable. > Which is a non-intuitive transformation to anyone who's not familiar > with recursive functional languages and their idioms. > > If you're rewriting your recursive function *anyway*, it's not that > much harder in many (most?) cases to rewrite it iteratively. > > An example of a function that naturally uses direct tail call > recursion, but which doesn't have a simple iterative rewrite, would > be more compelling. Not particularly compelling (to me, anyway) even > so, but still better than factorial or fibonnaci examples. Well, I did not write many tail recursive functions. But what surprised me was that for large values the ‘tail recursive’ version was more efficient as the iterative version. And that was with myself implementing the tail recursion. I expect the code to be more efficient when the compiler implements the tail recursion. -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
On Sunday, 3 May 2015 16:23:59 UTC+1, Cecil Westerhof wrote: > > By the way: I think that even if the recursion does not go further > > as 500, it is still a good idea to use tail recursion. Why use stack > > space when it is not necessary? > > I pushed the example to GitHub: > https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py You already know this, as your code shows, but tail call recursion elimination is only possible when you have a *direct* tail call (one with the result of the tail call returned immediately to the caller). Even the standard trivial factorial example doesn't have a direct tail call, without rewriting to use an accumulator variable. Which is a non-intuitive transformation to anyone who's not familiar with recursive functional languages and their idioms. If you're rewriting your recursive function *anyway*, it's not that much harder in many (most?) cases to rewrite it iteratively. An example of a function that naturally uses direct tail call recursion, but which doesn't have a simple iterative rewrite, would be more compelling. Not particularly compelling (to me, anyway) even so, but still better than factorial or fibonnaci examples. Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: Throw the cat among the pigeons
Op Saturday 2 May 2015 16:20 CEST schreef Cecil Westerhof: > I am throwing the cat among the pigeons. ;-) > > In another thread I mentioned that I liked to have tail recursion in > Python. To be clear not automatic, but asked for. > > Looking at the replies I did hit a nerve. But I still want to > continue. > > Some things are better expressed recursively for the people reading > the code. But there are two problems with that: > - You can get out of stack space > - It is less efficient > > Most of the time the first problem is the most important. > > When I write factorial (I know it is already written, but I use it > as an example to show a point), the recursive variant can not be > called with 1.000 without tail recursion. So for functions that > could go very deep, tail recursion would be a blessing. > > By the way: I think that even if the recursion does not go further > as 500, it is still a good idea to use tail recursion. Why use stack > space when it is not necessary? I pushed the example to GitHub: https://github.com/CecilWesterhof/PythonLibrary/blob/master/mathDecebal.py -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list