On Sun, 3 May 2015 12:20 am, Cecil Westerhof wrote: [...] > But to my surprise tail recursion could even be more efficient. I > wrote two different versions of factorial with self implemented tail > recursion. For bigger values both are more efficient. And I expect > that if the tail recursion is done by the compiler instead of by hand, > it will be a little faster still.
I decided to do some experimentation. Firstly, when timing code, one should minimise the amount of "scaffolding" used around the code you care about. Using a single function like this: def factorial_iterative(x, flag): if flag: # first implementation else: # second implementation is a bad idea, because you are not just timing the implementations, but also the scaffolding that selects between them. Also, you increase the size of the function, which increases the chance of cache misses and other processor effects. So first step is to split the implementations into separate functions. Here are my four implementations, the first two are taken from your code: def factorial_forloop1(n): assert n >= 0 result = 1 for i in range(2, n + 1): result *= i return result def factorial_forloop2(n): # Version with a silly extra variable. assert n >= 0 result = 1 j=2 for i in range(2, n + 1): result *= j j += 1 return result from operator import mul try: reduce except NameError: from functools import reduce def factorial_reduce(n): assert n >= 0 return reduce(mul, range(2, n+1), 1) def factorial_while(n): assert n >= 0 result = 1 while n > 1: result *= n n -= 1 return result A quick test to verify that they return the same results: py> factorial_while(10) 3628800 py> factorial_reduce(10) 3628800 py> factorial_forloop1(10) 3628800 py> factorial_forloop2(10) 3628800 There's no point in optimising code that does the wrong thing! Now, let's do some timing tests. It is best to use a well-tested timing framework rather than invent your own dodgy one, so I use the timeit module. Read the comments in the module to see why rolling your own is a bad idea. I'll start with some relative small input sizes. Remember, the inputs may be small, but the outputs will be very large. from timeit import Timer code = "fact(10); fact(20); fact(30)" t1 = Timer(code, setup="from __main__ import factorial_while as fact") t2 = Timer(code, setup="from __main__ import factorial_reduce as fact") t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact") t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact") I'm in a bit of a hurry, so I may be a bit more slap-dash than normal. Normally, I would pick a larger number of trials, and a larger number of iterations per trial, but here I'm going to use just best of three trials, each of 10000 iterations each: for t in (t1, t2, t3, t4): print(min(t.repeat(repeat=3, number=10000))) which prints: 0.22797810286283493 # while 0.17145151272416115 # reduce 0.16230526939034462 # for-loop 0.22819281555712223 # silly for-loop (Comments added by me.) See my earlier post for why the minimum is the only statistic worth looking at. These results show that: - the version using while is significantly slower than the more straightforward iterative versions using either reduce or a for loop. - adding an extra, unnecessary, variable to the for-loop, and incrementing it by hand, carries as much overhead as using a while loop; - reduce is slightly slower than a pure Python for-loop (at least according to this simple trial, on my computer -- your results may vary); - the obvious winner is the straightforward iterative version with a for-loop. Now I'm going to test it with a larger input: py> big = factorial_forloop1(50000) py> sys.getsizeof(big) # number of bytes 94460 py> len(str(big)) # number of digits 213237 Recreate the timer objects, and (again, because I'm in something of a hurry) do a best-of-three with just 2 iterations per trial. code = "fact(50000)" t1 = Timer(code, setup="from __main__ import factorial_while as fact") t2 = Timer(code, setup="from __main__ import factorial_reduce as fact") t3 = Timer(code, setup="from __main__ import factorial_forloop1 as fact") t4 = Timer(code, setup="from __main__ import factorial_forloop2 as fact") for t in (t1, t2, t3, t4): print(min(t.repeat(repeat=3, number=2))) which takes about two minutes on my computer, and prints: 8.604736926034093 # while loop 10.786483339965343 # reduce 10.91099695302546 # for loop 10.821452282369137 # silly version of the for loop (Again, annotations are by me.) These results are fascinating, and rather surprising. I think that they demonstrate that, at this size of input argument, the time is dominated by processing the BigNum int objects, not the iteration: whether we use reduce, a straight-forward for-loop, or a for-loop with an extra variable makes very little difference, of the order of 1%. (I wouldn't read too much into the fact that the for-loop which does *more* work, but manually adjusting a second variable, is slightly faster. Unless that can be proven to be consistently the case, I expect that is just random noise. I ran some quick trials, and did not replicate that result: py> for t in (t3, t4, t3, t4): ... print(min(t.repeat(repeat=3, number=1))) ... 5.282072028145194 # for-loop 5.415546240285039 # silly for-loop 5.358346642926335 # for-loop 5.328046130016446 # silly for-loop Note that, as expected, doing two iterations per trial takes about twice as long as one iteration per trial: 5 seconds versus 10. What is surprising is that for very large input like this, the while loop is significantly faster than reduce or either of the for-loops. I cannot explain that result. (Just goes to show that timing code in Python can surprise you when you least expect it.) Oh, and for the record, I'm using Python 3.3 on Linux: py> sys.version '3.3.0rc3 (default, Sep 27 2012, 18:44:58) \n[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)]' Results may vary on other platforms and versions. -- Steven -- https://mail.python.org/mailman/listinfo/python-list