Hi, On Fri, 25 Feb 2022 at 16:22, BELAHCENE Abdelkader < abdelkader.belahc...@enst.dz> wrote:
> Hi, > What do you mean? > the python and C programs are not equivalent? > > The python and C programs are NOT equivalent. (surface level they are: they both calculate the triangular number of N, N times, inefficiently and slowly, but the way they do it is radically different) Let's specifically compare the num.py and the C versions. The pure python version *should* always be slower, so it's irrelevant here. (NB, I show a version below where the pure python version is quicker than both :-) ) The num.py version: * It creates an array containing the numbers 1 to n. It then call's num.py's sum function with that array n times. The C version : * Calls a function n times. That function has a tight loop using a long that sums all the numbers from 1 to n. These are *very* different operations. The former can be vectorised, and while I don't use num.py, and while I'm not a betting person I would bet a Mars bar that the num.py version will throw the array at a SIMD implementation. A cursory look at the source to numpy does indeed show that this is the case (fx: grabs a mars bar :) ), and not only that it's optimised for a wide variety of CPU architectures. https://github.com/numpy/numpy/tree/main/numpy/core/src/common/simd If you don't know what that means, take a look at this -- https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions numpy however supports multiple versions of SIMD - including things like this: https://en.wikipedia.org/wiki/AVX-512 - which is pretty neat. Upshot - numpy will /essentially/ just throw the entire array at the CPU and say "add these, give me the result". And it'll be done. (OK there's a bit more to it, but that's the principle) By contrast your naive C version has to repeatedly create stack frames, push arguments onto it, pop arguments allocate memory on the stack, etc. It's also single threaded, and has no means of telling the CPU "add all these together, and I don't care how", so it literally is one after the other. The actual work it's doing to get the same answer is simply much greater and many many more clock cycles. If you took the same approach as numpy it's *possible* you *might* get something similarly fast. (But you'd have a steep learning curve and you'd likely only optimise for one architecture...) Aside: What size is the *value* the C version is creating? Well it's just (n * n+1 ) /2 - primary school triangular number stuff. So 50K is 1250025000 - which is a 31 bit number. So the C version will fail on a 32 bit machine as soon as you try over 65536 as the argument... (On a 64bit machine admittedly the fall over number is admittedly higher... :-) ) C and C++ *are* faster than pure python. That's pretty much always going to be the case ( *except* for a specialising python compiler that compiles a subset of python to either C/C++/Rust or assembler). However, python + highly optimised C/C++/etc libraries will likely outperform naive C/C++ code - as you've ably demonstrated here. Why? Because while on the surface the two programs are vaguely similar - calculate the triangular number of N, N times. In practice, the way that they do it is so different, you get dramatically different results. As a bonus - you can get faster than both your num.py and C versions with pure python, by several orders of magnitude. You can then squeeze out a few more percent out of it. Output from a pure python version of a piece of code that performs the same operation - calculates the triangle number of N, N times: michael@conceptual:~$ ./triangles.py 15000000 time using pure python: 1.81 sec time using pure python & memoisation: 1.7 sec Note that's 15 million, not 50,000 - and it took the same time for my machine (recent core i7) as numpy did for you on your machine for just 50,000. That's not because I have a faster machine (I expect I don't). What's different? Well the pure python version is this - it recognises you were calculating the triangular number for N and just calculates that instead :-) def normaltriangle(n): return (n*(n+1))/2 ... called like this .. tm1=it.timeit(stmt=lambda: normaltriangle(count), number=count) print(f"time using pure python: {round(tm1,2)} sec") The (naive) memoisation version is this: def memoise(f): cache = {} def g(n): if n in cache: return cache[n] else: X = f(n) cache[n] = X return X return g @memoise def triangle(n): return (n*(n+1))/2 ... called like this ... tm2=it.timeit(stmt=lambda: triangle(count), number=count) print(f"time using pure python & memoisation: {round(tm2,2)} sec") Original file containing both: #!/usr/bin/python3 import timeit as it def memoise(f): cache = {} def g(n): if n in cache: return cache[n] else: X = f(n) cache[n] = X return X return g @memoise def triangle(n): return (n*(n+1))/2 def normaltriangle(n): return (n*(n+1))/2 if __name__ == "__main__": import sys count = int(sys.argv[1]) tm1=it.timeit(stmt=lambda: normaltriangle(count), number=count) print(f"time using pure python: {round(tm1,2)} sec") tm2=it.timeit(stmt=lambda: triangle(count), number=count) print(f"time using pure python & memoisation: {round(tm2,2)} sec") I guess the point here is this: focussing on the language/tool C vs python ignores how "well" the C code is written and how "well" the python code is written. However worrying about *that* rather than "is there a smarter way of doing this" means you can sometimes miss the 300 fold speed up (vs numpy) or 6000 fold speed up (vs your python). I must admit as well, I was a little surprised that the memoised version worked even this well with such simple code. But the point is when comparing C to numpy, you're really comparing *your* C with *numpy's* C code, and as you can see from their code, your repeated calls are very different from a vectorised call to CPU processor extensions designed specifically for this sort of work... Gets even sillier (in a fun way :) ) if you can throw the code at several hundred GPU cores... Hope this was vaguely interesting... Michael. On Fri, 25 Feb 2022 at 16:22, BELAHCENE Abdelkader < abdelkader.belahc...@enst.dz> wrote: > Hi, > What do you mean? > the python and C programs are not equivalent? > > > Le ven. 25 févr. 2022 à 10:42, Giorgio Zoppi <giorgio.zo...@gmail.com> a > écrit : > >> Well, >> numpy is written in C :) Maybe your C is not the numpy equivalent? >> Best Regards, >> Giorgio >> >> Il giorno ven 25 feb 2022 alle ore 09:03 BELAHCENE Abdelkader < >> abdelkader.belahc...@enst.dz> ha scritto: >> >>> Hi, >>> a lot of people think that C (or C++) is faster than python, yes I >>> agree, but I think that's not the case with numpy, I believe numpy is >>> faster than C, at least in some cases. >>> >>> >>> *Is there another explanation ?Or where can find a doc speaking about >>> the subject?*Thanks a lot >>> Regards >>> Numpy implements vectorization for arrays, or I'm wrong. Anyway here is >>> an example Let's look at the following case: >>> Here is the result on my laptop i3: >>> >>> Labs$ *python3 tempsExe.py 50000* >>> sum with Python: 1250025000 and NumPy 1250025000 >>> time used Python Sum: * 37.28 sec * >>> time used Numpy Sum: *1.85 sec* >>> >>> Labs$ *./tt 50000 * >>> >>> >>> * CPU time :7.521730 The value : 1250025000 * >>> -------------------------------------------- >>> >>> This is the Python3 program : >>> >>> import timeit as it >>> import numpy as np >>> import sys >>> try : >>> n=eval(sys.argv[1]) >>> except: >>> print ("needs integer as argument") ; exit() >>> >>> a=range(1,n+1) >>> b=np.array(a) >>> def func1(): return sum(a) >>> def func2(): return np.sum(b) >>> >>> print(f"sum with Python: {func1()} and NumPy {func2()} ") >>> tm1=it.timeit(stmt=func1, number=n) >>> print(f"time used Python Sum: {round(tm1,2)} sec") >>> tm2=it.timeit(stmt=func2, number=n) >>> print(f"time used Numpy Sum: {round(tm2,2)} sec") >>> >>> and Here the C program: >>> #include <time.h> >>> #include <stdio.h> >>> #include <stdlib.h> >>> long func1(int n){ >>> long r=0; >>> for (int i=1; i<= n;i++) r+= i; >>> return r; >>> } >>> int main(int argc, char* argv[]){ >>> clock_t c0, c1; >>> long v,count; int n; >>> if ( argc < 2) { >>> printf("Please give an argument"); >>> return -1; >>> } >>> n=atoi(argv[1]); >>> c0 = clock(); >>> *for (int j=0;j < n;j++) v=func1(n);* >>> c1 = clock(); >>> printf ("\tCPU time :%.2f sec", (float)(c1 - c0)/CLOCKS_PER_SEC); >>> printf("\n\tThe value : %ld\n", v); >>> } >>> _______________________________________________ >>> python-uk mailing list >>> python-uk@python.org >>> https://mail.python.org/mailman/listinfo/python-uk >>> >> >> >> -- >> Life is a chess game - Anonymous. >> _______________________________________________ >> python-uk mailing list >> python-uk@python.org >> https://mail.python.org/mailman/listinfo/python-uk >> > _______________________________________________ > python-uk mailing list > python-uk@python.org > https://mail.python.org/mailman/listinfo/python-uk >
_______________________________________________ python-uk mailing list python-uk@python.org https://mail.python.org/mailman/listinfo/python-uk