On Thu, 28 Feb 2013 12:25:07 -0800, kramer65 wrote: > Hello, > > I'm using Python for a while now and I love it. There is just one thing > I cannot understand. There are compilers for languages like C and C++. > why is it impossible to create a compiler that can compile Python code > to machinecode?
Your assumption is incorrect. You can compile Python to machine-code, at least sometimes. It is quite tricky, for various reasons, but it can be done, at various levels of efficiency. One of the oldest such projects was Psyco, which was a Just-In-Time compiler for Python. When Psyco was running, it would detect at run time that you were doing calculations on (say) standard ints, compile on the fly a machine-code function to perform those calculations, and execute it. Psyco has more or less been made obsolete by PyPy, which does the same thing only even more so. http://en.wikipedia.org/wiki/Psyco http://en.wikipedia.org/wiki/PyPy > My reasoning is as follows: > When GCC compiles a program written in C++, it simply takes that code > and decides what instructions that would mean for the computer's > hardware. What does the CPU need to do, what does the memory need to > remember, etc. etc. If you can create this machinecode from C++, then I > would suspect that it should also be possible to do this (without a > C-step in between) for programs written in Python. In principle, yes, but in practice it's quite hard, simply because Python does so much more at runtime than C++ (in general). Take an expression like: x = a + b In C++, the compiler knows what kind of data a and b are, what kind of data x is supposed to be. They are often low-level machine types like int32 or similar, which the CPU can add directly (or at least, the compiler can fake it). Even if the variables are high-level objects, the compiler can usually make many safe assumptions about what methods will be called, and can compile instructions something like this pseudo-code: 10 get the int64 at location 12348 # "a" 20 get the int64 at location 13872 # "b" 30 jump to the function at location 93788 # add two int64s 40 store the result at location 59332 # "x" which is fast and efficient because most of the hard work is done at compile time. But it's also quite restrictive, because you can't change code on the fly, create new types or functions, etc. (Or, where you can, then you lose some of the advantages of C++ and end up with something like Python but with worse syntax.) In Python, you don't know what a and b are until runtime. They could be ints, or lists, or strings, or anything. The + operator could call a custom __add__ method, or a __radd__ method, from some arbitrary class. Because nearly everything is dynamic, the Python compiler cannot safely make many assumptions about the code at compile time. So you end up with code like this: 10 search for the name "a" and take note of it 20 search for the name "b" and take note of it 30 decide whether to call a.__add__ or b.__radd__ 40 call the appropriate method 60 bind the result to the name "x" You can get an idea of what Python actually does by disassembling the byte code into pseudo-assembly language: py> code = compile("x = a + b", '', 'single') py> from dis import dis py> dis(code) 1 0 LOAD_NAME 0 (a) 3 LOAD_NAME 1 (b) 6 BINARY_ADD 7 STORE_NAME 2 (x) 10 LOAD_CONST 0 (None) 13 RETURN_VALUE Nevertheless, PyPy can often speed up Python code significantly, sometimes to the speed of C or even faster. http://morepypy.blogspot.com.au/2011/02/pypy-faster-than-c-on-carefully-crafted.html http://morepypy.blogspot.com.au/2011/08/pypy-is-faster-than-c-again-string.html -- Steven -- http://mail.python.org/mailman/listinfo/python-list