On 02/28/2013 03:25 PM, kramer65 wrote:
Hello,

I'm using Python for a while now and I love it. There is just one thing I 
cannot understand. There are compilers for languages like C and C++. why is it 
impossible to create a compiler that can compile Python code to machinecode?

My reasoning is as follows:
When GCC compiles a program written in C++, it simply takes that code and 
decides what instructions that would mean for the computer's hardware. What 
does the CPU need to do, what does the memory need to remember, etc. etc. If 
you can create this machinecode from C++, then I would suspect that it should 
also be possible to do this (without a C-step in between) for programs written 
in Python.

Where is my reasoning wrong here? Is that because Python is dynamically typed? 
Does machinecode always need to know whether a variable is an int or a float? 
And if so, can't you build a compiler which creates machinecode that can handle 
both ints and floats in case of doubt? Or is it actually possible to do, but so 
much work that nobody does it?

I googled around, and I *think* it is because of the dynamic typing, but I 
really don't understand why this would be an issue..

Any insights on this would be highly appreciated!


Sure, python could be compiled into machine code. But what machine? Do you refer to the hardware inside one of the Pentium chips? Sorry, but Intel doesn't expose those instructions to the public. Instead, they wrote a microcode interpreter, and embedded it inside their processor, and the "machine languages" that are documented as the Pentium Instruction sets are what that interpreter handles. Good thing too, as the microcode machine language has changed radically over time, and I'd guess there have been at least a dozen major variants, and a hundred different sets of details.

So if we agree to ignore that interpreter, and consider the externally exposed machine language, we can pick a subset of the various such instruction sets, and make that our target.

Can Python be compiled directly into that instruction set? Sure, it could. But would it be practical to write a compiler that went directly to it, or is it simpler to target C, and use gcc?

Let's look at gcc. When you run it, does it look like it compiles C directly to machine language? Nope. It has 3 phases (last I looked, which was admittedly over 20 years ago). The final phase translates an internal form of program description into a particular "machine language". Even the mighty gcc doesn't do it in one step. Guess what, that means other languages can use the same back end, and a given language can use different back ends for different target machine languages. (Incidentally, Microsoft C compiler does the exact same thing, and a few of my patents involve injecting code between front end and back end)

So now we have three choices. We could target the C language, and use all of gcc, or we could target the intermediate language, and use only the backend of gcc. Unfortunately, that intermediate language isn't portable between compilers, so you'd either have to write totally separate python compilers for each back end, or skip that approach, or abandon total portability.

Well, we could write a Python compiler that targets an "abstract intermediate language," which in turn gets translated into each of the supported compiler's intermediate language. But that gets remarkably close to just targeting C in the first place.

So how hard would it be just to directly target one machine language? Not too bad if you didn't try to do any optimizations, or adapt to the different quirks and missing features of the different implementations of that machine language. But I expect what you got would be neither smaller nor noticeably faster than the present system. Writing simple optimizations that improve some things is easy. Writing great optimizers that are also reliable and correct is incredibly hard. I'd expect that gcc has hundreds of man years of effort in it.

Now, no matter which of these approaches you would take, there are some issues. The tricky part is not being flexible between int and float (and long, which is not part of the Intel machine instruction set), but between an unlimited set of possible meanings for each operation. Just picking on a+b, each class type that a and b might be can provide their own __add__ and/or __radd__ methods. All those have to be searched for, one has to be picked, and the code has to branch there. And that decision, in general, has to be made at runtime, not by the compiler.

So by default, the code ends up being a twisted set of 4way indirections, calls to dict lookups, and finally calling a function that actually does an instruction or two of real work. Guess what, an interpreter can store those details much more succinctly (code size), and can run those choices nearly as quickly. So we're back to CPython.

Could it be improved? Sure, that's why there are multiple projects which try to improve performance of the reference implementation. But each project seems to get to the point where the early promise of dozen-fold improvement dwindles down to a few times as fast, and not for everything. There are lots of things that can be improved with static analysis (so we're sure of the types of certain things), restricted language (so the developer gives us extra clues). But that work is nothing compared to what it would take to re-implement the equivalent of the back ends of gcc.

Java works roughly the same way as Python, compiling to byte code files, then interpreting them. The interpreter is given the fancy name "virtual machine" because it really is an instruction set, one that could have been interpreted by Intel in their internal microcode. But they have their own history to stay compatible with. Look at the Merced and how it's taken the world by storm (NOT).

But Java is much stricter about its byte code files, so each function is much closer to machine level. Nearly all those Python indirections are eliminated by the compiler (because it's not as dynamic a language), and they do JIT compiling. The latter is why they're quick.



--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to