Hi there, This is a status report and a "what to do next" summary.
The status is easily summarized as follows: the PyPy interpreter is quite complete and highly compliant with CPython, minus a number of dark corners that are only likely to bite user programs using the most introspective features of the standard library (most notably pickling). The flow graph and annotation subsystems work quite nicely too; we can successfully annotate more or less all of PyPy, or at least we are getting close to this goal. The "next" step is low-level code generation. Here, we have a rather large number of prototypes all around. The most complete one is genc.py, which is able to produce code for roughly the complete PyPy interpreter, but without using the annotations -- i.e. it is very slow code, and it is essentially dependent on CPython. Approaching the same goal but from the opposite direction, we have genllvm.py, which is only able to translate fully annotated graphs and doesn't have any mean of fall-back for things like faked objects which cannot be annotated. Finally there are other very incomplete or deprecated or planning-only back-ends: gencl, genpyrex, genjava. Not to mention geninterplevel whose goal is still different. The question is which line of work to focus on right now. All of these back-ends are interesting and worthwhile in the long run but we need to select a first one. There are basically 4 reasonable options: * Enhance genc.py. This is a step-by-step process, and any intermediate version can still be tested against the whole PyPy and against the snippet examples. Another advantage of genc is that it is the only option that doesn't depend on any external tool other than a C compiler. * Enhance genc.py using the C++ facility of function overloading for simplicity (basically, we would generate "z=add(x,y)" in the file and let the C++ compiler decide which version of add() to call based on the declared types of x and y). This might well be the easiest solution. A minor drawback is to require a C++ compiler. A possibly larger drawback is that the C++ compilation time might be quite larger, even for similar-looking code. (Having to know C++ in the first place shouldn't be that big a drawback if we don't use fancy C++ features.) * Go for genllvm.py. An obvious drawback is that we'd all have to install LLVM. The problem with genllvm right now is that it cannot make sense of unannotated code (or code containing the SomeObject annotation). We don't know yet for sure the quantity of such SomeObjects in the annotated PyPy source code, but a guess is that they occur mainly for "fake" stuff (file, long, unicode...). If so, there is one way around this problem. Carl pointed out that it *might* be easy to link the LLVM compiler output with CPython, possibly making a C extension module for CPython. If so, then we would add in genllvm support for "black box" PyObject pointers, and use a few functions from the CPython C-API to manipulate them. The goal here would be to modify the source code of the interpreter/module/objspace to reduce the number of operations that need to be performed on these "black boxes". For example, we could possibly reduce all these operations to method calls. In other words, we would say that using CPython objects like "long" at interp-level is temporarily OK but only if they are manipulated via method calls. This would make the genllvm support for them much easier. * genjava.py could be another option. It has a simpler type system, which matches ours quite well, but genjava doesn't exist yet at all (the one in the java/ subdirectory had a different goal in mind). We get memory management for free. If we add the requirement to compile with GCJ it could be easy to make a CPython extension module too, with the same problems and solutions about SomeObject as above. All in all, each 4 option is equally possible. If I have to pick one, I guess the first 2 options pass the additional criteria of "very good confidence that it will work within a couple of months". This is definitely biased by the fact that I'm not fluent in Java and know very little about LLVM, but also by my lack of knowledge about the ease of installation and integration of the corresponding tools. Then to pick one of the first two options, the second one (allowing some C++ facilities to sneak it) is my favourite. Comments and feed-back are welcome! Armin _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
