On 2017-06-30 07:17 PM, Victor Stinner wrote:
2017-06-30 17:09 GMT+02:00 Soni L. <fakedme...@gmail.com>:
CPython should get a tracing JIT that turns slow bytecode into fast
bytecode.

A JIT doesn't have to produce machine code. bytecode-to-bytecode compilation
is still compilation. bytecode-to-bytecode compilation works on iOS, and
doesn't require deviating from C.
Optimizations require to make assumptions on the code, and deoptimize
if an assumption becomes wrong. I call these things "guards". If I
understood correctly, PyPy is able to deoptimize a function in the
middle of the function, while executing it. In my FAT Python project,
I tried something simpler: add guards at the function entry point, and
decide at the entry which version of the code should be run (FAT
Python allows to have more than 2 versions of the code for the same
function).

I described my implementation in the PEP 510:
https://www.python.org/dev/peps/pep-0510/

I agree that you *can* emit more efficient bytecode using assumptions.
But I'm not sure that the best speedup will be worth it. For example,
if your maximum speedup is 20% but the JIT compiler increases the
startup time and uses more memory, I'm not sure that users will use
it. The design will restrict indirectly the maximum speed.

At the bytecode level, you cannot specialize bytecode for 1+2 (x+y
with x=1 and y=2) for example. The BINARY_ADD instruction calls
PyNumber_Add(), but a previous experience showed that the dispatch
inside PyNumber_Add() to reach long_add() is expensive.

If you can assert that the sum(s) never overflow an int, you can avoid hitting long_add() entirely, and avoid all the checks around it. IADD would be IADD as opposed to NADD because it would add two ints specifically, not two numbers. And it would do no overflow checks because the JIT already told it no overflow can happen.


I'm trying to find a solution to not make CPython 20% faster, but 2x
faster. See my talk at the recent Python Language Summit (at Pycon
US):
https://github.com/haypo/conf/raw/master/2017-PyconUS/summit.pdf
https://lwn.net/Articles/723949/

My mid-term/long-term plan for FAT Python is to support multiple
optimizers, and allow developers to choose between bytecode ("Python"
code) and machine code ("C" code). For example, work on an optimizer
reusing Cython rather than writing a new compiler from scratch. My
current optimizer works at the AST level and emits more efficient
bytecode by rewriting the AST.

But another major design choice in FAT Python is to run the optimizer
ahead-of-time (AoT), rather than just-in-time (JIT). Maybe it will not
work. We will see :-)

I suggest you to take a look at my notes to make CPython faster:
http://faster-cpython.readthedocs.io/

FAT Python homepage:
http://faster-cpython.readthedocs.io/fat_python.html

--

You may also be interested by my Pycon US talk about CPython
optimization in 3.5, 3.6 and 3.7:
https://lwn.net/Articles/725114/

Victor

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to