Hi, I guess it would be a good idea to quickly outline my inline caching approach, so that we all have a basic understanding of how it works. If we take for instance the BINARY_ADD instruction, the interpreter evaluates the actual operand types and chooses the matching operation implementation at runtime, i.e., operands that are unicode strings will be concatenated via unicode_concatenate, for float operands on the other hand, the interpreter would end up invoking float_add via binary_op1. Now, a very efficient way to achieve purely interpretative inline caching is to quicken the type-generic BINARY_ADD instruction to a type-dependent FLOAT_ADD instruction (this technique, i.e., inline caching via quickening, is the primary contribution of my ECOOP paper). Hence, I have a very simple code generator, that generates type-dependent interpreter instructions in a pre-compile step of the interpreter, and uses runtime type information to quicken/rewrite instructions. Aside of the operators, I have implemented this quickening technique for FOR_ITER, COMPARE_OP and CALL_FUNCTION instructions.
> I'm absolutely interested, although not for the CPython project but for > Cython. I wonder how you do inline caching in Python if the methods of a > type can be replaced by whatever at runtime. Could you elaborate on that? > Currently, I only provide optimized derivatives for several separate call targets, i.e., whether a call target is a C function with varargs, or a Python function/method--this already eliminates a lot of overhead from invoking call_function. Based on further quantitative analysis, I can easily provide inline cached derivatives of frequently called functions, such as some builtin primitives. > Based on what information do you switch between inlining states? > I have instrumented versions of some functions that allow me to make quickening decisions, such as binary_op1, do_richcompare, or call_function, where I can quicken instructions to an optimized, inline cached, instruction derivative. > Or do you restrict yourself to builtin types? > Currently, my approach provides optimized derivative instructions for the standard library, e.g., unicode strings, numerical objects, containers, and iterators. > That might be worth it > already, just think of list.append(). We have an optimistic optimisation for > object.append() in Cython that gives us massive speed-ups in loops that > build lists, even if we don't know at compile time that we are dealing with > lists. > Yes, that sounds like a reasonable thing to do. I could provide much more optimized derivatives based on application profiles, too. Since I use a simple code generator for generating the derivatives, it would also be possible to provide end-users with the means to analyze their apps and generate optimized instruction derivatives matching their profile. Regards, --stefan _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com