Re: [Python-Dev] Speeding up CPython 5-10%

Yury Selivanov Wed, 27 Jan 2016 12:28:39 -0800


On 2016-01-27 3:01 PM, Brett Cannon wrote:



[..]


    We can also optimize LOAD_METHOD.  There are high chances, that
    'obj' in
    'obj.method()' will be of the same type every time we execute the code
    object.  So if we'd have an opcodes cache, LOAD_METHOD could then
    cache
    a pointer to the resolved unbound method, a pointer to obj.__class__,
    and tp_version_tag of obj.__class__.  Then it would only need to check
    if the cached object type is the same (and that it wasn't
    modified) and
    that obj.__dict__ doesn't override 'method'.  Long story short, this
    caching really speeds up method calls on types implemented in C.
    list.append becomes very fast, because list doesn't have a
    __dict__, so
    the check is very cheap (with cache).


What would it take to make this work with Python-defined classes?

It already works for Python-defined classes. But it's a bit moreexpensive because you still have to check object's __dict__. Still,there is a very noticeable performance increase (see the results ofbenchmark runs).

I guess that would require knowing the version of the instance's__dict__, the instance's __class__ version, the MRO, and where themethod object was found in the MRO and any intermediary classes toknow if it was suddenly shadowed? I think that's everything. :)

No, unfortunately we can't use the version of the instance's __dict__ asit is very volatile. The current implementation of opcode cache worksbecause types are much more stable. Remember, the cache is per *codeobject*, so it should work for all times when code object is executed.


class F:
  def spam(self):
    self.ham()   # <- version of self.__dict__ is unstable
                 #    so we'll endup invalidating the cache
                 #    too often

__class__ version, MRO changes etc are covered by tp_version_tag, whichI use as one of guards.

Obviously that's a lot, but I wonder how many classes have a deepinheritance model vs. inheriting only from `object`? In that case youonly have to check self.__dict__.ma_version, self.__class__,self.__class__.__dict__.ma_version, and self.__class__.__class__ ==`type`. I guess another way to look at this is to get an idea of howcomplex do the checks have to get before caching something like thisis not worth it (probably also depends on how often you mutateself.__dict__ thanks to mutating attributes, but you could in thatinstance just decide to always look at self.__dict__ for the method'skey and then do the ma_version cache check for everything coming fromthe class).
Otherwise we can consider looking at the the caching strategies thatSelf helped pioneer (http://bibliography.selflanguage.org/) that allof the various JS engines lifted and consider caching all method lookups.

Yeah, hidden classes are great. But the infrastructure to support themproperly is huge. I think that to make them work you'll need a JIT --to trace, deoptimize, optimize, and do it all with a reasonable memoryfootprint. My patch is much smaller and simpler, something we canrealistically tune and ship in 3.6.


    A straightforward way to implement such a cache is simple, but
    consumes
    a lot of memory, that would be just wasted, since we only need such a
    cache for LOAD_GLOBAL and LOAD_METHOD opcodes. So we have to be
    creative
    about the cache design.  Here's what I came up with:

    1. We add a few fields to the code object.

    2. ceval will count how many times each code object is executed.

    3. When the code object is executed over ~900 times, we mark it as
    "hot".

What happens if you simply consider all code as hot? Is the overheadof building the mapping such that you really need this, or is thissimply to avoid some memory/startup cost?

That's the first step for this patch. I think we need to profileseveral big applications (I'll do it later for some of my code bases)and see how big is the memory impact if we optimize everything.

In any case, I expect it to be noticeable (which may be acceptable), sowe'll probably try to optimize it.

      We also create an 'unsigned char' array "MAPPING", with length
    set to match the length of the code object.  So we have a 1-to-1
    mapping
    between opcodes and MAPPING array.

    4. Next ~100 calls, while the code object is "hot", LOAD_GLOBAL and
    LOAD_METHOD do "MAPPING[opcode_offset()]++".

    5. After 1024 calls to the code object, ceval loop will iterate
    through
    the MAPPING, counting all opcodes that were executed more than 50
    times.

Where did the "50 times" boundary come from? Was this measured somehowor did you just guess at a number?

If the number is too low, then you'll optimize code in branches that arerarely executed. So I picked 50, because I only trace opcodes for 100calls.

All of those numbers can be (should be?) changed, and I think we shouldexperiment with different heuristics.

[..]


    If you're interested in these kind of optimizations, please help with
    code reviews, ideas, profiling and benchmarks.  The latter is
    especially
    important, I'd never imagine how hard it is to come up with a good
    macro
    benchmark.


Have you tried hg.python.org/benchmarks <http://hg.python.org/benchmarks>?


Yes: https://gist.github.com/1st1/aed69d63a2ff4de4c7be

Or are you looking for new benchmarks? If the latter then we shouldprobably strike up a discussion on speed@ and start considering a new,unified benchmark suite that CPython, PyPy, Pyston, Jython, andIronPython can all agree on.

Yes, IMHO we need better benchmarks. Some of the existing ones are veryunstable -- I can run them three times and get three completelydifferent results. Benchmarking is hard :)

I'll create a few issues on bugs.python.org with new/updated benchmarks,and will join the speed@ mailing list.


Yury
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Speeding up CPython 5-10%

Reply via email to