Author: Maciej Fijalkowski <[email protected]> Branch: extradoc Changeset: r5547:a979df8a839e Date: 2015-09-08 17:37 +0200 http://bitbucket.org/pypy/extradoc/changeset/a979df8a839e/
Log: draft a blog post diff --git a/blog/draft/warmup-improvements.rst b/blog/draft/warmup-improvements.rst new file mode 100644 --- /dev/null +++ b/blog/draft/warmup-improvements.rst @@ -0,0 +1,51 @@ +Hello everyone! + +I'm very pleased to announce that we've just managed to merge optresult branch. +Under this cryptic name is the biggest JIT refactoring we've done in a couple +years, mostly focused on the warmup time and memory impact of PyPy. + +To understand why we did that, let's look back in time - back when we +got the first working JIT prototype in 2009 we were focused exclusively +on the peak performance with some consideration towards memory usage, but +without serious consideration towards warmup time. This means we accumulated +quite a bit of technical debt over time that we're trying, with difficulty, +to address right now. + +The branch does "one" thing - it changes the underlaying model of how operations +are represented during the tracing and optimizations. Let's consider a simple +loop like that:: + + [i0, i1] + i2 = int_add(i0, i1) + i3 = int_add(i2, 1) + i4 = int_is_true(i3) + guard_true(i4) + jump(i3, i2) + +The original representation would allocate a ``Box`` for each of ``i0`` - ``i4`` +and then store those boxes in instances of ``ResOperation``. The list of such +operations would then go to the optimizer. Those lists are big - we usually +remove ``90%`` of them during optimizations, but they can be couple thousand +elements. Overall allocating those big lists takes a toll on warmup time, +especially due to the GC pressure. The branch removes the existance of ``Box`` +completely, instead using link to ``ResOperation`` itself. So say in the above +example, ``i2`` would refer to its producer - ``i2 = int_add(i0, i1)`` with +arguments getting special treatment. + +That alone reduces the GC pressure slightly, but we went an extra mile +to change a bunch of data structures in the optimizer itself. Overall +we measured about 50% speed improvement in the optimizer, which reduces +the overall warmup time between 10% and 30%. The very +`obvious warmup benchmark`_ got a speedup from 4.5s to 3.5s so almost +30% improvement. Obviously the speedups on benchmarks would vastly +depend on how much warmup time is there in those benchmarks. We observed +annotation of pypy to decrease by about 30% and the overall translation +time by about 7%, so your mileage may vary. In fact in most cases there +should not be a visible difference if you're already achieving peak performance, +however wherever warmup is a problem there should be a modest speedup. + +.. _`obvious warmup benchmark`: https://bitbucket.org/pypy/benchmarks/src/fe2e89c0ae6846e3a8d4142106a4857e95f17da7/warmup/function_call2.py?at=default + +Cheers! +fijal & arigo + diff --git a/talk/ep2015/performance/Makefile b/talk/ep2015/performance/Makefile --- a/talk/ep2015/performance/Makefile +++ b/talk/ep2015/performance/Makefile @@ -5,7 +5,7 @@ # https://sourceforge.net/tracker/?func=detail&atid=422032&aid=1459707&group_id=38414 talk.pdf: talk.rst author.latex stylesheet.latex - python ../../bin/rst2beamer.py --stylesheet=stylesheet.latex --documentoptions=14pt talk.rst talk.latex || exit + python ../../bin/rst2beamer.py --stylesheet=stylesheet.latex --documentoptions=14pt --input-encoding=utf8 --output-encoding=utf8 talk.rst talk.latex || exit #/home/antocuni/.virtualenvs/rst2beamer/bin/python `which rst2beamer.py` --stylesheet=stylesheet.latex --documentoptions=14pt talk.rst talk.latex || exit sed 's/\\date{}/\\input{author.latex}/' -i talk.latex || exit #sed 's/\\maketitle/\\input{title.latex}/' -i talk.latex || exit diff --git a/talk/ep2015/performance/author.latex b/talk/ep2015/performance/author.latex --- a/talk/ep2015/performance/author.latex +++ b/talk/ep2015/performance/author.latex @@ -2,7 +2,7 @@ \title[Python and PyPy performance]{Python and PyPy performance\\(not) for dummies} \author[antocuni,fijal] -{Antonio Cuni and Maciej Fijalkowski} +{Antonio Cuni and Maciej Fijałkowski} \institute{EuroPython 2015} \date{July 21, 2015} diff --git a/talk/ep2015/performance/talk.pdf b/talk/ep2015/performance/talk.pdf index 874cebbd96fb24e3f93148dfd82afb0985fc7145..b17e14c7d49dc0a2c47900b50419cb90e1a31aae GIT binary patch [cut] diff --git a/talk/ep2015/performance/talk.rst b/talk/ep2015/performance/talk.rst --- a/talk/ep2015/performance/talk.rst +++ b/talk/ep2015/performance/talk.rst @@ -16,16 +16,6 @@ - http://baroquesoftware.com/ -About you -------------- - -- You are proficient in Python - -- Your Python program is slow - -- You want to make it fast(er) - - Optimization for dummies ------------------------- @@ -55,43 +45,37 @@ 2. How to address the problems +Part 1 +------ -Part 1 -------- - -* profiling - -* tools - +* identifying the slow spots What is performance? -------------------- -* you need something quantifiable by numbers +* something quantifiable by numbers * usually, time spent doing task X -* sometimes number of requests, latency, etc. +* number of requests, latency, etc. -* some statistical properties about that metric (average, minimum, maximum) +* statistical properties about that metric Do you have a performance problem? ---------------------------------- -* define what you're trying to measure +* what you're trying to measure -* measure it (production, benchmarks, etc.) +* means to measure it (production, benchmarks, etc.) -* see if Python is the cause here (if it's not, we can't help you, - but I'm sure someone can) +* is Python is the cause here? -* make sure you can change and test stuff quickly (e.g. benchmarks are better - than changing stuff in production) +* environment to quickly measure and check the results -* same as for debugging + - same as for debugging -We have a python problem ------------------------- +When Python is the problem +-------------------------- * tools, timers etc. @@ -106,7 +90,7 @@ * cProfile, runSnakeRun (high overhead) - event based profiler -* plop, vmprof - statistical profiler +* plop, vmprof - statistical profilers * cProfile & vmprof work on pypy @@ -121,8 +105,8 @@ * CPython, PyPy, possibly more virtual machines -why not just use gperftools? ----------------------------- +why not gperftools? +-------------------- * C stack does not contain python-level frames @@ -349,10 +333,19 @@ * avoid creating classes at runtime +Example +------- + +* ``map(operator.attrgetter('x'), list)`` + +vs + +* ``[x.x for x in list]`` + More about PyPy --------------- -* we are going to run a PyPy open space +* we are going to run a PyPy open space (tomorrow 18:00 @ A4) * come ask more questions _______________________________________________ pypy-commit mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-commit
