Author: Konstantin Lopuhin <[email protected]> Branch: extradoc Changeset: r5460:9691e27e3379 Date: 2014-11-13 22:32 +0300 http://bitbucket.org/pypy/extradoc/changeset/9691e27e3379/
Log: wip diff --git a/blog/draft/tornado-stm.rst b/blog/draft/tornado-stm.rst --- a/blog/draft/tornado-stm.rst +++ b/blog/draft/tornado-stm.rst @@ -13,7 +13,7 @@ Here we will see how to slightly modify Tornado IO loop to use `transaction <https://bitbucket.org/pypy/pypy/raw/stmgc-c7/lib_pypy/transaction.py>`_ module. -This module is `descibed <http://pypy.readthedocs.org/en/latest/stm.html#atomic-sections-transactions-etc-a-better-way-to-write-parallel-programs>`_ +This module is `described <http://pypy.readthedocs.org/en/latest/stm.html#atomic-sections-transactions-etc-a-better-way-to-write-parallel-programs>`_ in the docs and is really simple to use - please see an example there. An event loop of Tornado, or any other asynchronous web server, looks like this (with some simplifications):: @@ -54,6 +54,9 @@ `here <https://github.com/lopuhin/tornado/commit/246c5e71ce8792b20c56049cf2e3eff192a01b20>`_, - we had to extract a little function to run the callback. +Part 1: a simple benchmark: primes +---------------------------------- + Now we need a simple benchmark, lets start with `this <https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/primes.py?at=default>`_ - just calculate a list of primes up to the given number, and return it @@ -69,7 +72,7 @@ def get(self, num): num = int(num) primes = [n for n in xrange(2, num + 1) if is_prime(n)] - self.write(json.dumps({'primes': primes})) + self.write({'primes': primes}) We can benchmark it with ``siege``:: @@ -103,7 +106,7 @@ For now we can hack around this by disabling this timing - this is only needed for internal profiling in tornado. -If we do it, we get the following results: +If we do it, we get the following results (but see caveats below): ============ ========= Impl. req/s @@ -121,20 +124,109 @@ PyPy STM 4 24.2 ============ ========= +.. image:: results-1.png + As we can see, in this benchmark PyPy STM using just two cores can beat regular PyPy! This is not linear scaling, there are still conflicts left, and this -is a very simple example but still, it works! And it was easy! +is a very simple example but still, it works! + +But its not that simple yet :) + +First, these are best case numbers after long (much longer than for regular +PyPy) warmup. Second, it can sometimes crash (although removing old pyc files +fixes it). Third, benchmark meta-parameters are also tuned. + +Here we get reletively good results only when there are a lot of concurrent +clients - as a results, a lot of requests pile up, the server is not keeping +with the load, and transaction module is busy with work running this piled up +requests. If we decrease the number of concurrent clients, results get worse. +Another thing we can tune is how heavy is each request - again, if we ask +primes up to a slower number, than less time is spent doing calculations, +more time is spent in conflicts, and results get worse. + +Besides the ``time.time()`` conflict described above, there are a lot of others. +The bulk of time is lost in this conflicts:: + + 14.153s lost in aborts, 0.000s paused (270x STM_CONTENTION_INEVITABLE) + File "/home/ubuntu/tornado-stm/tornado/tornado/web.py", line 1082, in compute_etag + hasher = hashlib.sha1() + File "/home/ubuntu/tornado-stm/tornado/tornado/web.py", line 1082, in compute_etag + hasher = hashlib.sha1() + + 13.484s lost in aborts, 0.000s paused (130x STM_CONTENTION_WRITE_READ) + File "/home/ubuntu/pypy/lib_pypy/transaction.py", line 164, in _run_thread + got_exception) + +The first one is presumably calling into some C function from stdlib, and we get +the same conflict as for ``time.time()`` above, but is can be fixed on PyPy +side, as we can be sure that computing sha1 is pure. + +It is easy to hack around this one too, just removing etag support, but if +we do it, performance is much worse, only slightly faster than regular PyPy, +with the top conflict being:: + + 83.066s lost in aborts, 0.000s paused (459x STM_CONTENTION_WRITE_WRITE) + File "/home/arigo/hg/pypy/stmgc-c7/lib-python/2.7/_weakrefset.py", line 70, in __contains__ + File "/home/arigo/hg/pypy/stmgc-c7/lib-python/2.7/_weakrefset.py", line 70, in __contains__ + +**FIXME** why does it happen? + +The second conflict (without etag tweaks) originates +in the transaction module, from this piece of code:: + + while True: + self._do_it(self._grab_next_thing_to_do(tloc_pending), + got_exception) + counter[0] += 1 + +**FIXME** why does it happen? + +Tornado modification used in this blog post is based on 3.2.dev2. +As of now, the latest version is 4.0.2, and if we +`apply <https://github.com/lopuhin/tornado/commit/04cd7407f8690fd1dc55b686eb78e3795f4363e6>`_ +the same changes to this version, then we no longer get any scaling on this benchmark, +and there are no conflicts that take any substantial time. + +**FIXME** - maybe this is just me messing something up + + +Part 2: a more interesting benchmark: A-star +-------------------------------------------- + +Although we have seen that PyPy STM is not all moonlight and roses, +it is interesting to see how it works on a more realistic application. + +`astar.py <https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/astar.py>`_ +is a simple game where several players move on a map +(represented as a list of lists of integers), +build and destroy walls, and ask server to give them +shortest paths between two points +using A-star search, adopted from `ActiveState recipie <http://code.activestate.com/recipes/577519-a-star-shortest-path-algorithm/>`_. + +The benchmark `bench_astar.py <https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/bench_astar.py>`_ +is simulating players, and tries to put the main load on A-star search, +but also does some wall building and desctruction. There are no locks +around map modifications, as normal tornado is executing all callbacks +serially, and we can keep this guaranty with atomic blocks of PyPy STM. +This is also an example of a program that is not trivial +to scale to multiple cores with separate processes (assuming +more interesting shared state and logic). + +**TODO** - results Although it is definitely not ready for production use, you can already try to run things, report bugs, and see what is missing in user-facing tools and libraries. -Benchmark setup: + +Benchmarks setup: * Amazon c3.xlarge (4 cores) running Ubuntu 14.04 -* pypy-c-r74011-stm-jit +* pypy-c-r74011-stm-jit for the primes benchmark (but it has more bugs + then more recent versions), and + `pypy-c-r74378-74379-stm-jit <http://cobra.cs.uni-duesseldorf.de/~buildmaster/misc/pypy-c-r74378-74379-stm-jit.xz>`_ + for all other stuff * http://bitbucket.org/kostialopuhin/tornado-stm-bench at a038bf9 * for PyPy-STM in this test the variation is higher, - best results after warmup are given - + best results after long warmup are given _______________________________________________ pypy-commit mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-commit
