Author: Armin Rigo <[email protected]>
Branch: 
Changeset: r70396:0d281ac894e2
Date: 2014-04-02 18:13 +0200
http://bitbucket.org/pypy/pypy/changeset/0d281ac894e2/

Log:    Add a document about STM.

diff --git a/pypy/doc/stm.rst b/pypy/doc/stm.rst
new file mode 100644
--- /dev/null
+++ b/pypy/doc/stm.rst
@@ -0,0 +1,236 @@
+======================
+Transactional Memory
+======================
+
+.. contents::
+
+
+This page is about ``pypy-stm``, a special in-development version of
+PyPy which can run multiple independent CPU-hungry threads in the same
+process in parallel.  It is side-stepping what is known in the Python
+world as the "global interpreter lock (GIL)" problem.
+
+"STM" stands for Software Transactional Memory, the technique used
+internally.  This page describes ``pypy-stm`` from the perspective of a
+user, describes work in progress, and finally gives references to more
+implementation details.
+
+This work was done by Remi Meier and Armin Rigo.
+
+
+Introduction and current status
+===============================
+
+``pypy-stm`` is a variant of the regular PyPy interpreter.  With caveats
+listed below, it should be in theory within 25%-50% of the speed of
+PyPy, comparing the JITting version in both cases.  It is called STM for
+Software Transactional Memory, which is the internal technique used (see
+`Reference to implementation details`_).
+
+**pypy-stm requires 64-bit Linux for now.**
+
+Development is done in the branch `stmgc-c7`_.  If you are only
+interested in trying it out, you can download a Ubuntu 12.04 binary
+here__.  The current version supports four "segments", which means that
+it will run up to four threads in parallel (in other words, you get a
+GIL effect again, but only if trying to execute more than 4 threads).
+
+To build a version from sources, you first need to compile a custom
+version of clang; we recommend downloading `llvm and clang like
+described here`__, but at revision 201645 (use ``svn co -r 201645 ...``
+for all checkouts).  Then apply all the patches in `this directory`__:
+they are fixes for the very extensive usage that pypy-stm does of a
+clang-only feature (without them, you get crashes of clang).  Then get
+the branch `stmgc-c7`_ of PyPy and run::
+
+   rpython/bin/rpython -Ojit --stm pypy/goal/targetpypystandalone.py
+
+.. _`stmgc-c7`: https://bitbucket.org/pypy/pypy/src/stmgc-c7/
+.. __: http://buildbot.pypy.org/nightly/stmgc-c7/
+.. __: http://clang.llvm.org/get_started.html
+.. __: https://bitbucket.org/pypy/stmgc/src/default/c7/llvmfix/
+
+
+Caveats:
+
+* It should generally work.  Please do `report bugs`_ that manifest as a
+  crash or wrong behavior (markedly different from the behavior of a
+  regular PyPy).  Performance bugs are likely to be known issues; we're
+  working on them.
+
+* The JIT warm-up time is abysmal (as opposed to the regular PyPy's,
+  which is "only" bad).  Moreover, you should run it with a command like
+  ``pypy-stm --jit trace_limit=60000 args...``; the default value of
+  6000 for ``trace_limit`` is currently too low (6000 should become
+  reasonable again as we improve).  Also, in order to produce machine
+  code, the JIT needs to enter a special single-threaded mode for now.
+  This all means that you *will* get very bad performance results if
+  your program doesn't run for *many* seconds for now.
+
+* The GC is new; although clearly inspired by PyPy's regular GC, it
+  misses a number of optimizations for now.  Programs allocating large
+  numbers of small objects that don't immediately die, as well as
+  programs that modify large lists or dicts, suffer from these missing
+  optimizations.
+
+* The GC has no support for destructors: the ``__del__`` method is
+  never called (including on file objects, which won't be closed for
+  you).  This is of course temporary.
+
+* The STM system is based on very efficient read/write barriers, which
+  are mostly done (their placement could be improved a bit in
+  JIT-generated machine code).  But the overall bookkeeping logic could
+  see more improvements (see Statistics_ below).
+
+* You can use `atomic sections`_, but the most visible missing thing is
+  that you don't get reports about the "conflicts" you get.  This would
+  be the first thing that you need in order to start using atomic
+  sections more extensively.  Also, for now: for better results, try to
+  explicitly force a transaction break just before (and possibly after)
+  each large atomic section, with ``time.sleep(0)``.
+
+.. _`report bugs`: https://bugs.pypy.org/
+
+
+
+Statistics
+==========
+
+When a non-main thread finishes, you get statistics printed to stderr,
+looking like that::
+
+      thread 0x7f73377fe600:
+          outside transaction          42182  0.506 s
+          run current                  85466  0.000 s
+          run committed                34262  3.178 s
+          run aborted write write       6982  0.083 s
+          run aborted write read         550  0.005 s
+          run aborted inevitable         388  0.010 s
+          run aborted other                0  0.000 s
+          wait free segment                0  0.000 s
+          wait write read                 78  0.027 s
+          wait inevitable                887  0.490 s
+          wait other                       0  0.000 s
+          bookkeeping                  51418  0.606 s
+          minor gc                    162970  1.135 s
+          major gc                         1  0.019 s
+          sync pause                   59173  1.738 s
+          spin loop                   129512  0.094 s
+
+The first number is a counter; the second number gives the associated
+time (the amount of real time that the thread was in this state; the sum
+of all the times should be equal to the total time between the thread's
+start and the thread's end).  The most important points are "run
+committed", which gives the amount of useful work, and "outside
+transaction", which should give the time spent e.g. in library calls
+(right now it seems to be a bit larger than that; to investigate).
+Everything else is overhead of various forms.  (Short-, medium- and
+long-term future work involves reducing this overhead :-)
+
+These statistics are not printed out for the main thread, for now.
+
+
+Atomic sections
+===============
+
+While one of the goal of pypy-stm is to give a GIL-free but otherwise
+unmodified Python, the other goal is to push for a better way to use
+multithreading.  For this, you (as the Python programmer) get an API
+in the ``__pypy__.thread`` submodule:
+
+* ``__pypy__.thread.atomic``: a context manager (i.e. you use it in
+  a ``with __pypy__.thread.atomic:`` statement).  It runs the whole
+  block of code without breaking the current transaction --- from
+  the point of view of a regular CPython/PyPy, this is equivalent to
+  saying that the GIL will not be released at all between the start and
+  the end of this block of code.
+
+The obvious usage is to use atomic blocks in the same way as one would
+use locks: to protect changes to some shared data, you do them in a
+``with atomic`` block, just like you would otherwise do them in a ``with
+mylock`` block after ``mylock = thread.allocate_lock()``.  This allows
+you not to care about acquiring the correct locks in the correct order;
+it is equivalent to having only one global lock.  This is how
+transactional memory is `generally described`__: as a way to efficiently
+execute such atomic blocks, running them in parallel while giving the
+illusion that they run in some serial order.
+
+.. __: http://en.wikipedia.org/wiki/Transactional_memory
+
+However, the less obvious intended usage of atomic sections is as a
+wide-ranging replacement of explicit threads.  You can turn a program
+that is not multi-threaded at all into a program that uses threads
+internally, together with large atomic sections to keep the behavior
+unchanged.  This capability can be hidden in a library or in the
+framework you use; the end user's code does not need to be explicitly
+aware of using threads.  For a simple example of this, see
+`lib_pypy/transaction.py`_.  The idea is that if you have a program
+where the function ``f(key, value)`` runs on every item of some big
+dictionary, you can replace the loop with::
+
+    for key, value in bigdict.items():
+        transaction.add(f, key, value)
+    transaction.run()
+
+This code runs the various calls to ``f(key, value)`` using a thread
+pool, but every single call is done in an atomic section.  The end
+result is that the behavior should be exactly equivalent: you don't get
+any extra multithreading issue.
+
+.. _`lib_pypy/transaction.py`: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/lib_pypy/transaction.py
+
+==================
+
+Other APIs in pypy-stm:
+
+* ``__pypy__.thread.getsegmentlimit()``: return the number of "segments"
+  in this pypy-stm.  This is the limit above which more threads will not
+  be able to execute on more cores.  (Right now it is limited to 4 due
+  to inter-segment overhead, but should be increased in the future.  It
+  should also be settable, and the default value should depend on the
+  number of actual CPUs.)
+
+* ``__pypy__.thread.exclusive_atomic``: same as ``atomic``, but
+  raises an exception if you attempt to nest it inside another
+  ``atomic``.
+
+* ``__pypy__.thread.signals_enabled``: a context manager that runs
+  its block with signals enabled.  By default, signals are only
+  enabled in the main thread; a non-main thread will not receive
+  signals (this is like CPython).  Enabling signals in non-main threads
+  is useful for libraries where threads are hidden and the end user is
+  not expecting his code to run elsewhere than in the main thread.
+
+Note that all of this API is (or will be) implemented in a regular PyPy
+too: for example, ``with atomic`` will simply mean "don't release the
+GIL" and ``getsegmentlimit()`` will return 1.
+
+==================
+
+
+Reference to implementation details
+===================================
+
+The core of the implementation is in a separate C library called stmgc_,
+in the c7_ subdirectory.  Please see the `README.txt`_ for more
+information.
+
+.. _stmgc: https://bitbucket.org/pypy/stmgc/src/default/
+.. _c7: https://bitbucket.org/pypy/stmgc/src/default/c7/
+.. _`README.txt`: https://bitbucket.org/pypy/stmgc/raw/default/c7/README.txt
+
+PyPy itself adds on top of it the automatic placement of read__ and write__
+barriers and of `"becomes-inevitable-now" barriers`__, the logic to
+`start/stop transactions as an RPython transformation`__ and as
+`supporting`__ `C code`__, and the support in the JIT (mostly as a
+`transformation step on the trace`__ and generation of custom assembler
+in `assembler.py`__).
+
+.. __: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/translator/stm/readbarrier.py
+.. __: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/memory/gctransform/stmframework.py
+.. __: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/translator/stm/inevitable.py
+.. __: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/translator/stm/jitdriver.py
+.. __: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/translator/stm/src_stm/stmgcintf.h
+.. __: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/translator/stm/src_stm/stmgcintf.c
+.. __: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/jit/backend/llsupport/stmrewrite.py
+.. __: 
https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/jit/backend/x86/assembler.py
_______________________________________________
pypy-commit mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to