Author: Armin Rigo <[email protected]>
Branch: extradoc
Changeset: r4300:ba2bd6071a72
Date: 2012-07-14 01:09 +0200
http://bitbucket.org/pypy/extradoc/changeset/ba2bd6071a72/

Log:    tweaks

diff --git a/blog/draft/stm-jul2012.rst b/blog/draft/stm-jul2012.rst
--- a/blog/draft/stm-jul2012.rst
+++ b/blog/draft/stm-jul2012.rst
@@ -64,9 +64,10 @@
 performing some "mostly-independent" work on each value.  By using the
 technique described here, putting each piece of work in one "block"
 running in one thread of a pool, we get exactly the same effect: the
-pieces of work still appear to run in some global serialized order, but
-the order is random (as it is anyway when iterating over the keys of a
-dictionary).
+pieces of work still appear to run in some global serialized order, in
+some random order (as it is anyway when iterating over the keys of a
+dictionary).  (There are even techniques building on top of AME that can
+be used to force the order of the blocks, if needed.)
 
 
 PyPy and STM
@@ -79,8 +80,8 @@
 we have blocks that are specified explicitly by the programmer using
 ``with thread.atomic:``.  The latter gives typically long-running
 blocks.  It allows us to build the higher-level solution sought after:
-we will run most of our Python code in multiple threads but always
-within a ``thread.atomic``.
+it will run most of our Python code in multiple threads but always
+within a ``thread.atomic`` block, e.g. using a pool of threads.
 
 This gives the nice illusion of a global serialized order, and thus
 gives us a well-behaving model of our program's behavior.  The drawback
@@ -89,10 +90,15 @@
 the execution of one block of code to be aborted and restarted.
 Although the process is transparent, if it occurs more than
 occasionally, then it has a negative impact on performance.  We will
-need better tools to deal with them.  The point here is that at all
-stages our program is *correct*, while it may not be as efficient as it
-could be.  This is the opposite of regular multithreading, where
-programs are efficient but not as correct as they could be...
+need better tools to deal with them.  The point here is that at any
+stage of this "improvement" process our program is *correct*, while it
+may not be yet as efficient as it could be.  This is the opposite of
+regular multithreading, where programs are efficient but not as correct
+as they could be.  (And as you only have resources to do the easy 80% of
+the work and not the remaining hard 20%, you get a program that has 80%
+of the theoretical maximum of performance and it's fine; as opposed to
+regular multithreading, where you are left with the most obscure 20% of
+the original bugs.)
 
 
 CPython and HTM
@@ -114,9 +120,9 @@
 The issue with the first two solutions is the same one: they are meant
 to support small-scale transactions, but not long-running ones.  For
 example, I have no clue how to give GCC rules about performing I/O in a
-transaction; and moreover looking at the STM library that is available
-so far to be linked with the compiled program, it assumes short
-transactions only.
+transaction --- this seems not supported at all; and moreover looking at
+the STM library that is available so far to be linked with the compiled
+program, it assumes short transactions only.
 
 Intel's HTM solution is both more flexible and more strictly limited.
 In one word, the transaction boundaries are given by a pair of special
@@ -140,21 +146,21 @@
 
 So what does it mean?  A Python interpreter overflows the L1 cache of
 the CPU very quickly: just creating new Python function frames takes a
-lot of memory (the order of magnitude is smaller than 100 frames).  This
-means that as long as the HTM support is limited to L1 caches, it is not
-going to be enough to run an "AME Python" with any sort of
-medium-to-long transaction (running for 0.01 second or longer).  It can
-run a "GIL-less Python", though: just running a few dozen bytecodes at a
-time should fit in the L1 cache, for most bytecodes.
+lot of memory (on the order of magnitude of 1/100 of the whole L1
+cache).  This means that as long as the HTM support is limited to L1
+caches, it is not going to be enough to run an "AME Python" with any
+sort of medium-to-long transaction (running for 0.01 second or longer).
+It can run a "GIL-less Python", though: just running a few dozen
+bytecodes at a time should fit in the L1 cache, for most bytecodes.
 
 
 Write your own STM for C
 ------------------------
 
 Let's discuss now the third option: if neither GCC 4.7 nor HTM are
-sufficient for CPython, then this third choice would be to write our own
-C compiler patch (as either extra work on GCC 4.7, or an extra pass to
-LLVM, for example).
+sufficient for an "AME CPython", then this third choice would be to
+write our own C compiler patch (as either extra work on GCC 4.7, or an
+extra pass to LLVM, for example).
 
 We would have to deal with the fact that we get low-level information,
 and somehow need to preserve interesting high-level bits through the
@@ -165,8 +171,8 @@
 against other threads modifying them.)  We can also have custom code to
 handle the reference counters: e.g. not consider it a conflict if
 multiple transactions have changed the same reference counter, but just
-resolve it automatically at commit time.  We can also choose what to do
-with I/O.
+resolve it automatically at commit time.  We are also free to handle I/O
+in the way we want.
 
 More generally, the advantage of this approach over the current GCC 4.7
 is that we control the whole process.  While this still looks like a lot
@@ -176,10 +182,11 @@
 Conclusion?
 -----------
 
-I would assume that a programming model specific to PyPy has little
-chances to catch on, as long as PyPy is not the main Python interpreter
-(which looks unlikely to occur anytime soon).  Thus as long as only PyPy
-has STM, I would assume that using it would not become the main model of
-multicore usage in Python.  However, I can conclude with a more positive
-note than during EuroPython: there appears to be a reasonable way
-forward to have an STM version of CPython too.
+I would assume that a programming model specific to PyPy and not
+applicable to CPython has little chances to catch on, as long as PyPy is
+not the main Python interpreter (which looks unlikely to occur anytime
+soon).  Thus as long as only PyPy has STM, it looks like it will not
+become the main model of multicore usage in Python.  However, I can
+conclude with a more positive note than during EuroPython: there appears
+to be a more-or-less reasonable way forward to have an STM version of
+CPython too.
_______________________________________________
pypy-commit mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to