Author: Remi Meier <[email protected]>
Branch: extradoc
Changeset: r5254:9d6ae8b1bf95
Date: 2014-05-16 10:25 +0200
http://bitbucket.org/pypy/extradoc/changeset/9d6ae8b1bf95/

Log:    adjust some details

diff --git a/talk/dls2014/paper/paper.tex b/talk/dls2014/paper/paper.tex
--- a/talk/dls2014/paper/paper.tex
+++ b/talk/dls2014/paper/paper.tex
@@ -205,7 +205,7 @@
 semantics as using the GIL
 while still allowing the TM system to
 run transactions in parallel as an optimisation.
-
+\remi{maybe some more explanation of how exactly TM replaces the GIL}
 
 \subsection{Python}
 
@@ -230,8 +230,8 @@
 The second approach, \emph{multiprocessing}, uses multiple instances
 of the interpreter itself and runs them in separate OS processes.
 Here we actually get parallelism because we have one GIL per
-interpreter, but of course we have the overhead of multiple processes
-/ interpreters and also need to exchange data between them explicitly
+interpreter, but of course we have the overhead of multiple processes~/
+interpreters and also need to exchange data between them explicitly
 and expensively.
 
 We focus on the \emph{threading} approach. This requires us to remove
@@ -351,7 +351,7 @@
 an object's offset inside a segment is the same in all segments, we
 can use this offset to reference objects. Because all segments are
 copies of each other, this \emph{Segment Offset ($SO$)} points to the
-private version of an object in all threads\,/\,segments. To then
+private version of an object in all threads~/ segments. To then
 translate this $SO$ to a real virtual memory address when used inside a
 thread, we need to add the thread's segment start address to the
 $SO$. The result of this operation is called a \emph{Linear Address
@@ -392,11 +392,13 @@
 \ref{fig:mmap()-Page-Mapping}, \lstinline!mmap()! creates a mapping
 between a range of virtual memory pages and virtual file pages. The
 virtual file pages are then mapped lazily by the kernel to real
-physical memory pages. The mapping generated by \lstinline!mmap()! is
-initially linear but can be changed arbitrarily. Especially, we can
-remap so that multiple virtual memory pages map to a single virtual
-file page. This is what we use to share memory between the segments
-since then we also only require one page of physical memory.
+physical memory pages.
+
+The mapping generated by \lstinline!mmap()! is initially linear but
+can be changed arbitrarily. Especially, we can remap so that multiple
+virtual memory pages map to a single virtual file page. This is what
+we use to share memory between the segments, since we then only
+require one page of physical memory for all of them.
 
 \begin{figure}[h]
   \begin{centering}
@@ -409,16 +411,19 @@
 
 As illustrated in Figure \ref{fig:Page-Remapping}, in our initial
 configuration (I) all segments are backed by their own range of
-virtual file pages. This is the share-nothing configuration.
+virtual file pages. This is the share-nothing configuration where
+all threads have private versions of all objects.
 
-We then designate segment 0 to be the \emph{Sharing-Segment}. No
+We then designate segment~0 to be the \emph{sharing-segment}. No
 thread gets this segment assigned to it, it simply holds the pages
-shared between all threads. So in (II), we remap all virtual pages of
-the segments $>0$ to the file pages of our sharing-segment. This is
-the fully-shared configuration.
+shared between threads. So in step (II), we remap all virtual pages of
+the segments~$>0$ to the file pages of our sharing-segment. This is
+the fully-shared configuration where no threads have private versions
+of any objects.
 
-During runtime, we can then privatise single pages in segments $>0$
-again by remapping single pages as seen in (III).
+During runtime, we can then privatise single pages in segments~$>0$
+again by remapping single pages as seen in (III). All objects in that
+page now have a private version in some thread.
 
 Looking back at address translation for object references, we see now
 that this is actually a two-step process. First, $\%gs{::}SO$ gets
@@ -426,14 +431,15 @@
 CPU. Then, depending on the current mapping of virtual pages to file
 pages, these LAs can map to a single file page in the sharing-segment,
 or to privatised file pages in the corresponding segments. This
-mapping is also performed efficiently by the CPU and can easily be
-done on every access to an object.
+mapping is also performed efficiently by CPUs that have a Memory
+Management Unit (MMU) and can easily be done on every access to an
+object.
 
 In summary, $\%gs{::}SO$ is translated efficiently by the CPU to
 either a physical memory location which is shared between several
-threads/segments, or to a location in memory private to the
-segment/thread.  This makes the memory segmentation model for
-isolation memory efficient again.
+threads~/ segments or to a location in memory private to the segment~/
+thread. Page sharing makes the memory segmentation model for isolation
+memory efficient again.
 
 \begin{figure}[h]
   \begin{centering}
@@ -441,7 +447,7 @@
     \par\end{centering}
 
     \protect\caption{Page Remapping: (I) after \texttt{mmap()}. (II) remap all 
pages to
-      segment 0, fully shared memory configuration. (III) privatise single
+      segment~0, fully shared memory configuration. (III) privatise single
       pages.\label{fig:Page-Remapping}}
 \end{figure}
 
@@ -457,12 +463,12 @@
 that all pages belonging to the object are private to our segment.
 
 To detect when to privatise pages, we use write barriers before every
-write. When the barrier detects that the object is not in a private
-page (or any pages that belong to the object), we remap and copy the
-pages to the thread's segment. From now on, the translation of
-$\%gs{::}SO$ in this particular segment will resolve to the private
-version of the object. Note, the $SO$ used to reference the object does
-not change during that process.
+write to an object. When the barrier detects that the object is not in
+a private page (or any pages that belong to the object), we remap and
+copy the pages to the thread's segment. From now on, the translation
+of $\%gs{::}SO$ in this particular thread will resolve to a private
+version of the object automatically. Note that the $SO$ used to reference
+the object does not change during that process.
 
 
 
@@ -470,13 +476,15 @@
 
 The job of barriers is to ensure complete isolation between transactions
 and to register the objects in the read or write set. We insert read
-and write barriers before reading or modifying an object except if
+and write barriers before reading or modifying an object, except if
 we statically know an object to be readable or writable already.
 \begin{description}
 \item [{Read~Barrier:}] Adds the object to the read set of the current
   transaction. Since our two-step address translation automatically
   resolves the reference to the private version of the object on every
-  access anyway, the read barrier does not need to do address translation 
anymore.
+  access anyway, the read barrier does not have the job to find the
+  private version. This job is fully performed by the CPU so that
+  read barriers have very little work to do.
 \item [{Write~Barrier:}] Adds the object to the read and write set of
   the current transaction and checks if all pages of the object are
   private, doing COW otherwise.\\
@@ -484,9 +492,10 @@
   object at a time. To ensure this, we acquire a write lock on the object
   and also eagerly check for a write-write conflict at this point. If
   there is a conflict, we do some contention management to decide which
-  transaction has to wait or abort. Eagerly detecting this kind of conflict
-  is not inherent to our system, future experiments may show that we
-  want to lift this restriction.
+  transaction has to wait or abort.\\
+  Eagerly detecting this kind of conflict is not inherent to our
+  system, future experiments may show that we want to lift this
+  restriction.
 \end{description}
 
 
@@ -543,13 +552,16 @@
 \subsubsection{Architecture}
 
 Our TM system is designed as a library that covers all aspects around
-transactions and object management. The library consists of two parts:
-(I) It provides a simple interface to starting and committing
-transactions, as well as the required read and write barriers. (II) It
-also includes a \emph{garbage collector (GC)} that is closely
-integrated with the TM part (e.g. it shares the write barrier). The
-close integration helps in order to know more about the lifetime of an
-object, as will be explained in the following sections.
+transactions and object management. It is designed for object-oriented
+dynamic language VMs as a replacement for the GIL.
+
+The library consists of two parts: (I) It provides a simple interface
+to starting and committing transactions, as well as the required read
+and write barriers. (II) It also includes a \emph{garbage collector
+(GC)} that is closely integrated with the TM part (e.g. it shares the
+write barrier). The close integration helps in order to know more
+about the lifetime of an object, as will be explained in the following
+sections.
 
 
 \subsubsection{Application Programming 
Interface\label{sub:Application-Programming-Interfac}}
@@ -571,8 +583,8 @@
 \lstinline!stm_commit_transaction()!  tries to commit the current
 transaction. \lstinline!stm_read()!, \lstinline!stm_write()!  perform
 a read or a write barrier on an object and \lstinline!stm_allocate()!
-allocates a new object with the specified size (must be a multiple of
-16). \lstinline!STM_PUSH_ROOT()!  and \lstinline!STM_POP_ROOT()!  push
+allocates a new object with the specified size.
+ \lstinline!STM_PUSH_ROOT()!  and \lstinline!STM_POP_ROOT()!  push
 and pop objects on the shadow stack~\footnote{A stack for pointers to
   GC objects that allows for precise garbage collection. All objects
   on that stack are never seen as garbage and are thus always kept
@@ -583,8 +595,8 @@
 require saving object references.
 
 The type \lstinline!object_t!  is special as it causes the
-compiler~\footnote{Clang 3.5 with some patches to this address-space
-  256 feature} to make all accesses through it relative to the $\%gs$
+compiler~\footnote{Clang 3.5 with some patches to its address-space
+ 256 feature} to make all accesses through it relative to the $\%gs$
 register.  With exceptions, nearly all accesses to objects managed by
 the TM system should use this type so that the CPU will translate the
 reference to the right version of the object.
@@ -594,7 +606,7 @@
 
 On startup, we reserve a big range of virtual memory with a call to
 \lstinline!mmap()! and partition this space into $N+1$ segments. We
-want to run $N$ threads in parallel while segment 0 is designated as
+want to run $N$ threads in parallel while segment~0 is designated as
 the \emph{sharing-segment} that is never assigned to a thread.
 
 The next step involves using \lstinline!remap_file_pages()!, a Linux
@@ -660,7 +672,8 @@
 
 Garbage collection plays a big role in our TM system. The GC is
 generational and has two generations: the \emph{young} and the
-\emph{old} generation.
+\emph{old} generation. It is optimised for dynamic languages with
+high allocation rates.
 
 The \textbf{young generation}, where objects are considered to be
 \emph{young} and reside in the \emph{Nursery}, is collected by
@@ -679,7 +692,8 @@
 the old object space with an \lstinline!overflow_number!  globally
 unique to the current transaction. That way we can still detect in a
 medium-fast path inside barriers that the object still belongs to the
-current transaction.
+current transaction. \remi{so this is where we mention how the GC-STM
+integration is useful. highlight more or move to own section?}
 
 The \textbf{old generation}, where objects are considered to be
 \emph{old} and never move again, is collected by \emph{major
@@ -696,7 +710,8 @@
 shadow stack using \lstinline!STM_PUSH_ROOT()!.  That way, they will
 not be freed. And in case they were young, we get their new location
 in the old object space when getting them back from the stack using
-\lstinline!STM_POP_ROOT()!.
+\lstinline!STM_POP_ROOT()!. \remi{cite something which explains
+shadowstacks in more detail}
 
 
 
@@ -717,12 +732,12 @@
 
 This area can be seen as a continuous array of bytes that is indexed
 from the start of the segment by an object's reference ($SO$) divided
-by 16 (this is where the requirement of objects to be of at least 16
-bytes in size comes from). Instead of just setting the byte to
-\lstinline!true!  if the corresponding object was read, we set it to a
-\lstinline!read_version!  belonging to the transaction, which will be
-incremented on each commit.  Thereby, we can avoid resetting the bytes
-to \lstinline!false!  on commit and only need to do this every 255
+by 16 (this requires objects to have a size which is dividable by 16).
+Instead of just setting the byte to \lstinline!true!  if the
+corresponding object was read, we set it to a \lstinline!read_version!
+belonging to the transaction, which will be incremented on each
+commit.  Thereby, we can avoid resetting the bytes to
+\lstinline!false!  on commit and only need to do this every 255
 transactions. The whole code for the barrier is easily optimisable for
 compilers as well as perfectly predictable for CPUs:
 
@@ -736,8 +751,8 @@
 \subsubsection{Write Barrier}
 
 The job of the write barrier is twofold: first, it serves as a write
-barrier for the garbage collector and second, it supports
-copy-on-write and adds objects to the write set of the transaction.
+barrier for the garbage collector and second, it supports COW and adds
+objects to the write set of the transaction.
 
 The \textbf{fast path} of the write barrier is very simple. We only
 need to check for the flag \lstinline!WRITE_BARRIER!  in the object's
@@ -789,29 +804,31 @@
 reference to it that points to a young object. We then need to trace
 it during the next minor collection in order to mark the young object
 alive and to update its reference to the new location it gets moved
-to. The check for \lstinline!is_overflow_obj()!  tells us if the
-object was actually created in this transaction. In that case, we do
-not need to execute the following \emph{TM part}.  We especially do
-not need to privatise the page since no other transaction knows about
-these ``old'' objects.
+to.
 
-For TM, we first perform a read barrier on the object. We then try to
-acquire its write lock. \lstinline!write_locks!  again is a simple
-global array of bytes that is indexed with the $SO$ of the object
-divided by 16. If we already own the lock, we are done.  If someone
-else owns the lock, we will do a write-write contention management
-that will abort either us or the current owner of the object.  If we
-succeed in acquiring the lock using an atomic
+The check for \lstinline!is_overflow_obj()! looks at the
+\lstinline!overflow_number!  and tells us if the object was actually
+created in this transaction. In that case, we do not need to execute
+the following \emph{TM part}.  We especially do not need to privatise
+its pages since no other transaction knows about these overflow
+objects. Even if they reside in non-private pages, it is guaranteed
+that no other transaction can have a reference to them.
+
+For the \emph{TM part}, we first perform a read barrier on the
+object. We then try to acquire its write lock. \lstinline!write_locks!
+again is a simple global array of bytes that is indexed with the $SO$
+of the object divided by 16. If we already own the lock, we are done.
+If someone else owns the lock, we will do a write-write contention
+management that will abort either us or the current owner of the
+object.  If we succeed in acquiring the lock using an atomic
 \lstinline!cmp_and_swap!, we need to add the object to the write set
 (a simple list called \lstinline!modified_old_objects!)  and privatise
 all pages belonging to it (copy-on-write).
 
 In all cases, we remove the \lstinline!WRITE_BARRIER!  flag from the
 object before we return. Thus, we never trigger the slow path again
-before we do the next minor collection (also part of a commit) or we
-start the next transaction.
-
-
+before we do the next minor collection or we start the next
+transaction (we always do a minor collection during a commit).
 
 
 \subsubsection{Abort}
@@ -823,7 +840,7 @@
 sharing-segment. What is left is to use \lstinline!longjmp()!  to jump
 back to the location initialised by a \lstinline!setjmp()!  in
 \lstinline!stm_start_transaction()!.  Increasing the
-\lstinline!read_version!  is also done there.
+\lstinline!read_version! for the next transaction is also done there.
 
 
 
@@ -832,9 +849,9 @@
 
 Committing a transaction needs a bit more work. First, we synchronise
 all threads so that the committing one is the only one running and all
-the others are waiting in a safe point. We then go through the write
+the others are waiting in safe points. We then go through the write
 set (\lstinline!modified_old_objects!)  and check the corresponding
-\lstinline!read_markers!  in other threads/segments. If we detect a
+\lstinline!read_markers!  in other threads~/ segments. If we detect a
 read-write conflict, we do contention management to either abort us or
 the other transaction, or to simply wait a bit (see 
\ref{subsub:contentionmanagement}).
 
@@ -844,7 +861,7 @@
 threads. We also need to push overflow objects generated by minor
 collections to other segments, since they may reside partially in
 private pages. At that point we also get a new
-\lstinline!overflow_number!  by increasing a global one, so that it
+\lstinline!overflow_number! by increasing a global one, so that it
 stays globally unique for each transaction. Increasing the
 \lstinline!read_version!  is then done at the start of a new
 transaction.
@@ -882,10 +899,10 @@
 
 If we want to synchronise all threads, we can rely on this check being
 performed regularly. So what we do is to set the
-\lstinline!nursery_end!  to $0$ in all segments that we want to
-synchronise. The mentioned check will then fail in those segments and
-call the slow path. In \lstinline!allocate_slowpath!  they can simply
-check for this condition and enter a safe point.
+\lstinline!nursery_end!  to some small number in all segments that we
+want to synchronise. The mentioned check will then fail in those
+segments and call the slow path. In \lstinline!allocate_slowpath!
+they can simply check for this condition and enter a safe point.
 
 For other synchronisation requirements, for example:
 \begin{itemize}[noitemsep]
_______________________________________________
pypy-commit mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to