Author: Armin Rigo <[email protected]>
Branch: extradoc
Changeset: r5083:51a7f13ee794
Date: 2013-10-15 11:31 +0200
http://bitbucket.org/pypy/extradoc/changeset/51a7f13ee794/

Log:    Complete the last section

diff --git a/blog/draft/incremental-gc.rst b/blog/draft/incremental-gc.rst
--- a/blog/draft/incremental-gc.rst
+++ b/blog/draft/incremental-gc.rst
@@ -124,72 +124,106 @@
 Nitty gritty details
 ====================
 
-XXX insert some links where you can read about terms used
+Here are some nitty gritty details for people really interested in
+Garbage Collection.  This was done as a patch to "minimark", our current
+GC, and called "incminimark" for now.  The former is a generational
+stop-the-world GC.  New objects are allocated "young", which means that
+they initially live in the "nursery", a special zone of a few MB of
+memory.  When the nursery is full, a "minor collection" step moves the
+surviving objects out of the nursery.  This can be done quickly (a few
+millisecond) because we only need to walk through the young objects that
+survive --- usually a small fraction of all young objects; and also by
+far not *all* objects that are alive at this point, but only the young
+ones.  However, from time to time this minor collection is followed by a
+"major collection": in that step, we really need to walk all objects to
+classify which ones are still alive and which ones are now dead
+("marking") and free the memory occupied by the dead ones ("sweeping").
+You can read more details here__.
 
-This was done as a patch to "minimark", our current GC, and called
-"incminimark" for now.  The former is a generational stop-the-world GC.
-New objects are allocated "young", i.e. in the nursery, a special zone
-of a few MB of memory.  When it is full, a "minor collection" step moves
-the surviving objects out of the nursery.  This can be done quickly (a
-few millisecond at most) because we only need to walk through the young
-objects that survive --- usually a small fraction of all young objects.
-From time to time, this minor collection is followed by a "major
-collection": in that step, we walk *all* objects to classify which ones
-are still alive and which ones are now dead (*marking*) and free the
-memory occupied by the dead ones (*sweeping*).
+.. __: http://doc.pypy.org/en/latest/garbage_collection.html#minimark-gc
 
 This "major collection" is what gives the long GC pauses.  To fix this
 problem we made the GC incremental: instead of running one complete
-major collection, we split its work into a variable number of pieces
-and run each piece after every minor collection for a while, until there
-are no more pieces.  The pieces are each doing a fraction of marking, or
-a fraction of sweeping.
+major collection, we split its work into a variable number of pieces and
+run each piece after every minor collection for a while, until there are
+no more pieces.  The pieces are each doing a fraction of marking, or a
+fraction of sweeping.  It adds some few milliseconds after each of these
+minor collections, rather than requiring hundreds of milliseconds in one
+go.
 
 The main issue is that splitting the major collections means that the
-main program is actually running between the pieces, and so can change
-the pointers in the objects to point to other objects.  This is not
-a problem for sweeping: dead objects will remain dead whatever the main
-program does.  However, it is a problem for marking.  Let us see why.
+main program is actually running between the pieces, and so it can
+change the pointers in the objects to point to other objects.  This is
+not a problem for sweeping: dead objects will remain dead whatever the
+main program does.  However, it is a problem for marking.  Let us see
+why.
+
+.. __: http://rubini.us/2013/06/22/concurrent-garbage-collection/
+.. __: 
http://wiki.luajit.org/New-Garbage-Collector/01fd5e5ca4f95d45e0c4b8a98b49f2b656cc23dd
 
 In terms of the incremental GC literature, objects are either "white",
-"gray" or "black".  They start as "white", become "gray" when they are
-found to be alive, and become "black" when they have been fully
-traversed --- at which point the objects that it points to have
-themselves been marked gray, or maybe are already black.  The gray
-objects are the "frontier" between the black objects that we have found
-to be reachable, and the white objects that represent the unknown part
-of the world.  When there are no more gray objects, the process is
-finished: all remaining white objects are unreachable and can be freed
-(by the following sweeping phase).
+"gray" or "black".  This is called *tri-color marking.*  See for example
+this `blog post about Rubinius`__, or this `page about LuaJIT`__.  The
+objects start as "white" at the beginning of marking; become "gray" when
+they are found to be alive; and become "black" when they have been fully
+traversed.  Marking proceeds by scanning grey objects for pointers to
+white objects.  The white objects found are turned grey, and the grey
+objects scanned are turned black.  When there are no more grey objects,
+the marking phase is complete: all remaining white objects are truly
+unreachable and can be freed (by the following sweeping phase).
 
 In this model, the important part is that a black object can never point
 to a white object: if the latter remains white until the end, it will be
 freed, which is incorrect because the black object itself can still be
-reached.
+reached.  How do we ensure that the main program, running in the middle
+of marking, will not try to write a pointer to white object into a black
+object?  This requires a "write barrier", i.e. a piece of code that runs
+every time we set a pointer into an object or array.  This piece of code
+checks if some (hopefully rare) condition is met, and calls a function
+if that is the case.
 
 The trick we used in PyPy is to consider minor collections as part of
 the whole, rather than focus only on major collections.  The existing
-minimark GC had always used a "write barrier" (a piece of code run every time
-you set or get a pointer from an object or array) to do its job, like any
-generational GC.  This write barrier is used to detect when an old
-object (outside the nursery) is modified to point to a young object
-(inside the nursery), which is essential information for minor
+minimark GC had always used a write barrier of its own to do its job,
+like any generational GC.  This existing write barrier is used to detect
+when an old object (outside the nursery) is modified to point to a young
+object (inside the nursery), which is essential information for minor
 collections.  Actually, although this was the goal, the actual write
-barrier code was simpler: it just recorded all old objects into which we
-wrote *any* pointer --- to a young or old object.  It is actually a
-performance improvement, because we don't need to check over and over
-again if the written pointer points to a young object or not.
+barrier code is simpler: it just records all old objects into which we
+write *any* pointer --- to a young or old object.  As we found out over
+time, doing so is not actually slower, and might actually be a
+performance improvement: for example, if the main program does a lot of
+writes into the same old object, we don't need to check over and over
+again if the written pointer points to a young object or not.  We just
+record the old object in some list the first time, and that's it.
 
-This *unmodified* write barrier works for incminimark too.  Imagine that
-we are in the middle of the marking phase, running the main program.
-The write barrier will record all old objects that are being modified.
-Then at the next minor collection, all surviving young objects will be
-moved out of the nursery.  At this point, as we're about to continue
-running the major collection's marking phase, we simply add to the list
-of pending gray objects all the objects that we consider --- both the
-objects listed as "old objects that are being modified", and the objects
-that we just moved out of the nursery.  A fraction of the former list
-are turned back from the black to the gray color.  This technique
+The trick is that this *unmodified* write barrier works for incminimark
+too.  Imagine that we are in the middle of the marking phase, running
+the main program.  The write barrier will record all old objects that
+are being modified.  Then at the next minor collection, all surviving
+young objects will be moved out of the nursery.  At this point, as we're
+about to continue running the major collection's marking phase, we
+simply add to the list of pending gray objects all the objects that we
+just considered --- both the objects listed as "old objects that are
+being modified", and the objects that we just moved out of the nursery.
+A fraction from the former list were black object; so this mean that
+they are turned back from the black to the gray color.  This technique
 implements nicely, if indirectly, what is called a "backward write
-barrier" in the literature: the backwardness is about the color that
-occasionally progresses backward from black to gray.
+barrier" in the literature.  The backwardness is about the color that
+needs to be changed in the opposite of the usual direction "white ->
+gray -> black", thus making more work for the GC.  (This is as opposed
+to "forward write barrier", where we would also detect "black -> white"
+writes but turn the white object gray.)
+
+In summary, I realize that this description is less about how we turned
+minimark into incminimark, and more about how we differ from the
+standard way of making a GC incremental.  What we really had to do to
+make incminimark was to write logic that says "if the major collection
+is in the middle of the marking phase, then add this object to the list
+of gray objects", and put it at a few places throughout minor
+collection.  Then we simply split a major collection into increments,
+doing marking or sweeping of some (relatively arbitrary) number of
+objects before returning.  That's why, after we found that the existing
+write barrier would do, it was not much actual work, and could be done
+without major changes.  For example, not a single line from the JIT
+needed adaptation.  All in all it was relatively painless work. ``:-)``
_______________________________________________
pypy-commit mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-commit

Reply via email to