Author: Armin Rigo <[email protected]>
Branch: extradoc
Changeset: r5083:51a7f13ee794
Date: 2013-10-15 11:31 +0200
http://bitbucket.org/pypy/extradoc/changeset/51a7f13ee794/
Log: Complete the last section
diff --git a/blog/draft/incremental-gc.rst b/blog/draft/incremental-gc.rst
--- a/blog/draft/incremental-gc.rst
+++ b/blog/draft/incremental-gc.rst
@@ -124,72 +124,106 @@
Nitty gritty details
====================
-XXX insert some links where you can read about terms used
+Here are some nitty gritty details for people really interested in
+Garbage Collection. This was done as a patch to "minimark", our current
+GC, and called "incminimark" for now. The former is a generational
+stop-the-world GC. New objects are allocated "young", which means that
+they initially live in the "nursery", a special zone of a few MB of
+memory. When the nursery is full, a "minor collection" step moves the
+surviving objects out of the nursery. This can be done quickly (a few
+millisecond) because we only need to walk through the young objects that
+survive --- usually a small fraction of all young objects; and also by
+far not *all* objects that are alive at this point, but only the young
+ones. However, from time to time this minor collection is followed by a
+"major collection": in that step, we really need to walk all objects to
+classify which ones are still alive and which ones are now dead
+("marking") and free the memory occupied by the dead ones ("sweeping").
+You can read more details here__.
-This was done as a patch to "minimark", our current GC, and called
-"incminimark" for now. The former is a generational stop-the-world GC.
-New objects are allocated "young", i.e. in the nursery, a special zone
-of a few MB of memory. When it is full, a "minor collection" step moves
-the surviving objects out of the nursery. This can be done quickly (a
-few millisecond at most) because we only need to walk through the young
-objects that survive --- usually a small fraction of all young objects.
-From time to time, this minor collection is followed by a "major
-collection": in that step, we walk *all* objects to classify which ones
-are still alive and which ones are now dead (*marking*) and free the
-memory occupied by the dead ones (*sweeping*).
+.. __: http://doc.pypy.org/en/latest/garbage_collection.html#minimark-gc
This "major collection" is what gives the long GC pauses. To fix this
problem we made the GC incremental: instead of running one complete
-major collection, we split its work into a variable number of pieces
-and run each piece after every minor collection for a while, until there
-are no more pieces. The pieces are each doing a fraction of marking, or
-a fraction of sweeping.
+major collection, we split its work into a variable number of pieces and
+run each piece after every minor collection for a while, until there are
+no more pieces. The pieces are each doing a fraction of marking, or a
+fraction of sweeping. It adds some few milliseconds after each of these
+minor collections, rather than requiring hundreds of milliseconds in one
+go.
The main issue is that splitting the major collections means that the
-main program is actually running between the pieces, and so can change
-the pointers in the objects to point to other objects. This is not
-a problem for sweeping: dead objects will remain dead whatever the main
-program does. However, it is a problem for marking. Let us see why.
+main program is actually running between the pieces, and so it can
+change the pointers in the objects to point to other objects. This is
+not a problem for sweeping: dead objects will remain dead whatever the
+main program does. However, it is a problem for marking. Let us see
+why.
+
+.. __: http://rubini.us/2013/06/22/concurrent-garbage-collection/
+.. __:
http://wiki.luajit.org/New-Garbage-Collector/01fd5e5ca4f95d45e0c4b8a98b49f2b656cc23dd
In terms of the incremental GC literature, objects are either "white",
-"gray" or "black". They start as "white", become "gray" when they are
-found to be alive, and become "black" when they have been fully
-traversed --- at which point the objects that it points to have
-themselves been marked gray, or maybe are already black. The gray
-objects are the "frontier" between the black objects that we have found
-to be reachable, and the white objects that represent the unknown part
-of the world. When there are no more gray objects, the process is
-finished: all remaining white objects are unreachable and can be freed
-(by the following sweeping phase).
+"gray" or "black". This is called *tri-color marking.* See for example
+this `blog post about Rubinius`__, or this `page about LuaJIT`__. The
+objects start as "white" at the beginning of marking; become "gray" when
+they are found to be alive; and become "black" when they have been fully
+traversed. Marking proceeds by scanning grey objects for pointers to
+white objects. The white objects found are turned grey, and the grey
+objects scanned are turned black. When there are no more grey objects,
+the marking phase is complete: all remaining white objects are truly
+unreachable and can be freed (by the following sweeping phase).
In this model, the important part is that a black object can never point
to a white object: if the latter remains white until the end, it will be
freed, which is incorrect because the black object itself can still be
-reached.
+reached. How do we ensure that the main program, running in the middle
+of marking, will not try to write a pointer to white object into a black
+object? This requires a "write barrier", i.e. a piece of code that runs
+every time we set a pointer into an object or array. This piece of code
+checks if some (hopefully rare) condition is met, and calls a function
+if that is the case.
The trick we used in PyPy is to consider minor collections as part of
the whole, rather than focus only on major collections. The existing
-minimark GC had always used a "write barrier" (a piece of code run every time
-you set or get a pointer from an object or array) to do its job, like any
-generational GC. This write barrier is used to detect when an old
-object (outside the nursery) is modified to point to a young object
-(inside the nursery), which is essential information for minor
+minimark GC had always used a write barrier of its own to do its job,
+like any generational GC. This existing write barrier is used to detect
+when an old object (outside the nursery) is modified to point to a young
+object (inside the nursery), which is essential information for minor
collections. Actually, although this was the goal, the actual write
-barrier code was simpler: it just recorded all old objects into which we
-wrote *any* pointer --- to a young or old object. It is actually a
-performance improvement, because we don't need to check over and over
-again if the written pointer points to a young object or not.
+barrier code is simpler: it just records all old objects into which we
+write *any* pointer --- to a young or old object. As we found out over
+time, doing so is not actually slower, and might actually be a
+performance improvement: for example, if the main program does a lot of
+writes into the same old object, we don't need to check over and over
+again if the written pointer points to a young object or not. We just
+record the old object in some list the first time, and that's it.
-This *unmodified* write barrier works for incminimark too. Imagine that
-we are in the middle of the marking phase, running the main program.
-The write barrier will record all old objects that are being modified.
-Then at the next minor collection, all surviving young objects will be
-moved out of the nursery. At this point, as we're about to continue
-running the major collection's marking phase, we simply add to the list
-of pending gray objects all the objects that we consider --- both the
-objects listed as "old objects that are being modified", and the objects
-that we just moved out of the nursery. A fraction of the former list
-are turned back from the black to the gray color. This technique
+The trick is that this *unmodified* write barrier works for incminimark
+too. Imagine that we are in the middle of the marking phase, running
+the main program. The write barrier will record all old objects that
+are being modified. Then at the next minor collection, all surviving
+young objects will be moved out of the nursery. At this point, as we're
+about to continue running the major collection's marking phase, we
+simply add to the list of pending gray objects all the objects that we
+just considered --- both the objects listed as "old objects that are
+being modified", and the objects that we just moved out of the nursery.
+A fraction from the former list were black object; so this mean that
+they are turned back from the black to the gray color. This technique
implements nicely, if indirectly, what is called a "backward write
-barrier" in the literature: the backwardness is about the color that
-occasionally progresses backward from black to gray.
+barrier" in the literature. The backwardness is about the color that
+needs to be changed in the opposite of the usual direction "white ->
+gray -> black", thus making more work for the GC. (This is as opposed
+to "forward write barrier", where we would also detect "black -> white"
+writes but turn the white object gray.)
+
+In summary, I realize that this description is less about how we turned
+minimark into incminimark, and more about how we differ from the
+standard way of making a GC incremental. What we really had to do to
+make incminimark was to write logic that says "if the major collection
+is in the middle of the marking phase, then add this object to the list
+of gray objects", and put it at a few places throughout minor
+collection. Then we simply split a major collection into increments,
+doing marking or sweeping of some (relatively arbitrary) number of
+objects before returning. That's why, after we found that the existing
+write barrier would do, it was not much actual work, and could be done
+without major changes. For example, not a single line from the JIT
+needed adaptation. All in all it was relatively painless work. ``:-)``
_______________________________________________
pypy-commit mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-commit