Hi All INTRODUCTION
This is the third and concluding post, where I describe a scheme for multi-core reference counting garbage collection. The first two posts are https://mail.python.org/pipermail/python-ideas/2018-July/052054.html https://mail.python.org/pipermail/python-ideas/2018-July/052151.html This subject is quite technical, with pitfalls and a race hazard. If you don't understand what I've written, or think it wrong, it might be that I screwed up and got it wrong. I personally find it a hard subject. Or it might be that you've missed something in one of the two previous posts. Or we might both be wrong. Or both right! For the race hazard, see https://mail.python.org/pipermail/python-ideas/2018-July/052190.html There have been many valuable contributions to this discussion thread, which I will acknowledge later, in another post. However, I do repeat that this scheme is based on other people's work (particularly talks by Larry Hastings), as well as my own thinking. And it is actually a known and published idea (which is reassuring). https://mail.python.org/pipermail/python-ideas/2018-July/052061.html THE STORY SO FAR We have 5 worker processes and one garbage collection (GC) process. Each worker process has an INCR buffer in which it logs an ID, every time the process increases the refcount of the object with that ID. And similarly for DECR. When the worker INCR and DECR buffers are full, they are passed over to the GC process. The GC process keeps, for each object ID, a running total of how many references. In this way it consumes the information in the INCR and DECR buffers. The problem, as always in garbage collection, is to reclaim objects that will no longer be used. With reference counting, objects whose reference count is zero can be reclaimed. This is often done, in single process systems, immediately after the reference count becomes zero. (An important aside: For multi-core machines this is an unhelpful constraint. Relaxing it will allow the worker processes to run more freely.) At each point in time, there are for each object ID two reference counts. The first is the running total, updated as full INCR and DECR buffers are applied to it. I will call this the RAW TOTAL. The second is the raw total, adjusted by all the INCR and DECR buffers in the worker processes. I will call this the REAL TOTAL. Once the real total is zero, it stays zero. This is an immediate consequence of the worker threads having no references to the object. Having no references, they cannot do an INCR or DECR. When the real total is zero, the object can be reclaimed. The problem is to refine what we have, so that all objects whose real count is zero are in good time reclaimed. We want to avoid locking all the worker threads, for example to compute real counts for all IDs. BEING TRICKY Think of the worker threads as an adversary, who are trying to delay or fool the garbage collector. Here's one thing that can be done. Have a worker thread 1. Acquire (references to) some objects (say from other worker processes). 2. Delete those object references, without filling either the INCR or DECR buffer. 3. Spin its wheels endlessly. This will defeat the GC process, unless the system from time to time sends the INCR and DECR buffers to the GC process, whether or not they are full. So we'll add this to the system requirements. Here's something else, which is part of normal use. We have one worker thread do a calculation for another. The return value is passed from one thread to another. It could happen that, for some ID, an INCR in one thread is followed by a DECR in another. Now suppose that the GC process gets * First, the process buffer containing the DECR. * And then the process buffer that contains the INCR. (This is a race hazard. The order of operations has been reversed.) Thus it can happen that the raw total for an ID can go down to zero and then increase up to one. This can't happen for the true total, of course. (With more work, the raw counts can go negative!) DEFEATING TRICKINESS The problem is for the GC process to discover IDs for which the real total is zero. And in good time. Here's how we'll do it. Suppose an ID has raw total zero. That makes it a candidate ID. Now do a global INCR update. By this we mean that we have the GC thread get INCR buffers from all the worker threads, and applies them to the raw totals. If the raw total for the candidate ID (drum roll) is still zero, then the real total is also zero. That's it. The rest is implementation. Let's see why this is. First, note that we don't need to know exactly when the real count became zero. We just need to know that it is now zero. (And once zero, it stays zero.) Now note that the real count can never be negative. Now suppose at time T the raw count (for an ID) is zero, but that the real count is one or more. That means that some worker process has a reference, and so the ID appears in some INCR buffer. Well, we just did a global INCR update, and found that the candidate ID was still zero. So the ID wasn't in an INCR buffer. We can reclaim the memory. A MATHEMATICAL APPROACH The previous argument was perhaps a computer science approach. Here's a more mathematical approach. We are using a special property of global INCR update. Suppose that at time T the raw count is N. Now do a global INCR update, to get raw count M. Let R be the real count at time T. We have * N <= M, because processing only INCR buffers. * R <= M, because ** we skipped the DECR buffers ** and at most enlarged the INCR buffers We always have 0 <= R, and after the global INCR update we have R <= M. That was at our time T, when we started the global INCR update. But once zero, R stays zero. So M == 0 is our condition for garbage collection, when M is the value after a global INCR update. WHERE NOW For me, this thread has been an exploration. I'm now convinced, although you may not be, that buffered multi-core reference counting garbage collections is something that can be done. And that it will allow the worker processes to run freely, at least when it comes to garbage collection. This discussion thread contributes nothing to important problem of multiple processes mutating (writing to) the same object. (Except that buffering provides a solution in the special case where order doesn't matter, such 1 + 2 + 3 = 2 + 3 + 1.) Well done for getting to the end of this long and difficult material. I've written it with extra special care, but expect that it can still be made clearer, and might have errors. I'd be delighted if this discussion thread helps us make CPython run faster on multi-core machines, at least for some users. If you've got some ideas about that, I'd love to hear them. Questions and comments are, of course, welcome. Especially corrections and improvements. -- Jonathan _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/