Big +1. I love the idea. str (especially, docstring), dict, and tuples are major memory eater in Python. This may reduce tuple memory usage massively.
INADA Naoki <songofaca...@gmail.com> On Fri, Sep 8, 2017 at 2:30 AM, Neil Schemenauer <n...@python.ca> wrote: > Python objects that participate in cyclic GC (things like lists, dicts, > sets but not strings, ints and floats) have extra memory overhead. I > think it is possible to mostly eliminate this overhead. Also, while > the GC is running, this GC state is mutated, which destroys > copy-on-write optimizations. This change would mostly fix that > issue. > > All objects that participate in cyclic GC have the Py_TPFLAGS_HAVE_GC > bit set in their type. That causes an extra chunk of memory to be > allocated *before* the ob_refcnt struct member. This is the PyGC_Head > struct. > > The whole object looks like this in memory (PyObject pointer is at > arrow): > > union __gc_head *gc_next; > union __gc_head *gc_prev; > Py_ssize_t gc_refs; > --> > Py_ssize_t ob_refcnt > struct _typeobject *ob_type; > [rest of PyObject members] > > > So, 24 bytes of overhead on a 64-bit machine. The smallest Python > object that can have a pointer to another object (e.g. a single PyObject > * member) is 48 bytes. Removing PyGC_Head would cut the size of these > objects in half. > > Carl Shaprio questioned me today on why we use a double linked-list and > not the memory bitmap. I think the answer is that there is no good > reason. We use a double linked list only due to historical constraints > that are no longer present. > > Long ago, Python objects could be allocated using the system malloc or > other memory allocators. Since we could not control the memory > location, bitmaps would be inefficient. Today, we allocate all Python > objects via our own function. Python objects under a certain size are > allocated using our own malloc, obmalloc, and are stored in memory > blocks known "arenas". > > The PyGC_Head struct performs three functions. First, it allows the GC > to find all Python objects that will be checked for cycles (i.e. follow > the linked list). Second, it stores a single bit of information to let > the GC know if it is safe to traverse the object, set with > PyObject_GC_Track(). Finally, it has a scratch area to compute the > effective reference count while tracing refs (gc_refs). > > Here is a sketch of how we can remove the PyGC_Head struct for small > objects (say less than 512 bytes). Large objects or objects created by > a different memory allocator will still have the PyGC_Head overhead. > > * Have memory arenas that contain only objects with the > Py_TPFLAGS_HAVE_GC flag. Objects like ints, strings, etc will be > in different arenas, not have bitmaps, not be looked at by the > cyclic GC. > > * For those arenas, add a memory bitmap. The bitmap is a bit array that > has a bit for each fixed size object in the arena. The memory used by > the bitmap is a fraction of what is needed by PyGC_Head. E.g. an > arena that holds up to 1024 objects of 48 bytes in size would have a > bitmap of 1024 bits. > > * The bits will be set and cleared by PyObject_GC_Track/Untrack() > > * We also need an array of Py_ssize_t to take over the job of gc_refs. > That could be allocated only when GC is working and it only needs to > be the size of the number of true bits in the bitmap. Or, it could be > allocated when the arena is allocated and be sized for the full arena. > > * Objects that are too large would still get the PyGC_Head struct > allocated "in front" of the PyObject. Because they are big, the > overhead is not so bad. > > * The GC process would work nearly the same as it does now. Rather than > only traversing the linked list, we would also have to crawl over the > GC object arenas, check blocks of memory that have the tracked bit > set. > > There are a lot of smaller details to work out but I see no reason > why the idea should not work. It should significantly reduce memory > usage. Also, because the bitmap and gc_refs are contiguous in > memory, locality will be improved. Ćukasz Langa has mentioned that > the current GC causes issues with copy-on-write memory in big > applications. This change should solve that issue. > > To implement, I think the easiest path is to create new malloc to be > used by small GC objects, e.g. gcmalloc.c. It would be similar to > obmalloc but have the features needed to keep track of the bitmap. > obmalloc has some quirks that makes it hard to use for this purpose. > Once the idea is proven, gcmalloc could be merged or made to be a > variation of obmalloc. Or, maybe just optimized and remain > separate. obmalloc is complicated and highly optimized. So, adding > additional functionality to it will be challenging. > > I believe this change would be ABI compatible. > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com