On 2019-06-06, Tim Peters wrote: > Like now: if the size were passed in, obmalloc could test the size > instead of doing the `address_in_range()` dance(*). But if it's ever > possible that the size won't be passed in, all the machinery > supporting `address_in_range()` still needs to be there, and every > obmalloc spelling of malloc/realloc needs to ensure that machinery > will work if the returned address is passed back to an obmalloc > free/realloc spelling without the size.
We can almost make it work for GC objects, the use of obmalloc is quite well encapsulated. I think I intentionally designed the PyObject_GG_New/PyObject_GC_Del/etc APIs that way. Quick and dirty experiment is here: https://github.com/nascheme/cpython/tree/gc_malloc_free_size The major hitch seems my new gc_obj_size() function. We can't be sure the 'nbytes' passed to _PyObject_GC_Malloc() is the same as what is computed by gc_obj_size(). It usually works but there are exceptions (freelists for frame objects and tuple objects, for one) A nasty problem is the weirdness with PyType_GenericAlloc() and the sentinel item. _PyObject_GC_NewVar() doesn't include space for the sentinel but PyType_GenericAlloc() does. When you get to gc_obj_size(), you don't if you should use "nitems" or "nitems+1". I'm not sure how the fix the sentinel issue. Maybe a new type slot or a type flag? In any case, making a change like my git branch above would almost certainly break extensions that don't play nicely. It won't be hard to make it a build option, like the original gcmodule was. Then, assuming there is a performance boost, people can enable it if their extensions are friendly. > The "only"problem with address_in_range is that it limits us to a > maximum pool size of 4K. Just for fun, I boosted that to 8K to see > how likely segfaults really are, and a Python built that way couldn't > even get to its first prompt before dying with an access violation > (Windows-speak for segfault). If we can make the above idea work, you could set the pool size to 8K without issue. A possible problem is that the obmalloc and gcmalloc arenas are separate. I suppose that affects performance testing. > We could eliminate the pool size restriction in many ways. For > example, we could store the addresses obtained from the system > malloc/realloc - but not yet freed - in a set, perhaps implemented as > a radix tree to cut the memory burden. But digging through 3 or 4 > levels of a radix tree to determine membership is probably > significantly slower than address_in_range. You are likely correct. I'm hoping to benchmark the radix tree idea. I'm not too far from having it working such that it can replace address_in_range(). Maybe allocating gc_refs as a block would offset the radix tree cost vs address_in_range(). If the above idea works, we know the object size at free() and realloc(), we don't need address_in_range() for those code paths. Regards, Neil _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ILFK2MTCVA7GB7JGBVSUWASKJ7T4LLJE/