On Wed, Apr 16, 2014 at 7:35 PM, Victor Stinner <victor.stin...@gmail.com> wrote: > Hi, > > 2014-04-16 7:51 GMT-04:00 Julian Taylor <jtaylor.deb...@googlemail.com>: >> In NumPy what we want is the tracing, not the exchangeable allocators. > > Did you read the PEP 445? Using the new malloc API, in fact you can > have both: install new allocators and set up hooks on allocators. > http://legacy.python.org/dev/peps/pep-0445/
The context here is that there's been some followup discussion on the numpy list about whether there are cases where we need even more exotic memory allocators than calloc(), and what to do about it if so. (Thread: http://mail.scipy.org/pipermail/numpy-discussion/2014-April/069935.html ) One case that has come up is when efficient use of SIMD instructions requires better-than-default alignment (e.g. malloc() usually gives something like 8 byte alignment, but if you're using an instruction that operates on 32 bytes at once you might need your array to have 32 byte alignment). Most (all?) OSes provide an extended version of malloc that allows one to request more alignment (posix_memalign on POSIX, _aligned_malloc on windows), and C11 standardizes this as aligned_alloc. An important feature of these functions is that they allocate from the same heap that 'malloc' does, i.e., when done with the aligned memory you call free() -- there's no such thing as aligned_free(). This means that if your program uses these functions then swapping out malloc/free without swapping out aligned_alloc will produce undesireable results. Numpy does not currently use aligned allocation, and it's not clear how important it is -- on older x86 it matters, but not so much on current CPUs, but when the next round of x86 SIMD instructions are released next year it might matter again, and apparently on popular IBM supercomputers it matters (but less on newer versions)[1,2], and who knows what will happen with ARM. It's a bit of a mess. But if we're messing about with APIs it seems worth thinking about. [1] http://mail.scipy.org/pipermail/numpy-discussion/2014-April/069965.html [2] http://mail.scipy.org/pipermail/numpy-discussion/2014-April/069967.html A second possible use case is: >> my_hugetlb_alloc(size) >> p = mmap('hugepagefs', ..., MAP_HUGETLB); >> PyMem_Register_Alloc(p, size, __func__, __line__); >> return p >> >> my_hugetlb_free(p); >> PyMem_Register_Free(p, __func__, __line__); >> munmap(p, ...); > > This is exactly how tracemalloc works. The advantage of the PEP 445 is > that you have a null overhead when tracemalloc is disabled. There is > no need to check if a trace function is present or not. I think the key thing about this example is that you would *never* want to use MAP_HUGETLB as a generic replacement for malloc(). Huge pages can have all kinds of weird quirky limitations, and are certainly unsuited for small allocations. BUT they can provide huge speed wins if used for certain specific allocations in certain programs. (In case anyone needs a reminder what "huge pages" even are: http://lwn.net/Articles/374424/) If I wrote a Python library to make it easy to use huge pages with numpy, then I might well want the allocations I was making to be visible to tracemalloc, even though they would not be going through malloc/free. (For that matter -- should calls to os.mmap be calling some tracemalloc hook in general? There are lots of cases where mmap is really doing memory allocation -- it's very useful for shared memory and stuff too.) --- My current impression is something like: - From the bug report discussion it sounds like calloc() is useful even in core Python, so it makes sense to go ahead with that regardless. - Now that aligned_alloc has been standardized, it might make sense to add it to the PyMemAllocator struct too. - And it might also make sense to have an API by which a Python library can say to tracemalloc: "hey FYI I just allocated something using my favorite weird exotic method", like in the huge pages example. This is a fully generic mechanism, so it could act as a kind of "safety valve" for future weirdnesses. All numpy *needs* to support its current and immediately foreseeable usage is calloc(). But I'm a bit nervous about getting trapped -- if the PyMem_* machinery implements calloc(), and we switch to using it and advertise tracemalloc support to our users, and then later it turns out that we need aligned_alloc or similar, then we'll be stuck unless and until we can get at least one of these other changes into CPython upstream, and that will suck for all of us. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com