On 06/18/2012 12:14 PM, Thouis (Ray) Jones wrote: > Based on some previous discussion on the numpy list [1] and in > now-cancelled PRs [2,3], I'd like to solicit opinions on adding an > interface for numpy memory allocation event tracking, as implemented > in this PR: > > https://github.com/numpy/numpy/pull/309 > > A brief summary of the changes: > > - PyDataMem_NEW/FREE/RENEW become functions in the numpy API. > (they used to be macros for malloc/free/realloc) > These are the functions used to manage allocations for array's > internal data. Most other numpy data is allocated through Python's > allocator. > > - PyDataMem_NEW/RENEW return void* instead of char*. > > - Adds PyDataMem_SetEventHook() to the API, with this description: > * Sets the allocation event hook for numpy array data. > * Takes a PyDataMem_EventHookFunc *, which has the signature: > * void hook(void *old, void *new, size_t size, void *user_data). > * Also takes a void *user_data, and void **old_data. > * > * Returns a pointer to the previous hook or NULL. If old_data is > * non-NULL, the previous user_data pointer will be copied to it. > * > * If not NULL, hook will be called at the end of each > PyDataMem_NEW/FREE/RENEW: > * result = PyDataMem_NEW(size) -> (*hook)(NULL, result, > size, user_data) > * PyDataMem_FREE(ptr) -> (*hook)(ptr, NULL, 0, user_data) > * result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size, > user_data) > * > * When the hook is called, the GIL will be held by the calling > * thread. The hook should be written to be reentrant, if it performs > * operations that might cause new allocation events (such as the > * creation/descruction numpy objects, or creating/destroying Python > * objects which might cause a gc) > > > The PR also includes an example using the hook functions to track > allocation via Python callback funcions (in > tools/allocation_tracking). > > Why I think this is worth adding to numpy, even though other tools may > be able to provide similar functionality: > > - numpy arrays use orders of magnitude more memory than most python > objects, and this is often a limiting factor in algorithms. > > - numpy can behave in complicated ways with regards to memory > management, e.g., views, OWNDATA, temporaries, etc., making it > sometimes difficult to know where memory usage problems are > happening and why. > > - numpy attracts a large number of programmers with limited low-level > programming expertise, and who don't have the skills to use external > tools (or time/motivation to acquire those skills), but still need > to be able to diagnose these sorts of problems. > > - Other tools are not well integrated with Python, and vary a great > deal between OS and compiler setup. > > I appreciate any feedback.
Are the hooks able to change how allocation happens/override allocation? If one goes to this much pain already, I think one might as well go the extra step and allow hooks to override memory allocation. At least something to think about -- of course the above (as I understand it) would be a good start on a pluggable allocator even if it isn't done right away. Examples: - Allocate NumPy arrays in process-shared memory using shmem/mmap - Allocate NumPy arrays on some boundary (16-byte, 4096-byte..) using memalign Dag _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion