Re: [Numpy-discussion] Memory allocation cleanup
On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith wrote: > On Thu, Jan 9, 2014 at 11:21 PM, Charles R Harris > wrote: > > [...] > > After a bit more research, some further points to keep in mind: > > Currently, PyDimMem_* and PyArray_* are just aliases for malloc/free, > and PyDataMem_* is an alias for malloc/free with some extra tracing > hooks wrapped around it. (AFAIK, these tracing hooks are not used by > anyone anywhere -- at least, if they are I haven't heard about it, and > there is no code on github that uses them.) > There is one substantial difference between the PyMem_* and PyObject_* > interfaces as compared to malloc(), which is that the Py* interfaces > require that the GIL be held when they are called. (@Julian -- I think > your PR we just merged fulfills this requirement, is that right?) I only replaced object allocation which should always be called under GIL, not sure about nditer construction, but it does uses python exceptions for errors which I think also require the GIL. [...] > > Also, none of the Py* interfaces implement calloc(), which is annoying > because it messes up our new optimization of using calloc() for > np.zeros. [...] > Another thing that is not directly implemented in Python is aligned allocation. This is going to get increasingly important with the advent heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C malloc being optimized for the oldish SSE (16 bytes). I want to change the array buffer allocation to make use of posix_memalign and C11 aligned_malloc if available to avoid some penalties when loading from non 32 byte aligned buffers. I could imagine it might also help coprocessors and gpus to have higher alignments, but I'm not very familiar with that type of hardware. The allocator used by the Python3.4 is plugable, so we could implement our special allocators with the new API, but only when 3.4 is more widespread. For this reason and missing calloc I don't think we should use the Python API for data buffers just yet. Any benefits are relatively small anyway. [...] > > I'm pretty sure that the vast majority of our allocations do occur > with GIL protection, so we might want to switch to using PyObject_* > for most cases to take advantage of the small-object optimizations, > and use PyRawMem_* for any non-GIL cases (like possibly ufunc > buffers), with a compatibility wrapper to replace PyRawMem_* with > malloc() on pre-3.4 pythons. Of course this will need some profiling > to see if PyObject_* is actually better than malloc() in practice. I don't think its required to replace everything with PyObject_* just because it can be faster. We should do it only in places where it really makes a difference and there are not that many of them. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Memory allocation cleanup
On Fri, Jan 10, 2014 at 4:18 AM, Julian Taylor wrote: > On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith wrote: >> >> On Thu, Jan 9, 2014 at 11:21 PM, Charles R Harris >> wrote: >> > [...] >> >> After a bit more research, some further points to keep in mind: >> >> Currently, PyDimMem_* and PyArray_* are just aliases for malloc/free, >> and PyDataMem_* is an alias for malloc/free with some extra tracing >> hooks wrapped around it. (AFAIK, these tracing hooks are not used by >> anyone anywhere -- at least, if they are I haven't heard about it, and >> there is no code on github that uses them.) >> >> >> There is one substantial difference between the PyMem_* and PyObject_* >> interfaces as compared to malloc(), which is that the Py* interfaces >> require that the GIL be held when they are called. (@Julian -- I think >> your PR we just merged fulfills this requirement, is that right?) > > > I only replaced object allocation which should always be called under GIL, > not sure about nditer construction, but it does uses python exceptions for > errors which I think also require the GIL. > > [...] >> >> >> Also, none of the Py* interfaces implement calloc(), which is annoying >> because it messes up our new optimization of using calloc() for >> np.zeros. [...] > > > Another thing that is not directly implemented in Python is aligned > allocation. This is going to get increasingly important with the advent > heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C > malloc being optimized for the oldish SSE (16 bytes). I want to change the > array buffer allocation to make use of posix_memalign and C11 aligned_malloc > if available to avoid some penalties when loading from non 32 byte aligned > buffers. I could imagine it might also help coprocessors and gpus to have > higher alignments, but I'm not very familiar with that type of hardware. > The allocator used by the Python3.4 is plugable, so we could implement our > special allocators with the new API, but only when 3.4 is more widespread. About the co-processor and GPUs, it could help, but as NumPy is CPU only and that there is other problem in directly using it, I dought that this change would help code around co-processor/GPUs. Fred ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Memory allocation cleanup
On Fri, Jan 10, 2014 at 9:18 AM, Julian Taylor wrote: > On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith wrote: >> >> Also, none of the Py* interfaces implement calloc(), which is annoying >> because it messes up our new optimization of using calloc() for >> np.zeros. [...] > > > Another thing that is not directly implemented in Python is aligned > allocation. This is going to get increasingly important with the advent > heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C > malloc being optimized for the oldish SSE (16 bytes). I want to change the > array buffer allocation to make use of posix_memalign and C11 aligned_malloc > if available to avoid some penalties when loading from non 32 byte aligned > buffers. I could imagine it might also help coprocessors and gpus to have > higher alignments, but I'm not very familiar with that type of hardware. > The allocator used by the Python3.4 is plugable, so we could implement our > special allocators with the new API, but only when 3.4 is more widespread. > > For this reason and missing calloc I don't think we should use the Python > API for data buffers just yet. Any benefits are relatively small anyway. It really would be nice if our data allocations would all be visible to the tracemalloc library though, somehow. And I doubt we want to patch *all* Python allocations to go through posix_memalign, both because this is rather intrusive and because it would break python -X tracemalloc. How certain are we that we want to switch to aligned allocators in the future? If we don't, then maybe it makes to ask python-dev for a calloc interface; but if we do, then I doubt we can convince them to add aligned allocation interfaces, and we'll need to ask for something else (maybe a "null" allocator, which just notifies the python memory tracking machinery that we allocated something ourselves?). It's not obvious to me why aligning data buffers is useful - can you elaborate? There's no code simplification, because we always have to handle the unaligned case anyway with the standard unaligned startup/cleanup loops. And intuitively, given the existence of such loops, alignment shouldn't matter much in practice, since the most that shifting alignment can do is change the number of elements that need to be handled by such loops by (SIMD alignment value / element size). For doubles, in a buffer that has 16 byte alignment but not 32 byte alignment, this means that worst case, we end up doing 4 unnecessary non-SIMD operations. And surely that only matters for very small arrays (for large arrays such constant overhead will amortize out), but for small arrays SIMD doesn't help much anyway? Probably I'm missing something, because you actually know something about SIMD and I'm just hand-waving from first principles :-). But it'd be nice to understand the reasoning for why/whether alignment really helps in the numpy context. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Why do weights in np.polyfit have to be 1D?
Hi, in using np.polyfit (in version 1.7.1), I ran accross TypeError: expected a 1-d array for weights when trying to fit k polynomials at once (x.shape = (4, ), y.shape = (4, 136), w.shape = (4, 136)). Is there any specific reason why this is not supported? -- Andreas. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Why do weights in np.polyfit have to be 1D?
On Fri, Jan 10, 2014 at 9:03 AM, Andreas Hilboll wrote: > Hi, > > in using np.polyfit (in version 1.7.1), I ran accross > >TypeError: expected a 1-d array for weights > > when trying to fit k polynomials at once (x.shape = (4, ), y.shape = (4, > 136), w.shape = (4, 136)). Is there any specific reason why this is not > supported? > The weights are applied to the rows of the design matrix, so if you have multiple weight vectors you essentially need to iterate the fit over them. Said differently, for each weight vector there is a generalized inverse and if there is a different weight vector for each column of the rhs, then there is a different generalized inverse for each column. You can't just multiply the rhs from the left by *the* inverse. The problem doesn't vectorize. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in resize of structured array (with initial size = 0)
Hi, I've tried to resize a record array that was first empty (on purpose, I need it) and I got the following error (while it's working for regular array). Traceback (most recent call last): File "test_resize.py", line 10, in print np.resize(V,2) File "/usr/locaL/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 1053, in resize if not Na: return mu.zeros(new_shape, a.dtype.char) TypeError: Empty data-type I'm using numpy 1.8.0, python 2.7.6, osx 10.9.1. Can anyone confirm before I submit an issue ? Here is the script: V = np.zeros(0, dtype=np.float32) print V.dtype print np.resize(V,2) V = np.zeros(0, dtype=[('a', np.float32, 1)]) print V.dtype print np.resize(V,2) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Memory allocation cleanup
On 10.01.2014 17:03, Nathaniel Smith wrote: > On Fri, Jan 10, 2014 at 9:18 AM, Julian Taylor > wrote: >> On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith wrote: >>> [...] >> >> For this reason and missing calloc I don't think we should use the Python >> API for data buffers just yet. Any benefits are relatively small anyway. > > It really would be nice if our data allocations would all be visible > to the tracemalloc library though, somehow. And I doubt we want to > patch *all* Python allocations to go through posix_memalign, both > because this is rather intrusive and because it would break python -X > tracemalloc. we can most likely plug aligned allocators into the python allocator to still be able to use tracemalloc but it would be python3.4 only [0], older versions would continue to use our aligned allocators directly with our own tracing. I think thats fine, I doubt the tracemalloc module will be backported to older pythons. An issue is we can't fit calloc in there without abusing one of the domains, but I think it is also not so critical to keep it. The sparseness is neat but you can lose it very quickly again too (basically on any full copy) and its not portable. > > How certain are we that we want to switch to aligned allocators in the > future? If we don't, then maybe it makes to ask python-dev for a > calloc interface; but if we do, then I doubt we can convince them to > add aligned allocation interfaces, and we'll need to ask for something > else (maybe a "null" allocator, which just notifies the python memory > tracking machinery that we allocated something ourselves?). > > It's not obvious to me why aligning data buffers is useful - can you > elaborate? There's no code simplification, because we always have to > handle the unaligned case anyway with the standard unaligned > startup/cleanup loops. And intuitively, given the existence of such > loops, alignment shouldn't matter much in practice, since the most > that shifting alignment can do is change the number of elements that > need to be handled by such loops by (SIMD alignment value / element > size). For doubles, in a buffer that has 16 byte alignment but not 32 > byte alignment, this means that worst case, we end up doing 4 > unnecessary non-SIMD operations. Its relevant when you have multiple buffer inputs. If they do not have the same alignment they can't be all peeled to a correct alignment, some of the inputs will always have be loaded unaligned. It might be that in modern x86 hardware unaligned loads might be cheaper. In Nehalem architectures using unaligned instructions have almost no penalty if the underlying memory is in fact aligned correctly, but there is still a penalty if it is not aligned. I'm not sure how relevant that is in the even newer architectures, the intel docs still recommend aligning memory though. [0] http://www.python.org/dev/peps/pep-0445/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion