Re: [Numpy-discussion] Memory allocation cleanup

2014-01-10 Thread Julian Taylor
On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith  wrote:

> On Thu, Jan 9, 2014 at 11:21 PM, Charles R Harris
>  wrote:
> > [...]
>
> After a bit more research, some further points to keep in mind:
>
> Currently, PyDimMem_* and PyArray_* are just aliases for malloc/free,
> and PyDataMem_* is an alias for malloc/free with some extra tracing
> hooks wrapped around it. (AFAIK, these tracing hooks are not used by
> anyone anywhere -- at least, if they are I haven't heard about it, and
> there is no code on github that uses them.)


> There is one substantial difference between the PyMem_* and PyObject_*
> interfaces as compared to malloc(), which is that the Py* interfaces
> require that the GIL be held when they are called. (@Julian -- I think
> your PR we just merged fulfills this requirement, is that right?)


I only replaced object allocation which should always be called under GIL,
not sure about nditer construction, but it does uses python exceptions for
errors which I think also require the GIL.

 [...]

>
> Also, none of the Py* interfaces implement calloc(), which is annoying
> because it messes up our new optimization of using calloc() for
> np.zeros. [...]
>

Another thing that is not directly implemented in Python is aligned
allocation. This is going to get increasingly important with the advent
heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C
malloc being optimized for the oldish SSE (16 bytes). I want to change the
array buffer allocation to make use of posix_memalign and C11
aligned_malloc if available to avoid some penalties when loading from non
32 byte aligned buffers. I could imagine it might also help coprocessors
and gpus to have higher alignments, but I'm not very familiar with that
type of hardware.
The allocator used by the Python3.4 is plugable, so we could implement our
special allocators with the new API, but only when 3.4 is more widespread.

For this reason and missing calloc I don't think we should use the Python
API for data buffers just yet. Any benefits are relatively small anyway.

 [...]

>
> I'm pretty sure that the vast majority of our allocations do occur
> with GIL protection, so we might want to switch to using PyObject_*
> for most cases to take advantage of the small-object optimizations,
> and use PyRawMem_* for any non-GIL cases (like possibly ufunc
> buffers), with a compatibility wrapper to replace PyRawMem_* with
> malloc() on pre-3.4 pythons. Of course this will need some profiling
> to see if PyObject_* is actually better than malloc() in practice.


I don't think its required to replace everything with PyObject_* just
because it can be faster. We should do it only in places where it really
makes a difference and there are not that many of them.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory allocation cleanup

2014-01-10 Thread Frédéric Bastien
On Fri, Jan 10, 2014 at 4:18 AM, Julian Taylor
 wrote:
> On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith  wrote:
>>
>> On Thu, Jan 9, 2014 at 11:21 PM, Charles R Harris
>>  wrote:
>> > [...]
>>
>> After a bit more research, some further points to keep in mind:
>>
>> Currently, PyDimMem_* and PyArray_* are just aliases for malloc/free,
>> and PyDataMem_* is an alias for malloc/free with some extra tracing
>> hooks wrapped around it. (AFAIK, these tracing hooks are not used by
>> anyone anywhere -- at least, if they are I haven't heard about it, and
>> there is no code on github that uses them.)
>>
>>
>> There is one substantial difference between the PyMem_* and PyObject_*
>> interfaces as compared to malloc(), which is that the Py* interfaces
>> require that the GIL be held when they are called. (@Julian -- I think
>> your PR we just merged fulfills this requirement, is that right?)
>
>
> I only replaced object allocation which should always be called under GIL,
> not sure about nditer construction, but it does uses python exceptions for
> errors which I think also require the GIL.
>
>  [...]
>>
>>
>> Also, none of the Py* interfaces implement calloc(), which is annoying
>> because it messes up our new optimization of using calloc() for
>> np.zeros. [...]
>
>
> Another thing that is not directly implemented in Python is aligned
> allocation. This is going to get increasingly important with the advent
> heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C
> malloc being optimized for the oldish SSE (16 bytes). I want to change the
> array buffer allocation to make use of posix_memalign and C11 aligned_malloc
> if available to avoid some penalties when loading from non 32 byte aligned
> buffers. I could imagine it might also help coprocessors and gpus to have
> higher alignments, but I'm not very familiar with that type of hardware.
> The allocator used by the Python3.4 is plugable, so we could implement our
> special allocators with the new API, but only when 3.4 is more widespread.

About the co-processor and GPUs, it could help, but as NumPy is CPU
only and that there is other problem in directly using it, I dought
that this change would help code around co-processor/GPUs.

Fred
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory allocation cleanup

2014-01-10 Thread Nathaniel Smith
On Fri, Jan 10, 2014 at 9:18 AM, Julian Taylor
 wrote:
> On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith  wrote:
>>
>> Also, none of the Py* interfaces implement calloc(), which is annoying
>> because it messes up our new optimization of using calloc() for
>> np.zeros. [...]
>
>
> Another thing that is not directly implemented in Python is aligned
> allocation. This is going to get increasingly important with the advent
> heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C
> malloc being optimized for the oldish SSE (16 bytes). I want to change the
> array buffer allocation to make use of posix_memalign and C11 aligned_malloc
> if available to avoid some penalties when loading from non 32 byte aligned
> buffers. I could imagine it might also help coprocessors and gpus to have
> higher alignments, but I'm not very familiar with that type of hardware.
> The allocator used by the Python3.4 is plugable, so we could implement our
> special allocators with the new API, but only when 3.4 is more widespread.
>
> For this reason and missing calloc I don't think we should use the Python
> API for data buffers just yet. Any benefits are relatively small anyway.

It really would be nice if our data allocations would all be visible
to the tracemalloc library though, somehow. And I doubt we want to
patch *all* Python allocations to go through posix_memalign, both
because this is rather intrusive and because it would break python -X
tracemalloc.

How certain are we that we want to switch to aligned allocators in the
future? If we don't, then maybe it makes to ask python-dev for a
calloc interface; but if we do, then I doubt we can convince them to
add aligned allocation interfaces, and we'll need to ask for something
else (maybe a "null" allocator, which just notifies the python memory
tracking machinery that we allocated something ourselves?).

It's not obvious to me why aligning data buffers is useful - can you
elaborate? There's no code simplification, because we always have to
handle the unaligned case anyway with the standard unaligned
startup/cleanup loops. And intuitively, given the existence of such
loops, alignment shouldn't matter much in practice, since the most
that shifting alignment can do is change the number of elements that
need to be handled by such loops by (SIMD alignment value / element
size). For doubles, in a buffer that has 16 byte alignment but not 32
byte alignment, this means that worst case, we end up doing 4
unnecessary non-SIMD operations. And surely that only matters for very
small arrays (for large arrays such constant overhead will amortize
out), but for small arrays SIMD doesn't help much anyway? Probably I'm
missing something, because you actually know something about SIMD and
I'm just hand-waving from first principles :-). But it'd be nice to
understand the reasoning for why/whether alignment really helps in the
numpy context.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Why do weights in np.polyfit have to be 1D?

2014-01-10 Thread Andreas Hilboll
Hi,

in using np.polyfit (in version 1.7.1), I ran accross

   TypeError: expected a 1-d array for weights

when trying to fit k polynomials at once (x.shape = (4, ), y.shape = (4,
136), w.shape = (4, 136)). Is there any specific reason why this is not
supported?

-- Andreas.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why do weights in np.polyfit have to be 1D?

2014-01-10 Thread Charles R Harris
On Fri, Jan 10, 2014 at 9:03 AM, Andreas Hilboll  wrote:

> Hi,
>
> in using np.polyfit (in version 1.7.1), I ran accross
>
>TypeError: expected a 1-d array for weights
>
> when trying to fit k polynomials at once (x.shape = (4, ), y.shape = (4,
> 136), w.shape = (4, 136)). Is there any specific reason why this is not
> supported?
>

The weights are applied to the rows of the design matrix, so if you have
multiple weight vectors you essentially need to iterate the fit over them.
Said differently, for each weight vector there is a generalized inverse and
if there is a different weight vector for each column of the rhs, then
there is a different generalized inverse for each column. You can't just
multiply the rhs from the left by *the* inverse. The problem doesn't
vectorize.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug in resize of structured array (with initial size = 0)

2014-01-10 Thread Nicolas Rougier

Hi,

I've tried to resize a record array that was first empty (on purpose, I need it)
and I got the following error (while it's working for regular array).


Traceback (most recent call last):
  File "test_resize.py", line 10, in 
print np.resize(V,2)
  File 
"/usr/locaL/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py",
 line 1053, in resize
if not Na: return mu.zeros(new_shape, a.dtype.char)
TypeError: Empty data-type


I'm using numpy 1.8.0, python 2.7.6, osx 10.9.1.
Can anyone confirm before I submit an issue ?


Here is the script:

V = np.zeros(0, dtype=np.float32)
print V.dtype
print np.resize(V,2)

V = np.zeros(0, dtype=[('a', np.float32, 1)])
print V.dtype
print np.resize(V,2)


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory allocation cleanup

2014-01-10 Thread Julian Taylor
On 10.01.2014 17:03, Nathaniel Smith wrote:
> On Fri, Jan 10, 2014 at 9:18 AM, Julian Taylor
>  wrote:
>> On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith  wrote:
>>> [...]
>>
>> For this reason and missing calloc I don't think we should use the Python
>> API for data buffers just yet. Any benefits are relatively small anyway.
> 
> It really would be nice if our data allocations would all be visible
> to the tracemalloc library though, somehow. And I doubt we want to
> patch *all* Python allocations to go through posix_memalign, both
> because this is rather intrusive and because it would break python -X
> tracemalloc.

we can most likely plug aligned allocators into the python allocator to
still be able to use tracemalloc but it would be python3.4 only [0],
older versions would continue to use our aligned allocators directly
with our own tracing.
I think thats fine, I doubt the tracemalloc module will be backported to
older pythons.
An issue is we can't fit calloc in there without abusing one of the
domains, but I think it is also not so critical to keep it. The
sparseness is neat but you can lose it very quickly again too (basically
on any full copy) and its not portable.

> 
> How certain are we that we want to switch to aligned allocators in the
> future? If we don't, then maybe it makes to ask python-dev for a
> calloc interface; but if we do, then I doubt we can convince them to
> add aligned allocation interfaces, and we'll need to ask for something
> else (maybe a "null" allocator, which just notifies the python memory
> tracking machinery that we allocated something ourselves?).
> 
> It's not obvious to me why aligning data buffers is useful - can you
> elaborate? There's no code simplification, because we always have to
> handle the unaligned case anyway with the standard unaligned
> startup/cleanup loops. And intuitively, given the existence of such
> loops, alignment shouldn't matter much in practice, since the most
> that shifting alignment can do is change the number of elements that
> need to be handled by such loops by (SIMD alignment value / element
> size). For doubles, in a buffer that has 16 byte alignment but not 32
> byte alignment, this means that worst case, we end up doing 4
> unnecessary non-SIMD operations.

Its relevant when you have multiple buffer inputs. If they do not have
the same alignment they can't be all peeled to a correct alignment, some
of the inputs will always have be loaded unaligned.

It might be that in modern x86 hardware unaligned loads might be
cheaper. In Nehalem architectures using unaligned instructions have
almost no penalty if the underlying memory is in fact aligned correctly,
but there is still a penalty if it is not aligned.
I'm not sure how relevant that is in the even newer architectures, the
intel docs still recommend aligning memory though.


[0] http://www.python.org/dev/peps/pep-0445/

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion