Re: [Numpy-discussion] ufunc overrides

2013-07-17 Thread Nathaniel Smith
On Thu, Jul 11, 2013 at 4:29 AM, Blake Griffith
blake.a.griff...@gmail.com wrote:

 Hello NumPy,

 Part of my GSoC is compatibility with SciPy's sparse matrices and NumPy's 
 ufuncs. Currently there is no feasible way to do this without changing ufuncs 
 a bit.

 I've been considering a mechanism to override ufuncs based on checking the 
 ufuncs arguments for a __ufunc_override__ attribute. Then handing off the 
 operation to a function specified by that attribute. I prototyped this in 
 python and did a demo in a blog post here:
 http://cwl.cx/posts/week-6-ufunc-overrides.html
 This is similar to a previously discussed, but never implemented change:
 http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html

I've just posted long comment with a slightly different proposal in the PR:
  https://github.com/numpy/numpy/pull/3524#issuecomment-21115548
Mentioning this here because this has the potential to majorly affect
anyone working with ndarray subclasses or other array-like objects
(e.g., masked arrays, GPU arrays, etc.), so if you care about these
things then please take a look and help us make sure that the final
API is flexible enough to handle your needs.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array

2013-07-17 Thread Nathaniel Smith
On Tue, Jul 16, 2013 at 7:53 PM, Frédéric Bastien no...@nouiz.org wrote:
 Hi,


 On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma arinkve...@gmail.com wrote:

 Each ndarray does two mallocs, for the obj and buffer. These could be
  combined into 1 - just allocate the total size and do some pointer
  arithmetic, then set OWNDATA to false.
 So, that two mallocs has been mentioned in project introduction. I got
 that wrong.


 On further thought/reading the code, it appears to be more complicated
 than that, actually.

 It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1
 for the array object itself, and one for the shapes + strides. And, one call
 to regular-old malloc: for the data buffer.

 (Mysteriously, shapes + strides together have 2*ndim elements, but to hold
 them we allocate a memory region sized to hold 3*ndim elements. I'm not sure
 why.)

 And contrary to what I said earlier, this is about as optimized as it can
 be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc,
 because the shapes+strides may need to be resized without affecting the much
 larger data area. But it's tempting to allocate the array object and the
 data buffer in a single memory region, like I suggested earlier. And this
 would ALMOST work. But, it turns out there is code out there which assumes
 (whether wisely or not) that you can swap around which data buffer a given
 PyArrayObject refers to (hi Theano!). And supporting this means that data
 buffers and PyArrayObjects need to be in separate memory regions.


 Are you sure that Theano swap the data ptr of an ndarray? When we play
 with that, it is on a newly create ndarray. So a node in our graph, won't
 change the input ndarray structure. It will create a new ndarray structure
 with new shape/strides and pass a data ptr and we flag the new ndarray with
 own_data correctly to my knowledge.

 If Theano pose a problem here, I'll suggest that I fix Theano. But currently
 I don't see the problem. So if this make you change your mind about this
 optimization, tell me. I don't want Theano to prevent optimization in NumPy.

It's entirely possible I misunderstood, so let's see if we can work it
out. I know that you want to assign to the -data pointer in a
PyArrayObject, right? That's what caused some trouble with the 1.7 API
deprecations, which were trying to prevent direct access to this
field? Creating a new array given a pointer to a memory region is no
problem, and obviously will be supported regardless of any
optimizations. But if that's all you were doing then you shouldn't
have run into the deprecation problem. Or maybe I'm misremembering!

The problem is if one wants to (a) create a PyArrayObject, which will
by default allocate a new memory region and assign a pointer to it to
the -data field, and *then* (b) steal that memory region and
replace it with another one, while keeping the same PyArrayObject.
This is technically possible right now (though I wouldn't say it was
necessarily a good idea!), but it would become impossible if we
allocated the PyArrayObject and data into a single region.

The profiles suggest that this would only make allocation of arrays
maybe 15% faster, with probably a similar effect on deallocation. And
I'm not sure how often array allocation per se is actually a
bottleneck -- usually you also do things with the arrays, which is
more expensive :-). But hey, 15% is nothing to sneeze at.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] empty_like for masked arrays

2013-07-17 Thread Nathaniel Smith
On Mon, Jul 15, 2013 at 2:33 PM, Gregorio Bastardo
gregorio.basta...@gmail.com wrote:
 Hi,

 On Mon, Jun 10, 2013 at 3:47 PM, Nathaniel Smith n...@pobox.com wrote:
 Hi all,

 Is there anyone out there using numpy masked arrays, who has an
 opinion on how empty_like (and its friends ones_like, zeros_like)
 should handle the mask?

 Right now apparently if you call np.ma.empty_like on a masked array,
 you get a new masked array that shares the original array's mask, so
 modifying one modifies the other. That's almost certainly wrong. This
 PR:
   https://github.com/numpy/numpy/pull/3404
 makes it so instead the new array has values that are all set to
 empty/zero/one, and a mask which is set to match the input array's
 mask (so whenever something was masked in the original array, the
 empty/zero/one in that place is also masked). We don't know if this is
 the desired behaviour for these functions, though. Maybe it's more
 intuitive for the new array to match the original array in shape and
 dtype, but to always have an empty mask. Or maybe not. None of us
 really use np.ma, so if you do and have an opinion then please speak
 up...

 I recently joined the mailing list, so the message might not reach the
 original thread, sorry for that.

 I use masked arrays extensively, and would vote for the first option,
 as I use the *_like operations with the assumption that the resulting
 array has the same mask as the original. I think it's more intuitive
 than selecting between all masked or all unmasked behaviour. If it's
 not too late, please consider my use case.

The original submitter of that PR has been silent since then, so so
far nothing has happened.

So that's 2 votes for copying the mask and 3 against, I guess. That's
not very consensus-ful. If there's really a lot of confusion here,
then it's possible the answer is that np.ma.empty_like should just
raise an error or not be defined. Or can you all agree?

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array

2013-07-17 Thread Frédéric Bastien
On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Jul 16, 2013 at 7:53 PM, Frédéric Bastien no...@nouiz.org wrote:
  Hi,
 
 
  On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith n...@pobox.com wrote:
 
  On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma arinkve...@gmail.com
 wrote:
 
  Each ndarray does two mallocs, for the obj and buffer. These could be
   combined into 1 - just allocate the total size and do some pointer
   arithmetic, then set OWNDATA to false.
  So, that two mallocs has been mentioned in project introduction. I got
  that wrong.
 
 
  On further thought/reading the code, it appears to be more complicated
  than that, actually.
 
  It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc:
 1
  for the array object itself, and one for the shapes + strides. And, one
 call
  to regular-old malloc: for the data buffer.
 
  (Mysteriously, shapes + strides together have 2*ndim elements, but to
 hold
  them we allocate a memory region sized to hold 3*ndim elements. I'm not
 sure
  why.)
 
  And contrary to what I said earlier, this is about as optimized as it
 can
  be without breaking ABI. We need at least 2 calls to
 malloc/PyMem_Malloc,
  because the shapes+strides may need to be resized without affecting the
 much
  larger data area. But it's tempting to allocate the array object and the
  data buffer in a single memory region, like I suggested earlier. And
 this
  would ALMOST work. But, it turns out there is code out there which
 assumes
  (whether wisely or not) that you can swap around which data buffer a
 given
  PyArrayObject refers to (hi Theano!). And supporting this means that
 data
  buffers and PyArrayObjects need to be in separate memory regions.
 
 
  Are you sure that Theano swap the data ptr of an ndarray? When we play
  with that, it is on a newly create ndarray. So a node in our graph, won't
  change the input ndarray structure. It will create a new ndarray
 structure
  with new shape/strides and pass a data ptr and we flag the new ndarray
 with
  own_data correctly to my knowledge.
 
  If Theano pose a problem here, I'll suggest that I fix Theano. But
 currently
  I don't see the problem. So if this make you change your mind about this
  optimization, tell me. I don't want Theano to prevent optimization in
 NumPy.

 It's entirely possible I misunderstood, so let's see if we can work it
 out. I know that you want to assign to the -data pointer in a
 PyArrayObject, right? That's what caused some trouble with the 1.7 API
 deprecations, which were trying to prevent direct access to this
 field? Creating a new array given a pointer to a memory region is no
 problem, and obviously will be supported regardless of any
 optimizations. But if that's all you were doing then you shouldn't
 have run into the deprecation problem. Or maybe I'm misremembering!


What is currently done at only 1 place is to create a new PyArrayObject
with a given ptr. So NumPy don't do the allocation. We later change that
ptr to another one.

It is the change to the ptr of the just created PyArrayObject that caused
problem with the interface deprecation. I fixed all other problem releated
to the deprecation (mostly just rename of function/macro). But I didn't
fixed this one yet. I would need to change the logic to compute the final
ptr before creating the PyArrayObject object and create it with the final
data ptr. But in call cases, NumPy didn't allocated data memory for this
object, so this case don't block your optimization.

One thing in our optimization wish list is to reuse allocated
PyArrayObject between Theano function call for intermediate results(so
completly under Theano control). This could be useful in particular for
reshape/transpose/subtensor. Those functions are pretty fast and from
memory, I already found the allocation time was significant. But in those
cases, it is on PyArrayObject that are views, so the metadata and the data
would be in different memory region in all cases.

The other cases of optimization wish list  is if  we want to reuse the
PyArrayObject when the shape isn't the good one (but the number of
dimensions is the same). If we do that for operation like addition, we will
need to use PyArray_Resize(). This will be done on PyArrayObject whose data
memory was allocated by NumPy. So if you do one memory allowcation for
metadata and data, just make sure that PyArray_Resize() will handle that
correctly.

On the usefulness of doing only 1 memory allocation, on our old gpu
ndarray, we where doing 2 alloc on the GPU, one for metadata and one for
data. I removed this, as this was a bottleneck. allocation on the CPU are
faster the on the GPU, but this is still something that is slow except if
you reuse memory. Do PyMem_Malloc, reuse previous small allocation?

For those that read up all this, the conclusion is that Theano should block
this optimization. If you optimize the allocation of new PyArrayObject,
they will be less incentive to do the 

Re: [Numpy-discussion] Size/Shape

2013-07-17 Thread Robert Kern
On Wed, Jul 17, 2013 at 6:21 PM, Brady McCary brady.mcc...@gmail.com
wrote:

 NumPy Folks,

 Would someone please discuss or point me to a discussion about the
 discrepancy in size vs shape in the following MWE? In this example I
 have used a grayscale PNG version of the ImageMagick logo, but any
 image which is not square will do.

 $ python
 Python 2.7.4 (default, Apr 19 2013, 18:28:01)
 [GCC 4.7.3] on linux2
 Type help, copyright, credits or license for more information.
  import Image
  import matplotlib.pyplot as plt
 
  s = 'logo.png'
 
  im = Image.open(s)
  ar = plt.imread(s)
 
  im.size
 (640, 480)
 
  ar.shape
 (480, 640)
 

 The extents/shape of the NumPy array (as loaded by matplotlib, but
 this convention seems uniform through NumPy) are transposed from what
 seems to be the usual convention. Why was this choice made?

It matches Python sequence semantics better. ar[i] will index along the
first axis to return an array of one less dimension, which itself can be
indexed (ar[i])[j]. Try using a list of lists to see what we are trying to
be consistent with. To extend this to multidimensional indexing, we want
ar[i,j] to give the same thing as ar[i][j]. The .shape attribute needs to
be given in the same order that indexing happens.

Note that what you call the usual convention isn't all that standard for
general multidimensional arrays. It's just one of two fairly arbitrary
choices, usually derived from the default memory layout at a very low
level. Fortran picked one convention, C picked another; numpy and Python
are built with C so we use its default conventions. Now, you are right that
image dimensions are usually quoted as (width, height), but numpy arrays
represent a much broader range of objects than images.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Size/Shape

2013-07-17 Thread Brady McCary
NumPy Folks,

Would someone please discuss or point me to a discussion about the
discrepancy in size vs shape in the following MWE? In this example I
have used a grayscale PNG version of the ImageMagick logo, but any
image which is not square will do.

$ python
Python 2.7.4 (default, Apr 19 2013, 18:28:01)
[GCC 4.7.3] on linux2
Type help, copyright, credits or license for more information.
 import Image
 import matplotlib.pyplot as plt

 s = 'logo.png'

 im = Image.open(s)
 ar = plt.imread(s)

 im.size
(640, 480)

 ar.shape
(480, 640)


The extents/shape of the NumPy array (as loaded by matplotlib, but
this convention seems uniform through NumPy) are transposed from what
seems to be the usual convention. Why was this choice made?

Brady
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array

2013-07-17 Thread Charles R Harris
On Wed, Jul 17, 2013 at 10:57 AM, Frédéric Bastien no...@nouiz.org wrote:




 On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Jul 16, 2013 at 7:53 PM, Frédéric Bastien no...@nouiz.org
 wrote:
  Hi,
 
 
  On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith n...@pobox.com
 wrote:
 
  On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma arinkve...@gmail.com
 wrote:
 
  Each ndarray does two mallocs, for the obj and buffer. These could be
   combined into 1 - just allocate the total size and do some pointer
   arithmetic, then set OWNDATA to false.
  So, that two mallocs has been mentioned in project introduction. I got
  that wrong.
 
 
  On further thought/reading the code, it appears to be more complicated
  than that, actually.
 
  It looks like (for a non-scalar array) we have 2 calls to
 PyMem_Malloc: 1
  for the array object itself, and one for the shapes + strides. And,
 one call
  to regular-old malloc: for the data buffer.
 
  (Mysteriously, shapes + strides together have 2*ndim elements, but to
 hold
  them we allocate a memory region sized to hold 3*ndim elements. I'm
 not sure
  why.)
 
  And contrary to what I said earlier, this is about as optimized as it
 can
  be without breaking ABI. We need at least 2 calls to
 malloc/PyMem_Malloc,
  because the shapes+strides may need to be resized without affecting
 the much
  larger data area. But it's tempting to allocate the array object and
 the
  data buffer in a single memory region, like I suggested earlier. And
 this
  would ALMOST work. But, it turns out there is code out there which
 assumes
  (whether wisely or not) that you can swap around which data buffer a
 given
  PyArrayObject refers to (hi Theano!). And supporting this means that
 data
  buffers and PyArrayObjects need to be in separate memory regions.
 
 
  Are you sure that Theano swap the data ptr of an ndarray? When we play
  with that, it is on a newly create ndarray. So a node in our graph,
 won't
  change the input ndarray structure. It will create a new ndarray
 structure
  with new shape/strides and pass a data ptr and we flag the new ndarray
 with
  own_data correctly to my knowledge.
 
  If Theano pose a problem here, I'll suggest that I fix Theano. But
 currently
  I don't see the problem. So if this make you change your mind about this
  optimization, tell me. I don't want Theano to prevent optimization in
 NumPy.

 It's entirely possible I misunderstood, so let's see if we can work it
 out. I know that you want to assign to the -data pointer in a
 PyArrayObject, right? That's what caused some trouble with the 1.7 API
 deprecations, which were trying to prevent direct access to this
 field? Creating a new array given a pointer to a memory region is no
 problem, and obviously will be supported regardless of any
 optimizations. But if that's all you were doing then you shouldn't
 have run into the deprecation problem. Or maybe I'm misremembering!


 What is currently done at only 1 place is to create a new PyArrayObject
 with a given ptr. So NumPy don't do the allocation. We later change that
 ptr to another one.

 It is the change to the ptr of the just created PyArrayObject that caused
 problem with the interface deprecation. I fixed all other problem releated
 to the deprecation (mostly just rename of function/macro). But I didn't
 fixed this one yet. I would need to change the logic to compute the final
 ptr before creating the PyArrayObject object and create it with the final
 data ptr. But in call cases, NumPy didn't allocated data memory for this
 object, so this case don't block your optimization.

 One thing in our optimization wish list is to reuse allocated
 PyArrayObject between Theano function call for intermediate results(so
 completly under Theano control). This could be useful in particular for
 reshape/transpose/subtensor. Those functions are pretty fast and from
 memory, I already found the allocation time was significant. But in those
 cases, it is on PyArrayObject that are views, so the metadata and the data
 would be in different memory region in all cases.

 The other cases of optimization wish list  is if  we want to reuse the
 PyArrayObject when the shape isn't the good one (but the number of
 dimensions is the same). If we do that for operation like addition, we will
 need to use PyArray_Resize(). This will be done on PyArrayObject whose data
 memory was allocated by NumPy. So if you do one memory allowcation for
 metadata and data, just make sure that PyArray_Resize() will handle that
 correctly.

 On the usefulness of doing only 1 memory allocation, on our old gpu
 ndarray, we where doing 2 alloc on the GPU, one for metadata and one for
 data. I removed this, as this was a bottleneck. allocation on the CPU are
 faster the on the GPU, but this is still something that is slow except if
 you reuse memory. Do PyMem_Malloc, reuse previous small allocation?

 For those that read up all this, the conclusion is that Theano should
 

[Numpy-discussion] [ANN] 4th Python Symposium at AMS2014

2013-07-17 Thread Jonathan Rocher
[Apologies for the cross-post]

Dear all,

If you work with Python around themes like big data, climate,
meteorological or oceanic science, and/or GIS, you should come present at
the 4th Python Symposium, as part of the American Meteorological Society
conference in Atlanta in Feb 2014:
http://annual.ametsoc.org/2014/index.cfm/programs-and-events/conferences-and-symposia/fourth-symposium-on-advances-in-modeling-and-analysis-using-python/

The *abstract deadline is Aug 1st*!

Jonathan

-- 
Jonathan Rocher, PhD
Scientific software developer
SciPy2013 conference co-chair
Enthought, Inc.
jroc...@enthought.com
1-512-536-1057
http://www.enthought.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion