Re: [Numpy-discussion] ufunc overrides
On Thu, Jul 11, 2013 at 4:29 AM, Blake Griffith blake.a.griff...@gmail.com wrote: Hello NumPy, Part of my GSoC is compatibility with SciPy's sparse matrices and NumPy's ufuncs. Currently there is no feasible way to do this without changing ufuncs a bit. I've been considering a mechanism to override ufuncs based on checking the ufuncs arguments for a __ufunc_override__ attribute. Then handing off the operation to a function specified by that attribute. I prototyped this in python and did a demo in a blog post here: http://cwl.cx/posts/week-6-ufunc-overrides.html This is similar to a previously discussed, but never implemented change: http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html I've just posted long comment with a slightly different proposal in the PR: https://github.com/numpy/numpy/pull/3524#issuecomment-21115548 Mentioning this here because this has the potential to majorly affect anyone working with ndarray subclasses or other array-like objects (e.g., masked arrays, GPU arrays, etc.), so if you care about these things then please take a look and help us make sure that the final API is flexible enough to handle your needs. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array
On Tue, Jul 16, 2013 at 7:53 PM, Frédéric Bastien no...@nouiz.org wrote: Hi, On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma arinkve...@gmail.com wrote: Each ndarray does two mallocs, for the obj and buffer. These could be combined into 1 - just allocate the total size and do some pointer arithmetic, then set OWNDATA to false. So, that two mallocs has been mentioned in project introduction. I got that wrong. On further thought/reading the code, it appears to be more complicated than that, actually. It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1 for the array object itself, and one for the shapes + strides. And, one call to regular-old malloc: for the data buffer. (Mysteriously, shapes + strides together have 2*ndim elements, but to hold them we allocate a memory region sized to hold 3*ndim elements. I'm not sure why.) And contrary to what I said earlier, this is about as optimized as it can be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc, because the shapes+strides may need to be resized without affecting the much larger data area. But it's tempting to allocate the array object and the data buffer in a single memory region, like I suggested earlier. And this would ALMOST work. But, it turns out there is code out there which assumes (whether wisely or not) that you can swap around which data buffer a given PyArrayObject refers to (hi Theano!). And supporting this means that data buffers and PyArrayObjects need to be in separate memory regions. Are you sure that Theano swap the data ptr of an ndarray? When we play with that, it is on a newly create ndarray. So a node in our graph, won't change the input ndarray structure. It will create a new ndarray structure with new shape/strides and pass a data ptr and we flag the new ndarray with own_data correctly to my knowledge. If Theano pose a problem here, I'll suggest that I fix Theano. But currently I don't see the problem. So if this make you change your mind about this optimization, tell me. I don't want Theano to prevent optimization in NumPy. It's entirely possible I misunderstood, so let's see if we can work it out. I know that you want to assign to the -data pointer in a PyArrayObject, right? That's what caused some trouble with the 1.7 API deprecations, which were trying to prevent direct access to this field? Creating a new array given a pointer to a memory region is no problem, and obviously will be supported regardless of any optimizations. But if that's all you were doing then you shouldn't have run into the deprecation problem. Or maybe I'm misremembering! The problem is if one wants to (a) create a PyArrayObject, which will by default allocate a new memory region and assign a pointer to it to the -data field, and *then* (b) steal that memory region and replace it with another one, while keeping the same PyArrayObject. This is technically possible right now (though I wouldn't say it was necessarily a good idea!), but it would become impossible if we allocated the PyArrayObject and data into a single region. The profiles suggest that this would only make allocation of arrays maybe 15% faster, with probably a similar effect on deallocation. And I'm not sure how often array allocation per se is actually a bottleneck -- usually you also do things with the arrays, which is more expensive :-). But hey, 15% is nothing to sneeze at. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] empty_like for masked arrays
On Mon, Jul 15, 2013 at 2:33 PM, Gregorio Bastardo gregorio.basta...@gmail.com wrote: Hi, On Mon, Jun 10, 2013 at 3:47 PM, Nathaniel Smith n...@pobox.com wrote: Hi all, Is there anyone out there using numpy masked arrays, who has an opinion on how empty_like (and its friends ones_like, zeros_like) should handle the mask? Right now apparently if you call np.ma.empty_like on a masked array, you get a new masked array that shares the original array's mask, so modifying one modifies the other. That's almost certainly wrong. This PR: https://github.com/numpy/numpy/pull/3404 makes it so instead the new array has values that are all set to empty/zero/one, and a mask which is set to match the input array's mask (so whenever something was masked in the original array, the empty/zero/one in that place is also masked). We don't know if this is the desired behaviour for these functions, though. Maybe it's more intuitive for the new array to match the original array in shape and dtype, but to always have an empty mask. Or maybe not. None of us really use np.ma, so if you do and have an opinion then please speak up... I recently joined the mailing list, so the message might not reach the original thread, sorry for that. I use masked arrays extensively, and would vote for the first option, as I use the *_like operations with the assumption that the resulting array has the same mask as the original. I think it's more intuitive than selecting between all masked or all unmasked behaviour. If it's not too late, please consider my use case. The original submitter of that PR has been silent since then, so so far nothing has happened. So that's 2 votes for copying the mask and 3 against, I guess. That's not very consensus-ful. If there's really a lot of confusion here, then it's possible the answer is that np.ma.empty_like should just raise an error or not be defined. Or can you all agree? -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array
On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Jul 16, 2013 at 7:53 PM, Frédéric Bastien no...@nouiz.org wrote: Hi, On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma arinkve...@gmail.com wrote: Each ndarray does two mallocs, for the obj and buffer. These could be combined into 1 - just allocate the total size and do some pointer arithmetic, then set OWNDATA to false. So, that two mallocs has been mentioned in project introduction. I got that wrong. On further thought/reading the code, it appears to be more complicated than that, actually. It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1 for the array object itself, and one for the shapes + strides. And, one call to regular-old malloc: for the data buffer. (Mysteriously, shapes + strides together have 2*ndim elements, but to hold them we allocate a memory region sized to hold 3*ndim elements. I'm not sure why.) And contrary to what I said earlier, this is about as optimized as it can be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc, because the shapes+strides may need to be resized without affecting the much larger data area. But it's tempting to allocate the array object and the data buffer in a single memory region, like I suggested earlier. And this would ALMOST work. But, it turns out there is code out there which assumes (whether wisely or not) that you can swap around which data buffer a given PyArrayObject refers to (hi Theano!). And supporting this means that data buffers and PyArrayObjects need to be in separate memory regions. Are you sure that Theano swap the data ptr of an ndarray? When we play with that, it is on a newly create ndarray. So a node in our graph, won't change the input ndarray structure. It will create a new ndarray structure with new shape/strides and pass a data ptr and we flag the new ndarray with own_data correctly to my knowledge. If Theano pose a problem here, I'll suggest that I fix Theano. But currently I don't see the problem. So if this make you change your mind about this optimization, tell me. I don't want Theano to prevent optimization in NumPy. It's entirely possible I misunderstood, so let's see if we can work it out. I know that you want to assign to the -data pointer in a PyArrayObject, right? That's what caused some trouble with the 1.7 API deprecations, which were trying to prevent direct access to this field? Creating a new array given a pointer to a memory region is no problem, and obviously will be supported regardless of any optimizations. But if that's all you were doing then you shouldn't have run into the deprecation problem. Or maybe I'm misremembering! What is currently done at only 1 place is to create a new PyArrayObject with a given ptr. So NumPy don't do the allocation. We later change that ptr to another one. It is the change to the ptr of the just created PyArrayObject that caused problem with the interface deprecation. I fixed all other problem releated to the deprecation (mostly just rename of function/macro). But I didn't fixed this one yet. I would need to change the logic to compute the final ptr before creating the PyArrayObject object and create it with the final data ptr. But in call cases, NumPy didn't allocated data memory for this object, so this case don't block your optimization. One thing in our optimization wish list is to reuse allocated PyArrayObject between Theano function call for intermediate results(so completly under Theano control). This could be useful in particular for reshape/transpose/subtensor. Those functions are pretty fast and from memory, I already found the allocation time was significant. But in those cases, it is on PyArrayObject that are views, so the metadata and the data would be in different memory region in all cases. The other cases of optimization wish list is if we want to reuse the PyArrayObject when the shape isn't the good one (but the number of dimensions is the same). If we do that for operation like addition, we will need to use PyArray_Resize(). This will be done on PyArrayObject whose data memory was allocated by NumPy. So if you do one memory allowcation for metadata and data, just make sure that PyArray_Resize() will handle that correctly. On the usefulness of doing only 1 memory allocation, on our old gpu ndarray, we where doing 2 alloc on the GPU, one for metadata and one for data. I removed this, as this was a bottleneck. allocation on the CPU are faster the on the GPU, but this is still something that is slow except if you reuse memory. Do PyMem_Malloc, reuse previous small allocation? For those that read up all this, the conclusion is that Theano should block this optimization. If you optimize the allocation of new PyArrayObject, they will be less incentive to do the
Re: [Numpy-discussion] Size/Shape
On Wed, Jul 17, 2013 at 6:21 PM, Brady McCary brady.mcc...@gmail.com wrote: NumPy Folks, Would someone please discuss or point me to a discussion about the discrepancy in size vs shape in the following MWE? In this example I have used a grayscale PNG version of the ImageMagick logo, but any image which is not square will do. $ python Python 2.7.4 (default, Apr 19 2013, 18:28:01) [GCC 4.7.3] on linux2 Type help, copyright, credits or license for more information. import Image import matplotlib.pyplot as plt s = 'logo.png' im = Image.open(s) ar = plt.imread(s) im.size (640, 480) ar.shape (480, 640) The extents/shape of the NumPy array (as loaded by matplotlib, but this convention seems uniform through NumPy) are transposed from what seems to be the usual convention. Why was this choice made? It matches Python sequence semantics better. ar[i] will index along the first axis to return an array of one less dimension, which itself can be indexed (ar[i])[j]. Try using a list of lists to see what we are trying to be consistent with. To extend this to multidimensional indexing, we want ar[i,j] to give the same thing as ar[i][j]. The .shape attribute needs to be given in the same order that indexing happens. Note that what you call the usual convention isn't all that standard for general multidimensional arrays. It's just one of two fairly arbitrary choices, usually derived from the default memory layout at a very low level. Fortran picked one convention, C picked another; numpy and Python are built with C so we use its default conventions. Now, you are right that image dimensions are usually quoted as (width, height), but numpy arrays represent a much broader range of objects than images. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Size/Shape
NumPy Folks, Would someone please discuss or point me to a discussion about the discrepancy in size vs shape in the following MWE? In this example I have used a grayscale PNG version of the ImageMagick logo, but any image which is not square will do. $ python Python 2.7.4 (default, Apr 19 2013, 18:28:01) [GCC 4.7.3] on linux2 Type help, copyright, credits or license for more information. import Image import matplotlib.pyplot as plt s = 'logo.png' im = Image.open(s) ar = plt.imread(s) im.size (640, 480) ar.shape (480, 640) The extents/shape of the NumPy array (as loaded by matplotlib, but this convention seems uniform through NumPy) are transposed from what seems to be the usual convention. Why was this choice made? Brady ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array
On Wed, Jul 17, 2013 at 10:57 AM, Frédéric Bastien no...@nouiz.org wrote: On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Jul 16, 2013 at 7:53 PM, Frédéric Bastien no...@nouiz.org wrote: Hi, On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma arinkve...@gmail.com wrote: Each ndarray does two mallocs, for the obj and buffer. These could be combined into 1 - just allocate the total size and do some pointer arithmetic, then set OWNDATA to false. So, that two mallocs has been mentioned in project introduction. I got that wrong. On further thought/reading the code, it appears to be more complicated than that, actually. It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1 for the array object itself, and one for the shapes + strides. And, one call to regular-old malloc: for the data buffer. (Mysteriously, shapes + strides together have 2*ndim elements, but to hold them we allocate a memory region sized to hold 3*ndim elements. I'm not sure why.) And contrary to what I said earlier, this is about as optimized as it can be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc, because the shapes+strides may need to be resized without affecting the much larger data area. But it's tempting to allocate the array object and the data buffer in a single memory region, like I suggested earlier. And this would ALMOST work. But, it turns out there is code out there which assumes (whether wisely or not) that you can swap around which data buffer a given PyArrayObject refers to (hi Theano!). And supporting this means that data buffers and PyArrayObjects need to be in separate memory regions. Are you sure that Theano swap the data ptr of an ndarray? When we play with that, it is on a newly create ndarray. So a node in our graph, won't change the input ndarray structure. It will create a new ndarray structure with new shape/strides and pass a data ptr and we flag the new ndarray with own_data correctly to my knowledge. If Theano pose a problem here, I'll suggest that I fix Theano. But currently I don't see the problem. So if this make you change your mind about this optimization, tell me. I don't want Theano to prevent optimization in NumPy. It's entirely possible I misunderstood, so let's see if we can work it out. I know that you want to assign to the -data pointer in a PyArrayObject, right? That's what caused some trouble with the 1.7 API deprecations, which were trying to prevent direct access to this field? Creating a new array given a pointer to a memory region is no problem, and obviously will be supported regardless of any optimizations. But if that's all you were doing then you shouldn't have run into the deprecation problem. Or maybe I'm misremembering! What is currently done at only 1 place is to create a new PyArrayObject with a given ptr. So NumPy don't do the allocation. We later change that ptr to another one. It is the change to the ptr of the just created PyArrayObject that caused problem with the interface deprecation. I fixed all other problem releated to the deprecation (mostly just rename of function/macro). But I didn't fixed this one yet. I would need to change the logic to compute the final ptr before creating the PyArrayObject object and create it with the final data ptr. But in call cases, NumPy didn't allocated data memory for this object, so this case don't block your optimization. One thing in our optimization wish list is to reuse allocated PyArrayObject between Theano function call for intermediate results(so completly under Theano control). This could be useful in particular for reshape/transpose/subtensor. Those functions are pretty fast and from memory, I already found the allocation time was significant. But in those cases, it is on PyArrayObject that are views, so the metadata and the data would be in different memory region in all cases. The other cases of optimization wish list is if we want to reuse the PyArrayObject when the shape isn't the good one (but the number of dimensions is the same). If we do that for operation like addition, we will need to use PyArray_Resize(). This will be done on PyArrayObject whose data memory was allocated by NumPy. So if you do one memory allowcation for metadata and data, just make sure that PyArray_Resize() will handle that correctly. On the usefulness of doing only 1 memory allocation, on our old gpu ndarray, we where doing 2 alloc on the GPU, one for metadata and one for data. I removed this, as this was a bottleneck. allocation on the CPU are faster the on the GPU, but this is still something that is slow except if you reuse memory. Do PyMem_Malloc, reuse previous small allocation? For those that read up all this, the conclusion is that Theano should
[Numpy-discussion] [ANN] 4th Python Symposium at AMS2014
[Apologies for the cross-post] Dear all, If you work with Python around themes like big data, climate, meteorological or oceanic science, and/or GIS, you should come present at the 4th Python Symposium, as part of the American Meteorological Society conference in Atlanta in Feb 2014: http://annual.ametsoc.org/2014/index.cfm/programs-and-events/conferences-and-symposia/fourth-symposium-on-advances-in-modeling-and-analysis-using-python/ The *abstract deadline is Aug 1st*! Jonathan -- Jonathan Rocher, PhD Scientific software developer SciPy2013 conference co-chair Enthought, Inc. jroc...@enthought.com 1-512-536-1057 http://www.enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion