Re: [Numpy-discussion] Setting contents of buffer for array object
On 11/02/2008, Matthew Brett [EMAIL PROTECTED] wrote: I can also see that this could possibly be improved by using a for loop to iterate over the output elements, so that there was no need to duplicate the large input array, or perhaps a blocked iteration that duplicated arrays of modest size would be better. But how can a single float per data set whose median is being taken help? Sorry, you are right to call me on this very sloppy late-night phrasing - I only meant that it would be useful in due course to use a C implementation for median such as the ones you're describing, and that this could write the result directly into the in-place memory - in the same way that mean() does. It's quite true that it's difficult to imagine the algorithm itself benefiting from the memory buffer. My point was not to catch you in an error - goodness knows I make enough of those, and not only late at night! - but to point out that there may not really be much need for an output argument. Even with a C code, for the median to be of much use, the output array can be at most half the size of the input array. The extra storage space required is not that big a concern, unlike a ufunc, and including an output argument forces you to deal with all sorts of data conversion issues. On the other hand, there is something to be said for allowing the code to destroy the input array. Perhaps *that* should be an optional argument (defaulting to zero)? Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
Hi, I can also see that this could possibly be improved by using a for loop to iterate over the output elements, so that there was no need to duplicate the large input array, or perhaps a blocked iteration that duplicated arrays of modest size would be better. But how can a single float per data set whose median is being taken help? Sorry, you are right to call me on this very sloppy late-night phrasing - I only meant that it would be useful in due course to use a C implementation for median such as the ones you're describing, and that this could write the result directly into the in-place memory - in the same way that mean() does. It's quite true that it's difficult to imagine the algorithm itself benefiting from the memory buffer. Thanks, Matthew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Setting contents of buffer for array object
Hi, I am sorry if I have missed something obvious, but is there any way in python of doing this: import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean ? Thanks a lot for any pointers. Matthew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
On Feb 10, 2008 5:15 PM, Matthew Brett [EMAIL PROTECTED] wrote: Hi, I am sorry if I have missed something obvious, but is there any way in python of doing this: import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean ? Not really, no. Can you describe your use case in more detail? -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
On Feb 10, 2008 6:48 PM, Matthew Brett [EMAIL PROTECTED] wrote: import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean ? Not really, no. Can you describe your use case in more detail? Yes - I am just writing the new median implementation. To allow future optimization, I would like to have the same signature as mean(): def median(a, axis=0, dtype=None, out=None) (axis=0 to change to axis=None default at some point). To do this, I need to copy the results of the median calculation in the routine into the array object given by 'out' - when passed. Ah, I see. You definitely do not want to reassign the .data buffer in this case. An out= parameter does not reassign the memory location that the array object points to. It should use the allocated memory that was already there. It shouldn't copy anything at all; otherwise, median(x, out=out) is no better than out[:] = median(x). Personally, I don't think that a function should expose an out= parameter unless if it can make good on that promise of memory efficency. Can you show us the current implementation that you have? -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean ? Not really, no. Can you describe your use case in more detail? Yes - I am just writing the new median implementation. To allow future optimization, I would like to have the same signature as mean(): def median(a, axis=0, dtype=None, out=None) (axis=0 to change to axis=None default at some point). To do this, I need to copy the results of the median calculation in the routine into the array object given by 'out' - when passed. Matthew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
Ah, I see. You definitely do not want to reassign the .data buffer in this case. An out= parameter does not reassign the memory location that the array object points to. It should use the allocated memory that was already there. It shouldn't copy anything at all; otherwise, median(x, out=out) is no better than out[:] = median(x). Personally, I don't think that a function should expose an out= parameter unless if it can make good on that promise of memory efficency. I agree - but there are more efficient median algorithms out there which can make use of the memory efficiently. I wanted to establish the call signature to allow that. I don't feel strongly about it though. Can you show us the current implementation that you have? is attached, comments welcome... Matthew import numpy as np def median(a, axis=0, dtype=None, out=None): Compute the median along the specified axis. Returns the median of the array elements. The median is taken over the first dimension of the array by default, otherwise over the specified axis. Parameters -- axis : {None, int}, optional Axis along which the medians are computed. The default is to compute the median along the first dimension. axis=None returns the median of the flattened array dtype : type, optional Type to use in returning the medians. For arrays of integer type the default is float32, for arrays of float types it is the same as the array type. Integer arrays may return float medians because, given the chosen axis has length N, and N is even, the median is given by the mean of the two central values (see notes) out : ndarray, optional Alternative output array in which to place the result. It must have the same shape as the expected output but the type will be cast if necessary. Returns --- median : The return type varies, see above. A new array holding the result is returned unless out is specified, in which case a reference to out is returned. SeeAlso --- mean Notes - Given a vector V length N, the median of V is the middle value of a sorted copy of V (Vs) - i.e. Vs[(N-1)/2], when N is odd. It is the mean of the two middle values of Vs, when N is even. sorted = np.sort(a, axis) if dtype is None: if a.dtype in np.sctypes['int']: dtype = np.float32 else: dtype = a.dtype if axis is None: axis = 0 indexer = [slice(None)] * sorted.ndim index = int(sorted.shape[axis]/2) if sorted.shape[axis] % 2 == 1: indexer[axis] = index ret = sorted(indexer) else: indexer[axis] = slice(index-1, index+1) ret = np.sum(sorted[indexer], axis=axis)/2.0 if dtype in np.sctypes['int']: ret = ret.round() if ret.dtype != dtype: ret = ret.astype(dtype) if not out is None: if not (out.shape == ret.shape and out.nbytes == ret.nbytes): raise ValueError, 'wrong shape for output' # This doesn't work - out.data = ret.data raise ValueError, 'out parameter not working yet' return ret ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
On Feb 10, 2008 7:17 PM, Matthew Brett [EMAIL PROTECTED] wrote: Ah, I see. You definitely do not want to reassign the .data buffer in this case. An out= parameter does not reassign the memory location that the array object points to. It should use the allocated memory that was already there. It shouldn't copy anything at all; otherwise, median(x, out=out) is no better than out[:] = median(x). Personally, I don't think that a function should expose an out= parameter unless if it can make good on that promise of memory efficency. I agree - but there are more efficient median algorithms out there which can make use of the memory efficiently. I wanted to establish the call signature to allow that. I don't feel strongly about it though. I say add the out= parameter when you use such an algorithm. But if you like, just use slice assignment for now. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Setting contents of buffer for array object
Matthew Brett wrote: import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean ? Not really, no. Can you describe your use case in more detail? Yes - I am just writing the new median implementation. To allow future optimization, I would like to have the same signature as mean(): def median(a, axis=0, dtype=None, out=None) (axis=0 to change to axis=None default at some point). To do this, I need to copy the results of the median calculation in the routine into the array object given by 'out' - when passed. My understanding of numerical routines that accept an out parameter is that this is a convention for in-place algorithms. When None is passed in the out parameter, it's the caller's way of indicating that in-place is not needed, and a new array is allocated to store the result; otherwise, the result is stored in the 'out' array. Either way, the result is returned. One can break from this convention by allocating more memory than provided by the out array but that's a performance issue that may or may not be unavoidable. Remember that A[:] = expr sets the value of the elements in A to the values of array elements in the expression expr, and this copying is done in-place. To copy an array C, and make the copy contiguous, use the .copy() method on C. Assigning the .data buffers is not something I have seen before in non-constructor (or npn=pseudo-constructor like from_buffer) code. I think it might even be dangerous if you don't do it right. If one does not properly recalculate the strides of A, slicing operations on A may not behave as expected. If this is library code, reassigning the .data buffer can confuse the user, since it messes up array view semantics. Suppose I'm an ignorant user and I write the following code: A=numpy.random.rand(10,20) dummy_input=numpy.random.rand(10,20) B=A.T C=B[0::-1,:] then I use a library function foo (suppose foo accepts an input array inp and an output array out, and assigns out.data to something else) foo(in=dummy_input, out=B) Now, A and B point to two different .data buffers, B's base points to A, and C's base points to B but A and C share the same .data buffer. As a user, I may expect B and C to be a view of A (certainly B isn't), and C to be a view of B (which is verified by checking 'C.base is B') but changing C's values changes A's but not B's. That's confusing. Also, suppose B's new data buffer has less elements than its original data buffer. I may be clever and set B's size and strides attributes accordingly but changing C's values might cause the manipulation of undefined memory. Damian ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion