Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-11 Thread Anne Archibald
On 11/02/2008, Matthew Brett [EMAIL PROTECTED] wrote:

  I can also see that this could possibly be improved by using a for
  loop to iterate over the output elements, so that there was no need to
  duplicate the large input array, or perhaps a blocked iteration that
  duplicated arrays of modest size would be better. But how can a single
  float per data set whose median is being taken help?

 Sorry, you are right to call me on this very sloppy late-night
 phrasing - I only meant that it would be useful in due course to use a
 C implementation for median such as the ones you're describing, and
 that this could write the result directly into the in-place memory -
 in the same way that mean() does.  It's quite true that it's difficult
 to imagine the algorithm itself benefiting from the memory buffer.

My point was not to catch you in an error - goodness knows I make
enough of those, and not only late at night! - but to point out that
there may not really be much need for an output argument. Even with a
C code, for the median to be of much use, the output array can be at
most half the size of the input array. The extra storage space
required is not that big a concern, unlike a ufunc, and including an
output argument forces you to deal with all sorts of data conversion
issues.

On the other hand, there is something to be said for allowing the code
to destroy the input array. Perhaps *that* should be an optional
argument (defaulting to zero)?

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-11 Thread Matthew Brett
Hi,

 I can also see that this could possibly be improved by using a for
 loop to iterate over the output elements, so that there was no need to
 duplicate the large input array, or perhaps a blocked iteration that
 duplicated arrays of modest size would be better. But how can a single
 float per data set whose median is being taken help?

Sorry, you are right to call me on this very sloppy late-night
phrasing - I only meant that it would be useful in due course to use a
C implementation for median such as the ones you're describing, and
that this could write the result directly into the in-place memory -
in the same way that mean() does.  It's quite true that it's difficult
to imagine the algorithm itself benefiting from the memory buffer.

Thanks,

Matthew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Setting contents of buffer for array object

2008-02-10 Thread Matthew Brett
Hi,

I am sorry if I have missed something obvious, but is there any way in
python of doing this:

import numpy as np
a = np.arange(10)
b = np.arange(10)+1
a.data = b.data # raises error, but I hope you see what I mean

?

Thanks a lot for any pointers.

Matthew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-10 Thread Robert Kern
On Feb 10, 2008 5:15 PM, Matthew Brett [EMAIL PROTECTED] wrote:
 Hi,

 I am sorry if I have missed something obvious, but is there any way in
 python of doing this:

 import numpy as np
 a = np.arange(10)
 b = np.arange(10)+1
 a.data = b.data # raises error, but I hope you see what I mean

 ?

Not really, no. Can you describe your use case in more detail?

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-10 Thread Robert Kern
On Feb 10, 2008 6:48 PM, Matthew Brett [EMAIL PROTECTED] wrote:
   import numpy as np
   a = np.arange(10)
   b = np.arange(10)+1
   a.data = b.data # raises error, but I hope you see what I mean
  
   ?
 
  Not really, no. Can you describe your use case in more detail?

 Yes - I am just writing the new median implementation.   To allow
 future optimization, I would like to have the same signature as
 mean():

 def median(a, axis=0, dtype=None, out=None)

 (axis=0 to change to axis=None default at some point).

 To do this, I need to copy the results of the median calculation in
 the routine into the array object given by 'out' - when passed.

Ah, I see. You definitely do not want to reassign the .data buffer in
this case. An out= parameter does not reassign the memory location
that the array object points to. It should use the allocated memory
that was already there. It shouldn't copy anything at all;
otherwise, median(x, out=out) is no better than out[:] =
median(x). Personally, I don't think that a function should expose an
out= parameter unless if it can make good on that promise of memory
efficency. Can you show us the current implementation that you have?

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-10 Thread Matthew Brett
  import numpy as np
  a = np.arange(10)
  b = np.arange(10)+1
  a.data = b.data # raises error, but I hope you see what I mean
 
  ?

 Not really, no. Can you describe your use case in more detail?

Yes - I am just writing the new median implementation.   To allow
future optimization, I would like to have the same signature as
mean():

def median(a, axis=0, dtype=None, out=None)

(axis=0 to change to axis=None default at some point).

To do this, I need to copy the results of the median calculation in
the routine into the array object given by 'out' - when passed.

Matthew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-10 Thread Matthew Brett
 Ah, I see. You definitely do not want to reassign the .data buffer in
 this case. An out= parameter does not reassign the memory location
 that the array object points to. It should use the allocated memory
 that was already there. It shouldn't copy anything at all;
 otherwise, median(x, out=out) is no better than out[:] =
 median(x). Personally, I don't think that a function should expose an
 out= parameter unless if it can make good on that promise of memory
 efficency.

I agree - but there are more efficient median algorithms out there
which can make use of the memory efficiently.  I wanted to establish
the call signature to allow that.  I don't feel strongly about it
though.

 Can you show us the current implementation that you have?

is attached, comments welcome...

Matthew
import numpy as np

def median(a, axis=0, dtype=None, out=None):
Compute the median along the specified axis.

Returns the median of the array elements.  The median is taken
over the first dimension of the array by default, otherwise over
the specified axis.

Parameters
--
axis : {None, int}, optional
Axis along which the medians are computed. The default is to
compute the median along the first dimension.  axis=None
returns the median of the flattened array

dtype : type, optional
Type to use in returning the medians. For arrays of integer
type the default is float32, for arrays of float types it is
the same as the array type. Integer arrays may return float
medians because, given the chosen axis has length N, and N is
even, the median is given by the mean of the two central
values (see notes)

out : ndarray, optional
Alternative output array in which to place the result. It must
have the same shape as the expected output but the type will be
cast if necessary.

Returns
---
median : The return type varies, see above.
A new array holding the result is returned unless out is
specified, in which case a reference to out is returned.

SeeAlso
---
mean

Notes
-
Given a vector V length N, the median of V is the middle value of
a sorted copy of V (Vs) - i.e. Vs[(N-1)/2], when N is odd. It is
the mean of the two middle values of Vs, when N is even.

sorted = np.sort(a, axis)
if dtype is None:
if a.dtype in np.sctypes['int']:
dtype = np.float32
else:
dtype = a.dtype
if axis is None:
axis = 0
indexer = [slice(None)] * sorted.ndim
index = int(sorted.shape[axis]/2)
if sorted.shape[axis] % 2 == 1:
indexer[axis] = index
ret = sorted(indexer)
else:
indexer[axis] = slice(index-1, index+1)
ret = np.sum(sorted[indexer], axis=axis)/2.0
if dtype in np.sctypes['int']:
ret = ret.round()
if ret.dtype != dtype:
ret = ret.astype(dtype)
if not out is None:
if not (out.shape == ret.shape and
out.nbytes == ret.nbytes):
raise ValueError, 'wrong shape for output'
# This doesn't work - out.data = ret.data
raise ValueError, 'out parameter not working yet'
return ret
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-10 Thread Robert Kern
On Feb 10, 2008 7:17 PM, Matthew Brett [EMAIL PROTECTED] wrote:
  Ah, I see. You definitely do not want to reassign the .data buffer in
  this case. An out= parameter does not reassign the memory location
  that the array object points to. It should use the allocated memory
  that was already there. It shouldn't copy anything at all;
  otherwise, median(x, out=out) is no better than out[:] =
  median(x). Personally, I don't think that a function should expose an
  out= parameter unless if it can make good on that promise of memory
  efficency.

 I agree - but there are more efficient median algorithms out there
 which can make use of the memory efficiently.  I wanted to establish
 the call signature to allow that.  I don't feel strongly about it
 though.

I say add the out= parameter when you use such an algorithm. But if
you like, just use slice assignment for now.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Setting contents of buffer for array object

2008-02-10 Thread Damian R. Eads
Matthew Brett wrote:
 import numpy as np
 a = np.arange(10)
 b = np.arange(10)+1
 a.data = b.data # raises error, but I hope you see what I mean

 ?
 Not really, no. Can you describe your use case in more detail?

 Yes - I am just writing the new median implementation.   To allow
 future optimization, I would like to have the same signature as
 mean():

 def median(a, axis=0, dtype=None, out=None)

 (axis=0 to change to axis=None default at some point).

 To do this, I need to copy the results of the median calculation in
 the routine into the array object given by 'out' - when passed.

My understanding of numerical routines that accept an out parameter is
that this is a convention for in-place algorithms.  When None is passed in
the out parameter, it's the caller's way of indicating that in-place is
not needed, and a new array is allocated to store the result; otherwise,
the result is stored in the 'out' array. Either way, the result is
returned. One can break from this convention by allocating more memory
than provided by the out array but that's a performance issue that may or
may not be unavoidable.

Remember that A[:] = expr sets the value of the elements in A to the
values of array elements in the expression expr, and this copying is done
in-place. To copy an array C, and make the copy contiguous, use the
.copy() method on C.

Assigning the .data buffers is not something I have seen before in
non-constructor (or npn=pseudo-constructor like from_buffer) code. I think
it might even be dangerous if you don't do it right. If one does not
properly recalculate the strides of A, slicing operations on A may not
behave as expected.

If this is library code, reassigning the .data buffer can confuse the
user, since it messes up array view semantics. Suppose I'm an ignorant
user and I write the following code:

   A=numpy.random.rand(10,20)
   dummy_input=numpy.random.rand(10,20)
   B=A.T
   C=B[0::-1,:]

then I use a library function foo (suppose foo accepts an input array inp
and an output array out, and assigns out.data to something else)

   foo(in=dummy_input, out=B)

Now, A and B point to two different .data buffers, B's base points to A,
and C's base points to B but A and C share the same .data buffer. As a
user, I may expect B and C to be a view of A (certainly B isn't), and C to
be a view of B (which is verified by checking 'C.base is B') but changing
C's values changes A's but not B's. That's confusing. Also, suppose B's
new data buffer has less elements than its original data buffer. I may be
clever and set B's size and strides attributes accordingly but changing
C's values might cause the manipulation of undefined memory.

Damian
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion