Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
On Tue, Oct 20, 2009 at 5:45 AM, Anne Archibald peridot.face...@gmail.com wrote: 2009/10/19 Sebastian Walter sebastian.wal...@gmail.com: I'm all for generic (u)funcs since they might come handy for me since I'm doing lots of operation on arrays of polynomials. Just as a side note, if you don't mind my asking, what sorts of operations do you do on arrays of polynomials? In a thread on scipy-dev we're discussing improving scipy's polynomial support, and we'd be happy to get some more feedback on what they need to be able to do. I've been reading (and commenting) that thread ;) I'm doing algorithmic differentiation by computing on truncated Taylor polynomials in the Powerbasis, i.e. always truncating all operation at degree D z(t) = \sum_d=0^{D-1} z_d t^d = x(t) * y(t) = \sum_{d=0}^{D-1} \sum_{k=0}^d x_k * y_{d-k} + O(t^D) Using other bases does not make sense in my case since the truncation of all terms of higher degree than t^D has afaik no good counterpart for bases like chebycheff. On the other hand, I need to be generic in the coefficients, e.g. z_d from above could be a tensor of any shape, e.g. a matrix. Typical workcase when I need to perform operations on arrays of polynomials is best explained in a talk I gave earlier this year: http://github.com/b45ch1/pyadolc/raw/master/doc/walter_talk_algorithmic_differentiation_in_python_with_pyadolc_pycppad_algopy.pdf on slide 7 and 8. (the class adouble is a Taylor polynomial). Thanks! Anne ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
I'm not very familiar with the underlying C-API of numpy, so this has to be taken with a grain of salt. The reason why I'm curious about the genericity is that it would be awesome to have: 1) ufuncs like sin, cos, exp... to work on arrays of any object (this works already) 2) funcs like dot, eig, etc, to work on arrays of objects( works for dot already, but not for eig) 3) ufuncs and funcs to work on any objects examples that would be nice to work are among others: * arrays of polynomials, i.e. arrays of objects * polynomials with tensor coefficients, object with underlying array structure I thought that the most elegant way to implement that would be to have all numpy functions try to call either 1) the class function with the same name as the numpy function 2) or if the class function is not implemented, the member function with the same name as the numpy function 3) if none exists, raise an exception E.g. 1) if isinstance(x) = Foo then numpy.sin(x) would call Foo.sin(x) if it doesn't know how to handle Foo 2) similarly, for arrays of objects of type Foo: x = np.array([Foo(1), Foo(2)]) Then numpy.sin(x) should try to return npy.array([Foo.sin(xi) for xi in x]) or in case Foo.sin is not implemented as class function, return : np.array([xi.sin() for xi in x]) Therefore, I somehow expected something like that: Quantity would derive from numpy.ndarray. When calling Quantity.__new__(cls) creates the member functions __add__, __imul__, sin, exp, ... where each function has a preprocessing part and a post processing part. After the preprocessing call the original ufuncs on the base class object, e.g. __add__ Sebastian On Mon, Oct 19, 2009 at 1:55 PM, Darren Dale dsdal...@gmail.com wrote: On Mon, Oct 19, 2009 at 3:10 AM, Sebastian Walter sebastian.wal...@gmail.com wrote: On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale dsdal...@gmail.com wrote: numpy's functions, especially ufuncs, have had some ability to support subclasses through the ndarray.__array_wrap__ method, which provides masked arrays or quantities (for example) with an opportunity to set the class and metadata of the output array at the end of an operation. An example is q1 = Quantity(1, 'meter') q2 = Quantity(2, 'meters') numpy.add(q1, q2) # yields Quantity(3, 'meters') At SciPy2009 we committed a change to the numpy trunk that provides a chance to determine the class and some metadata of the output *before* the ufunc performs its calculation, but after output array has been established (and its data is still uninitialized). Consider: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'J') numpy.add(q1, q2, q1) # or equivalently: # q1 += q2 With only __array_wrap__, the attempt to propagate the units happens after q1's data was updated in place, too late to raise an error, the data is now corrupted. __array_prepare__ solves that problem, an exception can be raised in time. Now I'd like to suggest one more improvement to numpy to make its functions more generic. Consider one more example: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'feet') numpy.add(q1, q2) In this case, I'd like an opportunity to operate on the input arrays on the way in to the ufunc, to rescale the second input to meters. I think it would be a hack to try to stuff this capability into __array_prepare__. One form of this particular example is already supported in quantities, q1 + q2, by overriding the __add__ method to rescale the second input, but there are ufuncs that do not have an associated special method. So I'd like to look into adding another check for a special method, perhaps called __input_prepare__. My time is really tight for the next month, so I'd rather not start if there are strong objections, but otherwise, I'd like to try to try to get it in in time for numpy-1.4. (Has a timeline been established?) I think it will be not too difficult to document this overall scheme: When calling numpy functions: 1) __input_prepare__ provides an opportunity to operate on the inputs to yield versions that are compatible with the operation (they should obviously not be modified in place) 2) the output array is established 3) __array_prepare__ is used to determine the class of the output array, as well as any metadata that needs to be established before the operation proceeds 4) the ufunc performs its operations 5) __array_wrap__ provides an opportunity to update the output array based on the results of the computation Comments, criticisms? If PEP 3124^ were already a part of the standard library, that could serve as the basis for generalizing numpy's functions. But I think the PEP will not be approved in its current form, and it is unclear when and if the author will revisit the proposal. The scheme I'm imagining might be sufficient for our purposes. I'm all for generic (u)funcs since they might come handy for me since I'm doing lots of operation on arrays of polynomials. I don't quite get the
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
On Tue, Oct 20, 2009 at 5:24 AM, Sebastian Walter sebastian.wal...@gmail.com wrote: I'm not very familiar with the underlying C-API of numpy, so this has to be taken with a grain of salt. The reason why I'm curious about the genericity is that it would be awesome to have: 1) ufuncs like sin, cos, exp... to work on arrays of any object (this works already) 2) funcs like dot, eig, etc, to work on arrays of objects( works for dot already, but not for eig) 3) ufuncs and funcs to work on any objects I think if you want to work on any object, you need something like the PEP I mentioned earlier. What I am proposing is to use the existing mechanism in numpy, check __array_priority__ to determine which input's __input_prepare__ to call. examples that would be nice to work are among others: * arrays of polynomials, i.e. arrays of objects * polynomials with tensor coefficients, object with underlying array structure I thought that the most elegant way to implement that would be to have all numpy functions try to call either 1) the class function with the same name as the numpy function 2) or if the class function is not implemented, the member function with the same name as the numpy function 3) if none exists, raise an exception E.g. 1) if isinstance(x) = Foo then numpy.sin(x) would call Foo.sin(x) if it doesn't know how to handle Foo How does it numpy.sin know if it knows how to handle Foo? numpy.sin will happily process the data of subclasses of ndarray, but if you give it a quantity with units of degrees it is going to return garbage and not care. 2) similarly, for arrays of objects of type Foo: x = np.array([Foo(1), Foo(2)]) Then numpy.sin(x) should try to return npy.array([Foo.sin(xi) for xi in x]) or in case Foo.sin is not implemented as class function, return : np.array([xi.sin() for xi in x]) I'm not going to comment on this, except to say that it is outside the scope of my proposal. Therefore, I somehow expected something like that: Quantity would derive from numpy.ndarray. When calling Quantity.__new__(cls) creates the member functions __add__, __imul__, sin, exp, ... where each function has a preprocessing part and a post processing part. After the preprocessing call the original ufuncs on the base class object, e.g. __add__ It is more complicated than that. Ufuncs don't call array methods, its the other way around. ndarray.__add__ calls numpy.add. If you have a custom operation to perform on numpy arrays, you write a ufunc, not a subclass. What you are proposing is a very significant change to numpy. Darren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
Hi Travis, On Mon, Oct 19, 2009 at 6:29 PM, Travis Oliphant oliph...@enthought.com wrote: On Oct 17, 2009, at 7:49 AM, Darren Dale wrote: [...] When calling numpy functions: 1) __input_prepare__ provides an opportunity to operate on the inputs to yield versions that are compatible with the operation (they should obviously not be modified in place) 2) the output array is established 3) __array_prepare__ is used to determine the class of the output array, as well as any metadata that needs to be established before the operation proceeds 4) the ufunc performs its operations 5) __array_wrap__ provides an opportunity to update the output array based on the results of the computation Comments, criticisms? If PEP 3124^ were already a part of the standard library, that could serve as the basis for generalizing numpy's functions. But I think the PEP will not be approved in its current form, and it is unclear when and if the author will revisit the proposal. The scheme I'm imagining might be sufficient for our purposes. This seems like it could work. So, basically ufuncs will take any object as input and call it's __input__prepare__ method? This should return a sub-class of an ndarray? ufuncs would call __input_prepare__ on the input declaring the highest __array_priority__, just like ufuncs do with __array_wrap__, passing a tuple of inputs and the ufunc itself (provided for context). __input_prepare__ would return a tuple of inputs that the ufunc would use for computation, I'm not sure if these need to be arrays or not, I think I can give a better answer once I start the implementation (next few days I think). Darren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
2009/10/20 Sebastian Walter sebastian.wal...@gmail.com: On Tue, Oct 20, 2009 at 5:45 AM, Anne Archibald peridot.face...@gmail.com wrote: 2009/10/19 Sebastian Walter sebastian.wal...@gmail.com: I'm all for generic (u)funcs since they might come handy for me since I'm doing lots of operation on arrays of polynomials. Just as a side note, if you don't mind my asking, what sorts of operations do you do on arrays of polynomials? In a thread on scipy-dev we're discussing improving scipy's polynomial support, and we'd be happy to get some more feedback on what they need to be able to do. I've been reading (and commenting) that thread ;) I'm doing algorithmic differentiation by computing on truncated Taylor polynomials in the Powerbasis, i.e. always truncating all operation at degree D z(t) = \sum_d=0^{D-1} z_d t^d = x(t) * y(t) = \sum_{d=0}^{D-1} \sum_{k=0}^d x_k * y_{d-k} + O(t^D) Using other bases does not make sense in my case since the truncation of all terms of higher degree than t^D has afaik no good counterpart for bases like chebycheff. On the other hand, I need to be generic in the coefficients, e.g. z_d from above could be a tensor of any shape, e.g. a matrix. In fact, truncating at degree D for Chebyshev polynomials works exactly the same way as it does for power polynomials, and if what you care about is function approximation, it has much nicer behaviour. But if what you care about is really truncated Taylor polynomials, there's no beating the power basis. I realize that arrays of polynomial objects are handy from a bookkeeping point of view, but how does an array of polynomials, say of shape (N,), differ from a single polynomial with coefficients of shape (N,)? I think we need to provide the latter in our polynomial classes, but as you point out, getting ufunc support for the former is nontrivial. Anne ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale dsdal...@gmail.com wrote: numpy's functions, especially ufuncs, have had some ability to support subclasses through the ndarray.__array_wrap__ method, which provides masked arrays or quantities (for example) with an opportunity to set the class and metadata of the output array at the end of an operation. An example is q1 = Quantity(1, 'meter') q2 = Quantity(2, 'meters') numpy.add(q1, q2) # yields Quantity(3, 'meters') At SciPy2009 we committed a change to the numpy trunk that provides a chance to determine the class and some metadata of the output *before* the ufunc performs its calculation, but after output array has been established (and its data is still uninitialized). Consider: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'J') numpy.add(q1, q2, q1) # or equivalently: # q1 += q2 With only __array_wrap__, the attempt to propagate the units happens after q1's data was updated in place, too late to raise an error, the data is now corrupted. __array_prepare__ solves that problem, an exception can be raised in time. Now I'd like to suggest one more improvement to numpy to make its functions more generic. Consider one more example: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'feet') numpy.add(q1, q2) In this case, I'd like an opportunity to operate on the input arrays on the way in to the ufunc, to rescale the second input to meters. I think it would be a hack to try to stuff this capability into __array_prepare__. One form of this particular example is already supported in quantities, q1 + q2, by overriding the __add__ method to rescale the second input, but there are ufuncs that do not have an associated special method. So I'd like to look into adding another check for a special method, perhaps called __input_prepare__. My time is really tight for the next month, so I'd rather not start if there are strong objections, but otherwise, I'd like to try to try to get it in in time for numpy-1.4. (Has a timeline been established?) I think it will be not too difficult to document this overall scheme: When calling numpy functions: 1) __input_prepare__ provides an opportunity to operate on the inputs to yield versions that are compatible with the operation (they should obviously not be modified in place) 2) the output array is established 3) __array_prepare__ is used to determine the class of the output array, as well as any metadata that needs to be established before the operation proceeds 4) the ufunc performs its operations 5) __array_wrap__ provides an opportunity to update the output array based on the results of the computation Comments, criticisms? If PEP 3124^ were already a part of the standard library, that could serve as the basis for generalizing numpy's functions. But I think the PEP will not be approved in its current form, and it is unclear when and if the author will revisit the proposal. The scheme I'm imagining might be sufficient for our purposes. I'm all for generic (u)funcs since they might come handy for me since I'm doing lots of operation on arrays of polynomials. I don't quite get the reasoning though. Could you correct me where I get it wrong? * the class Quantity derives from numpy.ndarray * Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for q1 = Quantity(1, 'meter') q2 = Quantity(2, 'J') by raising an exception when performing q1+=q2 * The problem is that numpy.add(q1,q1,q2) would corrupt q1 before raising an exception Sebastian Darren ^ http://www.python.org/dev/peps/pep-3124/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
On Mon, Oct 19, 2009 at 3:10 AM, Sebastian Walter sebastian.wal...@gmail.com wrote: On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale dsdal...@gmail.com wrote: numpy's functions, especially ufuncs, have had some ability to support subclasses through the ndarray.__array_wrap__ method, which provides masked arrays or quantities (for example) with an opportunity to set the class and metadata of the output array at the end of an operation. An example is q1 = Quantity(1, 'meter') q2 = Quantity(2, 'meters') numpy.add(q1, q2) # yields Quantity(3, 'meters') At SciPy2009 we committed a change to the numpy trunk that provides a chance to determine the class and some metadata of the output *before* the ufunc performs its calculation, but after output array has been established (and its data is still uninitialized). Consider: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'J') numpy.add(q1, q2, q1) # or equivalently: # q1 += q2 With only __array_wrap__, the attempt to propagate the units happens after q1's data was updated in place, too late to raise an error, the data is now corrupted. __array_prepare__ solves that problem, an exception can be raised in time. Now I'd like to suggest one more improvement to numpy to make its functions more generic. Consider one more example: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'feet') numpy.add(q1, q2) In this case, I'd like an opportunity to operate on the input arrays on the way in to the ufunc, to rescale the second input to meters. I think it would be a hack to try to stuff this capability into __array_prepare__. One form of this particular example is already supported in quantities, q1 + q2, by overriding the __add__ method to rescale the second input, but there are ufuncs that do not have an associated special method. So I'd like to look into adding another check for a special method, perhaps called __input_prepare__. My time is really tight for the next month, so I'd rather not start if there are strong objections, but otherwise, I'd like to try to try to get it in in time for numpy-1.4. (Has a timeline been established?) I think it will be not too difficult to document this overall scheme: When calling numpy functions: 1) __input_prepare__ provides an opportunity to operate on the inputs to yield versions that are compatible with the operation (they should obviously not be modified in place) 2) the output array is established 3) __array_prepare__ is used to determine the class of the output array, as well as any metadata that needs to be established before the operation proceeds 4) the ufunc performs its operations 5) __array_wrap__ provides an opportunity to update the output array based on the results of the computation Comments, criticisms? If PEP 3124^ were already a part of the standard library, that could serve as the basis for generalizing numpy's functions. But I think the PEP will not be approved in its current form, and it is unclear when and if the author will revisit the proposal. The scheme I'm imagining might be sufficient for our purposes. I'm all for generic (u)funcs since they might come handy for me since I'm doing lots of operation on arrays of polynomials. I don't quite get the reasoning though. Could you correct me where I get it wrong? * the class Quantity derives from numpy.ndarray * Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for q1 = Quantity(1, 'meter') q2 = Quantity(2, 'J') by raising an exception when performing q1+=q2 No, Quantity does not override __iadd__ to catch this. Quantity implements __array_prepare__ to perform the dimensional analysis based on the identity of the ufunc and the inputs, and set the class and dimensionality of the output array, or raise an error when dimensional analysis fails. This approach lets quantities support all ufuncs (in principle), not just built in numerical operations. It should also make it easier to subclass from MaskedArray, so we could have a MaskedQuantity without having to establish yet another suite of ufuncs specific to quantities or masked quantities. * The problem is that numpy.add(q1,q1,q2) would corrupt q1 before raising an exception That was solved by the addition of __array_prepare__ to numpy back in August. What I am proposing now is supporting operations on arrays that would be compatible if we had a chance to transform them on the way into the ufunc, like meter + foot. Darren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
2009/10/19 Sebastian Walter sebastian.wal...@gmail.com: I'm all for generic (u)funcs since they might come handy for me since I'm doing lots of operation on arrays of polynomials. Just as a side note, if you don't mind my asking, what sorts of operations do you do on arrays of polynomials? In a thread on scipy-dev we're discussing improving scipy's polynomial support, and we'd be happy to get some more feedback on what they need to be able to do. Thanks! Anne ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
On Sat, Oct 17, 2009 at 6:49 AM, Darren Dale dsdal...@gmail.com wrote: numpy's functions, especially ufuncs, have had some ability to support subclasses through the ndarray.__array_wrap__ method, which provides masked arrays or quantities (for example) with an opportunity to set the class and metadata of the output array at the end of an operation. An example is q1 = Quantity(1, 'meter') q2 = Quantity(2, 'meters') numpy.add(q1, q2) # yields Quantity(3, 'meters') At SciPy2009 we committed a change to the numpy trunk that provides a chance to determine the class and some metadata of the output *before* the ufunc performs its calculation, but after output array has been established (and its data is still uninitialized). Consider: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'J') numpy.add(q1, q2, q1) # or equivalently: # q1 += q2 With only __array_wrap__, the attempt to propagate the units happens after q1's data was updated in place, too late to raise an error, the data is now corrupted. __array_prepare__ solves that problem, an exception can be raised in time. Now I'd like to suggest one more improvement to numpy to make its functions more generic. Consider one more example: q1 = Quantity(1, 'meter') q2 = Quantity(2, 'feet') numpy.add(q1, q2) In this case, I'd like an opportunity to operate on the input arrays on the way in to the ufunc, to rescale the second input to meters. I think it would be a hack to try to stuff this capability into __array_prepare__. One form of this particular example is already supported in quantities, q1 + q2, by overriding the __add__ method to rescale the second input, but there are ufuncs that do not have an associated special method. So I'd like to look into adding another check for a special method, perhaps called __input_prepare__. My time is really tight for the next month, so I'd rather not start if there are strong objections, but otherwise, I'd like to try to try to get it in in time for numpy-1.4. (Has a timeline been established?) I think it will be not too difficult to document this overall scheme: When calling numpy functions: 1) __input_prepare__ provides an opportunity to operate on the inputs to yield versions that are compatible with the operation (they should obviously not be modified in place) 2) the output array is established 3) __array_prepare__ is used to determine the class of the output array, as well as any metadata that needs to be established before the operation proceeds 4) the ufunc performs its operations 5) __array_wrap__ provides an opportunity to update the output array based on the results of the computation Comments, criticisms? If PEP 3124^ were already a part of the standard library, that could serve as the basis for generalizing numpy's functions. But I think the PEP will not be approved in its current form, and it is unclear when and if the author will revisit the proposal. The scheme I'm imagining might be sufficient for our purposes. This sounds interesting to me, as it would push the use of array wrap down into a common function and make it easier to use. I wonder what the impact would be on the current subclasses of ndarray? On a side note, I wonder if you could look into adding your reduce loop optimizations into the generic loops? It would be interesting to see if that speeded up some common operations. In any case, it can't hurt. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion