[Numpy-discussion] TypeError when calling numpy.kaiser()
Howdy, I'm having trouble getting the kaiser window to work. Anytime I try to call numpy.kaiser(), it throws an exception. Here's the output when I run the example code from http://docs.scipy.org/doc/numpy/reference/generated/numpy.kaiser.html : Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from numpy import kaiser >>> kaiser(12, 14) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py", line 2630, in kaiser return i0(beta * sqrt(1-((n-alpha)/alpha)**2.0))/i0(beta) File "/usr/lib/python2.6/dist-packages/numpy/lib/function_base.py", line 2507, in i0 y[ind] = _i0_1(x[ind]) TypeError: array cannot be safely cast to required type >>> Is this a bug? Am I doing something wrong? (I'm using the Ubuntu 9.4 packages for python and numpy.) Thanks, Jeff ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Optimized sum of squares
On Sun, Oct 18, 2009 at 11:37 AM, wrote: > On Sun, Oct 18, 2009 at 12:06 PM, Skipper Seabold > wrote: > > On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux > > wrote: > >> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote: > >>> Hi Gaël, > >> > >>> If you've got a 1D array/vector called "a", I think the normal idiom is > >> > >>> np.dot(a,a) > >> > >>> For the more general case, I think > >>> np.tensordot(a, a, axes=something_else) > >>> should do it, where you should be able to figure out something_else for > >>> your particular case. > >> > >> Ha, yes. Good point about the tensordot trick. > >> > >> Thank you > >> > >> Gaël > > > > I'm curious about this as I use ss, which is just np.sum(a*a, axis), > > in statsmodels and didn't much think about it. > > > > There is > > > > import numpy as np > > from scipy.stats import ss > > > > a = np.ones(5000) > > > > but > > > > timeit ss(a) > > 1 loops, best of 3: 21.5 µs per loop > > > > timeit np.add.reduce(a*a) > > 10 loops, best of 3: 15 µs per loop > > > > timeit np.dot(a,a) > > 10 loops, best of 3: 5.38 µs per loop > > > > Do the number of loops matter in the timings and is dot always faster > > even without the blas dot? > > David's reply once was that it depends on ATLAS and the version of > lapack/blas. > > I usually switched to using dot for 1d. Using tensordot looks to > complicated for me, to figure out the axes when I quickly want a sum of > squares. > > I never tried the timing of tensordot for 2d arrays, especially for > axis=0 for a > c ordered array. If it's faster, this could be useful to rewrite stats.ss. > > I don't remember that np.add.reduce is much faster than np.sum. This might > be > the additional call overhead from using another function in between. > > If you are using numpy from svn, it might be due to te recent optimizations that Luca Citi did for some of the ufuncs. Now we just need a multiply and add function. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Optimized sum of squares
Skipper Seabold skrev: > I'm curious about this as I use ss, which is just np.sum(a*a, axis), > in statsmodels and didn't much think about it. > > Do the number of loops matter in the timings and is dot always faster > even without the blas dot? > The thing is that a*a returns a temporary array with the same shape as a, and then that is passed to np.sum. The BLAS dot product don't need to allocate and deallocate temporary arrays. S.M. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Optimized sum of squares
On Sun, Oct 18, 2009 at 12:06 PM, Skipper Seabold wrote: > On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux > wrote: >> On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote: >>> Hi Gaël, >> >>> If you've got a 1D array/vector called "a", I think the normal idiom is >> >>> np.dot(a,a) >> >>> For the more general case, I think >>> np.tensordot(a, a, axes=something_else) >>> should do it, where you should be able to figure out something_else for >>> your particular case. >> >> Ha, yes. Good point about the tensordot trick. >> >> Thank you >> >> Gaël > > I'm curious about this as I use ss, which is just np.sum(a*a, axis), > in statsmodels and didn't much think about it. > > There is > > import numpy as np > from scipy.stats import ss > > a = np.ones(5000) > > but > > timeit ss(a) > 1 loops, best of 3: 21.5 µs per loop > > timeit np.add.reduce(a*a) > 10 loops, best of 3: 15 µs per loop > > timeit np.dot(a,a) > 10 loops, best of 3: 5.38 µs per loop > > Do the number of loops matter in the timings and is dot always faster > even without the blas dot? David's reply once was that it depends on ATLAS and the version of lapack/blas. I usually switched to using dot for 1d. Using tensordot looks to complicated for me, to figure out the axes when I quickly want a sum of squares. I never tried the timing of tensordot for 2d arrays, especially for axis=0 for a c ordered array. If it's faster, this could be useful to rewrite stats.ss. I don't remember that np.add.reduce is much faster than np.sum. This might be the additional call overhead from using another function in between. Josef > > Skipper > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Multiple string formatting while writing an array into a file
Hello, I have a relatively simple question which I couldn't figure out myself yet. I have an array that I am writing into a file using the following savetxt method. np.savetxt(fid, output_array, fmt='%12.4f', delimiter='') However, I have made some changes on the code and I require to write after 7th element of the array as integer instead of 12.4 formatted float. The change below doesn't help me to solve the problem since I get a "ValueError: setting an array element with a sequence." np.savetxt(fid, (output_array[:7], output_array[7:]), fmt=('%12.4f', '%12d'), delimiter='') What would be the right approach to fix this issue? Thanks. -- Gökhan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Optimized sum of squares
On Sun, Oct 18, 2009 at 8:09 AM, Gael Varoquaux wrote: > On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote: >> Hi Gaël, > >> If you've got a 1D array/vector called "a", I think the normal idiom is > >> np.dot(a,a) > >> For the more general case, I think >> np.tensordot(a, a, axes=something_else) >> should do it, where you should be able to figure out something_else for >> your particular case. > > Ha, yes. Good point about the tensordot trick. > > Thank you > > Gaël I'm curious about this as I use ss, which is just np.sum(a*a, axis), in statsmodels and didn't much think about it. There is import numpy as np from scipy.stats import ss a = np.ones(5000) but timeit ss(a) 1 loops, best of 3: 21.5 µs per loop timeit np.add.reduce(a*a) 10 loops, best of 3: 15 µs per loop timeit np.dot(a,a) 10 loops, best of 3: 5.38 µs per loop Do the number of loops matter in the timings and is dot always faster even without the blas dot? Skipper ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] object array alignment issues
On Sun, Oct 18, 2009 at 6:04 AM, Michael Droettboom wrote: > On 10/16/2009 11:35 PM, Travis Oliphant wrote: > > > > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote: > > > >> I recently committed a regression test and bugfix for object pointers in > >> record arrays of unaligned size (meaning where each record is not a > >> multiple of sizeof(PyObject **)). > >> > >> For example: > >> > >>a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) > >>a2 = np.zeros((10,), 'S10') > >># This copying would segfault > >>a1['o'] = a2 > >> > >> http://projects.scipy.org/numpy/ticket/1198 > >> > >> Unfortunately, this unit test has opened up a whole hornet's nest of > >> alignment issues on Solaris. The various reference counting functions > >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, > >> for instance. Interestingly, there are comments in there saying > >> "handles misaligned data" (eg. line 190), but in fact it doesn't, and > >> doesn't look to me like it would. But I won't rule out a mistake in > >> building it on my part. > > > > Thanks for this bug report. It would be very helpful if you could > > provide the line number where the code is giving a bus error and > > explain why you think the code in question does not handle misaligned > > data (it still seems like it should to me --- but perhaps I must be > > missing something --- I don't have a Solaris box to test on). > > Perhaps, the real problem is elsewhere (such as other places where the > > mistake of forgetting about striding needing to be aligned also before > > pursuing the fast alignment path that you pointed out in another place > > of code). > > > > This was the thinking for why the code (that I think is in question) > > should handle mis-aligned data: > > > > 1) pointers that are not aligned to the correct size need to be copied > > to an aligned memory area before being de-referenced. > > 2) static variables defined in a function will be aligned by the C > > compiler. > > > > So, what the code in refcnt.c does is to copy the value in the NumPy > > data-area (i.e. pointed to by it->dataptr) to another memory location > > (the stack variable temp), dereference it and then increment it's > > reference count. > > > > 196: temp = (PyObject **)it->dataptr; > > 197: Py_XINCREF(*temp); > This is exactly an instance that fails. Let's say we have a PyObject at > an aligned location 0x4000 (PyObjects themselves always seem to be > aligned -- I strongly suspect CPython is enforcing that). Then, we can > create a recarray such that some of the PyObject*'s in it are at > unaligned locations. For example, if the dtype is 'O,c', you have a > record stride of 5 which creates unaligned PyObject*'s: > >ccc >0123456789abcde > ^^ > > Now in the code above, let's assume that it->dataptr points to an > unaligned location, 0x8005. Assigning it to temp puts the same > unaligned value in temp, 0x8005. That is: > > &temp == 0x1000 /* The location of temp *is* on the stack and aligned */ >temp == 0x8005 /* But its value as a pointer points to an unaligned > memory location */ >*temp == 0x4000 /* Dereferencing it should get us back to the original > PyObject * pointer, but dereferencing an > unaligned memory location > fails with a bus error on Solaris */ > > So the bus error occurs on line 197. > > Note that something like: > >PyObject* temp; >temp = *(PyObject **)it->dataptr; > > would also fail. > > The solution (this is what works for me, though there may be a better way): > > PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject > **) */ > /* memcpy works byte-by-byte, so can handle an unaligned assignment */ > memcpy(&temp, it->dataptr, sizeof(PyObject *)); > Py_XINCREF(temp); > > I'm proposing adding a macro which on Intel/AMD would be defined as: > > #define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src)) > > and on alignment-required platforms as: > > #define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src), > sizeof(PyObject *)) > > and it would be used something like: > > COPY_PYOBJECT_PTR(&temp, it->dataptr); > > This looks right to me, but I'll let Travis sign off on it. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Optimized sum of squares
On Sun, Oct 18, 2009 at 09:06:15PM +1100, Gary Ruben wrote: > Hi Gaël, > If you've got a 1D array/vector called "a", I think the normal idiom is > np.dot(a,a) > For the more general case, I think > np.tensordot(a, a, axes=something_else) > should do it, where you should be able to figure out something_else for > your particular case. Ha, yes. Good point about the tensordot trick. Thank you Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Subclassing record array
On Sun, Oct 18, 2009 at 12:22 AM, Charles R Harris wrote: > > > On Sat, Oct 17, 2009 at 9:13 AM, Loïc BERTHE wrote: >> >> Hi, >> >> I would like to create my own class of record array to deal with units. >> >> Here is the code I used, inspired from >> >> http://docs.scipy.org/doc/numpy-1.3.x/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array >> : >> >> >> [code] >> from numpy import * >> >> class BlocArray(rec.ndarray): >> """ Recarray with units and pretty print """ >> >> fmt_dict = {'S' : '%10s', 'f' : '%10.6G', 'i': '%10d'} >> >> def __new__(cls, data, titles=None, units=None): >> >> # guess format for each column >> data2 = [] >> for line in zip(*data) : >> try : data2.append(cast[int](line)) # integers >> except ValueError : >> try : data2.append(cast[float](line)) # reals >> except ValueError : >> data2.append(cast[str](line)) # characters >> >> # create the array >> dt = dtype(zip(titres, [line.dtype for line in data2])) >> obj = rec.array(data2, dtype=dt).view(cls) >> >> # add custom attributes >> obj.units = units or [] >> obj._fmt = " ".join(obj.fmt_dict[d[1][1]] for d in dt.descr) + '\n' >> obj._head = "%10s "*len(dt.names) % dt.names +'\n' >> obj._head += "%10s "*len(dt.names) % tuple('(%s)' % u for u in >> units) +'\n' >> >> # Finally, we must return the newly created object: >> return obj >> >> titles = ['Name', 'Nb', 'Price'] >> units = ['/', '/', 'Eur'] >> data = [['fish', '1', '12.25'], ['egg', '6', '0.85'], ['TV', 1, '125']] >> bloc = BlocArray(data, titles=titles, units=units) >> >> In [544]: bloc >> Out[544]: >> Name Nb Price >> (/) (/) (Eur) >> fish 1 12.25 >> egg 6 0.85 >> TV 1 125 >> [/code] >> >> It's almost working, but I have some isues : >> >> - I can't access data through indexing >> In [563]: bloc['Price'] >> /home/loic/Python/numpy/test.py in ((r,)) >> 50 >> 51 def __repr__(self): >> ---> 52 return self._head + ''.join(self._fmt % tuple(r) for r in >> self) >> >> TypeError: 'numpy.float64' object is not iterable >> >> So I think that overloading the __repr__ method is not that easy >> >> - I can't access data through attributes now : >> In [564]: bloc.Nb >> AttributeError: 'BlocArray' object has no attribute 'Nb' >> >> - I can't use 'T' as field in theses array as the T method is >> already here as a shortcut for transpose >> >> >> Have you any hints to make this work ? >> >> > > On adding units in general, you might want to contact Darren Dale who has > been working in that direction also and has added some infrastructure in svn > to make it easier. He also gave a short presentation at scipy2009 on that > problem, which has been worked on before. No sense in reinventing the wheel > here. The units package I have been working on is called quantities. It is available at the python package index, and the project is hosted at launchpad as python-quantities. If quantities isn't a good fit, please let me know why. At least the code can provide some example of how to subclass ndarray. Darren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] object array alignment issues
On 10/16/2009 11:35 PM, Travis Oliphant wrote: > > On Oct 15, 2009, at 11:40 AM, Michael Droettboom wrote: > >> I recently committed a regression test and bugfix for object pointers in >> record arrays of unaligned size (meaning where each record is not a >> multiple of sizeof(PyObject **)). >> >> For example: >> >>a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c')]) >>a2 = np.zeros((10,), 'S10') >># This copying would segfault >>a1['o'] = a2 >> >> http://projects.scipy.org/numpy/ticket/1198 >> >> Unfortunately, this unit test has opened up a whole hornet's nest of >> alignment issues on Solaris. The various reference counting functions >> (PyArray_INCREF etc.) in refcnt.c all fail on unaligned object pointers, >> for instance. Interestingly, there are comments in there saying >> "handles misaligned data" (eg. line 190), but in fact it doesn't, and >> doesn't look to me like it would. But I won't rule out a mistake in >> building it on my part. > > Thanks for this bug report. It would be very helpful if you could > provide the line number where the code is giving a bus error and > explain why you think the code in question does not handle misaligned > data (it still seems like it should to me --- but perhaps I must be > missing something --- I don't have a Solaris box to test on). > Perhaps, the real problem is elsewhere (such as other places where the > mistake of forgetting about striding needing to be aligned also before > pursuing the fast alignment path that you pointed out in another place > of code). > > This was the thinking for why the code (that I think is in question) > should handle mis-aligned data: > > 1) pointers that are not aligned to the correct size need to be copied > to an aligned memory area before being de-referenced. > 2) static variables defined in a function will be aligned by the C > compiler. > > So, what the code in refcnt.c does is to copy the value in the NumPy > data-area (i.e. pointed to by it->dataptr) to another memory location > (the stack variable temp), dereference it and then increment it's > reference count. > > 196: temp = (PyObject **)it->dataptr; > 197: Py_XINCREF(*temp); This is exactly an instance that fails. Let's say we have a PyObject at an aligned location 0x4000 (PyObjects themselves always seem to be aligned -- I strongly suspect CPython is enforcing that). Then, we can create a recarray such that some of the PyObject*'s in it are at unaligned locations. For example, if the dtype is 'O,c', you have a record stride of 5 which creates unaligned PyObject*'s: ccc 0123456789abcde ^^ Now in the code above, let's assume that it->dataptr points to an unaligned location, 0x8005. Assigning it to temp puts the same unaligned value in temp, 0x8005. That is: &temp == 0x1000 /* The location of temp *is* on the stack and aligned */ temp == 0x8005 /* But its value as a pointer points to an unaligned memory location */ *temp == 0x4000 /* Dereferencing it should get us back to the original PyObject * pointer, but dereferencing an unaligned memory location fails with a bus error on Solaris */ So the bus error occurs on line 197. Note that something like: PyObject* temp; temp = *(PyObject **)it->dataptr; would also fail. The solution (this is what works for me, though there may be a better way): PyObject *temp; /* NB: temp is now a (PyObject *), not a (PyObject **) */ /* memcpy works byte-by-byte, so can handle an unaligned assignment */ memcpy(&temp, it->dataptr, sizeof(PyObject *)); Py_XINCREF(temp); I'm proposing adding a macro which on Intel/AMD would be defined as: #define COPY_PYOBJECT_PTR(dst, src) (*(dst) = *(src)) and on alignment-required platforms as: #define COPY_PYOBJECT_PTR(dst, src) (memcpy((dst), (src), sizeof(PyObject *)) and it would be used something like: COPY_PYOBJECT_PTR(&temp, it->dataptr); If you agree with this assessment, I'm working on a patch for all of the locations that require this change. All that I've found so far are related to object arrays. It seems that many places where this would be an issue for numeric types are already using this memcpy technique (e.g. *_copyswap in arraytype.c.src:1716). I think this issue shows up in object arrays much more because there are many more places where the unaligned memory is dereferenced (in order to do reference counting). So here's the traceback from: a1 = np.zeros((10,), dtype=[('o', 'O'), ('c', 'c'), ('i', 'i'), ('c2', 'c')]) Unfortunately, I'm having trouble getting line numbers out of the debugger, but "print statement debugging" tells me the inner most frame here is in refcount.c: 275PyObject **temp; 276Py_XINCREF(obj); 277temp = (PyObject **)optr; 278*temp = obj; /* <-- here */ 279return; My fix was: Py_XINCREF(obj); memcpy(optr, &obj, si
Re: [Numpy-discussion] Another suggestion for making numpy's functions generic
On Sat, Oct 17, 2009 at 6:45 PM, Charles R Harris wrote: > > > On Sat, Oct 17, 2009 at 6:49 AM, Darren Dale wrote: [...] >> I think it will be not too difficult to document this overall scheme: >> >> When calling numpy functions: >> >> 1) __input_prepare__ provides an opportunity to operate on the inputs >> to yield versions that are compatible with the operation (they should >> obviously not be modified in place) >> >> 2) the output array is established >> >> 3) __array_prepare__ is used to determine the class of the output >> array, as well as any metadata that needs to be established before the >> operation proceeds >> >> 4) the ufunc performs its operations >> >> 5) __array_wrap__ provides an opportunity to update the output array >> based on the results of the computation >> >> Comments, criticisms? If PEP 3124^ were already a part of the standard >> library, that could serve as the basis for generalizing numpy's >> functions. But I think the PEP will not be approved in its current >> form, and it is unclear when and if the author will revisit the >> proposal. The scheme I'm imagining might be sufficient for our >> purposes. >> > > This sounds interesting to me, as it would push the use of array wrap down > into a common function and make it easier to use. Sorry, I don't understand what you mean. > I wonder what the impact > would be on the current subclasses of ndarray? I don't think it will have any impact. The only change would be the addition of __input_prepare__, which by default would simply return the unmodified inputs. > On a side note, I wonder if you could look into adding your reduce loop > optimizations into the generic loops? It would be interesting to see if that > speeded up some common operations. In any case, it can't hurt. I think you are confusing me with someone else. Darren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Optimized sum of squares
Hi Gaël, If you've got a 1D array/vector called "a", I think the normal idiom is np.dot(a,a) For the more general case, I think np.tensordot(a, a, axes=something_else) should do it, where you should be able to figure out something_else for your particular case. Gary R. Gael Varoquaux wrote: > On Sat, Oct 17, 2009 at 07:27:55PM -0400, josef.p...@gmail.com wrote: > Why aren't you using logaddexp ufunc from numpy? > Maybe because it is difficult to find, it doesn't have its own docs entry. > > Speaking of which... > > I thought that there was a readily-written, optimized function (or ufunc) > in numpy or scipy that calculated the sum of squares for an array > (possibly along an axis). However, I cannot find it. > > Is there something similar? If not, it is not the end of the world, the > operation is trivial to write. > > Cheers, > > Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Optimized sum of squares (was: vectorized version of logsumexp? (from scipy.maxentropy))
On Sat, Oct 17, 2009 at 07:27:55PM -0400, josef.p...@gmail.com wrote: > >> > Why aren't you using logaddexp ufunc from numpy? > >> Maybe because it is difficult to find, it doesn't have its own docs entry. Speaking of which... I thought that there was a readily-written, optimized function (or ufunc) in numpy or scipy that calculated the sum of squares for an array (possibly along an axis). However, I cannot find it. Is there something similar? If not, it is not the end of the world, the operation is trivial to write. Cheers, Gaël ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion