[Numpy-discussion] Bug
___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug
Sorry, wrong shortcut key, question will arrive later. Josef On Fri, Oct 16, 2015 at 1:40 PM,wrote: > > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?
Agreed that indexing functions should return bare `ndarray`. Note that in Jaime's PR one can override it anyway by defining __nonzero__. -- Marten On Sat, May 9, 2015 at 9:53 PM, Stephan Hoyer sho...@gmail.com wrote: With regards to np.where -- shouldn't where be a ufunc, so subclasses or other array-likes can be control its behavior with __numpy_ufunc__? As for the other indexing functions, I don't have a strong opinion about how they should handle subclasses. But it is certainly tricky to attempt to handle handle arbitrary subclasses. I would agree that the least error prone thing to do is usually to return base ndarrays. Better to force subclasses to override methods explicitly. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?
On May 9, 2015 12:54 PM, Benjamin Root ben.r...@ou.edu wrote: Absolutely, it should be writable. As for subclassing, that might be messy. Consider the following: inds = np.where(data 5) In that case, I'd expect a normal, bog-standard ndarray because that is what you use for indexing (although pandas might have a good argument for having it return one of their special indexing types if data was a pandas array...). Pandas doesn't subclass ndarray (anymore), so they're irrelevant to this particular discussion :-). Of course they're an argument for having a cleaner more general way of allowing non-ndarray array-like objects, but the legacy subclassing system will never be that. Next: foobar = np.where(data 5, 1, 2) Again, I'd expect a normal, bog-standard ndarray because the scalar elements are very simple. This question gets very complicated when considering array arguments. Consider: merged_data = np.where(data 5, data, data2) So, what should merged_data be? If both data and data2 are the same types, then it would be reasonable to return the same type, if possible. But what if they aren't the same? Maybe use array_priority to determine the return type? Or, perhaps it does make sense to say sod it all and always return an ndarray? Not sure what this has to do with Jaime's post about nonzero? There is indeed a potential question about what 3-argument where() should do with subclasses, but that's effectively a different operation entirely and to discuss it we'd need to know things like what it historically has done and why that was causing problems. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?
On Sat, May 9, 2015 at 1:27 PM, Benjamin Root ben.r...@ou.edu wrote: On Sat, May 9, 2015 at 4:03 PM, Nathaniel Smith n...@pobox.com wrote: Not sure what this has to do with Jaime's post about nonzero? There is indeed a potential question about what 3-argument where() should do with subclasses, but that's effectively a different operation entirely and to discuss it we'd need to know things like what it historically has done and why that was causing problems. Because my train of thought started at np.nonzero(), which I have always just mentally mapped to np.where(), and then... squirrel! Indeed, np.where() has no bearing here. Ah, gotcha :-). There is an argument that we should try to reduce this confusion by nudging people to use np.nonzero() consistently instead of np.where(), via the documentation and/or a warning message... -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?
With regards to np.where -- shouldn't where be a ufunc, so subclasses or other array-likes can be control its behavior with __numpy_ufunc__? As for the other indexing functions, I don't have a strong opinion about how they should handle subclasses. But it is certainly tricky to attempt to handle handle arbitrary subclasses. I would agree that the least error prone thing to do is usually to return base ndarrays. Better to force subclasses to override methods explicitly.___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?
On Sat, May 9, 2015 at 4:03 PM, Nathaniel Smith n...@pobox.com wrote: Not sure what this has to do with Jaime's post about nonzero? There is indeed a potential question about what 3-argument where() should do with subclasses, but that's effectively a different operation entirely and to discuss it we'd need to know things like what it historically has done and why that was causing problems. Because my train of thought started at np.nonzero(), which I have always just mentally mapped to np.where(), and then... squirrel! Indeed, np.where() has no bearing here. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?
There is a reported bug (issue #5837 https://github.com/numpy/numpy/issues/5837) regarding different returns from np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output: class C(np.ndarray): pass ... a = np.arange(6).view(C) b = np.arange(6).reshape(2, 3).view(C) anz = a.nonzero() bnz = b.nonzero() type(anz[0]) type 'numpy.ndarray' anz[0].flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False anz[0].base type(bnz[0]) class '__main__.C' bnz[0].flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : False ALIGNED : True UPDATEIFCOPY : False bnz[0].base array([[0, 1], [0, 2], [1, 0], [1, 1], [1, 2]]) The original bug report was only concerned with the non-writeability of higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable. I have a branch that attempts to fix this by making both 1-D and n-D arrays: 1. return a view, never the base array, 2. return an ndarray, never a subclass, and 3. return a writeable view. I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions. Since we are changing the returns of a few other functions in 1.10 (diagonal, diag, ravel), it may be a good moment to revisit the behavior for these other functions. Any thoughts? Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?
On May 9, 2015 10:48 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: There is a reported bug (issue #5837) regarding different returns from np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output: class C(np.ndarray): pass ... a = np.arange(6).view(C) b = np.arange(6).reshape(2, 3).view(C) anz = a.nonzero() bnz = b.nonzero() type(anz[0]) type 'numpy.ndarray' anz[0].flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False anz[0].base type(bnz[0]) class '__main__.C' bnz[0].flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : False ALIGNED : True UPDATEIFCOPY : False bnz[0].base array([[0, 1], [0, 2], [1, 0], [1, 1], [1, 2]]) The original bug report was only concerned with the non-writeability of higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable. I have a branch that attempts to fix this by making both 1-D and n-D arrays: return a view, never the base array, This doesn't matter, does it? View isn't a thing, only view of is meaningful. And in this case, none of the returned arrays share any memory with any other arrays that the user has access to... so whether they were created as a view or not should be an implementation detail that's transparent to the user? return an ndarray, never a subclass, and return a writeable view. I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions. I also can't see any logical reason why the return type of these functions has anything to do with the type of the inputs. You can index me with my phone number but my phone number is not a person. OTOH logic and ndarray subclassing don't have much to do with each other; the practical effect is probably more important. Looking at the subclasses I know about (masked arrays, np.matrix, and astropy quantities), though, I also can't see much benefit in copying the subclass of the input, and the fact that we were never consistent about this suggests that people probably aren't depending on it too much. So in summary my feeling is: +1 to making then writable, no objection to the view thing (though I don't see how it matters), and provisional +1 to consistently returning ndarray (to be revised if the people who use the subclassing functionality disagree). -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses?
Absolutely, it should be writable. As for subclassing, that might be messy. Consider the following: inds = np.where(data 5) In that case, I'd expect a normal, bog-standard ndarray because that is what you use for indexing (although pandas might have a good argument for having it return one of their special indexing types if data was a pandas array...). Next: foobar = np.where(data 5, 1, 2) Again, I'd expect a normal, bog-standard ndarray because the scalar elements are very simple. This question gets very complicated when considering array arguments. Consider: merged_data = np.where(data 5, data, data2) So, what should merged_data be? If both data and data2 are the same types, then it would be reasonable to return the same type, if possible. But what if they aren't the same? Maybe use array_priority to determine the return type? Or, perhaps it does make sense to say sod it all and always return an ndarray? I don't know the answer. I do find it interesting that the result from a multi-dimensional array is not writable. I don't know why I have never encountered that. Ben Root On Sat, May 9, 2015 at 2:42 PM, Nathaniel Smith n...@pobox.com wrote: On May 9, 2015 10:48 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: There is a reported bug (issue #5837) regarding different returns from np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output: class C(np.ndarray): pass ... a = np.arange(6).view(C) b = np.arange(6).reshape(2, 3).view(C) anz = a.nonzero() bnz = b.nonzero() type(anz[0]) type 'numpy.ndarray' anz[0].flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False anz[0].base type(bnz[0]) class '__main__.C' bnz[0].flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : False ALIGNED : True UPDATEIFCOPY : False bnz[0].base array([[0, 1], [0, 2], [1, 0], [1, 1], [1, 2]]) The original bug report was only concerned with the non-writeability of higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable. I have a branch that attempts to fix this by making both 1-D and n-D arrays: return a view, never the base array, This doesn't matter, does it? View isn't a thing, only view of is meaningful. And in this case, none of the returned arrays share any memory with any other arrays that the user has access to... so whether they were created as a view or not should be an implementation detail that's transparent to the user? return an ndarray, never a subclass, and return a writeable view. I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions. I also can't see any logical reason why the return type of these functions has anything to do with the type of the inputs. You can index me with my phone number but my phone number is not a person. OTOH logic and ndarray subclassing don't have much to do with each other; the practical effect is probably more important. Looking at the subclasses I know about (masked arrays, np.matrix, and astropy quantities), though, I also can't see much benefit in copying the subclass of the input, and the fact that we were never consistent about this suggests that people probably aren't depending on it too much. So in summary my feeling is: +1 to making then writable, no objection to the view thing (though I don't see how it matters), and provisional +1 to consistently returning ndarray (to be revised if the people who use the subclassing functionality disagree). -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in 1.9?
Hello, Is this desired behaviour or a regression or a bug? http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray Thanks, Neil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in 1.9?
On Wed, Oct 22, 2014 at 11:32 AM, Neil Girdhar mistersh...@gmail.com wrote: Hello, Is this desired behaviour or a regression or a bug? http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray Thanks, I'd guess that the definition of aligned may have become stricter, that's the only thing I think has changed. Maybe Julian can comment on that. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in 1.9?
On 22.10.2014 20:00, Charles R Harris wrote: On Wed, Oct 22, 2014 at 11:32 AM, Neil Girdhar mistersh...@gmail.com mailto:mistersh...@gmail.com wrote: Hello, Is this desired behaviour or a regression or a bug? http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray Thanks, I'd guess that the definition of aligned may have become stricter, that's the only thing I think has changed. Maybe Julian can comment on that. structured dtypes have not really a well defined alignment, e.g. the stride of this is 12, so when element 0 is aligned element 1 is always unaligned. Before 1.9 structured dtype always had the aligned flag set, even if they were unaligned. Now we require a minimum alignment of 16 for strings and structured types so copying which sometimes works on the whole compound type instead of each item always works. This was the easiest way to get the testsuite running on sparc after fixing a couple of code paths not updating alignment information which forced some functions to always take super slow unaligned paths (e.g. ufunc.at) But the logic could certainly be improved. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in 1.9?
On Wed, Oct 22, 2014 at 12:28 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 22.10.2014 20:00, Charles R Harris wrote: On Wed, Oct 22, 2014 at 11:32 AM, Neil Girdhar mistersh...@gmail.com mailto:mistersh...@gmail.com wrote: Hello, Is this desired behaviour or a regression or a bug? http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray Thanks, I'd guess that the definition of aligned may have become stricter, that's the only thing I think has changed. Maybe Julian can comment on that. structured dtypes have not really a well defined alignment, e.g. the stride of this is 12, so when element 0 is aligned element 1 is always unaligned. Before 1.9 structured dtype always had the aligned flag set, even if they were unaligned. Now we require a minimum alignment of 16 for strings and structured types so copying which sometimes works on the whole compound type instead of each item always works. This was the easiest way to get the testsuite running on sparc after fixing a couple of code paths not updating alignment information which forced some functions to always take super slow unaligned paths (e.g. ufunc.at) But the logic could certainly be improved. The stackexchange example: In [9]: a = np.zeros(4, dtype=dtype([('x', 'f8'), ('y', 'i4')], align=False)) In [10]: a.data Out[10]: read-write buffer for 0x2f94440, size 48, offset 0 at 0x2f8caf0 In [11]: a = np.zeros(4, dtype=dtype([('x', 'f8'), ('y', 'i4')], align=True)) In [12]: a.data Out[12]: read-write buffer for 0x2f94030, size 64, offset 0 at 0x2f8c5b0 Note that using an aligned dtype yields a different size on my 64 bit system and 64 / 4 = 16. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters
On 26 Aug 2014, at 09:05 pm, Adrian Altenhoff adrian.altenh...@inf.ethz.ch wrote: But you are right that the problem with using the first_values, which should of course be valid, somehow stems from the use of usecols, it seems that in that loop for (i, conv) in user_converters.items(): i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of modifying i inside the loop appears a bit dangerous to me. I’ll have look if I can make this safer. Thanks. As long as your data don’t actually contain any missing values you might also simply use np.loadtxt. Ok, wasn't aware of that function so far. I will try that! It was first_values that needs to be addressed by the original indices. I have created a short test from your case and submitted a fix at https://github.com/numpy/numpy/pull/5006 Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters
Hi Adrian, I tried to load data from a csv file into numpy using genfromtxt. I need only a subset of the columns and want to apply some conversions to the data. attached is a minimal script showing the error. In brief, I want to load columns 1,2 and 4. But in the converter function for the 4th column, I get the 3rd value. The issue does not occur if I also load the 3rd column. Did I somehow misunderstand how the function is supposed to work or is this indeed a bug? not sure whether to call it a bug; the error seems to arise before reading any actual data (even on reading from an empty string); when genfromtxt is checking the filling_values used to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1 which your conversion scheme does not know about. Although I think it is rather the user’s responsibility to provide valid converters, probably the documentation should at least be updated to make them aware of this requirement. I see two possible fixes/workarounds: provide an keyword argument filling_values=[0,0,'1:1’] or add the default filling values to your relEnum dictionary, e.g. { … '-1':-1, '1':-1} Could you check if this works for your case? HTH, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters
Hi Derek, thanks for your answer. not sure whether to call it a bug; the error seems to arise before reading any actual data (even on reading from an empty string); when genfromtxt is checking the filling_values used to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1 which your conversion scheme does not know about. Although I think it is rather the user’s responsibility to provide valid converters, probably the documentation should at least be updated to make them aware of this requirement. I see two possible fixes/workarounds: provide an keyword argument filling_values=[0,0,'1:1’] This workaround seems to be work, but I doubt that the actual problem is the converter function I pass. The '-1', which is used as the testing value is the first_values from the 3rd column (line 1574 in npyio.py), but the converter is defined for column 4. by setting the filling_values to an array of length 3, this obviously makes the problem disappear. But I think if the first row is used, it should also use the values from the column for which the converter is defined. Best Adrian ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters
Hi Adrian, not sure whether to call it a bug; the error seems to arise before reading any actual data (even on reading from an empty string); when genfromtxt is checking the filling_values used to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1 which your conversion scheme does not know about. Although I think it is rather the user’s responsibility to provide valid converters, probably the documentation should at least be updated to make them aware of this requirement. I see two possible fixes/workarounds: provide an keyword argument filling_values=[0,0,'1:1’] This workaround seems to be work, but I doubt that the actual problem is the converter function I pass. The '-1', which is used as the testing value is the first_values from the 3rd column (line 1574 in npyio.py), but the converter is defined for column 4. by setting the filling_values to an array of length 3, this obviously makes the problem disappear. But I think if the first row is used, it should also use the values from the column for which the converter is defined. it is certainly related to the converter function because a KeyError for the dictionary you provide is raised: File test.py, line 13, in module 3: lambda rel: relEnum[rel.decode()]}) File /sw/lib/python3.4/site-packages/numpy/lib/npyio.py, line 1581, in genfromtxt missing_values=missing_values[i],) File /sw/lib/python3.4/site-packages/numpy/lib/_iotools.py, line 784, in update tester = func(testing_value or asbytes('1')) File test.py, line 13, in lambda 3: lambda rel: relEnum[rel.decode()]}) KeyError: '-1’ But you are right that the problem with using the first_values, which should of course be valid, somehow stems from the use of usecols, it seems that in that loop for (i, conv) in user_converters.items(): i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of modifying i inside the loop appears a bit dangerous to me. I’ll have look if I can make this safer. As long as your data don’t actually contain any missing values you might also simply use np.loadtxt. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in genfromtxt with usecols and converters
Hi Derek, But you are right that the problem with using the first_values, which should of course be valid, somehow stems from the use of usecols, it seems that in that loop for (i, conv) in user_converters.items(): i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of modifying i inside the loop appears a bit dangerous to me. I’ll have look if I can make this safer. Thanks. As long as your data don’t actually contain any missing values you might also simply use np.loadtxt. Ok, wasn't aware of that function so far. I will try that! Best wishes Adrian ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in genfromtxt with usecols and converters
Hi, I tried to load data from a csv file into numpy using genfromtxt. I need only a subset of the columns and want to apply some conversions to the data. attached is a minimal script showing the error. In brief, I want to load columns 1,2 and 4. But in the converter function for the 4th column, I get the 3rd value. The issue does not occur if I also load the 3rd column. Did I somehow misunderstand how the function is supposed to work or is this indeed a bug? I'm using python 3.3.1 with numpy 1.8.1 Regards Adrian import numpy import io off1, off2 = 0, 4000 dtype = [('EntryNr1','i4'),('EntryNr2','i4'),('RelType', 'i1')] fn = io.BytesIO(1,5,-1,1:1,1.98\n2,8,-1,1:n,22.56\n3,3,-2,m:n,18.2\n.encode('utf-8')) relEnum = {'1:1':0, '1:n':1, 'm:1':2, 'm:n':3} data = numpy.genfromtxt(fn, dtype=dtype, delimiter=',', usecols=(0,1,3), converters={0: lambda nr: int(nr)+off1, 1: lambda nr: int(nr)+off2, 3: lambda rel: relEnum[rel.decode()]}) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.cross for 2D vectors
On Di, 2014-07-15 at 10:22 +0100, Neil Hodgson wrote: Hi, We came across this bug while using np.cross on 3D arrays of 2D vectors. Hi, which numpy version are you using? Until recently, the cross product simply did *not* work in a broadcasting manner (3d arrays of 2d vectors), it did something, but usually not the right thing. This is fixed in recent versions (not sure if 1.8 or only now with 1.9) - Sebastian The first example shows the problem and we looked at the source for np.cross and believe we found the bug - an unnecessary swapaxes when returning the output (comment inserted in the code). Thanks Neil # Example shape = (3,5,7,2) # These are effectively 3D arrays (3*5*7) of 2D vectors data1 = np.random.randn(*shape) data2 = np.random.randn(*shape) # The cross product of data1 and data2 should produce a (3*5*7) array of scalars cross_product_longhand = data1[:,:,:,0]*data2[:,:,:,1]-data1[:,:,:,1]*data2[:,:,:,0] print 'longhand output shape:',cross_product_longhand.shape # and it does cross_product_numpy = np.cross(data1,data2) print 'numpy output shape:',cross_product_numpy.shape # It seems to have transposed the last 2 dimensions if (cross_product_longhand == np.transpose(cross_product_numpy, (0,2,1))).all(): print 'Unexpected transposition in numpy.cross (numpy version %s)'% np.__version__ # np.cross L1464 if axis is not None: axisa, axisb, axisc=(axis,)*3 a = asarray(a).swapaxes(axisa, 0) b = asarray(b).swapaxes(axisb, 0) msg = incompatible dimensions for cross product\n\ (dimension must be 2 or 3) if (a.shape[0] not in [2, 3]) or (b.shape[0] not in [2, 3]): raise ValueError(msg) if a.shape[0] == 2: if (b.shape[0] == 2): cp = a[0]*b[1] - a[1]*b[0] if cp.ndim == 0: return cp else: ## WE SHOULD NOT SWAPAXES HERE! ## For 2D vectors the first axis has been ## collapsed during the cross product return cp.swapaxes(0, axisc) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.cross for 2D vectors
Hi, We came across this bug while using np.cross on 3D arrays of 2D vectors. What version of numpy are you using? This should already be solved in numpy master, and be part of the 1.9 release. Here's the relevant commit, although the code has been cleaned up a bit in later ones: https://github.com/numpy/numpy/commit/b9454f50f23516234c325490913224c3a69fb122 Jaime Yes, we are using 1.8 - sorry I should have checked! Thanks Neil___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.cross for 2D vectors
Hi, We came across this bug while using np.cross on 3D arrays of 2D vectors. What version of numpy are you using? This should already be solved in numpy master, and be part of the 1.9 release. Here's the relevant commit, although the code has been cleaned up a bit in later ones: https://github.com/numpy/numpy/commit/b9454f50f23516234c325490913224c3a69fb122 Jaime Hi, which numpy version are you using? Until recently, the cross product simply did *not* work in a broadcasting manner (3d arrays of 2d vectors), it did something, but usually not the right thing. This is fixed in recent versions (not sure if 1.8 or only now with 1.9) - Sebastian Hi, I thought I replied, but I don't see it on the list, so here goes again... Yes, we are using 1.8, will confirm it's ok with 1.9 Thanks Neil___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.cross for 2D vectors
On Tue, Jul 15, 2014 at 2:22 AM, Neil Hodgson hodgson.n...@yahoo.co.uk wrote: Hi, We came across this bug while using np.cross on 3D arrays of 2D vectors. What version of numpy are you using? This should already be solved in numpy master, and be part of the 1.9 release. Here's the relevant commit, although the code has been cleaned up a bit in later ones: https://github.com/numpy/numpy/commit/b9454f50f23516234c325490913224c3a69fb122 Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in np.cross for 2D vectors
Hi, We came across this bug while using np.cross on 3D arrays of 2D vectors. The first example shows the problem and we looked at the source for np.cross and believe we found the bug - an unnecessary swapaxes when returning the output (comment inserted in the code). Thanks Neil # Example shape = (3,5,7,2) # These are effectively 3D arrays (3*5*7) of 2D vectors data1 = np.random.randn(*shape) data2 = np.random.randn(*shape) # The cross product of data1 and data2 should produce a (3*5*7) array of scalars cross_product_longhand = data1[:,:,:,0]*data2[:,:,:,1]-data1[:,:,:,1]*data2[:,:,:,0] print 'longhand output shape:',cross_product_longhand.shape # and it does cross_product_numpy = np.cross(data1,data2) print 'numpy output shape:',cross_product_numpy.shape # It seems to have transposed the last 2 dimensions if (cross_product_longhand == np.transpose(cross_product_numpy, (0,2,1))).all(): print 'Unexpected transposition in numpy.cross (numpy version %s)'%np.__version__ # np.cross L1464if axis is not None: axisa, axisb, axisc=(axis,)*3 a = asarray(a).swapaxes(axisa, 0) b = asarray(b).swapaxes(axisb, 0) msg = incompatible dimensions for cross product\n\ (dimension must be 2 or 3) if (a.shape[0] not in [2, 3]) or (b.shape[0] not in [2, 3]): raise ValueError(msg) if a.shape[0] == 2: if (b.shape[0] == 2): cp = a[0]*b[1] - a[1]*b[0] if cp.ndim == 0: return cp else: ## WE SHOULD NOT SWAPAXES HERE! ## For 2D vectors the first axis has been ## collapsed during the cross product return cp.swapaxes(0, axisc) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug with mmap'ed datetime64 arrays
test case: #!/usr/bin/env python import numpy as np a=np.array(['2014', '2015', '2016'], dtype='datetime64') x=np.datetime64('2015') print ax np.save('test.npy', a) b = np.load('test.npy', mmap_mode='c') print bx result: [False False True] Traceback (most recent call last): File stdin, line 1, in module File /tmp/t.py, line 12, in module print bx File /usr/lib64/python2.7/site-packages/numpy/core/memmap.py, line 279, in __array_finalize__ if hasattr(obj, '_mmap') and np.may_share_memory(self, obj): File /usr/lib64/python2.7/site-packages/numpy/lib/utils.py, line 298, in may_share_memory a_low, a_high = byte_bounds(a) File /usr/lib64/python2.7/site-packages/numpy/lib/utils.py, line 258, in byte_bounds bytes_a = int(ai['typestr'][2:]) ValueError: invalid literal for int() with base 10: '8[Y]' fix: diff --git a/numpy/lib/utils.py b/numpy/lib/utils.py index 1f1cdfc..c73f2f1 100644 --- a/numpy/lib/utils.py +++ b/numpy/lib/utils.py @@ -210,7 +210,7 @@ def byte_bounds(a): a_data = ai['data'][0] astrides = ai['strides'] ashape = ai['shape'] -bytes_a = int(ai['typestr'][2:]) +bytes_a = a.dtype.itemsize a_low = a_high = a_data if astrides is None: # contiguous case will submit pull request via github ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in comparing object arrays to None (?)
Hi Numpy folks. I just noticed that comparing an array of type 'object' to None does not behave as I expected. Is this a feature or a bug? (I can take a stab at fixing it if it's a bug, as I believe it is). np.version.full_version '1.8.0' a = np.array(['Frank', None, 'Nancy']) a array(['Frank', None, 'Nancy'], dtype=object) a == 'Frank' array([ True, False, False], dtype=bool) # Return value is an array a == None False # Return value is scalar (BUG?) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in comparing object arrays to None (?)
On Mon, Jan 27, 2014 at 3:43 PM, Charles G. Waldman char...@crunch.iowrote: Hi Numpy folks. I just noticed that comparing an array of type 'object' to None does not behave as I expected. Is this a feature or a bug? (I can take a stab at fixing it if it's a bug, as I believe it is). np.version.full_version '1.8.0' a = np.array(['Frank', None, 'Nancy']) a array(['Frank', None, 'Nancy'], dtype=object) a == 'Frank' array([ True, False, False], dtype=bool) # Return value is an array a == None False # Return value is scalar (BUG?) Looks like a fix is in progress: https://github.com/numpy/numpy/pull/3514 Warren ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in resize of structured array (with initial size = 0)
Hi, I've tried to resize a record array that was first empty (on purpose, I need it) and I got the following error (while it's working for regular array). Traceback (most recent call last): File test_resize.py, line 10, in module print np.resize(V,2) File /usr/locaL/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py, line 1053, in resize if not Na: return mu.zeros(new_shape, a.dtype.char) TypeError: Empty data-type I'm using numpy 1.8.0, python 2.7.6, osx 10.9.1. Can anyone confirm before I submit an issue ? Here is the script: V = np.zeros(0, dtype=np.float32) print V.dtype print np.resize(V,2) V = np.zeros(0, dtype=[('a', np.float32, 1)]) print V.dtype print np.resize(V,2) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.correlate documentation
On 11.10.2013, at 01:19, Julian Taylor jtaylor.deb...@googlemail.com wrote: Yeah, unless the current behaviour is actually broken or redundant in some way, we're not going to switch from one perfectly good convention to another perfectly good convention and break everyone's code in the process. The most helpful thing would be if you could file a pull request that just changes the docstring to what you think it should be. Extra bonus points if it points out that there is another definition some people might be expecting instead, and explains how those people can use the existing functions to get what they want. :-) -n IMHO, point[ing] out that there is another definition some people might be expecting instead, and explain[ing] how those people can use the existing functions to get what they want should be a requirement for the docstring (Notes section), not merely worth extra bonus points. But then I'm not, presently, in a position to edit the docstring myself, so that's just MHO. IAE, I found what appears to me to be another vote for the extant docstring: Box Jenkins, 1976, Time Series Analysis: Forecasting and Control, Holden-Day, Oakland, pg. 374. Perhaps a switch (with a default value that maintains current definition, so that extant uses would not require a code change) c/should be added to the function signature so that users can get easily get what they want? As pointed out in another post in this thread, there are now at least three different definitions of correlation which are in use in different disciplines of science and engineering: Numpy code: z_numpyCode[k] = sum_n a[n+k] * conj(v[n]) Numpy docs: z_numpyDoc[k] = sum_n a[n] * conj(v[n+k]) = sum_n a[n-k] * conj(v[n]) = z_numpyCode[-k] Wolfram Mathworld: z_mmca[k] = sum_n conj(a[n]) * v[n+k] = conj( sum_n a[n] * conj(v[n+k]) ) = conj( z_numpyDoc[k] ) = conj( z_numpyCode[-k] ) I'm sure there are even more if you search long enough. But shouldn't the primary objective be to bring the docs in line with the code (which is definitely not broken)? It took me 2 days of debugging my code recently only to discover that numpy correlate() was calculating a different correlation than the docs said. I can try to come up with a proposal for the docs. Could anyone point me to where I can find the docs? I can clone the numpy repo, however, I'm not a numpy developer. yes we should only change the documentation to match the (hopefully correct) code. the documentation is in the docstring of the correlate function in numpy/core/numeric.py line 819 ___ Ok, corrected the docstring, mentioning one alternative definition of correlation. Pull request filed: https://github.com/numpy/numpy/pull/3913. Bernhard ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.correlate documentation
It seems to me that Wolfram is following yet another path. From http://mathworld.wolfram.com/Autocorrelation.html and more importantly http://mathworld.wolfram.com/Cross-Correlation.html, equation (5): z_mathworld[k] = sum_n conj(a[n]) * v[n+k] = conj( sum_n a[n] * conj(v[n+k]) ) = conj( z_numpyDocstring[k] ) = conj( z_numpyCode[-k] ) is the conjugate of what the numpy docstring says. So, now we have at least three definitions to chose from :-) Cheers, Bernhard On 09.10.2013, at 22:19, David Goldsmith d.l.goldsm...@gmail.com wrote: Looks like Wolfram MathWorld would favor the docstring, but the possibility of a use-domain dependency seems plausible (after all, a similar dilemma is observed, e.g., w/ the Fourier Transform)--I guess one discipline's future is another discipline's past. :-) http://mathworld.wolfram.com/Autocorrelation.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.correlate documentation
On 10.10.2013, at 19:27, David Goldsmith d.l.goldsm...@gmail.com wrote: On Wed, Oct 9, 2013 at 7:48 PM, Bernhard Spinnler bernhard.spinn...@gmx.net wrote: Hi Richard, Ah, I searched the list but didn't find those posts before? I can easily imagine that correlation is defined differently in different disciplines. Both ways are correct and it's just a convention or definition. In my field (Digital Communications, Digital Signal Processing) the vast majority uses the convention implemented by the code. Here are a few examples of prominent text books: - Papoulis, Probaility, Random Variables, and Stochastic Processes, McGraw-Hill, 2nd ed. - Benvenuto, Cherubini, Algorithms for Communications Systems and their Applications, Wiley. - Carlson, Communication Systems 4th ed. 2002, McGraw-Hill. Last not least, Matlab's xcorr() function behaves exactly like correlate() does right now, see - http://www.mathworks.de/de/help/signal/ref/xcorr.html But, as you say, the most important aspect might be, that most people will probably prefer changing the docs instead of changing the code. Yeah, unless the current behaviour is actually broken or redundant in some way, we're not going to switch from one perfectly good convention to another perfectly good convention and break everyone's code in the process. The most helpful thing would be if you could file a pull request that just changes the docstring to what you think it should be. Extra bonus points if it points out that there is another definition some people might be expecting instead, and explains how those people can use the existing functions to get what they want. :-) -n IMHO, point[ing] out that there is another definition some people might be expecting instead, and explain[ing] how those people can use the existing functions to get what they want should be a requirement for the docstring (Notes section), not merely worth extra bonus points. But then I'm not, presently, in a position to edit the docstring myself, so that's just MHO. IAE, I found what appears to me to be another vote for the extant docstring: Box Jenkins, 1976, Time Series Analysis: Forecasting and Control, Holden-Day, Oakland, pg. 374. Perhaps a switch (with a default value that maintains current definition, so that extant uses would not require a code change) c/should be added to the function signature so that users can get easily get what they want? As pointed out in another post in this thread, there are now at least three different definitions of correlation which are in use in different disciplines of science and engineering: Numpy code: z_numpyCode[k] = sum_n a[n+k] * conj(v[n]) Numpy docs: z_numpyDoc[k] = sum_n a[n] * conj(v[n+k]) = sum_n a[n-k] * conj(v[n]) = z_numpyCode[-k] Wolfram Mathworld: z_mmca[k] = sum_n conj(a[n]) * v[n+k] = conj( sum_n a[n] * conj(v[n+k]) ) = conj( z_numpyDoc[k] ) = conj( z_numpyCode[-k] ) I'm sure there are even more if you search long enough. But shouldn't the primary objective be to bring the docs in line with the code (which is definitely not broken)? It took me 2 days of debugging my code recently only to discover that numpy correlate() was calculating a different correlation than the docs said. I can try to come up with a proposal for the docs. Could anyone point me to where I can find the docs? I can clone the numpy repo, however, I'm not a numpy developer. Best wishes, Bernhard ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.correlate documentation
On 10.10.2013 21:31, Bernhard Spinnler wrote: On 10.10.2013, at 19:27, David Goldsmith d.l.goldsm...@gmail.com mailto:d.l.goldsm...@gmail.com wrote: On Wed, Oct 9, 2013 at 7:48 PM, Bernhard Spinnler bernhard.spinn...@gmx.net mailto:bernhard.spinn...@gmx.net wrote: Hi Richard, Ah, I searched the list but didn't find those posts before? I can easily imagine that correlation is defined differently in different disciplines. Both ways are correct and it's just a convention or definition. In my field (Digital Communications, Digital Signal Processing) the vast majority uses the convention implemented by the code. Here are a few examples of prominent text books: - Papoulis, Probaility, Random Variables, and Stochastic Processes, McGraw-Hill, 2nd ed. - Benvenuto, Cherubini, Algorithms for Communications Systems and their Applications, Wiley. - Carlson, Communication Systems 4th ed. 2002, McGraw-Hill. Last not least, Matlab's xcorr() function behaves exactly like correlate() does right now, see - http://www.mathworks.de/de/help/signal/ref/xcorr.html But, as you say, the most important aspect might be, that most people will probably prefer changing the docs instead of changing the code. Yeah, unless the current behaviour is actually broken or redundant in some way, we're not going to switch from one perfectly good convention to another perfectly good convention and break everyone's code in the process. The most helpful thing would be if you could file a pull request that just changes the docstring to what you think it should be. Extra bonus points if it points out that there is another definition some people might be expecting instead, and explains how those people can use the existing functions to get what they want. :-) -n IMHO, point[ing] out that there is another definition some people might be expecting instead, and explain[ing] how those people can use the existing functions to get what they want should be a requirement for the docstring (Notes section), not merely worth extra bonus points. But then I'm not, presently, in a position to edit the docstring myself, so that's just MHO. IAE, I found what appears to me to be another vote for the extant docstring: Box Jenkins, 1976, Time Series Analysis: Forecasting and Control, Holden-Day, Oakland, pg. 374. Perhaps a switch (with a default value that maintains current definition, so that extant uses would not require a code change) c/should be added to the function signature so that users can get easily get what they want? As pointed out in another post in this thread, there are now at least three different definitions of correlation which are in use in different disciplines of science and engineering: Numpy code: z_numpyCode[k] = sum_n a[n+k] * conj(v[n]) Numpy docs: z_numpyDoc[k] = sum_n a[n] * conj(v[n+k]) = sum_n a[n-k] * conj(v[n]) = z_numpyCode[-k] Wolfram Mathworld: z_mmca[k] = sum_n conj(a[n]) * v[n+k] = conj( sum_n a[n] * conj(v[n+k]) ) = conj( z_numpyDoc[k] ) = conj( z_numpyCode[-k] ) I'm sure there are even more if you search long enough. But shouldn't the primary objective be to bring the docs in line with the code (which is definitely not broken)? It took me 2 days of debugging my code recently only to discover that numpy correlate() was calculating a different correlation than the docs said. I can try to come up with a proposal for the docs. Could anyone point me to where I can find the docs? I can clone the numpy repo, however, I'm not a numpy developer. yes we should only change the documentation to match the (hopefully correct) code. the documentation is in the docstring of the correlate function in numpy/core/numeric.py line 819 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.correlate documentation
Hi Richard, Ah, I searched the list but didn't find those posts before… I can easily imagine that correlation is defined differently in different disciplines. Both ways are correct and it's just a convention or definition. In my field (Digital Communications, Digital Signal Processing) the vast majority uses the convention implemented by the code. Here are a few examples of prominent text books: - Papoulis, Probaility, Random Variables, and Stochastic Processes, McGraw-Hill, 2nd ed. - Benvenuto, Cherubini, Algorithms for Communications Systems and their Applications, Wiley. - Carlson, Communication Systems 4th ed. 2002, McGraw-Hill. Last not least, Matlab's xcorr() function behaves exactly like correlate() does right now, see - http://www.mathworks.de/de/help/signal/ref/xcorr.html But, as you say, the most important aspect might be, that most people will probably prefer changing the docs instead of changing the code. Should I file a bug somewhere? Cheers, Bernhard On 08.10.2013, at 21:10, Richard Hattersley rhatters...@gmail.com wrote: Hi Bernard, Looks like you're on to something - two other people have raised this discrepancy before: https://github.com/numpy/numpy/issues/2588. Unfortunately, when it comes to resolving the discrepancy one of the previous comments takes the opposite view. Namely, that the docstring is correct and the code is wrong. Do different domains use different conventions here? Are there some references to back up one stance or another? But all else being equal, I'm guessing there'll be far more appetite for updating the documentation than the code. Regards, Richard Hattersley On 7 October 2013 22:09, Bernhard Spinnler bernhard.spinn...@gmx.net wrote: The numpy.correlate documentation says: correlate(a, v) = z[k] = sum_n a[n] * conj(v[n+k]) In [1]: a = [1, 2] In [2]: v = [2, 1j] In [3]: z = correlate(a, v, 'full') In [4]: z Out[4]: array([ 0.-1.j, 2.-2.j, 4.+0.j]) However, according to the documentation, z should be z[-1] = a[1] * conj(v[0]) = 4.+0.j z[0] = a[0] * conj(v[0]) + a[1] * conj(v[1]) = 2.-2.j z[1] = a[0] * conj(v[1]) = 0.-1.j which is the time reversed version of what correlate() calculates. IMHO, the correlate() code is correct. The correct formula in the docs (which is also the correlation formula in standard text books) should be z[k] = sum_n a[n+k] * conj(v[n]) Cheers, Bernhard ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.correlate documentation
Looks like Wolfram MathWorld would favor the docstring, but the possibility of a use-domain dependency seems plausible (after all, a similar dilemma is observed, e.g., w/ the Fourier Transform)--I guess one discipline's future is another discipline's past. :-) http://mathworld.wolfram.com/Autocorrelation.html DG Date: Tue, 8 Oct 2013 20:10:41 +0100 From: Richard Hattersley rhatters...@gmail.com Subject: Re: [Numpy-discussion] Bug in numpy.correlate documentation To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: CAP=RS9k54vtNFHy9ppG=U09oEHwB=KLV0xvwR6BfFgB3o5S= f...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 Hi Bernard, Looks like you're on to something - two other people have raised this discrepancy before: https://github.com/numpy/numpy/issues/2588. Unfortunately, when it comes to resolving the discrepancy one of the previous comments takes the opposite view. Namely, that the docstring is correct and the code is wrong. Do different domains use different conventions here? Are there some references to back up one stance or another? But all else being equal, I'm guessing there'll be far more appetite for updating the documentation than the code. Regards, Richard Hattersley On 7 October 2013 22:09, Bernhard Spinnler bernhard.spinn...@gmx.net wrote: The numpy.correlate documentation says: correlate(a, v) = z[k] = sum_n a[n] * conj(v[n+k]) snip [so] according to the documentation, z should be z[-1] = a[1] * conj(v[0]) = 4.+0.j z[0] = a[0] * conj(v[0]) + a[1] * conj(v[1]) = 2.-2.j z[1] = a[0] * conj(v[1]) = 0.-1.j which is the time reversed version of what correlate() calculates. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.correlate documentation
On Wed, Oct 9, 2013 at 7:48 PM, Bernhard Spinnler bernhard.spinn...@gmx.net wrote: Hi Richard, Ah, I searched the list but didn't find those posts before… I can easily imagine that correlation is defined differently in different disciplines. Both ways are correct and it's just a convention or definition. In my field (Digital Communications, Digital Signal Processing) the vast majority uses the convention implemented by the code. Here are a few examples of prominent text books: - Papoulis, Probaility, Random Variables, and Stochastic Processes, McGraw-Hill, 2nd ed. - Benvenuto, Cherubini, Algorithms for Communications Systems and their Applications, Wiley. - Carlson, Communication Systems 4th ed. 2002, McGraw-Hill. Last not least, Matlab's xcorr() function behaves exactly like correlate() does right now, see - http://www.mathworks.de/de/help/signal/ref/xcorr.html But, as you say, the most important aspect might be, that most people will probably prefer changing the docs instead of changing the code. Yeah, unless the current behaviour is actually broken or redundant in some way, we're not going to switch from one perfectly good convention to another perfectly good convention and break everyone's code in the process. The most helpful thing would be if you could file a pull request that just changes the docstring to what you think it should be. Extra bonus points if it points out that there is another definition some people might be expecting instead, and explains how those people can use the existing functions to get what they want. :-) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.correlate documentation
Hi Bernard, Looks like you're on to something - two other people have raised this discrepancy before: https://github.com/numpy/numpy/issues/2588. Unfortunately, when it comes to resolving the discrepancy one of the previous comments takes the opposite view. Namely, that the docstring is correct and the code is wrong. Do different domains use different conventions here? Are there some references to back up one stance or another? But all else being equal, I'm guessing there'll be far more appetite for updating the documentation than the code. Regards, Richard Hattersley On 7 October 2013 22:09, Bernhard Spinnler bernhard.spinn...@gmx.netwrote: The numpy.correlate documentation says: correlate(a, v) = z[k] = sum_n a[n] * conj(v[n+k]) In [1]: a = [1, 2] In [2]: v = [2, 1j] In [3]: z = correlate(a, v, 'full') In [4]: z Out[4]: array([ 0.-1.j, 2.-2.j, 4.+0.j]) However, according to the documentation, z should be z[-1] = a[1] * conj(v[0]) = 4.+0.j z[0] = a[0] * conj(v[0]) + a[1] * conj(v[1]) = 2.-2.j z[1] = a[0] * conj(v[1]) = 0.-1.j which is the time reversed version of what correlate() calculates. IMHO, the correlate() code is correct. The correct formula in the docs (which is also the correlation formula in standard text books) should be z[k] = sum_n a[n+k] * conj(v[n]) Cheers, Bernhard ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in numpy.correlate documentation
The numpy.correlate documentation says: correlate(a, v) = z[k] = sum_n a[n] * conj(v[n+k]) In [1]: a = [1, 2] In [2]: v = [2, 1j] In [3]: z = correlate(a, v, 'full') In [4]: z Out[4]: array([ 0.-1.j, 2.-2.j, 4.+0.j]) However, according to the documentation, z should be z[-1] = a[1] * conj(v[0]) = 4.+0.j z[0] = a[0] * conj(v[0]) + a[1] * conj(v[1]) = 2.-2.j z[1] = a[0] * conj(v[1]) = 0.-1.j which is the time reversed version of what correlate() calculates. IMHO, the correlate() code is correct. The correct formula in the docs (which is also the correlation formula in standard text books) should be z[k] = sum_n a[n+k] * conj(v[n]) Cheers, Bernhard ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug (?) converting list to array
I'm trying to enter a 2-D array and np.array() is returning a 1-D array of lists. I'm using Python (x,y) on Windows 7 with numpy 1.7.1. Here's the code that is giving me issues. f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] f1a = np.array(f1) f1a array([[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]], dtype=object) What am I missing? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug (?) converting list to array
The two lists are of different sizes. Had to count twice to catch that. Ben Root On Mon, Sep 9, 2013 at 9:46 AM, Chad Kidder cckid...@gmail.com wrote: I'm trying to enter a 2-D array and np.array() is returning a 1-D array of lists. I'm using Python (x,y) on Windows 7 with numpy 1.7.1. Here's the code that is giving me issues. f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] f1a = np.array(f1) f1a array([[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]], dtype=object) What am I missing? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug (?) converting list to array
One list has 6 entries and one has 7, so they can't be aligned into a single array. Possibly it would be better to raise an error here instead of returning an object array, but that's what's going on. -n On 9 Sep 2013 14:49, Chad Kidder cckid...@gmail.com wrote: I'm trying to enter a 2-D array and np.array() is returning a 1-D array of lists. I'm using Python (x,y) on Windows 7 with numpy 1.7.1. Here's the code that is giving me issues. f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] f1a = np.array(f1) f1a array([[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]], dtype=object) What am I missing? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug (?) converting list to array
Oh, so there was a bug in the user... On Mon, Sep 9, 2013 at 7:52 AM, Nathaniel Smith n...@pobox.com wrote: One list has 6 entries and one has 7, so they can't be aligned into a single array. Possibly it would be better to raise an error here instead of returning an object array, but that's what's going on. -n On 9 Sep 2013 14:49, Chad Kidder cckid...@gmail.com wrote: I'm trying to enter a 2-D array and np.array() is returning a 1-D array of lists. I'm using Python (x,y) on Windows 7 with numpy 1.7.1. Here's the code that is giving me issues. f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] f1a = np.array(f1) f1a array([[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]], dtype=object) What am I missing? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug (?) converting list to array
On Mon, Sep 9, 2013 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote: One list has 6 entries and one has 7, so they can't be aligned into a single array. Possibly it would be better to raise an error here instead of returning an object array, but that's what's going on. It did at some point (and I relied on the exception to catch bugs, since I'm still using mainly numpy 1.5) f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] np.array(f1) Traceback (most recent call last): File stdin, line 1, in module ValueError: setting an array element with a sequence. np.__version__ '1.5.1' now we get object arrays (in scipy.stats, and I didn't know what to do with them) I don't remember any discussion on this. Josef -n On 9 Sep 2013 14:49, Chad Kidder cckid...@gmail.com wrote: I'm trying to enter a 2-D array and np.array() is returning a 1-D array of lists. I'm using Python (x,y) on Windows 7 with numpy 1.7.1. Here's the code that is giving me issues. f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] f1a = np.array(f1) f1a array([[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]], dtype=object) What am I missing? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug (?) converting list to array
On Mon, Sep 9, 2013 at 11:35 AM, Nathaniel Smith n...@pobox.com wrote: On 9 Sep 2013 15:50, josef.p...@gmail.com wrote: On Mon, Sep 9, 2013 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote: One list has 6 entries and one has 7, so they can't be aligned into a single array. Possibly it would be better to raise an error here instead of returning an object array, but that's what's going on. It did at some point (and I relied on the exception to catch bugs, since I'm still using mainly numpy 1.5) f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] np.array(f1) Traceback (most recent call last): File stdin, line 1, in module ValueError: setting an array element with a sequence. np.__version__ '1.5.1' now we get object arrays (in scipy.stats, and I didn't know what to do with them) I don't remember any discussion on this. There may not have been any. Isn't it too late now? Feel free to submit a PR and we can argue about which way is better... (I also prefer the 1.5 approach personally.) I'm just a balcony muppet (and user) (and I lost the argument against object arrays in scipy.stats) Josef -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug (?) converting list to array
On 9 Sep 2013 15:50, josef.p...@gmail.com wrote: On Mon, Sep 9, 2013 at 9:52 AM, Nathaniel Smith n...@pobox.com wrote: One list has 6 entries and one has 7, so they can't be aligned into a single array. Possibly it would be better to raise an error here instead of returning an object array, but that's what's going on. It did at some point (and I relied on the exception to catch bugs, since I'm still using mainly numpy 1.5) f1 = [[15.207, 15.266, 15.181, 15.189, 15.215, 15.198], [-45, -57, -62, -70, -72, -73.5, -77]] np.array(f1) Traceback (most recent call last): File stdin, line 1, in module ValueError: setting an array element with a sequence. np.__version__ '1.5.1' now we get object arrays (in scipy.stats, and I didn't know what to do with them) I don't remember any discussion on this. There may not have been any. Feel free to submit a PR and we can argue about which way is better... (I also prefer the 1.5 approach personally.) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in gufuncs affecting umath_linalg
Hi, I think I have found an undocumented feature of the gufuncs machinery. I have filed a bug report: https://github.com/numpy/numpy/issues/3582 Some more background on what i am seeing... I have coded a gufunc with signature '(r,c,p),(g,g,g,q)-(r,c,q)'. It is a color map, i.e. a transformation of a 3-dimensional array of p channels (with p=3, an RGB image of r rows and c columns), into a 3-dimensional array of q channels (with q=4, a CMYK image of the same size), via a p-dimensional look-up-table (LUT). For all practical purposes, the LUT always has the first three dimensions identical, hence the repeated g's in the signature. The function registered with this signature, receives the expected values in the dimensions argument: 'n', 'r', 'c', 'p', 'g', 'q', with 'n' being the length of the gufunc loop. But there is a problem with the steps argument. As expected I get a 13 item long array: 3 main loop strides, 3 strides (r, c, p) for the first argument, 4 strides (g, g, g, q) for the second, and 3 strides (r, c, q) for the return. Everything is OK except for the strides for the repeating 'g's: instead of getting three different stride values, the first two are the same as the last. This does not happen if I modify the signature to be '(r,c,p),(i,j,k,q)-(r,c,q)', which is the workaround I am implementing in my code for the time being. I have also managed to repeat the behavior in repeated dimensions on other arguments, e.g. '(r,r,p),(i,j,k,q)-(r,c,p)' shows the same issue for the strides of the first argument. I have seen that the gufunc version of umath_linalg makes use of a similar, repeated index scheme, e.g. in the 'solve' and 'det' gufuncs. At least for these two, the tests in place do not catch the error. For solve, the tests run these two cases (the results below come from the traditional linalg, as I am running numpy 1.7.1): np.linalg.solve([[1, 2], [3, 4]], [[4, 3], [2, 1]]) array([[-6., -5.], [ 5., 4.]]) np.linalg.solve([[1+2j,2+3j], [3+4j,4+5j]], [[4+3j,3+2j], [2+1j,1+0j]]) array([[-6. +0.e+00j, -5. +0.e+00j], [ 5. -3.46944695e-16j, 4. -3.46944695e-16j]]) But because of their highly strucutred nature, these particular test cases give the same result if you get the strides wrong (!!!): np.linalg.solve([[1, 2], [2, 3]], [[4, 3], [3, 2]]) array([[-6., -5.], [ 5., 4.]]) np.linalg.solve([[1+2j,2+3j], [2+3j,3+4j]], [[4+3j,3+2j], [3+2j,2+1j]]) array([[-6. -1.09314267e-15j, -5. -1.09314267e-15j], [ 5. +1.08246745e-15j, 4. +1.08246745e-15j]]) As for the determinant, no abolute check of the return value is performed: the return of 'det' is compared to the product of the return of 'eigvals', which also has the '(m, m)' signature, and interprets the data equally wrong. For my particular issue, I am simply going to register the gufunc with non-repeating dimensions, check for equality in a Python wrapper, and discard the repeated values in my C code. Not sure what is the best way of going about umath_linalg. Probably better to fix the issue in the gufunc machinery than to patch umath_lnalg. If there's any way, other than reporting it, in which I can help getting this fixed, I'll be more than happy to do it. But for this job I am clearly unqualified labor, and would need to work under someone else's command. Regards, Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug fixes: which branch?
What is the preferred strategy for handling bug fix PRs? Initial fix on master, and then a separate PR to backport to v1.7.x? Or the reverse? It doesn't look like v1.7.x is being merged into master regularly, so the matplotlib pattern (fix on maintenance, merge maintenance into master) seems not to be used here. Thanks. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug fixes: which branch?
On Sun, Jun 16, 2013 at 10:57 PM, Eric Firing efir...@hawaii.edu wrote: What is the preferred strategy for handling bug fix PRs? Initial fix on master, and then a separate PR to backport to v1.7.x? Or the reverse? It doesn't look like v1.7.x is being merged into master regularly, so the matplotlib pattern (fix on maintenance, merge maintenance into master) seems not to be used here. Fix on master then backport is the current strategy. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in deepcopy() of rank-zero arrays?
+1 for getting rid of this inconsistency We've hit this with Iris (a met/ocean analysis package - see github), and have had to add several workarounds. On 19 April 2013 16:55, Chris Barker - NOAA Federal chris.bar...@noaa.govwrote: Hi folks, In [264]: np.__version__ Out[264]: '1.7.0' I just noticed that deep copying a rank-zero array yields a scalar -- probably not what we want. In [242]: a1 = np.array(3) In [243]: type(a1), a1 Out[243]: (numpy.ndarray, array(3)) In [244]: a2 = copy.deepcopy(a1) In [245]: type(a2), a2 Out[245]: (numpy.int32, 3) regular copy.copy() seems to work fine: In [246]: a3 = copy.copy(a1) In [247]: type(a3), a3 Out[247]: (numpy.ndarray, array(3)) Higher-rank arrays seem to work fine: In [253]: a1 = np.array((3,4)) In [254]: type(a1), a1 Out[254]: (numpy.ndarray, array([3, 4])) In [255]: a2 = copy.deepcopy(a1) In [256]: type(a2), a2 Out[256]: (numpy.ndarray, array([3, 4])) Array scalars seem to work fine as well: In [257]: s1 = np.float32(3) In [258]: s2 = copy.deepcopy(s1) In [261]: type(s1), s1 Out[261]: (numpy.float32, 3.0) In [262]: type(s2), s2 Out[262]: (numpy.float32, 3.0) There are other ways to copy arrays, but in this case, I had a dict with a bunch of arrays in it, and needed a deepcopy of the dict. I was surprised to find that my rank-0 array got turned into a scalar. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in deepcopy() of rank-zero arrays?
hmm -- I suppose one of us should post an issue on github -- then ask for it ti be fixed before 1.8 ;-) I'll try to get to the issue if no one beats me to it -- got to run now... -Chris On Tue, Apr 30, 2013 at 5:35 AM, Richard Hattersley rhatters...@gmail.comwrote: +1 for getting rid of this inconsistency We've hit this with Iris (a met/ocean analysis package - see github), and have had to add several workarounds. On 19 April 2013 16:55, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: Hi folks, In [264]: np.__version__ Out[264]: '1.7.0' I just noticed that deep copying a rank-zero array yields a scalar -- probably not what we want. In [242]: a1 = np.array(3) In [243]: type(a1), a1 Out[243]: (numpy.ndarray, array(3)) In [244]: a2 = copy.deepcopy(a1) In [245]: type(a2), a2 Out[245]: (numpy.int32, 3) regular copy.copy() seems to work fine: In [246]: a3 = copy.copy(a1) In [247]: type(a3), a3 Out[247]: (numpy.ndarray, array(3)) Higher-rank arrays seem to work fine: In [253]: a1 = np.array((3,4)) In [254]: type(a1), a1 Out[254]: (numpy.ndarray, array([3, 4])) In [255]: a2 = copy.deepcopy(a1) In [256]: type(a2), a2 Out[256]: (numpy.ndarray, array([3, 4])) Array scalars seem to work fine as well: In [257]: s1 = np.float32(3) In [258]: s2 = copy.deepcopy(s1) In [261]: type(s1), s1 Out[261]: (numpy.float32, 3.0) In [262]: type(s2), s2 Out[262]: (numpy.float32, 3.0) There are other ways to copy arrays, but in this case, I had a dict with a bunch of arrays in it, and needed a deepcopy of the dict. I was surprised to find that my rank-0 array got turned into a scalar. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in deepcopy() of rank-zero arrays?
Hi folks, In [264]: np.__version__ Out[264]: '1.7.0' I just noticed that deep copying a rank-zero array yields a scalar -- probably not what we want. In [242]: a1 = np.array(3) In [243]: type(a1), a1 Out[243]: (numpy.ndarray, array(3)) In [244]: a2 = copy.deepcopy(a1) In [245]: type(a2), a2 Out[245]: (numpy.int32, 3) regular copy.copy() seems to work fine: In [246]: a3 = copy.copy(a1) In [247]: type(a3), a3 Out[247]: (numpy.ndarray, array(3)) Higher-rank arrays seem to work fine: In [253]: a1 = np.array((3,4)) In [254]: type(a1), a1 Out[254]: (numpy.ndarray, array([3, 4])) In [255]: a2 = copy.deepcopy(a1) In [256]: type(a2), a2 Out[256]: (numpy.ndarray, array([3, 4])) Array scalars seem to work fine as well: In [257]: s1 = np.float32(3) In [258]: s2 = copy.deepcopy(s1) In [261]: type(s1), s1 Out[261]: (numpy.float32, 3.0) In [262]: type(s2), s2 Out[262]: (numpy.float32, 3.0) There are other ways to copy arrays, but in this case, I had a dict with a bunch of arrays in it, and needed a deepcopy of the dict. I was surprised to find that my rank-0 array got turned into a scalar. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in np.records?
On Wed, Mar 20, 2013 at 2:57 PM, Pierre Barbier de Reuille pierre.barbierdereui...@gmail.com wrote: Hey, I am trying to use titles for the record arrays. In the documentation, it is specified that any column can set to None. However, trying this fails on numpy 1.6.2 because in np.core.records, on line 195, the strip method is called on the title object. This is really annoying. Could we fix this by replacing line 195 with: self._titles = [n.strip() if n is not None else None for n in titles[:self._nfields]] ? That sounds reasonable. Ideally you'd send a pull request for this, including a regression test. Otherwise providing a self-contained example that can be turned into a test would help. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in einsum?
Hi, I have encountered a very weird behaviour with einsum. I try to compute something like R*A*R', where * denotes a kind of matrix multiplication. However, for particular shapes of R and A, the results are extremely bad. I compare two einsum results: First, I compute in two einsum calls as (R*A)*R'. Second, I compute the whole result in one einsum call. However, the results are significantly different for some shapes. My test: import numpy as np for D in range(30): A = np.random.randn(100,D,D) R = np.random.randn(D,D) Y1 = np.einsum('...ik,...kj-...ij', R, A) Y1 = np.einsum('...ik,...kj-...ij', Y1, R.T) Y2 = np.einsum('...ik,...kl,...lj-...ij', R, A, R.T) print(D=%d % D, np.allclose(Y1,Y2), np.linalg.norm(Y1-Y2)) Output: D=0 True 0.0 D=1 True 0.0 D=2 True 8.40339658678e-15 D=3 True 8.09995399928e-15 D=4 True 3.59428803435e-14 D=5 False 34.755610184 D=6 False 28.3576558351 D=7 False 41.5402690906 D=8 True 2.31709582841e-13 D=9 False 36.0161112799 D=10 True 4.76237746912e-13 D=11 True 4.5790782e-13 D=12 True 4.90302218301e-13 D=13 True 6.96175851271e-13 D=14 True 1.10067181384e-12 D=15 True 1.29095933163e-12 D=16 True 1.3466837332e-12 D=17 True 1.52265065763e-12 D=18 True 2.05407923852e-12 D=19 True 2.33327630748e-12 D=20 True 2.96849358082e-12 D=21 True 3.31063706175e-12 D=22 True 4.28163620455e-12 D=23 True 3.58951880681e-12 D=24 True 4.69973694769e-12 D=25 True 5.47385264567e-12 D=26 True 5.49643316347e-12 D=27 True 6.75132988402e-12 D=28 True 7.86435437892e-12 D=29 True 7.85453681029e-12 So, for D={5,6,7,9}, allclose returns False and the error norm is HUGE. It doesn't seem like just some small numerical inaccuracy because the error norm is so large. I don't know which one is correct (Y1 or Y2) but at least either one is wrong in my opinion. I ran the same test several times, and each time same values of D fail. If I change the shapes somehow, the failing values of D might change too, but I usually have several failing values. I'm running the latest version from github (commit bd7104cef4) under Python 3.2.3. With NumPy 1.6.1 under Python 2.7.3 the test crashes and Python exits printing Floating point exception. This seems so weird to me that I wonder if I'm just doing something stupid.. Thanks a lot for any help! Jaakko ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug with ufuncs made with frompyfunc
On Wed, Jan 9, 2013 at 7:23 AM, OKB (not okblacke) brenb...@brenbarn.net wrote: A bug causing errors with using methods of ufuncs created with frompyfunc was mentioned on the list over a year ago: http://mail.scipy.org/pipermail/numpy-discussion/2011- September/058501.html Is there any word on the status of this bug? I wasn't able to find a ticket in the bug tracker. That thread says that it had already been fixed in the development version of numpy, so it should be fixed in the upcoming 1.7. If you want to be sure then you try it on the 1.7 release candidate. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug with ufuncs made with frompyfunc
A bug causing errors with using methods of ufuncs created with frompyfunc was mentioned on the list over a year ago: http://mail.scipy.org/pipermail/numpy-discussion/2011- September/058501.html Is there any word on the status of this bug? I wasn't able to find a ticket in the bug tracker. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in as_strided/reshape
Sebastian Berg sebastian at sipsolutions.net writes: Hello, looking at the code, when only adding/removing dimensions with size 1, numpy takes a small shortcut, however it uses 0 stride lengths as value for the new one element dimensions temporarily, then replacing it again to ensure the new array is contiguous. This replacing does not check if the dimension has more then size 1. Likely there is a better way to fix it, but the attached diff should do it. Regards, Sebastian Thanks for the confirmation. So this doesn't get lost I've opened issue #380 on GitHub https://github.com/numpy/numpy/issues/380 -Dave ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in as_strided/reshape
Dave Hirschfeld dave.hirschfeld at gmail.com writes: It seems that reshape doesn't work correctly on an array which has been resized using the 0-stride trick e.g. In [73]: x = array([5]) In [74]: y = as_strided(x, shape=(10,), strides=(0,)) In [75]: y Out[75]: array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5]) In [76]: y.reshape([10,1]) Out[76]: array([[ 5], [ 8], [ 762933412], [-2013265919], [ 26], [ 64], [ 762933414], [-2013244356], [ 26], [ 64]]) Should all be 5 In [77]: y.copy().reshape([10,1]) Out[77]: array([[5], [5], [5], [5], [5], [5], [5], [5], [5], [5]]) In [78]: np.__version__ Out[78]: '1.6.2' Perhaps a clause such as below is required in reshape? if any(stride == 0 for stride in y.strides): return y.copy().reshape(shape) else: return y.reshape(shape) Regards, Dave Though it would be good to avoid the copy which you should be able to do in this case. Investigating further: In [15]: y.strides Out[15]: (0,) In [16]: z = y.reshape([10,1]) In [17]: z.strides Out[17]: (4, 4) In [18]: z.strides = (0, 4) In [19]: z Out[19]: array([[5], [5], [5], [5], [5], [5], [5], [5], [5], [5]]) In [32]: y.reshape([5, 2]) Out[32]: array([[5, 5], [5, 5], [5, 5], [5, 5], [5, 5]]) In [33]: y.reshape([5, 2]).strides Out[33]: (0, 0) So it seems that reshape is incorrectly setting the stride of axis0 to 4, but only when the appended axis is of size 1. -Dave ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in as_strided/reshape
Hello, looking at the code, when only adding/removing dimensions with size 1, numpy takes a small shortcut, however it uses 0 stride lengths as value for the new one element dimensions temporarily, then replacing it again to ensure the new array is contiguous. This replacing does not check if the dimension has more then size 1. Likely there is a better way to fix it, but the attached diff should do it. Regards, Sebastian On Do, 2012-08-09 at 13:06 +, Dave Hirschfeld wrote: Dave Hirschfeld dave.hirschfeld at gmail.com writes: It seems that reshape doesn't work correctly on an array which has been resized using the 0-stride trick e.g. In [73]: x = array([5]) In [74]: y = as_strided(x, shape=(10,), strides=(0,)) In [75]: y Out[75]: array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5]) In [76]: y.reshape([10,1]) Out[76]: array([[ 5], [ 8], [ 762933412], [-2013265919], [ 26], [ 64], [ 762933414], [-2013244356], [ 26], [ 64]]) Should all be 5 In [77]: y.copy().reshape([10,1]) Out[77]: array([[5], [5], [5], [5], [5], [5], [5], [5], [5], [5]])--- a/numpy/core/src/multiarray/shape.c +++ b/numpy/core/src/multiarray/shape.c @@ -273,21 +273,21 @@ PyArray_Newshape(PyArrayObject *self, PyArray_Dims *newdims, * appropriate value to preserve contiguousness */ if (order == NPY_FORTRANORDER) { -if (strides[0] == 0) { +if ((strides[0] == 0) (dimensions[0] == 1)) { strides[0] = PyArray_DESCR(self)-elsize; } for (i = 1; i ndim; i++) { -if (strides[i] == 0) { +if ((strides[i] == 0) (dimensions[i] == 1)) { strides[i] = strides[i-1] * dimensions[i-1]; } } } else { -if (strides[ndim-1] == 0) { +if ((strides[ndim-1] == 0) (dimensions[ndim-1] == 1)) { strides[ndim-1] = PyArray_DESCR(self)-elsize; } for (i = ndim - 2; i -1; i--) { -if (strides[i] == 0) { +if ((strides[i] == 0) (dimensions[i] == 1)) { strides[i] = strides[i+1] * dimensions[i+1]; } } In [78]: np.__version__ Out[78]: '1.6.2' Perhaps a clause such as below is required in reshape? if any(stride == 0 for stride in y.strides): return y.copy().reshape(shape) else: return y.reshape(shape) Regards, Dave Though it would be good to avoid the copy which you should be able to do in this case. Investigating further: In [15]: y.strides Out[15]: (0,) In [16]: z = y.reshape([10,1]) In [17]: z.strides Out[17]: (4, 4) In [18]: z.strides = (0, 4) In [19]: z Out[19]: array([[5], [5], [5], [5], [5], [5], [5], [5], [5], [5]]) In [32]: y.reshape([5, 2]) Out[32]: array([[5, 5], [5, 5], [5, 5], [5, 5], [5, 5]]) In [33]: y.reshape([5, 2]).strides Out[33]: (0, 0) So it seems that reshape is incorrectly setting the stride of axis0 to 4, but only when the appended axis is of size 1. -Dave ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From eed2abca6144e16c5d9ca208ef90dd01f7dd6009 Mon Sep 17 00:00:00 2001 From: Sebastian Berg sebast...@sipsolutions.net Date: Thu, 9 Aug 2012 17:17:32 +0200 Subject: [PATCH] Fix reshaping of arrays with stride 0 in a dimension with size of more then 1. --- numpy/core/src/multiarray/shape.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/numpy/core/src/multiarray/shape.c b/numpy/core/src/multiarray/shape.c index 0672326..09a6cb0 100644 --- a/numpy/core/src/multiarray/shape.c +++ b/numpy/core/src/multiarray/shape.c @@ -273,21 +273,21 @@ PyArray_Newshape(PyArrayObject *self, PyArray_Dims *newdims, * appropriate value to preserve contiguousness */ if (order == NPY_FORTRANORDER) { -if (strides[0] == 0) { +if ((strides[0] == 0) (dimensions[0] == 1)) { strides[0] = PyArray_DESCR(self)-elsize; } for (i = 1; i ndim; i++) { -if (strides[i] == 0) { +if ((strides[i] == 0) (dimensions[i] == 1)) { strides[i] = strides[i-1] * dimensions[i-1]; } } } else { -if (strides[ndim-1] == 0) { +if ((strides[ndim-1] == 0) (dimensions[ndim-1] == 1)) { strides[ndim-1] =
[Numpy-discussion] Bug in as_strided/reshape
It seems that reshape doesn't work correctly on an array which has been resized using the 0-stride trick e.g. In [73]: x = array([5]) In [74]: y = as_strided(x, shape=(10,), strides=(0,)) In [75]: y Out[75]: array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5]) In [76]: y.reshape([10,1]) Out[76]: array([[ 5], [ 8], [ 762933412], [-2013265919], [ 26], [ 64], [ 762933414], [-2013244356], [ 26], [ 64]]) Should all be 5 In [77]: y.copy().reshape([10,1]) Out[77]: array([[5], [5], [5], [5], [5], [5], [5], [5], [5], [5]]) In [78]: np.__version__ Out[78]: '1.6.2' Perhaps a clause such as below is required in reshape? if any(stride == 0 for stride in y.strides): return y.copy().reshape(shape) else: return y.reshape(shape) Regards, Dave ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On 07/27/2012 03:58 PM, Andreas Mueller wrote: Hi Everybody. The bug is that no error is raised, right? The docs say where(condition, [x, y]) x, y : array_like, optional Values from which to choose. `x` and `y` need to have the same shape as `condition` In the example you gave, x was a scalar. net.max() returns an array: print type(net.max()) type 'numpy.float32' That was the reason I cast it to a float to check that that did result in the correct behavior for `where`. Phil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On Mon, Jul 30, 2012 at 2:30 PM, Phil Hodge ho...@stsci.edu wrote: On 07/27/2012 03:58 PM, Andreas Mueller wrote: Hi Everybody. The bug is that no error is raised, right? The docs say where(condition, [x, y]) x, y : array_like, optional Values from which to choose. `x` and `y` need to have the same shape as `condition` In the example you gave, x was a scalar. net.max() returns an array: print type(net.max()) type 'numpy.float32' No, that's a scalar. The type would be numpy.ndarray if it were an array. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
Can you file a bug report on Github's issue tracker? Thanks, -Travis On Jul 26, 2012, at 1:33 PM, Phil Hodge wrote: On a Linux machine: uname -srvop Linux 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012 x86_64 GNU/Linux this example shows an apparent problem with the where function: Python 2.7.1 (r271:86832, Dec 21 2010, 11:19:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type help, copyright, credits or license for more information. import numpy as np print np.__version__ 1.5.1 net = np.zeros(3, dtype='f4') net[1] = 0.00458849 net[2] = 0.605202 max_net = net.max() test = np.where(net = 0., max_net, net) print test [ -2.23910537e-35 4.58848989e-03 6.05202019e-01] When I specified the dtype for net as 'f8', test[0] was 3.46244974e+68. It worked as expected (i.e. test[0] should be 0.605202) when I specified float(max_net) as the second argument to np.where. Phil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On 07/30/2012 10:53 AM, Travis Oliphant wrote: Can you file a bug report on Github's issue tracker? It's https://github.com/numpy/numpy/issues/369 Phil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.mean() revisited
On Thu, 2012-07-26 at 22:15 -0600, Charles R Harris wrote: I would support accumulating in 64 bits but, IIRC, the function will need to be rewritten so that it works by adding 32 bit floats to the accumulator to save space. There are also more stable methods that could also be investigated. There is a nice little project there for someone to cut their teeth on. So a (very) quick read around suggests that using an interim mean gives a more robust algorithm. The problem being, that these techniques are either multi-pass, or inherently slower (due to say a division in the loop). Higher precision would not suffer the same potential slow down and would solve most cases of this problem. Henry ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.mean() revisited
On Fri, Jul 27, 2012 at 5:15 AM, Charles R Harris charlesr.har...@gmail.com wrote: I would support accumulating in 64 bits but, IIRC, the function will need to be rewritten so that it works by adding 32 bit floats to the accumulator to save space. There are also more stable methods that could also be investigated. There is a nice little project there for someone to cut their teeth on. So the obvious solution here would be to make the ufunc reduce loop smart enough that x = np.zeros(2 ** 30, dtype=float32) np.sum(x, dtype=float64) does not upcast 'x' to float64's as a whole. This shouldn't be too terrible to implement -- iterate over the float32 array, and only upcast each inner-loop buffer as you go, instead of upcasting the whole thing. In fact, nditer might do this already? Then using a wide accumulator by default would just take a few lines of code in numpy.core._methods._mean to select the proper dtype and downcast the result. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On Thu, Jul 26, 2012 at 2:33 PM, Phil Hodge ho...@stsci.edu wrote: On a Linux machine: uname -srvop Linux 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012 x86_64 GNU/Linux this example shows an apparent problem with the where function: Python 2.7.1 (r271:86832, Dec 21 2010, 11:19:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type help, copyright, credits or license for more information. import numpy as np print np.__version__ 1.5.1 net = np.zeros(3, dtype='f4') net[1] = 0.00458849 net[2] = 0.605202 max_net = net.max() test = np.where(net = 0., max_net, net) print test [ -2.23910537e-35 4.58848989e-03 6.05202019e-01] When I specified the dtype for net as 'f8', test[0] was 3.46244974e+68. It worked as expected (i.e. test[0] should be 0.605202) when I specified float(max_net) as the second argument to np.where. Phil Confirmed with version 1.7.0.dev-470c857 on a CentOS6 64-bit machine. Strange indeed. Breaking it down further: res = (net = 0.) print res [ True False False] np.where(res, max_net, net) array([ -2.23910537e-35, 4.58848989e-03, 6.05202019e-01], dtype=float32) Very Strange... Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On Fri, Jul 27, 2012 at 2:01 PM, Benjamin Root ben.r...@ou.edu wrote: On Thu, Jul 26, 2012 at 2:33 PM, Phil Hodge ho...@stsci.edu wrote: On a Linux machine: uname -srvop Linux 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012 x86_64 GNU/Linux this example shows an apparent problem with the where function: Python 2.7.1 (r271:86832, Dec 21 2010, 11:19:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type help, copyright, credits or license for more information. import numpy as np print np.__version__ 1.5.1 net = np.zeros(3, dtype='f4') net[1] = 0.00458849 net[2] = 0.605202 max_net = net.max() test = np.where(net = 0., max_net, net) print test [ -2.23910537e-35 4.58848989e-03 6.05202019e-01] When I specified the dtype for net as 'f8', test[0] was 3.46244974e+68. It worked as expected (i.e. test[0] should be 0.605202) when I specified float(max_net) as the second argument to np.where. Phil Confirmed with version 1.7.0.dev-470c857 on a CentOS6 64-bit machine. Strange indeed. Breaking it down further: res = (net = 0.) print res [ True False False] np.where(res, max_net, net) array([ -2.23910537e-35, 4.58848989e-03, 6.05202019e-01], dtype=float32) Very Strange... Ben Root What if find really interesting is that -2.23910537e-35 is the byte swapped version of 6.05202019e-01. Chris ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
Hi Everybody. The bug is that no error is raised, right? The docs say where(condition, [x, y]) x, y : array_like, optional Values from which to choose. `x` and `y` need to have the same shape as `condition` In the example you gave, x was a scalar. Cheers, Andy ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On Fri, Jul 27, 2012 at 3:58 PM, Andreas Mueller amuel...@ais.uni-bonn.dewrote: Hi Everybody. The bug is that no error is raised, right? The docs say where(condition, [x, y]) x, y : array_like, optional Values from which to choose. `x` and `y` need to have the same shape as `condition` In the example you gave, x was a scalar. Cheers, Andy Hmm, that is incorrect, I believe. I have used a scalar before. Maybe it works because a scalar is broadcastable to the same shape as any other N-dim array? If so, then the wording of that docstring needs to be fixed. No, I think Christopher hit it on the head. For whatever reason, the endian-ness somewhere is not being respected and causes a byte-swapped version to show up. How that happens, though, is beyond me. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On 07/27/2012 09:10 PM, Benjamin Root wrote: On Fri, Jul 27, 2012 at 3:58 PM, Andreas Mueller amuel...@ais.uni-bonn.de mailto:amuel...@ais.uni-bonn.de wrote: Hi Everybody. The bug is that no error is raised, right? The docs say where(condition, [x, y]) x, y : array_like, optional Values from which to choose. `x` and `y` need to have the same shape as `condition` In the example you gave, x was a scalar. Cheers, Andy Hmm, that is incorrect, I believe. I have used a scalar before. Maybe it works because a scalar is broadcastable to the same shape as any other N-dim array? If so, then the wording of that docstring needs to be fixed. No, I think Christopher hit it on the head. For whatever reason, the endian-ness somewhere is not being respected and causes a byte-swapped version to show up. How that happens, though, is beyond me. Well, if you use np.repeat(max_net, 3) instead of max_net, it works as expected. So if you use the function as documented, it does the right thing. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.where?
On Fri, Jul 27, 2012 at 4:10 PM, Benjamin Root ben.r...@ou.edu wrote: On Fri, Jul 27, 2012 at 3:58 PM, Andreas Mueller amuel...@ais.uni-bonn.de wrote: Hi Everybody. The bug is that no error is raised, right? The docs say where(condition, [x, y]) x, y : array_like, optional Values from which to choose. `x` and `y` need to have the same shape as `condition` In the example you gave, x was a scalar. Cheers, Andy Hmm, that is incorrect, I believe. I have used a scalar before. Maybe it works because a scalar is broadcastable to the same shape as any other N-dim array? If so, then the wording of that docstring needs to be fixed. No, I think Christopher hit it on the head. For whatever reason, the endian-ness somewhere is not being respected and causes a byte-swapped version to show up. How that happens, though, is beyond me. Ben Root It may have something to do with the dtype size as well. The problem seen with, net = np.zeros(3, dtype='f4') Disappears for net = np.zeros(3, dtype='f8') and above. Chris ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in numpy.where?
On a Linux machine: uname -srvop Linux 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012 x86_64 GNU/Linux this example shows an apparent problem with the where function: Python 2.7.1 (r271:86832, Dec 21 2010, 11:19:43) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type help, copyright, credits or license for more information. import numpy as np print np.__version__ 1.5.1 net = np.zeros(3, dtype='f4') net[1] = 0.00458849 net[2] = 0.605202 max_net = net.max() test = np.where(net = 0., max_net, net) print test [ -2.23910537e-35 4.58848989e-03 6.05202019e-01] When I specified the dtype for net as 'f8', test[0] was 3.46244974e+68. It worked as expected (i.e. test[0] should be 0.605202) when I specified float(max_net) as the second argument to np.where. Phil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in numpy.mean() revisited
There was a thread in January discussing the non-obvious behavior of numpy.mean() for large arrays of float32 values [1]. This issue is nicely discussed at the end of the numpy.mean() documentation [2] with an example: a = np.zeros((2, 512*512), dtype=np.float32) a[0, :] = 1.0 a[1, :] = 0.1 np.mean(a) 0.546875 From the docs and previous discussion it seems there is no technical difficulty in choosing a different (higher precision) type for the accumulator using the dtype arg, and in fact this is done automatically for int values. My question is whether there would be any support for doing something more than documenting this behavior. I suspect very few people ever make it below the fold for the np.mean() documentation. Taking the mean of large arrays of float32 values is a *very* common use case and giving the wrong answer with default inputs is really disturbing. I recently had to rebuild a complex science data archive because of corrupted mean values. Possible ideas to stimulate discussion: 1. Always use float64 to accumulate float types that are 64 bits or less. Are there serious performance impacts to automatically using float64 to accumulate float32 arrays? I appreciate this would likely introduce unwanted regressions (sometimes suddenly getting the right answer is a bad thing). So could this be considered for numpy 2.0? 2. Might there be a way to emit a warning if the number of values and the max accumulated value [3] are such that the estimated fractional error is above some tolerance? I'm not even sure if this is a good idea or if there will be howls from the community as their codes start warning about inaccurate mean values. Better idea along this line?? Cheers, Tom [1]: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059960.html [2]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html [3]: Using the max accumulated value during accumulation instead of the final accumulated value seems like the right thing for estimating precision loss. But this would affect performance so maybe just using the final value would catch many cases. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in numpy.mean() revisited
On Thu, Jul 26, 2012 at 9:26 PM, Tom Aldcroft aldcr...@head.cfa.harvard.edu wrote: There was a thread in January discussing the non-obvious behavior of numpy.mean() for large arrays of float32 values [1]. This issue is nicely discussed at the end of the numpy.mean() documentation [2] with an example: a = np.zeros((2, 512*512), dtype=np.float32) a[0, :] = 1.0 a[1, :] = 0.1 np.mean(a) 0.546875 From the docs and previous discussion it seems there is no technical difficulty in choosing a different (higher precision) type for the accumulator using the dtype arg, and in fact this is done automatically for int values. My question is whether there would be any support for doing something more than documenting this behavior. I suspect very few people ever make it below the fold for the np.mean() documentation. Taking the mean of large arrays of float32 values is a *very* common use case and giving the wrong answer with default inputs is really disturbing. I recently had to rebuild a complex science data archive because of corrupted mean values. Possible ideas to stimulate discussion: 1. Always use float64 to accumulate float types that are 64 bits or less. Are there serious performance impacts to automatically using float64 to accumulate float32 arrays? I appreciate this would likely introduce unwanted regressions (sometimes suddenly getting the right answer is a bad thing). So could this be considered for numpy 2.0? 2. Might there be a way to emit a warning if the number of values and the max accumulated value [3] are such that the estimated fractional error is above some tolerance? I'm not even sure if this is a good idea or if there will be howls from the community as their codes start warning about inaccurate mean values. Better idea along this line?? I would support accumulating in 64 bits but, IIRC, the function will need to be rewritten so that it works by adding 32 bit floats to the accumulator to save space. There are also more stable methods that could also be investigated. There is a nice little project there for someone to cut their teeth on. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug in pickling an ndarray?
I am having trouble pickling (and then unpickling) an ndarray. Upon unpickling, the base attribute of the ndarray is set to some very strange string (base was None when the ndarray was pickled, so it should remain None). I have tried on various platforms and versions of numpy, with inconclusive results: # tested: Linux (Suse 11.1), numpy 1.5.1 BUG # Linux (Suse 11,0), numpy 1.6.1 OK # Linux (Mint Debian), numpy 1.6.1 BUG # Linux (Mint Debian), numpy 1.6.2 BUG # OSX (Snow Leopard), numpy 1.5.1rc1 BUG # OSX (Snow Leopard), numpy 1.6.2 BUG # Windows 7, numpy 1.4.1 OK I have attached a script below that can be used to check for the problem; I suppose that this is a bug report, unless I'm doing something terribly wrong or my expectations for the base attribute are off. cut here - # this little demo shows a problem with the base attribute of an ndarray, when # pickling. Before pickling, dset.base is None, but after pickling, it is some # strange string. import cPickle as pickle import numpy print numpy.__version__ #import pickle dset = numpy.ones((2,2)) print BEFORE PICKLING print dset print base = ,dset.base print dset.flags # pickle. s = pickle.dumps(dset) # now unpickle. dset = pickle.loads(s) print AFTER PICKLING AND THEN IMMEDIATELY UNPICKLING print dset print base = ,dset.base print dset.flags -- Daniel Hyams dhy...@gmail.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in pickling an ndarray?
On Sat, Jun 30, 2012 at 9:15 PM, Daniel Hyams dhy...@gmail.com wrote: I am having trouble pickling (and then unpickling) an ndarray. Upon unpickling, the base attribute of the ndarray is set to some very strange string (base was None when the ndarray was pickled, so it should remain None). This sounds like correct behaviour to me -- is it causing you a problem? In general ndarray's don't keep things like memory layout, view sharing, etc. through pickling, and that means that things like .flags and .base may change. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in pickling an ndarray?
Hmmm, I wouldn't think that it is correct behavior; I would think that *any* ndarray arising from pickling would have its .base attribute set to None. If not, then who is really the one that owns the data? It was my understanding that .base should hold a reference to another ndarray that the data is really coming from, or it's None. It certainly shouldn't be some random string, should it? And yes, it is causing a problem for me, which is why I noticed it. In my application, ndarrays can come from various sources, pickling being one of them. Later in the app, I was wanting to resize the array, which you cannot do if the data is not really owned by that array...I had explicit check for myarray.base==None, which it is not when I get the ndarray from a pickle. -- Daniel Hyams dhy...@gmail.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in pickling an ndarray?
This is the expected behavior. It is not a bug. NumPy arrays after pickling are views into the String that is created by the pickling machinery. Thus, the base is set. This was done to avoid an additional memcpy. This avoids a copy, but yes, it does mean that you can't resize the array until you make another copy. Best regards, -Travis On Jun 30, 2012, at 5:33 PM, Daniel Hyams wrote: Hmmm, I wouldn't think that it is correct behavior; I would think that *any* ndarray arising from pickling would have its .base attribute set to None. If not, then who is really the one that owns the data? It was my understanding that .base should hold a reference to another ndarray that the data is really coming from, or it's None. It certainly shouldn't be some random string, should it? And yes, it is causing a problem for me, which is why I noticed it. In my application, ndarrays can come from various sources, pickling being one of them. Later in the app, I was wanting to resize the array, which you cannot do if the data is not really owned by that array...I had explicit check for myarray.base==None, which it is not when I get the ndarray from a pickle. -- Daniel Hyams dhy...@gmail.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in pickling an ndarray?
On Sat, Jun 30, 2012 at 11:33 PM, Daniel Hyams dhy...@gmail.com wrote: Hmmm, I wouldn't think that it is correct behavior; I would think that *any* ndarray arising from pickling would have its .base attribute set to None. If not, then who is really the one that owns the data? It was my understanding that .base should hold a reference to another ndarray that the data is really coming from, or it's None. It certainly shouldn't be some random string, should it? It can be any object that will keep the data memory alive while the object is kept alive. It does not have to be an ndarray. In this case, the numpy unpickling constructor takes the string object that the underlying pickling machinery has just created and views its memory directly. In order to keep Python from freeing that memory, the string object needs to be kept alive via a reference, so it gets assigned to the .base. And yes, it is causing a problem for me, which is why I noticed it. In my application, ndarrays can come from various sources, pickling being one of them. Later in the app, I was wanting to resize the array, which you cannot do if the data is not really owned by that array... You also can't resize an array if any *other* array has a view on that array too, so checking for ownership isn't going to help. .resize() will raise an exception if it can't do this; it's better to just attempt it and catch the exception than to look before you leap. I had explicit check for myarray.base==None, which it is not when I get the ndarray from a pickle. That is not the way to check if an ndarray owns its data. Instead, check a.flags['OWNDATA'] -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Bug in pickling an ndarray?
Thanks Travis and Robert for the clarification; it is much more clear what is going on now. As the demo code shows, also a.flags['OWNDATA'] is different on its way out of the pickle; which also makes sense now. So using that flag instead of checking a.base for None is equivalent, at least in this situation. So is it a bug, then, that, on Windows, .base is set to None (of course, this may be something that was fixed in later versions of numpy; I was only able to test Windows with numpy 1.4.1). I'll just make a copy and discard the original to work around the situation (which is what I already had done, but the inconsistent behavior across versions and platforms made me think it was a bug). Thanks again for the clear explanation of what is going on. On Sat, Jun 30, 2012 at 6:33 PM, Daniel Hyams dhy...@gmail.com wrote: Hmmm, I wouldn't think that it is correct behavior; I would think that *any* ndarray arising from pickling would have its .base attribute set to None. If not, then who is really the one that owns the data? It was my understanding that .base should hold a reference to another ndarray that the data is really coming from, or it's None. It certainly shouldn't be some random string, should it? And yes, it is causing a problem for me, which is why I noticed it. In my application, ndarrays can come from various sources, pickling being one of them. Later in the app, I was wanting to resize the array, which you cannot do if the data is not really owned by that array...I had explicit check for myarray.base==None, which it is not when I get the ndarray from a pickle. -- Daniel Hyams dhy...@gmail.com -- Daniel Hyams dhy...@gmail.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in array instanciation?
In [20]: dt_knobs = [('pvName',(str,40)),('start','float'),('stop','float'),('mode',(str,10))] In [21]: r_knobs = np.recarray([],dtype=dt_knobs) In [22]: r_knobs Out[22]: rec.array(('\xa0\x8c\xc9\x02\x00\x00\x00\x00(\xc8v\x02\x00\x00\x00\x00\x00\xd3\x86\x02\x00\x00\x00\x00\x10\xdeJ\x02\x00\x00\x00\x00\x906\xb9\x02', 1.63e-322, 1.351330465085e-312, '\x90\xc6\xa3\x02\x00\x00\x00\x00P'), dtype=[('pvName', '|S40'), ('start', 'f8'), ('stop', 'f8'), ('mode', '|S10')]) why is the array not empty? -- E ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in array instanciation?
On Fri, Jan 27, 2012 at 21:17, Emmanuel Mayssat emays...@gmail.com wrote: In [20]: dt_knobs = [('pvName',(str,40)),('start','float'),('stop','float'),('mode',(str,10))] In [21]: r_knobs = np.recarray([],dtype=dt_knobs) In [22]: r_knobs Out[22]: rec.array(('\xa0\x8c\xc9\x02\x00\x00\x00\x00(\xc8v\x02\x00\x00\x00\x00\x00\xd3\x86\x02\x00\x00\x00\x00\x10\xdeJ\x02\x00\x00\x00\x00\x906\xb9\x02', 1.63e-322, 1.351330465085e-312, '\x90\xc6\xa3\x02\x00\x00\x00\x00P'), dtype=[('pvName', '|S40'), ('start', 'f8'), ('stop', 'f8'), ('mode', '|S10')]) why is the array not empty? The shape [] creates a rank-0 array, which is essentially a scalar. [~] |1 x = np.array(10) [~] |2 x array(10) [~] |3 x.shape () If you want an empty array, you need at least one dimension of size 0: [~] |7 r_knobs = np.recarray([0], dtype=dt_knobs) [~] |8 r_knobs rec.array([], dtype=[('pvName', '|S40'), ('start', 'f8'), ('stop', 'f8'), ('mode', '|S10')]) -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in numpy.mean() ?
I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On 01/24/2012 12:33 PM, K.-Michael Aye wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion You have a million 32-bit floating point numbers that are in the thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype: a.mean(dtype=np.float32) # default and lacks precision 3067.024383998 a.mean(dtype=np.float64) 3045.747251076416 a.mean(dtype=np.float128) 3045.7472510764160156 b=a.astype(np.float128) b.mean() 3045.7472510764160156 Otherwise you are left to using some alternative approach to calculate the mean. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
I have confirmed this on a 64-bit linux machine running python 2.7.2 with the development version of numpy. It seems to be related to using float32 instead of float64. If the array is first converted to a 64-bit float (via astype), mean gives an answer that agrees with your looped-calculation value: 3045.747250002. With the original 32-bit array, averaging successively on one axis and then on the other gives answers that agree with the 64-bit float answer to the second decimal place. In [125]: d = np.load('data.npy') In [126]: d.mean() Out[126]: 3067.024383998 In [127]: d64 = d.astype('float64') In [128]: d64.mean() Out[128]: 3045.747251076416 In [129]: d.mean(axis=0).mean() Out[129]: 3045.748750002 In [130]: d.mean(axis=1).mean() Out[130]: 3045.74448 In [131]: np.version.full_version Out[131]: '2.0.0.dev-55472ca' -- On Tue, 2012-01-24 at 12:33 -0600, K.-MichaelA wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. I get the same result: In [1]: import numpy In [2]: data = numpy.load('data.npy') In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.min() Out[5]: 3040.498 In [6]: numpy.version.version Out[6]: '2.0.0.dev-433b02a' This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 32 vs. 64-bit process doesn't make a difference. The data matrix doesn't look too strange when I view it as an image -- all pretty smooth variation around the (min, max) range. But maybe it's still somehow floating-point pathological? This is fun too: In [12]: data.mean() Out[12]: 3067.024383998 In [13]: (data/3000).mean()*3000 Out[13]: 3020.807437501 In [15]: (data/2).mean()*2 Out[15]: 3067.024383998 In [16]: (data/200).mean()*200 Out[16]: 3013.67541 Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Just what Bruce said. You can run the following to confirm: np.mean(data - data.mean()) If for some reason you do not want to convert to float64 you can add the result of the previous line to the bad mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean) Val On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye kmichael@gmail.comwrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
You have a million 32-bit floating point numbers that are in the thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype: a.mean(dtype=np.float32) # default and lacks precision 3067.024383998 a.mean(dtype=np.float64) 3045.747251076416 a.mean(dtype=np.float128) 3045.7472510764160156 b=a.astype(np.float128) b.mean() 3045.7472510764160156 Otherwise you are left to using some alternative approach to calculate the mean. Bruce Interesting -- I knew that float64 accumulators were used with integer arrays, and I had just assumed that 64-bit or higher accumulators would be used with floating-point arrays too, instead of the array's dtype. This is actually quite a bit of a gotcha for floating-point imaging-type tasks -- good to know! Zach ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Thank you Bruce and all, I knew I was doing something wrong (should have read the mean method doc more closely). Am of course glad that's so easy understandable. But: If the error can get so big, wouldn't it be a better idea for the accumulator to always be of type 'float64' and then convert later to the type of the original array? As one can see in this case, the result would be much closer to the true value. Michael On 2012-01-24 19:01:40 +, Val Kalatsky said: Just what Bruce said. You can run the following to confirm: np.mean(data - data.mean()) If for some reason you do not want to convert to float64 you can add the result of the previous line to the bad mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean) Val On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye kmichael@gmail.com wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.024383998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' On Tue, 2012-01-24 at 17:12 -0600, eat wrote: Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' This indeed looks very nasty, regardless of whether it is a version or platform related problem. -eat On Tue, 2012-01-24 at 17:12 -0600, eat wrote: Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On Tue, Jan 24, 2012 at 7:21 PM, eat e.antero.ta...@gmail.com wrote: Hi On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' This indeed looks very nasty, regardless of whether it is a version or platform related problem. Looks like platform specific, same result as -eat Windows 7, Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32 a = np.ones((1024,1024),dtype=np.float32) a.mean() 1.0 (4000*a).dtype dtype('float32') (4000*a).mean() 4000.0 b = np.load(data.npy) b.mean() 3045.74718 b.shape (1000, 1000) b.mean(0).mean(0) 3045.74724 _.dtype dtype('float64') b.dtype dtype('float32') b.mean(dtype=np.float32) 3045.74718 Josef -eat On Tue, 2012-01-24 at 17:12 -0600, eat wrote: Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.74718 In []: d.mean(dtype= np.float32) Out[]: 3045.74718 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.74724 In []: d.mean(axis= 1).mean() Out[]: 3045.74724 Or does the results of calculations depend more on the platform? My 2 cents, eat -- -- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: ** I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' Yes, the results are platform/compiler dependent. The 32 bit platforms tend to use extended precision accumulators and the x87 instruction set. The 64 bit platforms tend to use sse2+. Different precisions, even though you might think they are the same. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in numpy.mean() ?
On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina kathleen.m.tac...@nasa.gov wrote: I found something similar, with a very simple example. On 64-bit linux, python 2.7.2, numpy development version: In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32) In [23]: a.mean() Out[23]: 4034.16357421875 In [24]: np.version.full_version Out[24]: '2.0.0.dev-55472ca' But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives: a = np.ones((1024,1024),dtype=np.float32) a.mean() 4000.0 np.version.full_version '1.6.1' Yes, the results are platform/compiler dependent. The 32 bit platforms tend to use extended precision accumulators and the x87 instruction set. The 64 bit platforms tend to use sse2+. Different precisions, even though you might think they are the same. just to confirm, same computer as before but the python 3.2 version is 64 bit, now I get the Linux result Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32 import numpy as np np.__version__ '1.5.1' a = 4000*np.ones((1024,1024),dtype=np.float32) a.mean() 4034.16357421875 a.mean(0).mean(0) 4000.0 a.mean(dtype=np.float64) 4000.0 Josef snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in PyArray_GetCastFunc
On Sat, Dec 3, 2011 at 5:28 PM, Geoffrey Irving irv...@naml.us wrote: When attempting to cast to a user defined type, PyArray_GetCast looks up the cast function in the dictionary but doesn't check if the entry exists. This causes segfaults. Here's a patch. Geoffrey diff --git a/numpy/core/src/multiarray/convert_datatype.c b/numpy/core/src/multiarray/convert_datatype.c index 818d558..4b8f38b 100644 --- a/numpy/core/src/multiarray/convert_datatype.c +++ b/numpy/core/src/multiarray/convert_datatype.c @@ -81,7 +81,7 @@ PyArray_GetCastFunc(PyArray_Descr *descr, int type_num) key = PyInt_FromLong(type_num); cobj = PyDict_GetItem(obj, key); Py_DECREF(key); -if (NpyCapsule_Check(cobj)) { +if (cobj NpyCapsule_Check(cobj)) { castfunc = NpyCapsule_AsVoidPtr(cobj); } } __ I'm thinking NpyCapsule_Check should catch this. From the documentation it probably should: int PyCObject_Check(PyObjecthttp://docs.python.org/release/2.7/c-api/structures.html#PyObject * *p*) Return true if its argument is a PyCObjecthttp://docs.python.org/release/2.7/c-api/cobject.html#PyCObject I don't think NULL is a valid PyCObject ;) However, it should be easy to add the NULL check to the numpy version of the function. I'll do that. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in PyArray_GetCastFunc
On Sun, Dec 4, 2011 at 10:02 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Dec 3, 2011 at 5:28 PM, Geoffrey Irving irv...@naml.us wrote: When attempting to cast to a user defined type, PyArray_GetCast looks up the cast function in the dictionary but doesn't check if the entry exists. This causes segfaults. Here's a patch. Geoffrey diff --git a/numpy/core/src/multiarray/convert_datatype.c b/numpy/core/src/multiarray/convert_datatype.c index 818d558..4b8f38b 100644 --- a/numpy/core/src/multiarray/convert_datatype.c +++ b/numpy/core/src/multiarray/convert_datatype.c @@ -81,7 +81,7 @@ PyArray_GetCastFunc(PyArray_Descr *descr, int type_num) key = PyInt_FromLong(type_num); cobj = PyDict_GetItem(obj, key); Py_DECREF(key); - if (NpyCapsule_Check(cobj)) { + if (cobj NpyCapsule_Check(cobj)) { castfunc = NpyCapsule_AsVoidPtr(cobj); } } __ I'm thinking NpyCapsule_Check should catch this. From the documentation it probably should: int PyCObject_Check(PyObject *p) Return true if its argument is a PyCObject I don't think NULL is a valid PyCObject ;) However, it should be easy to add the NULL check to the numpy version of the function. I'll do that. That would work, but I think would match the rest of the Python API better if NpyCapsule_Check required a nonnull argument. PyCapsule_Check and essentially every other Python API function have documented undefined behavior if you pass in null, so it might be surprising one numpy macro violates this. Incidentally, every other use of NpyCapsule_Check correctly tests for null. Geoffrey ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] bug in PyArray_GetCastFunc
On Sun, Dec 4, 2011 at 6:30 PM, Geoffrey Irving irv...@naml.us wrote: On Sun, Dec 4, 2011 at 10:02 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Dec 3, 2011 at 5:28 PM, Geoffrey Irving irv...@naml.us wrote: When attempting to cast to a user defined type, PyArray_GetCast looks up the cast function in the dictionary but doesn't check if the entry exists. This causes segfaults. Here's a patch. Geoffrey diff --git a/numpy/core/src/multiarray/convert_datatype.c b/numpy/core/src/multiarray/convert_datatype.c index 818d558..4b8f38b 100644 --- a/numpy/core/src/multiarray/convert_datatype.c +++ b/numpy/core/src/multiarray/convert_datatype.c @@ -81,7 +81,7 @@ PyArray_GetCastFunc(PyArray_Descr *descr, int type_num) key = PyInt_FromLong(type_num); cobj = PyDict_GetItem(obj, key); Py_DECREF(key); -if (NpyCapsule_Check(cobj)) { +if (cobj NpyCapsule_Check(cobj)) { castfunc = NpyCapsule_AsVoidPtr(cobj); } } __ I'm thinking NpyCapsule_Check should catch this. From the documentation it probably should: int PyCObject_Check(PyObject *p) Return true if its argument is a PyCObject I don't think NULL is a valid PyCObject ;) However, it should be easy to add the NULL check to the numpy version of the function. I'll do that. That would work, but I think would match the rest of the Python API better if NpyCapsule_Check required a nonnull argument. PyCapsule_Check and essentially every other Python API function have documented undefined behavior if you pass in null, so it might be surprising one numpy macro violates this. Incidentally, every other use of NpyCapsule_Check correctly tests for null. Good points. I may change it back ;) Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in PyArray_GetCastFunc
When attempting to cast to a user defined type, PyArray_GetCast looks up the cast function in the dictionary but doesn't check if the entry exists. This causes segfaults. Here's a patch. Geoffrey diff --git a/numpy/core/src/multiarray/convert_datatype.c b/numpy/core/src/multiarray/convert_datatype.c index 818d558..4b8f38b 100644 --- a/numpy/core/src/multiarray/convert_datatype.c +++ b/numpy/core/src/multiarray/convert_datatype.c @@ -81,7 +81,7 @@ PyArray_GetCastFunc(PyArray_Descr *descr, int type_num) key = PyInt_FromLong(type_num); cobj = PyDict_GetItem(obj, key); Py_DECREF(key); -if (NpyCapsule_Check(cobj)) { +if (cobj NpyCapsule_Check(cobj)) { castfunc = NpyCapsule_AsVoidPtr(cobj); } } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] bug in reshape ?
might be an old story np.__version__ - '1.5.1' It thought for once it's easier to use reshape to add a new axis instead of ...,None but my results got weird (normal(0,1) sample of 2.13795875e-314) x = 1 y = np.arange(3) z = np.arange(2)[:,None] np.broadcast(x,y,z) numpy.broadcast object at 0x04C0DCA0 np.broadcast_arrays(x,y,z) [array([[1, 1, 1], [1, 1, 1]]), array([[0, 1, 2], [0, 1, 2]]), array([[0, 0, 0], [1, 1, 1]])] x1, y1, z1 = np.broadcast_arrays(x,y,z) map(np.shape, (x1, y1, z1)) [(2, 3), (2, 3), (2, 3)] shape looks fine, let's add an extra axis with reshape x1.reshape(2,3,1) array([[[ 1], [ 1], [ 1099464714]], [[-2147481592], [184], [ 1]]]) what's that ? (0+x1).reshape(2,3,1) array([[[1], [1], [1]], [[1], [1], [1]]]) (y1*1.).reshape(2,3,1) array([[[ 0.], [ 1.], [ 2.]], [[ 0.], [ 1.], [ 2.]]]) (y1).reshape(2,3,1) array([[[ 0], [ 1], [ 2]], [[ 0], [ 1099447643], [-2147483648]]]) x1, y1, z1 = np.broadcast_arrays(x,y,z) x1[...,None] array([[[1], [1], [1]], [[1], [1], [1]]]) x1.shape (2, 3) x1.reshape(2,3,1) array([[[ 1], [ 1], [ 1099464730]], [[-2147479536], [ -445054780], [ 1063686842]]]) the background story: playing broadcasting tricks for http://projects.scipy.org/scipy/ticket/1544 Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion