Re: [Numpy-discussion] Generalized ufuncs?
On Aug 15, 2008, at 8:36 AM, Charles R Harris wrote: > The inline keyword also tends to be gcc/icc specific, although it > is part of the C99 standard. For reference, a page on using inline and doing so portably: http://www.greenend.org.uk/rjk/2003/03/inline.html Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
On Fri, Aug 15, 2008 at 12:35 AM, Travis E. Oliphant <[EMAIL PROTECTED] > wrote: > Travis E. Oliphant wrote: > >> Can we fix the ticket notification mailings some day? It's been almost > >> four months now. > >> > > That would be fabulous. So far nobody has figured out how... Jarrod?? > > > >> Re: the patch. I noticed the replacement of the signed type int by an > >> unsigned size_t. > >> > > Where did you notice this? I didn't see it. > > > Are you referring to Stefan's patch to the Fu's _parse_signature code in > r5654.This is a local function, I'm not sure why there is a concern. > There probably isn't a problem, but the use of unsigned types in loop counters and such can lead to subtle errors, so when a signed type is changed to an unsigned type the code has to be audited to make sure there won't be any unintended consequences. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
On Fri, Aug 15, 2008 at 12:28 AM, Travis E. Oliphant <[EMAIL PROTECTED] > wrote: > > > > > Can we fix the ticket notification mailings some day? It's been almost > > four months now. > That would be fabulous. So far nobody has figured out how... Jarrod?? > > > > Re: the patch. I noticed the replacement of the signed type int by an > > unsigned size_t. > Where did you notice this? I didn't see it. r5654. > > > python or numpy types. The use of inline and the local declaration of > > variables would also have been caught early in a code review. > What do you mean by the local declaration of variables? > r5653, gcc allows variables to be declared where used rather than at the beginning of a block. I believe this is part of a recent (proposed?) standard, but will fail for other compilers. The inline keyword also tends to be gcc/icc specific, although it is part of the C99 standard. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
Travis E. Oliphant wrote: >> Can we fix the ticket notification mailings some day? It's been almost >> four months now. >> > That would be fabulous. So far nobody has figured out how... Jarrod?? > >> Re: the patch. I noticed the replacement of the signed type int by an >> unsigned size_t. >> > Where did you notice this? I didn't see it. > Are you referring to Stefan's patch to the Fu's _parse_signature code in r5654.This is a local function, I'm not sure why there is a concern. >> python or numpy types. The use of inline and the local declaration of >> variables would also have been caught early in a code review. >> > What do you mean by the local declaration of variables? > > Never mind, I understand it's the mid-code declaration of variables (without a separate block defined) that Stefan fixed. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy 1.2.0b2 released
On Aug 14, 2008, at 11:07 PM, Alan G Isaac wrote: > Btw, numpy loads noticeably faster. Any chance of someone reviewing my suggestions for making the import somewhat faster still? http://scipy.org/scipy/numpy/ticket/874 Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
> > Can we fix the ticket notification mailings some day? It's been almost > four months now. That would be fabulous. So far nobody has figured out how... Jarrod?? > > Re: the patch. I noticed the replacement of the signed type int by an > unsigned size_t. Where did you notice this? I didn't see it. > python or numpy types. The use of inline and the local declaration of > variables would also have been caught early in a code review. What do you mean by the local declaration of variables? -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
On Thu, Aug 14, 2008 at 10:54 PM, Charles R Harris <[EMAIL PROTECTED]> wrote: > Numpy 1.2 is for documentation, bug fixes, and getting the new testing > framework in place. Discipline is called for if we are going to have timely > releases. First, all your points are very valid. And I apologize for the role I played in this. Thanks for calling us on it. That said while you are correct that this release is mainly about documentation, bug fixes, and getting the new testing framework in place, there are several other things that have gone in. Their have been a few planned API changes and even a C-API change. Travis emailed me asking where we were on the beta release and whether we should discuss including this change on the list. I contacted Stefan and asked him if he could do me a huge favor and see if we could quickly apply the patch before making the beta release. My reasoning was that this looked very good and useful and just offered something new. Stefan was hesitant, but I persisted. He didn't like that it didn't have any tests, but I said if he put it in time for the beta he could add tests afterward. I wanted to make sure no new features got in after a beta. Also we are all ready requiring recompiling with this release, so I thought now would be a good time to add it. > We is the numpy community, not you and Travis. Absolutely. There were several of us involved, not just Travis and Stefan. But that is no excuse. Stefan, David, Chris, and I have been trying very hard to get the beta out over the last few days and had started talking among ourselves since we were mostly just coordinating. Taking that over to feature adding was a mistake. > Why not wait until after the release then? The motivation is that we are not allowing features in bugfix releases anymore. So it can't go in in 1.2.x if it isn't in 1.2.0. I also want to get several 1.2.x releases out. That means the earliest we could get it in is 1.3.0. But I would prefer not having to require recompiling extension code with every minor release. Sorry. This was handled poorly. But I think this would still be very useful and I would like to see it get in. We were planning on releasing a 1.2.0b3 early next week. But this is it, I promise. How about we work on it and see where we are early next week. If it doesn't look good, we can pull it. -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
> > Numpy 1.2 is for documentation, bug fixes, and getting the new testing > framework in place. Discipline is called for if we are going to have > timely releases. We also agreed to a change in the C-API (or at least did not object too loudly). I'm in favor of minimizing that sort of change. > > > Why not wait until after the release then? The biggest reason is that the patch requires changing the C-API and we are already doing that for 1.2. I would rather not do it again for another 6 months at least. I don't think we should make the patch wait that long. Your code review is very much appreciated. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
On Thu, Aug 14, 2008 at 11:45 PM, Stéfan van der Walt <[EMAIL PROTECTED]>wrote: > Hi Charles > > 2008/8/14 Charles R Harris <[EMAIL PROTECTED]>: > > Re: the patch. I noticed the replacement of the signed type int by an > > unsigned size_t. This is a risky sort of thing and needs to be checked. > Nor > > is it clear we should use size_t instead of one of the python or numpy > > types. The use of inline and the local declaration of variables would > also > > have been caught early in a code review. So I think in this case the > patch > > should have been discussed and reviewed on the list. An internal > discussion > > at Enthought doesn't serve the same purposel. > > I apologise for not keeping the list up to date with the progress on > this front. The patch is such a great contribution that I wanted it > to become part of NumPy for 1.2b3. The idea was to merge it and, once Numpy 1.2 is for documentation, bug fixes, and getting the new testing framework in place. Discipline is called for if we are going to have timely releases. > > done, report on the list. Wrong way around. > As is, I am still busy fixing some bugs on > the Windows platform and integrating unit tests. I did, however, get > Travis to review the patch beforehand, and we will keep reviewing the > changes made until 1.2b3 goes out. We is the numpy community, not you and Travis. > The patch does not influence > current NumPy behaviour in any way -- it simply provides hooks for > general ufuncs, which can be implemented in the future. > Why not wait until after the release then? Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
Hi Charles 2008/8/14 Charles R Harris <[EMAIL PROTECTED]>: > Re: the patch. I noticed the replacement of the signed type int by an > unsigned size_t. This is a risky sort of thing and needs to be checked. Nor > is it clear we should use size_t instead of one of the python or numpy > types. The use of inline and the local declaration of variables would also > have been caught early in a code review. So I think in this case the patch > should have been discussed and reviewed on the list. An internal discussion > at Enthought doesn't serve the same purposel. I apologise for not keeping the list up to date with the progress on this front. The patch is such a great contribution that I wanted it to become part of NumPy for 1.2b3. The idea was to merge it and, once done, report on the list. As is, I am still busy fixing some bugs on the Windows platform and integrating unit tests. I did, however, get Travis to review the patch beforehand, and we will keep reviewing the changes made until 1.2b3 goes out. The patch does not influence current NumPy behaviour in any way -- it simply provides hooks for general ufuncs, which can be implemented in the future. Thanks for your concern, Regards Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
On Thu, Aug 14, 2008 at 9:55 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > On Thu, Aug 14, 2008 at 22:45, Charles R Harris > <[EMAIL PROTECTED]> wrote: > > Stefan, > > > > I notice that you have merged some new ufunc infrastructure. I think > these > > sort of things should be discussed and reviewed on the list before being > > committed. Could you explain what the purpose of these patches is? The > > commit messages are rather skimpy. > > Stéfan happens to be in our offices this week, so he did discuss it > with Travis, at least. This was actually contributed to us with > extensive details from Wenjie Fu and Hans-Andreas Engel here: > > http://projects.scipy.org/scipy/numpy/ticket/887 > Can we fix the ticket notification mailings some day? It's been almost four months now. Re: the patch. I noticed the replacement of the signed type int by an unsigned size_t. This is a risky sort of thing and needs to be checked. Nor is it clear we should use size_t instead of one of the python or numpy types. The use of inline and the local declaration of variables would also have been caught early in a code review. So I think in this case the patch should have been discussed and reviewed on the list. An internal discussion at Enthought doesn't serve the same purposel. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy 1.2.0b2 released
>> is it really necessary to label these dmg's for 10.5 only? > No. This is done automatically by the tool used to build the mpkg. > I'll look at changing this to 10.4, thanks for the reminder. If the dmg name is generated from the distribution name that the python distutils makes (e.g. macosx-10.5-i386-2.5), then the following may be of note: It appears that the MACOSX_DEPLOYMENT_TARGET environment variable controls (among other things) the distutils name. I generally set mine to 10.4, or even 10.3, depending on whether anything that I'm building requires later features (I'm pretty sure that numpy builds don't.) Zach On Aug 14, 2008, at 11:41 PM, Christopher Burns wrote: > On Thu, Aug 14, 2008 at 6:45 PM, Les Schaffer > <[EMAIL PROTECTED]> wrote: >> is it really necessary to label these dmg's for 10.5 only? > > No. This is done automatically by the tool used to build the mpkg. > I'll look at changing this to 10.4, thanks for the reminder. > >> will this dmg install on 10.4 if py2.5 is available? > > It should. Let us know otherwise. > > -- > Christopher Burns > Computational Infrastructure for Research Labs > 10 Giannini Hall, UC Berkeley > phone: 510.643.4014 > http://cirl.berkeley.edu/ > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generalized ufuncs?
On Thu, Aug 14, 2008 at 22:45, Charles R Harris <[EMAIL PROTECTED]> wrote: > Stefan, > > I notice that you have merged some new ufunc infrastructure. I think these > sort of things should be discussed and reviewed on the list before being > committed. Could you explain what the purpose of these patches is? The > commit messages are rather skimpy. Stéfan happens to be in our offices this week, so he did discuss it with Travis, at least. This was actually contributed to us with extensive details from Wenjie Fu and Hans-Andreas Engel here: http://projects.scipy.org/scipy/numpy/ticket/887 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Generalized ufuncs?
Stefan, I notice that you have merged some new ufunc infrastructure. I think these sort of things should be discussed and reviewed on the list before being committed. Could you explain what the purpose of these patches is? The commit messages are rather skimpy. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy 1.2.0b2 released
On Thu, Aug 14, 2008 at 6:45 PM, Les Schaffer <[EMAIL PROTECTED]> wrote: > is it really necessary to label these dmg's for 10.5 only? No. This is done automatically by the tool used to build the mpkg. I'll look at changing this to 10.4, thanks for the reminder. > will this dmg install on 10.4 if py2.5 is available? It should. Let us know otherwise. -- Christopher Burns Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy 1.2.0b2 released
Jarrod Millman wrote: > Mac binary: > https://cirl.berkeley.edu/numpy/numpy-1.2.0b2-py2.5-macosx10.5.dmg > is it really necessary to label these dmg's for 10.5 only? i assume more than myself run 10.4 but have python 2.5.X installed on their machine. will this dmg install on 10.4 if py2.5 is available? thanks Les ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reading *big* inhomogenous text matrices *fast*?
On Thu, 14 Aug 2008 04:40:16 +, Daniel Lenski wrote: > I assume that list-of-arrays is more memory-efficient since array > elements don't have the overhead of full-blown Python objects. But > list- of-lists is probably more time-efficient since I think it's faster > to convert the whole array at once than do it row-by-row. > > Dan Just a follow-up... Well, I tried the simple, straightforward list-of-lists approach and it's the fastest. About 20 seconds for 1.5 million cells on my machine: def _read_cells(self, f, n, debug=False): cells = dict() for i in xrange(n): cell = f.readline().split() celltype = cell.pop(2) if celltype not in cells: cells[celltype]=[] cells[celltype].append(cell) for k in cells: cells[k] = N.array(cells[k], dtype=int).T return cells List-of-arrays uses about 20% less memory, but is about 4-5 times slower (presumably due to the overhead of array creation?). And my preallocation approach is 2-3 times slower than list-of-lists. Again, I *think* this is due to array creation/conversion overhead, when assigning to a slice of the array: def _read_cells2(self, f, n, debug=False): cells = dict() count = dict() curtype = None for i in xrange(n): cell = f.readline().split() celltype = cell[2] if celltype!=curtype: curtype = celltype if curtype not in cells: cells[curtype] = N.empty((n-i, len(cell)-1), type=int) count[curtype] = 0 block = cells[curtype] block[count[curtype]] = cell[:2]+cell[3:] ### THIS LINE HERE count[curtype] += 1 for k in cells: cells[k] = cells[k][:count[k]].T return cells So my conclusion is... you guys are right. List-of-lists is the fastest way to build up an array. Then do the string-to-numeric and list-to- array conversion ALL AT ONCE with numpy.array(list_of_lists, dtype=int). Thanks for all the insight! Dan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy 1.2.0b2 released
Jarrod Millman wrote: > Hey, > > NumPy 1.2.0b2 is now available. Please test this so that we can > uncover any problems ASAP. > > Windows binary: > http://www.enthought.com/~gvaroquaux/numpy-1.2.0b2-win32.zip > As well as the ones from Alan, if you add the "-O" for optimise flag to your python, there is still the interpreter crash as well as seeing some extra failures. -Jon c:\python25\python -O -c "import numpy; numpy.test()" Running unit tests for numpy NumPy version 1.2.0b2 NumPy is installed in c:\python25\lib\site-packages\numpy Python version 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] nose version 0.10.3 F... ...S Ignoring "Python was built with Visual Studio 2003; extensions must be built with a compiler than can generate compatible binaries. Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32, by passing "-c mingw32" to setup.py." (one should fix me in fcompiler/compaq.py) .F.F.F.F.F. == FAIL: Convolve should raise an error for empty input array. -- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\core\tests\test_regression.py", line 626, in test_convolve_empty self.failUnlessRaises(AssertionError,np.convolve,[],[1]) AssertionError: AssertionError not raised == FAIL: Test two arrays with different shapes are found not equal. -- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 46, in test_array_diffshape self._test_not_equal(a, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not == FAIL: Test two different array of rank 1 are found not equal. -- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 32, in test_array_rank1_noteq self._test_not_equal(a, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not == FAIL: Test two arrays with different shapes are found not equal. -- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 46, in test_array_diffshape self._test_not_equal(a, b) File "C:\Python25\Lib\site-packages\numpy\testing\tests\test_utils.py", line 18, in _test_not_equal raise AssertionError("a and b are found equal but are not") AssertionError: a and b are found equal but are not == FAIL: Test two different array of rank 1 are found not equal.
Re: [Numpy-discussion] NumPy 1.2.0b2 released
Btw, numpy loads noticeably faster. Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy 1.2.0b2 released
Two odd failures in test_print.py. Platform: Win XP SP3 on Intel T2600. Alan Isaac >>> np.test() Running unit tests for numpy NumPy version 1.2.0b2 NumPy is installed in C:\Python25\lib\site-packages\numpy Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] nose version 0.11.0 .. .. ..FF.. .. ...S.. ..Ignoring "Python was bui lt with Visual Studio 2003; extensions must be built with a compiler than can generate compatible binaries. Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32, by passing "-c mingw32" to setup.py." (one should fix me in fcompiler/compaq.py) .. .. .. .. .. .. .. .. .. .. .. . == FAIL: Check formatting. -- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\core\tests\test_print.py", line 28, in test_complex_types assert_equal(str(t(x)), str(complex(x))) File "C:\Python25\Lib\site-packages\numpy\testing\utils.py", line 180, in assert_equal assert desired == actual, msg AssertionError: Items are not equal: ACTUAL: '(0+5.9287877500949585e-323j)' DESIRED: '(1+0j)' == FAIL: Check formatting. -- Traceback (most recent call last): File "C:\Python25\Lib\site-packages\numpy\core\tests\test_print.py", line 16, in test_fl oat_types assert_equal(str(t(x)), str(float(x))) File "C:\Python25\Lib\site-packages\numpy\testing\utils.py", line 180, in assert_equal assert desired == actual, msg AssertionError: Items are not equal: ACTUAL: '0.0' DESIRED: '1.0' -- Ran 1567 tests in 8.234s FAILED (SKIP=1, failures=2) >>> ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NumPy 1.2.0b2 released
Hey, NumPy 1.2.0b2 is now available. Please test this so that we can uncover any problems ASAP. SVN tag: http://svn.scipy.org/svn/numpy/tags/1.2.0b2 Mac binary: https://cirl.berkeley.edu/numpy/numpy-1.2.0b2-py2.5-macosx10.5.dmg Windows binary: http://www.enthought.com/~gvaroquaux/numpy-1.2.0b2-win32.zip Source tarball: https://cirl.berkeley.edu/numpy/numpy-1.2.0b2.tar.gz Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reading *big* inhomogenous text matrices *fast*?
On Thu, Aug 14, 2008 at 11:51, Christopher Barker <[EMAIL PROTECTED]> wrote: > > One other potential downside of using python lists to accumulate numbers > is that you are storing python objects (python ints or floats, or...) > rather than raw numbers, which has got to incur some memory overhead. > > How does array.array perform in this context? Pretty well for 1D arrays, at least. > It has an append() method, > and one would hope it uses a similar memory allocation scheme. It does. > Also, does numpy convert array.array objects to numpy arrays more > efficiently? It could, of course, but someone would have to have written > the special case code. It does. array.array() exposes the Python buffer interface. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] min() of array containing NaN
Anne Archibald: > Sadly, it's not possible without extra overhead. Specifically: the > NaN-ignorant implementation does a single comparison between each > array element and a placeholder, and decides based on the result which > to keep. Did my example code go through? The test for NaN only needs to be done when a new min value is found, which will occur something like O(log(n)) in a randomly distributed array. (Here's the hand-waving. The first requires a NaN check. The second has a 1/2 chance of being the new minimum. The third has a 1/3 chance, etc. The sum of the harmonic series goes as O(ln(n)).) This depends on a double inverting so the test for a new min value and a test for NaN occur at the same time. Here's pseudocode: best = array[0] if isnan(best): return best for item in array[1:]: if !(best <= item): best = item if isnan(best): return best return item > If you're willing to do two tests, sure, but that's overhead (and > probably comparable to an isnan). In Python the extra inversion costs an extra PVM instruction. In C by comparison the resulting assembly code for "best > item" and "!(best <= item)" have identical lengths, with no real performance difference. There's no extra cost for doing the extra inversion in the common case, and for large arrays the ratio of (NaN check) / (no check) -> 1.0. > What do compilers' min builtins do with NaNs? This might well be > faster than an if statement even in the absence of NaNs... This comes from a g++ implementation of min: /** * @brief This does what you think it does. * @param a A thing of arbitrary type. * @param b Another thing of arbitrary type. * @return The lesser of the parameters. * * This is the simple classic generic implementation. It will work on * temporary expressions, since they are only evaluated once, unlike a * preprocessor macro. */ template inline const _Tp& min(const _Tp& __a, const _Tp& __b) { // concept requirements __glibcxx_function_requires(_LessThanComparableConcept<_Tp>) //return __b < __a ? __b : __a; if (__b < __a) return __b; return __a; } The isnan function another version of gcc uses a bunch of #defs, leading to static __inline__ int __inline_isnanf( float __x ) { return __x != __x; } static __inline__ int __inline_isnand( double __x ) { return __x != __x; } static __inline__ int __inline_isnan( long double __x ) { return __x != __x; } Andrew [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Different results from repeated calculation, part 2
On Thu, Aug 14, 2008 at 11:29 AM, Bruce Southey <[EMAIL PROTECTED]> wrote: > Keith Goodman wrote: >> I get slightly different results when I repeat a calculation. >> >> I've seen this problem before (it went away but has returned): >> >> http://projects.scipy.org/pipermail/numpy-discussion/2007-January/025724.html >> >> A unit test is attached. It contains three tests: >> >> In test1, I construct matrices x and y and then repeatedly calculate z >> = calc(x,y). The result z is the same every time. So this test passes. >> >> In test2, I construct matrices x and y each time before calculating z >> = calc(x,y). Sometimes z is slightly different. But the x's test to be >> equal and so do the y's. This test fails (on Debian Lenny, Core 2 Duo, >> with libatlas3gf-sse2 but not with libatlas3gf-sse). >> >> test3 is the same as test2 but I calculate z like this: z = >> calc(100*x,y) / (100 * 100). This test passes. >> >> I get: >> >> == >> FAIL: repeatability #2 >> -- >> Traceback (most recent call last): >> File "/home/[snip]/test/repeat_test.py", line 73, in test_repeat_2 >> self.assert_(result, msg) >> AssertionError: Max difference = 2.04946e-16 >> >> -- >> >> Should a unit test like this be added to numpy? >> >> >> >> ___ >> Numpy-discussion mailing list >> Numpy-discussion@scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion > Hi, > In the function 'test_repeat_2' you are redefining variables 'x and y' > that were first defined using the setup function. (Also, you are not > using the __init__ function.) I vaguely recall there are some quirks to > Python classes with this, so does the problem go away with if you use > 'a,b' instead of 'x, y'? (I suspect the answer is yes given test_repeat_3). > > Note that you should also test that 'x' and 'y' are same here as well > (but these have been redefined...). > > Otherwise, can you please provide your OS (version), computer processor, > Python version, numpy version, version of atlas (or similar) and > compiler used? > > I went back and reread the thread but I could not see this information. Here's a test that doesn't use classes and checks that x and y do not change: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20070127/52b3a51c/attachment.py I'm using binaries from Debian Lenny: $ uname -a Linux jan 2.6.25-2-686 #1 SMP Fri Jul 18 17:46:56 UTC 2008 i686 GNU/Linux $ python -V Python 2.5.2 >> numpy.__version__ '1.1.0' $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping: 6 cpu MHz : 2402.004 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 4807.45 clflush size: 64 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz stepping: 6 cpu MHz : 2402.004 cache size : 4096 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 4750.69 clflush size: 64 ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Mac OSX 4-way universal Re: [Pythonmac-SIG] python 2.6 trunk
The 4-way universal install of numpy-1.1.1 is working now with the Python 2.6b2+ (trunk:65678), and all the tests pass (running as i386 and x86_64 at least). Unfortunately, I didn't find exactly what was causing it. I just erased /Library/Frameworks/Python64.framework and rebuilt the 4-way universal python and numpy module again because I had noticed that the numpy build directory kept coming up with this structure: %ls build lib.macosx-10.5-universal-2.6scripts.macosx-10.5-universal-2.6 src.macosx-10.3-i386-2.6temp.macosx-10.5-universal-2.6 Apparently something in my python framework was leftover from a previous install and causing the src.macosx-10.3-i386-2.6 to get built, which seems to be related to distutils deciding that I was cross-compiling. Anyway, thanks for the help and sorry for the trouble. I guess one should always erase the python framework before re-installing it? Should I be running an uninstall script instead of just erasing it? Chris On 8/13/08 6:08 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > On 2008-08-13, David Cournapeau <[EMAIL PROTECTED]> wrote: >> On Wed, Aug 13, 2008 at 4:20 PM, Robert Kern <[EMAIL PROTECTED]> wrote: >>> >>> Hmm. Odd. I can't find the string "Can't install when cross-compiling" >>> anywhere in the numpy or Python sources. Can you try again with the >>> environment variable DISTUTILS_DEBUG=1 set? >> >> You can find it in python svn: the message seems python 2.6 specific. > > Okay, it looks like this happens when distutils.util.get_platform() > and the build command's plat_name are different. Chris, can you do the > following and show me the output? > > $ python setup.py build --help > ... > $ python -c "from distutils import util;print util.get_platform()" > ... > > Probably a workaround is to do > > $ python setup.py build --plat-name=... install > > where ... is whatever the output of the second command above gives. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Different results from repeated calculation, part 2
Keith Goodman wrote: > I get slightly different results when I repeat a calculation. > > I've seen this problem before (it went away but has returned): > > http://projects.scipy.org/pipermail/numpy-discussion/2007-January/025724.html > > A unit test is attached. It contains three tests: > > In test1, I construct matrices x and y and then repeatedly calculate z > = calc(x,y). The result z is the same every time. So this test passes. > > In test2, I construct matrices x and y each time before calculating z > = calc(x,y). Sometimes z is slightly different. But the x's test to be > equal and so do the y's. This test fails (on Debian Lenny, Core 2 Duo, > with libatlas3gf-sse2 but not with libatlas3gf-sse). > > test3 is the same as test2 but I calculate z like this: z = > calc(100*x,y) / (100 * 100). This test passes. > > I get: > > == > FAIL: repeatability #2 > -- > Traceback (most recent call last): > File "/home/[snip]/test/repeat_test.py", line 73, in test_repeat_2 > self.assert_(result, msg) > AssertionError: Max difference = 2.04946e-16 > > -- > > Should a unit test like this be added to numpy? > > > > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion Hi, In the function 'test_repeat_2' you are redefining variables 'x and y' that were first defined using the setup function. (Also, you are not using the __init__ function.) I vaguely recall there are some quirks to Python classes with this, so does the problem go away with if you use 'a,b' instead of 'x, y'? (I suspect the answer is yes given test_repeat_3). Note that you should also test that 'x' and 'y' are same here as well (but these have been redefined...). Otherwise, can you please provide your OS (version), computer processor, Python version, numpy version, version of atlas (or similar) and compiler used? I went back and reread the thread but I could not see this information. Bruce ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Different results from repeated calculation, part 2
Hi, Am 14.08.2008 um 19:48 schrieb Alok Singhal: > On 14/08/08: 10:20, Keith Goodman wrote: >> A unit test is attached. It contains three tests: >> >> In test1, I construct matrices x and y and then repeatedly >> calculate z >> = calc(x,y). The result z is the same every time. So this test >> passes. >> >> In test2, I construct matrices x and y each time before calculating z >> = calc(x,y). Sometimes z is slightly different. But the x's test to >> be >> equal and so do the y's. This test fails (on Debian Lenny, Core 2 >> Duo, >> with libatlas3gf-sse2 but not with libatlas3gf-sse). >> >> test3 is the same as test2 but I calculate z like this: z = >> calc(100*x,y) / (100 * 100). This test passes. >> >> I get: >> >> = >> = >> FAIL: repeatability #2 >> -- >> Traceback (most recent call last): >> File "/home/[snip]/test/repeat_test.py", line 73, in test_repeat_2 >>self.assert_(result, msg) >> AssertionError: Max difference = 2.04946e-16 > > Could this be because of how the calculations are done? If the > floating point numbers are stored in the cpu registers, in this case > (intel core duo), they are 80-bit values, whereas 'double' precision > is 64-bits. Depending upon gcc's optimization settings, the amount of > automatic variables, etc., it is entirely possible that the numbers > are stored in registers only in some cases, and are in the RAM in > other cases. Thus, in your tests, sometimes some numbers get stored > in the cpu registers, making the calculations with those values > different from the case if they were not stored in the registers. The tests never fail on my CoreDuo 2 on MacOS X, just for the records ;) Holger ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] min() of array containing NaN
On 2008-08-14, Joe Harrington <[EMAIL PROTECTED]> wrote: >> I'm doing nothing. Someone else must volunteer. > > Fair enough. Would the code be accepted if contributed? Like I said, I would be amenable to such a change. The other developers haven't weighed in on this particular proposal, but I suspect they will agree with me. >> There is a >> reasonable design rule that if you have a boolean argument which you >> expect to only be passed literal Trues and Falses, you should instead >> just have two different functions. > > Robert, can you list some reasons to favor this design rule? nanmin(x) vs. min(x, nan=True) A boolean argument that will almost always take literal Trues and Falses basically is just a switch between different functionality. The usual mechanism for the programmer to pick between different functionality is to use the appropriate function. The =True is extraneous, and puts important semantic information last rather than at the front. > Here are some reasons to favor richly functional routines: > > User's code is more readable because subtle differences affect args, >not functions This isn't subtle. > Easier learning for new users You have no evidence of this. > Much briefer and more readable docs Briefer is possible. More readable is debatable. "Much" is overstating the case. > Similar behavior across languages This is not, has never been, and never will be a goal. Similar behavior happens because of convergent design constraints and occasionally laziness, never for it's own sake. > Smaller number of functions in the core package (a recent list topic) In general, this is a reasonable concern that must be traded off with the other concerns. In this particular case, it has no weight. nanmin() and nanmax() already exist. > Many fewer routines to maintain, particularly if multiple switches exist Again, in this case, neither of these are relevant. Yes, if there are multiple boolean switches, it might make sense to keep them all into the same function. Typically, these switches will also be affecting the semantics only in minor details, too. > Availability of the NaN functionality in a method of ndarray Point, but see below. > The last point is key. The NaN behavior is central to analyzing real > data containing unavoidable bad values, which is the bread and butter > of a substantial fraction of the user base. In the languages they're > switching from, handling NaNs is just part of doing business, and is > an option of every relevant routine; there's no need for redundant > sets of routines. In contrast, numpy appears to consider data > analysis to be secondary, somehow, to pure math, and takes the NaN > functionality out of routines like min() and std(). This means it's > not possible to use many ndarray methods. If we're ready to handle a > NaN by returning it, why not enable the more useful behavior of > ignoring it, at user discretion? Let's get something straight. numpy has no opinion on the primacy of data analysis tasks versus "pure math", however you want to define those. Now, the numpy developers *do* tend to have an opinion on how NaNs are used. NaNs were invented to handle invalid results of *computations*. They were not invented as place markers for missing data. They can frequently be used as such because the IEEE-754 semantics of NaNs sometimes works for missing data (e.g. in z=x+y, z will have a NaN wherever either x or y have NaNs). But at least as frequently, they don't, and other semantics need to be specifically placed on top of it (e.g. nanmin()). numpy is a general purpose computational tool that needs to apply to many different fields and use cases. Consequently, when presented with a choice like this, we tend to go for the path that makes the minimum of assumptions and overlaid semantics. Now to address the idea that all of the relevant ndarray methods should take nan=True arguments. I am sympathetic to the idea that we should have the functionality somewhere. I do doubt that the users you are thinking about will be happy adding nan=True to a substantial fraction of their calls. My experience with such APIs is that it gets tedious real fast. Instead, I would suggest that if you want a wide range of nan-skipping versions of functions that we have, let's put them all as functions into a module. This gives the programmer the possibility of using relatively clean calls. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Different results from repeated calculation, part 2
On 14/08/08: 10:20, Keith Goodman wrote: > A unit test is attached. It contains three tests: > > In test1, I construct matrices x and y and then repeatedly calculate z > = calc(x,y). The result z is the same every time. So this test passes. > > In test2, I construct matrices x and y each time before calculating z > = calc(x,y). Sometimes z is slightly different. But the x's test to be > equal and so do the y's. This test fails (on Debian Lenny, Core 2 Duo, > with libatlas3gf-sse2 but not with libatlas3gf-sse). > > test3 is the same as test2 but I calculate z like this: z = > calc(100*x,y) / (100 * 100). This test passes. > > I get: > > == > FAIL: repeatability #2 > -- > Traceback (most recent call last): > File "/home/[snip]/test/repeat_test.py", line 73, in test_repeat_2 > self.assert_(result, msg) > AssertionError: Max difference = 2.04946e-16 Could this be because of how the calculations are done? If the floating point numbers are stored in the cpu registers, in this case (intel core duo), they are 80-bit values, whereas 'double' precision is 64-bits. Depending upon gcc's optimization settings, the amount of automatic variables, etc., it is entirely possible that the numbers are stored in registers only in some cases, and are in the RAM in other cases. Thus, in your tests, sometimes some numbers get stored in the cpu registers, making the calculations with those values different from the case if they were not stored in the registers. See "The pitfalls of verifying floating-point computations" at http://portal.acm.org/citation.cfm?doid=1353445.1353446 (or if that needs subscription, you can download the PDF from http://arxiv.org/abs/cs/0701192). The paper has a lot of examples of surprises like this. Quote: We shall discuss the following myths, among others: ... - "Arithmetic operations are deterministic; that is, if I do z=x+y in two places in the same program and my program never touches x and y in the meantime, then the results should be the same." - A variant: "If x < 1 tests true at one point, then x < 1 stays true later if I never modify x." ... -Alok -- * * Alok Singhal * * * http://www.astro.virginia.edu/~as8ca/ ** ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Different results from repeated calculation, part 2
I get slightly different results when I repeat a calculation. I've seen this problem before (it went away but has returned): http://projects.scipy.org/pipermail/numpy-discussion/2007-January/025724.html A unit test is attached. It contains three tests: In test1, I construct matrices x and y and then repeatedly calculate z = calc(x,y). The result z is the same every time. So this test passes. In test2, I construct matrices x and y each time before calculating z = calc(x,y). Sometimes z is slightly different. But the x's test to be equal and so do the y's. This test fails (on Debian Lenny, Core 2 Duo, with libatlas3gf-sse2 but not with libatlas3gf-sse). test3 is the same as test2 but I calculate z like this: z = calc(100*x,y) / (100 * 100). This test passes. I get: == FAIL: repeatability #2 -- Traceback (most recent call last): File "/home/[snip]/test/repeat_test.py", line 73, in test_repeat_2 self.assert_(result, msg) AssertionError: Max difference = 2.04946e-16 -- Should a unit test like this be added to numpy? """Unit tests for repeatability. The problem and possible causes are discussed in the following thread: http://projects.scipy.org/pipermail/numpy-discussion/2007-January/025724.html If the tests do no pass, then try removing ATLAS-SSE2 from your system. But similar problems (not tested) may occur unless you remove all versions of ATLAS (SSE and BASE, for example). """ import unittest import numpy.matlib as M M.seterr(divide='ignore') M.seterr(invalid='ignore') def calc(x, y): return x * (x.T * y) def load(): # x data x = M.zeros((3,3)) x[0,0] = 0.00301404794991108 x[0,1] = 0.0026474226678 x[0,2] = -0.00112705028731085 x[1,0] = 0.0228605377994491 x[1,1] = 0.00337153112741583 x[1,2] = -0.00823674912992519 x[2,0] = 0.00447839875836716 x[2,1] = 0.00274880280576514 x[2,2] = -0.00161133933606597 # y data y = M.zeros((3,1)) y[0,0] = 0.000885398 y[1,0] = 0.00667193 y[2,0] = 0.000324727 return x, y class Test_repeat(unittest.TestCase): "Test repeatability" def setUp(self): self.nsim = 100 self.x, self.y = load() def test_repeat_1(self): "repeatability #1" z0 = calc(self.x, self.y) msg = '' result = True for i in xrange(self.nsim): z = calc(self.x, self.y) if (z != z0).any(): msg = 'Max difference = %g' % abs((z - z0)/z0).max() result = False break self.assert_(result, msg) def test_repeat_2(self): "repeatability #2" z0 = calc(self.x, self.y) msg = '' result = True for i in xrange(self.nsim): x, y = load() z = calc(x, y) if (z != z0).any(): msg = 'Max difference = %g' % abs((z - z0)/z0).max() result = False break self.assert_(result, msg) def test_repeat_3(self): "repeatability #3" z0 = calc(100*self.x, self.y) / (100 * 100) msg = '' result = True for i in xrange(self.nsim): x, y = load() z = calc(100*x, y) / (100 * 100) if (z != z0).any(): msg = 'Max difference = %g' % abs((z - z0)/z0).max() result = False break self.assert_(result, msg) def testsuite(): unit = unittest.TestLoader().loadTestsFromTestCase s = [] s.append(unit(Test_repeat)) return unittest.TestSuite(s) def run(): suite = testsuite() unittest.TextTestRunner(verbosity=2).run(suite) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] reading *big* inhomogenous text matrices *fast*?
One other potential downside of using python lists to accumulate numbers is that you are storing python objects (python ints or floats, or...) rather than raw numbers, which has got to incur some memory overhead. How does array.array perform in this context? It has an append() method, and one would hope it uses a similar memory allocation scheme. Also, does numpy convert array.array objects to numpy arrays more efficiently? It could, of course, but someone would have to have written the special case code. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [EMAIL PROTECTED] ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] min() of array containing NaN
2008/8/14 Norbert Nemec <[EMAIL PROTECTED]>: > Travis E. Oliphant wrote: >> NAN's don't play well with comparisons because comparison with them is >> undefined.See numpy.nanmin >> > This is not true! Each single comparison with a NaN has a well defined > outcome. The difficulty is only that certain logical assumptions do not > hold any more when NaNs are involved (e.g. [A [not(A>=B)]). Assuming an IEEE compliant processor and C compiler, it > should be possible to code a NaN safe min routine without additional > overhead. Sadly, it's not possible without extra overhead. Specifically: the NaN-ignorant implementation does a single comparison between each array element and a placeholder, and decides based on the result which to keep. If you try to rewrite the comparison to do the right thing when a NaN is involved, you get stuck: any comparison with a NaN on either side always returns False, so you cannot distinguish between the temporary being a NaN and the new element being a non-NaN (keep the temporary) and the temporary being a non-NaN and the new element being a NaN (replace the temporary). If you're willing to do two tests, sure, but that's overhead (and probably comparable to an isnan). If you're willing to do arithmetic you might even be able to pull it off, since NaNs tend to propagate: if (newhttp://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique1d returning indices
Stéfan van der Walt wrote: > 2008/8/13 Robert Cimrman <[EMAIL PROTECTED]>: >>> Yeah, that's why I think not many people used the extra return anyway. >>> I will do as you say unless somebody steps in. >> ... but not before August 25, as I am about to leave on holidays and >> have not managed to do it yet. I do not want to mess with the SVN now as >> I would not be able to follow it. >> >> If you think the patch is ok, and have time, then go ahead :) > > Thanks, Robert. This has been merged in r5639. Nice, thank you. After I return I will go through the other arraysetops functions and add return_index-like flags when appropriate. r. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] min() of array containing NaN
Travis E. Oliphant wrote: > Thomas J. Duck wrote: > >> Determining the minimum value of an array that contains NaN produces >> a surprising result: >> >> >>> x = numpy.array([0,1,2,numpy.nan,4,5,6]) >> >>> x.min() >> 4.0 >> >> I expected 0.0. Is this the intended behaviour or a bug? I am using >> numpy 1.1.1. >> >> > NAN's don't play well with comparisons because comparison with them is > undefined.See numpy.nanmin > This is not true! Each single comparison with a NaN has a well defined outcome. The difficulty is only that certain logical assumptions do not hold any more when NaNs are involved (e.g. [A=B)]). Assuming an IEEE compliant processor and C compiler, it should be possible to code a NaN safe min routine without additional overhead. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion