Re: [Numpy-discussion] Embedded NumPy LAPACK errors
On Sat, Jan 5, 2013 at 1:03 PM, Robin robi...@gmail.com wrote: If not, is there a reasonable way to build numpy.linalg such that it interfaces with MKL correctly ? I managed to get this to work in the end. Since Matlab uses MKL with ILP64 interface it is not possible to get Numpy to use that without modifications to all the lapack calls. However, I was able to keep the two different versions of lapack seperate. The first step is to build numpy to link statically against MKL. I wasn't sure how to get distutils to do this so I copied all the mkl static .a libaries to a temporary directory and pointed numpy to that to force the issue (so dynamic linking wasn't an option). Even with that it still uses the Lapack from the Matlab dynamic global symbols. The trick was adding the linker flag -Bsymbolic which means lapack_lite calls to lapack use the statically linked local copies. With these changes everything appears to work. There are two test failures (below) which do not appear when running the same Numpy build outside of Matlab but they don't seem so severe. So: [robini@robini2-pc numpy]$ cat site.cfg [mkl] search_static_first = true library_dirs = /tmp/intel64 include_dirs = /opt/intel/mkl/include #mkl_libs = mkl_sequential, mkl_intel_lp64, mkl_core, mkl_lapack95_lp64, mkl_blas95_lp64 mkl_libs = mkl_lapack95, mkl_blas95, mkl_intel_lp64, mkl_sequential, mkl_core, svml, imf, irc lapack_libs = [robini@robini2-pc numpy]$ ls /tmp/intel64/ libimf.a libmkl_gnu_thread.a libirc.a libmkl_intel_ilp64.a libmkl_blacs_ilp64.a libmkl_intel_lp64.a libmkl_blacs_intelmpi_ilp64.a libmkl_intel_sp2dp.a libmkl_blacs_intelmpi_lp64.a libmkl_intel_thread.a libmkl_blacs_lp64.alibmkl_lapack95_ilp64.a libmkl_blacs_openmpi_ilp64.a libmkl_lapack95_lp64.a libmkl_blacs_openmpi_lp64.alibmkl_pgi_thread.a libmkl_blacs_sgimpt_ilp64.alibmkl_scalapack_ilp64.a libmkl_blacs_sgimpt_lp64.a libmkl_scalapack_lp64.a libmkl_blas95_ilp64.a libmkl_sequential.a libmkl_blas95_lp64.a libmkl_solver_ilp64.a libmkl_cdft_core.a libmkl_solver_ilp64_sequential.a libmkl_core.a libmkl_solver_lp64.a libmkl_gf_ilp64.a libmkl_solver_lp64_sequential.a libmkl_gf_lp64.a libsvml.a in numpy/distutils/intelccompiler.py: class IntelEM64TCCompiler(UnixCCompiler): A modified Intel x86_64 compiler compatible with a 64bit gcc built Python. compiler_type = 'intelem' cc_exe = 'icc -m64 -fPIC' cc_args = -fPIC def __init__ (self, verbose=0, dry_run=0, force=0): UnixCCompiler.__init__ (self, verbose,dry_run, force) self.cc_exe = 'icc -m64 -fPIC -O3 -fomit-frame-pointer' compiler = self.cc_exe self.set_executables(compiler=compiler, compiler_so=compiler, compiler_cxx=compiler, linker_exe=compiler, linker_so=compiler + ' -shared -static-intel -Bsymbolic') Test failures (test_special_values also fails outside Matlab, but the other 2 only occur when embedded): == FAIL: test_umath.test_nextafterl -- Traceback (most recent call last): File /opt/epd-7.3/lib/python2.7/site-packages/nose/case.py, line 197, in runTest self.test(*self.arg) File /home/robini/slash/lib/python2.7/site-packages/numpy/testing/decorators.py, line 215, in knownfailer return f(*args, **kwargs) File /home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py, line 1123, in test_nextafterl return _test_nextafter(np.longdouble) File /home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py, line 1108, in _test_nextafter assert np.nextafter(one, two) - one == eps AssertionError == FAIL: test_umath.test_spacingl -- Traceback (most recent call last): File /opt/epd-7.3/lib/python2.7/site-packages/nose/case.py, line 197, in runTest self.test(*self.arg) File /home/robini/slash/lib/python2.7/site-packages/numpy/testing/decorators.py, line 215, in knownfailer return f(*args, **kwargs) File /home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py, line 1149, in test_spacingl return _test_spacing(np.longdouble) File /home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py, line 1132, in _test_spacing assert np.spacing(one) == eps AssertionError == FAIL: test_special_values (test_umath_complex.TestClog) -- Traceback (most recent call last): File
Re: [Numpy-discussion] Which Python to use for Mac binaries
On Mon, Jan 7, 2013 at 7:31 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Jan 6, 2013 at 10:40 PM, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers ralf.gomm...@gmail.com wrote: Which exact Python do we need to use on Mac? Do we need to use the binary installer from python.org? Yes, the one from python.org. Or can I install it from source? you could install from source using the same method that the python.org binaries are built -- there is a script with the source to do that, though I'm not sure what the point of that would be. The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5. It would be great to continue support for that, though I wonder how many people still need it -- I don't think Apple supports 10.5 anymore, for instance. The 10.7 -- 10.6 support hasn't been checked, but I wouldn't trust it. I have a 10.6 machine, so I can compile those binaries if needed. That would be better, but it would also be nice to check how building on 10.7 works. Avoid using system Python for anything. The first thing to do on any new OS X system is install Python some other way, preferably from python.org. +1 Last note: bdist_mpkg is unmaintained and doesn't support Python 3.x. Most recent version is at: https://github.com/matthew-brett/bdist_mpkg, for previous versions numpy releases I've used that at commit e81a58a471 There has been recent discussion on the pythonmac list about this -- some waffling about how important it is -- though I think it would be good to keep it up to date. I updated my fork of bdist_mpkg with Python 3k support. It doesn't have any tests that I could see, but I've run it on python 2.6 and 3.2 and 3.3 on one of my packages as a first pass. If we want 3.x binaries, then we should fix that or (preferably) build binaries with Bento. Bento has grown support for mpkg's; I'm not sure how robust that is. So maybe bento is a better route than bdist_mpkg -- this is worth discussion on teh pythonmac list. David - can you give a status update on that? It is more a starting point than anything else, and barely tested. I would advise against using it ATM. thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Which Python to use for Mac binaries
On Mon, Jan 7, 2013 at 11:36 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Tue, Jan 8, 2013 at 3:12 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Sun, Jan 6, 2013 at 3:21 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, Currently the NumPy binaries are built using the pavement.py script, which uses the following Pythons: MPKG_PYTHON = { 2.5: [/Library/Frameworks/Python.framework/Versions/2.5/bin/python], 2.6: [/Library/Frameworks/Python.framework/Versions/2.6/bin/python], 2.7: [/Library/Frameworks/Python.framework/Versions/2.7/bin/python], 3.1: [/Library/Frameworks/Python.framework/Versions/3.1/bin/python3], 3.2: [/Library/Frameworks/Python.framework/Versions/3.2/bin/python3], 3.3: [/Library/Frameworks/Python.framework/Versions/3.3/bin/python3], } So for example I can easily create the 2.6 binary if that Python is pre-installed on the Mac box that I am using. On one of the Mac boxes that I am using, the 2.7 is missing, so are 3.1, 3.2 and 3.3. So I was thinking of updating my Fabric fab file to automatically install all Pythons from source and build against that, just like I do for Wine. Which exact Python do we need to use on Mac? Do we need to use the binary installer from python.org? Yes, the one from python.org. Or can I install it from source? Finally, for which Python versions should we provide binary installers for Mac? For reference, the 1.6.2 had installers for 2.5, 2.6 and 2.7 only for OS X 10.3. There is only 2.7 version for OS X 10.6. The provided installers and naming scheme should match what's done for Python itself on python.org. The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5. This is kind of hard to come by these days, but Vincent Davis maintains a build machine for numpy and scipy. That's already set up correctly, so all you have to do is connect to it via ssh, check out v.17.0 in ~/Code/numpy, check in release.sh that the section for OS X 10.6 is disabled and for 10.5 enabled and run it. OS X 10.6 broke support for previous versions in some subtle ways, so even when using the 10.4 SDK numpy compiled on 10.6 won't run on 10.5. As long as we're supporting 10.5 you therefore need to compile on it. The 10.7 -- 10.6 support hasn't been checked, but I wouldn't trust it. I have a 10.6 machine, so I can compile those binaries if needed. Also, what is the meaning of the following piece of code in pavement.py: def _build_mpkg(pyver): # account for differences between Python 2.7.1 versions from python.org if os.environ.get('MACOSX_DEPLOYMENT_TARGET', None) == 10.6: ldflags = -undefined dynamic_lookup -bundle -arch i386 -arch x86_64 -Wl,-search_paths_first else: ldflags = -undefined dynamic_lookup -bundle -arch i386 -arch ppc -Wl,-search_paths_first ldflags += -L%s % os.path.join(os.path.dirname(__file__), build) The 10.6 binaries support only Intel Macs, both 32-bit and 64-bit. The 10.3 binaries support PPC Macs and 32-bit Intel. That's what the above does. Note that we simply follow the choice made by the Python release managers here. if pyver == 2.5: sh(CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg % (ldflags, .join(MPKG_PYTHON[pyver]))) else: sh(LDFLAGS='%s' %s setupegg.py bdist_mpkg % (ldflags, .join(MPKG_PYTHON[pyver]))) This is necessary because in Python 2.5, distutils asks for gcc instead of gcc-4.0, so you may get the wrong one without CC=gcc-4.0. From Python 2.6 on this was fixed. In particular, the last line gets executed and it then fails with: paver dmg -p 2.6 --- pavement.dmg --- pavement.clean LDFLAGS='-undefined dynamic_lookup -bundle -arch i386 -arch ppc -Wl,-search_paths_first -Lbuild' /Library/Frameworks/Python.framework/Versions/2.6/bin/python setupegg.py bdist_mpkg Traceback (most recent call last): File setupegg.py, line 17, in module from setuptools import setup ImportError: No module named setuptools The reason is (I think) that if the Python binary is called explicitly with /Library/Frameworks/Python.framework/Versions/2.6/bin/python, then the paths are not setup properly in virtualenv, and thus setuptools (which is only installed in virtualenv, but not in system Python) fails to import. The solution is to simply apply this patch: Avoid using system Python for anything. The first thing to do on any new OS X system is install Python some other way, preferably from python.org. diff --git a/pavement.py b/pavement.py index e693016..0c637f8 100644 --- a/pavement.py +++ b/pavement.py @@ -449,7 +449,7 @@ def _build_mpkg(pyver): if pyver == 2.5:
Re: [Numpy-discussion] Which Python to use for Mac binaries
On Mon, Jan 7, 2013 at 10:23 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: http://www.commandlinefu.com/commands/view/2031/install-an-mpkg-from-the-command-line-on-osx This requires root access. Without sudo, I get: $ installer -pkg /Volumes/Python\ 2.7.3/Python.mpkg/ -target ondrej installer: This package requires authentication to install. and since I don't have root access, it doesn't work. So one way around it would be to install python from source, that shouldn't require root access. hmm -- this all may be a trick -- both the *.mpkg and the standard build put everything in /Library/Frameworks/Python -- which is where it belongs. Bu tif you need root access to write there, then there is a problem. I'm sure a non-root build could put everything in the users' home directory, then packages built against that would have their paths messed up. What's odd is that I'm pretty sure I've been able to point+click install those without sudo...(I could recall incorrectly). This would be a good question for the pythonmac list -- low traffic, but there are some very smart and helpful folks there: http://mail.python.org/mailman/listinfo/pythonmac-sig But I am not currently sure what to do with it. The Python.mpkg directory seems to contain the sources. It should be possible to unpack a mpkg by hand, but it contains both the contents, and various instal scripts, so that seems like a really ugly solution. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
Hi, On Mon, Jan 7, 2013 at 10:58 PM, Andrew Collette andrew.colle...@gmail.com wrote: Hi, Taking 2) first, in this example: return self.f[dataset_name][...] + heightmap assuming it is not going to upcast, would you rather it overflow than raise an error? Why? The second seems more explicit and sensible to me. Yes, I think this (the 1.5 overflow behavior) was a bit odd, if easy to understand. For 1) - of course the upcasting in 1.6 is only going to work some of the time. For example: In [2]: np.array([127], dtype=np.int8) * 1000 Out[2]: array([-4072], dtype=int16) So - you'll get something, but there's a reasonable chance you won't get what you were expecting. Of course that is true for 1.5 as well, but at least the rule there is simpler and so easier - in my opinion - to think about. Part of what my first example was trying to demonstrate was that the function author assumed arrays and scalars obeyed the same rules for addition. For example, if data were int8 and heightmap were an int16 array with a max value of 32767, and the data had a max value in the same spot with e.g. 10, then the addition would overflow at that position, even with the int16 result. That's how array addition works in numpy, and as I understand it that's not slated to change. But when we have a scalar of value 32767 (which fits in int16 but not int8), we are proposing instead to do nothing under the assumption that it's an error. In summary: yes, there are some odd results, but they're consistent with the rules for addition elsewhere in numpy, and I would prefer that to treating this case as an error. I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Returning to the question of 1.5 behavior vs the error - I think you are saying you prefer the 1.5 silent-but-deadly approach to the error, but I think I still haven't grasped why. Maybe someone else can explain it? The holiday has not been good to my brain. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
Hi, I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Maybe this is the source of my confusion... why should scalars be different from arrays? They should follow the same rules, as closely as possible. If a scalar value would fit in an int16, why not add it using the rules for an int16 array? Returning to the question of 1.5 behavior vs the error - I think you are saying you prefer the 1.5 silent-but-deadly approach to the error, but I think I still haven't grasped why. Maybe someone else can explain it? The holiday has not been good to my brain. In a strict choice between 1.5-behavior and errors, I'm not sure which one I would pick. I don't think either is particularly useful. Of course, other members of the community would likely have a different view, especially those who got used to the 1.5 behavior. Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Linear least squares
Hi, i did some profiling and testing of my data-fitting code. One of its core parts is doing some linear least squares, until now i used np.linalg.lstsq. Most of time the size a is (250, 7) and of b is (250, 800). Today i compared it to using pinv manually, to my surprise, it is much faster. I taught, both are svd based? Too check another computer i also run my test on wakari: https://www.wakari.io/nb/tillsten/linear_least_squares Also using scipy.linalg instead of np.linalg is slower for both cases. My numpy and scipy are both from C. Gohlkes website. If my result is valid in general, maybe the lstsq function should be changed or a hint should be added to the documentation. greetings Till ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
2013/1/8 Andrew Collette andrew.colle...@gmail.com: Hi, I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Maybe this is the source of my confusion... why should scalars be different from arrays? They should follow the same rules, as closely as possible. If a scalar value would fit in an int16, why not add it using the rules for an int16 array? As I mentioned in another post, I also agree that it would make things simpler and safer to just yield the same result as if we were using a one-element array. My understanding of the motivation for the rule scalars do not upcast arrays unless they are of a fundamentally different type is that it avoids accidentally upcasting arrays in operations like x + 1 (for instance if x is a float32 array, the upcast would yield a float64 result, and if x is an int16, it would yield int64), which may waste memory. I find it a useful feature, however I'm not sure it's worth the headaches it can lead to. However, my first reaction at the idea of dropping this rule altogether is that it would lead to a long and painful deprecation process. I may be wrong though, I really haven't thought about it much. -=- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On 1/8/2013 1:48 PM, Olivier Delalleau wrote: As I mentioned in another post, I also agree that it would make things simpler and safer to just yield the same result as if we were using a one-element array. Yes! Anything else is going to drive people insane, especially new users. Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On 8 Jan 2013 17:24, Andrew Collette andrew.colle...@gmail.com wrote: Hi, I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Maybe this is the source of my confusion... why should scalars be different from arrays? They should follow the same rules, as closely as possible. If a scalar value would fit in an int16, why not add it using the rules for an int16 array? The problem is that rule for arrays - and for every other party of numpy in general - are that we *don't* pick types based on values. Numpy always uses input types to determine output types, not input values. # This value fits in an int8 In [5]: a = np.array([1]) # And yet... In [6]: a.dtype Out[6]: dtype('int64') In [7]: small = np.array([1], dtype=np.int8) # Computing 1 + 1 doesn't need a large integer... but we use one In [8]: (small + a).dtype Out[8]: dtype('int64') Python scalars have an unambiguous types: a Python 'int' is a C 'long', and a Python 'float' is a C 'double'. And these are the types that np.array() converts them to. So it's pretty unambiguous that using the same rules for arrays and scalars would mean, ignore the value of the scalar, and in expressions like np.array([1], dtype=np.int8) + 1 we should always upcast to int32/int64. The problem is that this makes working with narrow types very awkward for no real benefit, so everyone pretty much seems to want *some* kind of special case. These are both absolutely special cases: numarray through 1.5: in a binary operation, if one operand has ndim==0 and the other has ndim0, ignore the width of the ndim==0 operand. 1.6, your proposal: in a binary operation, if one operand has ndim==0 and the other has ndim0, downcast the ndim==0 item to the smallest width that is consistent with its value and the other operand's type. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On Tue, Jan 8, 2013 at 7:28 PM, Alan G Isaac alan.is...@gmail.com wrote: On 1/8/2013 1:48 PM, Olivier Delalleau wrote: As I mentioned in another post, I also agree that it would make things simpler and safer to just yield the same result as if we were using a one-element array. Yes! Anything else is going to drive people insane, especially new users. New users don't use narrow-width dtypes... it's important to remember in this discussion that in numpy, non-standard dtypes only arise when users explicitly request them, so there's some expressed intention there that we want to try and respect. (As opposed to the type associated with Python manifest constants like the 2 in 2 * a, which probably no programmer looked at and thought hmm, what I want here is 2-as-an-int64.) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
Hi Nathaniel, (Responding to both your emails) The problem is that rule for arrays - and for every other party of numpy in general - are that we *don't* pick types based on values. Numpy always uses input types to determine output types, not input values. Yes, of course... array operations are governed exclusively by their dtypes. It seems to me that, using the language of the bug report (2878), if we have this: result = arr + scalar I would argue that our job is, rather than to pick result.dtype, to pick scalar.dtype, and apply the normal rules for array operations. So it's pretty unambiguous that using the same rules for arrays and scalars would mean, ignore the value of the scalar, and in expressions like np.array([1], dtype=np.int8) + 1 we should always upcast to int32/int64. Ah, but that's my point: we already, in 1.6, ignore the intrinsic width of the scalar and effectively substitute one based on it's value: a = np.array([1], dtype=int8) (a + 1).dtype dtype('int8') (a + 1000).dtype dtype('int16') (a + 9).dtype dtype('int32') (a + 2**40).dtype dtype('int64') 1.6, your proposal: in a binary operation, if one operand has ndim==0 and the other has ndim0, downcast the ndim==0 item to the smallest width that is consistent with its value and the other operand's type. Yes, exactly. I'm not trying to propose a completely new behavior: as I mentioned (although very far upthread), this is the mental model I had of how things worked in 1.6 already. New users don't use narrow-width dtypes... it's important to remember in this discussion that in numpy, non-standard dtypes only arise when users explicitly request them, so there's some expressed intention there that we want to try and respect. I would respectfully disagree. One example I cited was that when dealing with HDF5, it's very common to get int16's (and even int8's) when reading from a file because they are used to save disk space. All a new user has to do to get int8's from a file they got from someone else is: data = some_hdf5_file['MyDataset'][...] This is a general issue applying to data which is read from real-world external sources. For example, digitizers routinely represent their samples as int8's or int16's, and you apply a scale and offset to get a reading in volts. As you say, the proposed change will prevent accidental upcasting by people who selected int8/int16 on purpose to save memory, by notifying them with a ValueError. But another assumption we could make is that people who choose to use narrow types for performance reasons should be expected to use caution when performing operations that might upcast, and that the default behavior should be to follow the normal array rules as closely as possible, as is done in 1.6. Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On Tue, 2013-01-08 at 19:59 +, Nathaniel Smith wrote: On 8 Jan 2013 17:24, Andrew Collette andrew.colle...@gmail.com wrote: Hi, I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Maybe this is the source of my confusion... why should scalars be different from arrays? They should follow the same rules, as closely as possible. If a scalar value would fit in an int16, why not add it using the rules for an int16 array? The problem is that rule for arrays - and for every other party of numpy in general - are that we *don't* pick types based on values. Numpy always uses input types to determine output types, not input values. # This value fits in an int8 In [5]: a = np.array([1]) # And yet... In [6]: a.dtype Out[6]: dtype('int64') In [7]: small = np.array([1], dtype=np.int8) # Computing 1 + 1 doesn't need a large integer... but we use one In [8]: (small + a).dtype Out[8]: dtype('int64') Python scalars have an unambiguous types: a Python 'int' is a C 'long', and a Python 'float' is a C 'double'. And these are the types that np.array() converts them to. So it's pretty unambiguous that using the same rules for arrays and scalars would mean, ignore the value of the scalar, and in expressions like np.array([1], dtype=np.int8) + 1 we should always upcast to int32/int64. The problem is that this makes working with narrow types very awkward for no real benefit, so everyone pretty much seems to want *some* kind of special case. These are both absolutely special cases: numarray through 1.5: in a binary operation, if one operand has ndim==0 and the other has ndim0, ignore the width of the ndim==0 operand. 1.6, your proposal: in a binary operation, if one operand has ndim==0 and the other has ndim0, downcast the ndim==0 item to the smallest width that is consistent with its value and the other operand's type. Well, that leaves the maybe not quite implausible proposal of saying that numpy scalars behave like arrays with ndim0, but python scalars behave like they do in 1.6. to allow for easier working with narrow types. Sebastian -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On Tue, Jan 8, 2013 at 12:43 PM, Alan G Isaac alan.is...@gmail.com wrote: New users don't use narrow-width dtypes... it's important to remember 1. I think the first statement is wrong. Control over dtypes is a good reason for a new use to consider NumPy. Absolutely. Because NumPy supports broadcasting, it is natural for array-array operations and scalar-array operations to be consistent. I believe anything else will be too confusing. Theoretically true -- but in practice, the problem arrises because it is easy to write literals with the standard python scalars, so one is very likely to want to do: arr = np.zeros((m,n), dtype=np.uint8) arr += 3 and not want an upcast. I don't think we want to require that to be spelled: arr += np.array(3, dtype=np.uint8) so that defines desired behaviour for array-scalar. but what should this do? arr1 = np.zeros((m,n), dtype=np.uint8) arr2 = np.zeros((m,n), dtype=np.uint16) arr1 + arr2 or arr2 + arr1 upcast in both cases? use the type of the left operand? raise an exception? matching the array- scalar approach would mean always keeping the smallest type, which is unlikely to be what is wanted. Having it be dependent on order would be really ripe fro confusion. raising an exception might have been the best idea from the beginning. (though I wouldn't want that in the array- scalar case). So perhaps having a scalar array distinction, while quite impure, is the best compromise. NOTE: no matter how you slice it, at some point reducing operations produce something different (that can no longer be reduced), so I do think it would be nice for rank-zero arrays and scalars to be the same thing (in this regard and others) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
2013/1/8 Sebastian Berg sebast...@sipsolutions.net: On Tue, 2013-01-08 at 19:59 +, Nathaniel Smith wrote: On 8 Jan 2013 17:24, Andrew Collette andrew.colle...@gmail.com wrote: Hi, I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Maybe this is the source of my confusion... why should scalars be different from arrays? They should follow the same rules, as closely as possible. If a scalar value would fit in an int16, why not add it using the rules for an int16 array? The problem is that rule for arrays - and for every other party of numpy in general - are that we *don't* pick types based on values. Numpy always uses input types to determine output types, not input values. # This value fits in an int8 In [5]: a = np.array([1]) # And yet... In [6]: a.dtype Out[6]: dtype('int64') In [7]: small = np.array([1], dtype=np.int8) # Computing 1 + 1 doesn't need a large integer... but we use one In [8]: (small + a).dtype Out[8]: dtype('int64') Python scalars have an unambiguous types: a Python 'int' is a C 'long', and a Python 'float' is a C 'double'. And these are the types that np.array() converts them to. So it's pretty unambiguous that using the same rules for arrays and scalars would mean, ignore the value of the scalar, and in expressions like np.array([1], dtype=np.int8) + 1 we should always upcast to int32/int64. The problem is that this makes working with narrow types very awkward for no real benefit, so everyone pretty much seems to want *some* kind of special case. These are both absolutely special cases: numarray through 1.5: in a binary operation, if one operand has ndim==0 and the other has ndim0, ignore the width of the ndim==0 operand. 1.6, your proposal: in a binary operation, if one operand has ndim==0 and the other has ndim0, downcast the ndim==0 item to the smallest width that is consistent with its value and the other operand's type. Well, that leaves the maybe not quite implausible proposal of saying that numpy scalars behave like arrays with ndim0, but python scalars behave like they do in 1.6. to allow for easier working with narrow types. I know I already said it, but I really think it'd be a bad idea to have a different behavior between Python scalars and Numpy scalars, because I think most people would expect them to behave the same (when knowing what dtype is a Python float / int). It could lead to very tricky bugs to handle them differently. -=- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
2013/1/8 Chris Barker - NOAA Federal chris.bar...@noaa.gov: On Tue, Jan 8, 2013 at 12:43 PM, Alan G Isaac alan.is...@gmail.com wrote: New users don't use narrow-width dtypes... it's important to remember 1. I think the first statement is wrong. Control over dtypes is a good reason for a new use to consider NumPy. Absolutely. Because NumPy supports broadcasting, it is natural for array-array operations and scalar-array operations to be consistent. I believe anything else will be too confusing. Theoretically true -- but in practice, the problem arrises because it is easy to write literals with the standard python scalars, so one is very likely to want to do: arr = np.zeros((m,n), dtype=np.uint8) arr += 3 and not want an upcast. Note that the behavior with in-place operations is also an interesting topic, but slightly different, since there is no ambiguity on the dtype of the output (which is required to match that of the input). I was actually thinking about this earlier today but decided not to mention it yet to avoid making the discussion even more complex ;) The key question is whether the operand should be cast before the operation, or whether to perform the operation in an upcasted array, then downcast it back into the original version. I actually thnk the latter makes more sense (and that's actually what's being done I think in 1.6.1 from a few tests I tried), and to me this is an argument in favor of the upcast behavior for non-inplace operations. -=- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On 01/08/2013 06:20 PM, Andrew Collette wrote: Hi, I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Maybe this is the source of my confusion... why should scalars be different from arrays? They should follow the same rules, as closely Scalars (as in, Python float/int) are inherently different because the user didn't specify a dtype. For an array, there was always *some* point where the user chose, explicitly or implicitly, a dtype. as possible. If a scalar value would fit in an int16, why not add it using the rules for an int16 array? So you are saying that, for an array x, you want x + random.randint(10) to produce an array with a random dtype? So that after carefully testing that your code works, suddenly a different draw (or user input, or whatever) causes a different set of dtypes to ripple through your entire program? To me this is something that must be avoided at all costs. It's hard enough to reason about the code one writes without throwing in complete randomness (by which I mean, types determined by values). Dag Sverre Returning to the question of 1.5 behavior vs the error - I think you are saying you prefer the 1.5 silent-but-deadly approach to the error, but I think I still haven't grasped why. Maybe someone else can explain it? The holiday has not been good to my brain. In a strict choice between 1.5-behavior and errors, I'm not sure which one I would pick. I don't think either is particularly useful. Of course, other members of the community would likely have a different view, especially those who got used to the 1.5 behavior. Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
On 01/08/2013 10:32 PM, Dag Sverre Seljebotn wrote: On 01/08/2013 06:20 PM, Andrew Collette wrote: Hi, I think you are voting strongly for the current casting rules, because they make it less obvious to the user that scalars are different from arrays. Maybe this is the source of my confusion... why should scalars be different from arrays? They should follow the same rules, as closely Scalars (as in, Python float/int) are inherently different because the user didn't specify a dtype. For an array, there was always *some* point where the user chose, explicitly or implicitly, a dtype. as possible. If a scalar value would fit in an int16, why not add it using the rules for an int16 array? So you are saying that, for an array x, you want x + random.randint(10) to produce an array with a random dtype? So that after carefully testing that your code works, suddenly a different draw (or user input, or whatever) causes a different set of dtypes to ripple through your entire program? To me this is something that must be avoided at all costs. It's hard enough to reason about the code one writes without throwing in complete randomness (by which I mean, types determined by values). Oh, sorry, given that this is indeed the present behaviour, this just sounds silly. I should have said it's something I dislike about the present behaviour then. Dag Sverre Dag Sverre Returning to the question of 1.5 behavior vs the error - I think you are saying you prefer the 1.5 silent-but-deadly approach to the error, but I think I still haven't grasped why. Maybe someone else can explain it? The holiday has not been good to my brain. In a strict choice between 1.5-behavior and errors, I'm not sure which one I would pick. I don't think either is particularly useful. Of course, other members of the community would likely have a different view, especially those who got used to the 1.5 behavior. Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
Hi Dag, So you are saying that, for an array x, you want x + random.randint(10) to produce an array with a random dtype? Under the proposed behavior, depending on the dtype of x and the value from random, this would sometimes add-with-rollover and sometimes raise ValueError. Under the 1.5 behavior, it would always add-with-rollover and preserve the type of x. Under the 1.6 behavior, it produces a range of dtypes, each of which is at least large enough to hold the random int. Personally, I prefer the third option, but I strongly prefer either the second or the third to the first. Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Linear least squares
On Tue, Jan 8, 2013 at 1:17 PM, Till Stensitz mail.t...@gmx.de wrote: Hi, i did some profiling and testing of my data-fitting code. One of its core parts is doing some linear least squares, until now i used np.linalg.lstsq. Most of time the size a is (250, 7) and of b is (250, 800). My guess is that this depends a lot on the shape try a is (1, 7) and b is (1, 1) Josef Today i compared it to using pinv manually, to my surprise, it is much faster. I taught, both are svd based? Too check another computer i also run my test on wakari: https://www.wakari.io/nb/tillsten/linear_least_squares Also using scipy.linalg instead of np.linalg is slower for both cases. My numpy and scipy are both from C. Gohlkes website. If my result is valid in general, maybe the lstsq function should be changed or a hint should be added to the documentation. greetings Till ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
Le mardi 8 janvier 2013, Andrew Collette a écrit : Hi Dag, So you are saying that, for an array x, you want x + random.randint(10) to produce an array with a random dtype? Under the proposed behavior, depending on the dtype of x and the value from random, this would sometimes add-with-rollover and sometimes raise ValueError. Under the 1.5 behavior, it would always add-with-rollover and preserve the type of x. Under the 1.6 behavior, it produces a range of dtypes, each of which is at least large enough to hold the random int. Personally, I prefer the third option, but I strongly prefer either the second or the third to the first. Andrew Keep in mind that in the third option (current 1.6 behavior) the dtype is large enough to hold the random number, but not necessarily to hold the result. So for instance if x is an int16 array with only positive values, the result of this addition may contain negative values (or not, depending on the number being drawn). That's the part I feel is flawed with this behavior, it is quite unpredictable. -=- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
Hi, Keep in mind that in the third option (current 1.6 behavior) the dtype is large enough to hold the random number, but not necessarily to hold the result. So for instance if x is an int16 array with only positive values, the result of this addition may contain negative values (or not, depending on the number being drawn). That's the part I feel is flawed with this behavior, it is quite unpredictable. Yes, certainly. But in either the proposed or 1.5 behavior, if the values in x are close to the limits of the type, this can happen also. Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
Le mardi 8 janvier 2013, Andrew Collette a écrit : Hi, Keep in mind that in the third option (current 1.6 behavior) the dtype is large enough to hold the random number, but not necessarily to hold the result. So for instance if x is an int16 array with only positive values, the result of this addition may contain negative values (or not, depending on the number being drawn). That's the part I feel is flawed with this behavior, it is quite unpredictable. Yes, certainly. But in either the proposed or 1.5 behavior, if the values in x are close to the limits of the type, this can happen also. My previous email may not have been clear enough, so to be sure: in my above example, if the random number is 3, then the result may contain negative values (int16). If the random number is 5, then the result will only contain positive values (upcast to int32). Do you believe it is a good behavior? -=- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Linear least squares
On Tue, Jan 8, 2013 at 6:17 PM, Till Stensitz mail.t...@gmx.de wrote: Hi, i did some profiling and testing of my data-fitting code. One of its core parts is doing some linear least squares, until now i used np.linalg.lstsq. Most of time the size a is (250, 7) and of b is (250, 800). Today i compared it to using pinv manually, to my surprise, it is much faster. I taught, both are svd based? np.linalg.lstsq is written in Python (calling LAPACK for the SVD), so you could run the line_profiler over it and see where the slowdown is. An obvious thing is that it always computes residuals, which could be costly; if your pinv code isn't doing that then it's not really comparable. (Though might still be well-suited for your actual problem.) Depending on how well-conditioned your problems are, and how much speed you need, there are faster ways than pinv as well. (Going via qr might or might not, going via cholesky almost certainly will be.) -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?
Hi Olivier, Yes, certainly. But in either the proposed or 1.5 behavior, if the values in x are close to the limits of the type, this can happen also. My previous email may not have been clear enough, so to be sure: in my above example, if the random number is 3, then the result may contain negative values (int16). If the random number is 5, then the result will only contain positive values (upcast to int32). Do you believe it is a good behavior? Under the proposed behavior, if the random number is 3, then you *still* may have negative values, and if it's 5, you get ValueError. No, I don't think the behavior you outlined is particularly nice, but (1) it's consistent with array addition elsewhere, at least in my mind, and (2) I don't think that sometimes getting a ValueError is a big improvement. Although I still prefer automatic upcasting, this discussion is really making me see the value of a nice, simple rule like in 1.5. :) Andrew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Linear least squares
On Tue, Jan 8, 2013 at 11:17 AM, Till Stensitz mail.t...@gmx.de wrote: Hi, i did some profiling and testing of my data-fitting code. One of its core parts is doing some linear least squares, until now i used np.linalg.lstsq. Most of time the size a is (250, 7) and of b is (250, 800). Today i compared it to using pinv manually, to my surprise, it is much faster. I taught, both are svd based? Too check another computer i also run my test on wakari: https://www.wakari.io/nb/tillsten/linear_least_squares Also using scipy.linalg instead of np.linalg is slower for both cases. My numpy and scipy are both from C. Gohlkes website. If my result is valid in general, maybe the lstsq function should be changed or a hint should be added to the documentation. Do you know if both are using Atlas (MKL)? Numpy will compile a default unoptimized version if there is no Atlas (or MKL). Also, lstsq is a direct call to an LAPACK least squares function, so the underlying functions themselves are probably different for lstsq and pinv. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Bug with ufuncs made with frompyfunc
A bug causing errors with using methods of ufuncs created with frompyfunc was mentioned on the list over a year ago: http://mail.scipy.org/pipermail/numpy-discussion/2011- September/058501.html Is there any word on the status of this bug? I wasn't able to find a ticket in the bug tracker. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion