Re: [Numpy-discussion] Embedded NumPy LAPACK errors

2013-01-08 Thread Robin
On Sat, Jan 5, 2013 at 1:03 PM, Robin robi...@gmail.com wrote:
 If not, is there a reasonable way to build numpy.linalg such that
 it interfaces with MKL correctly ?

I managed to get this to work in the end. Since Matlab uses MKL with
ILP64 interface it is not possible to get Numpy to use that without
modifications to all the lapack calls. However, I was able to keep the
two different versions of lapack seperate.

The first step is to build numpy to link statically against MKL. I
wasn't sure how to get distutils to do this so I copied all the mkl
static .a libaries to a temporary directory and pointed numpy to that
to force the issue (so dynamic linking wasn't an option).

Even with that it still uses the Lapack from the Matlab dynamic global
symbols. The trick was adding the linker flag -Bsymbolic which means
lapack_lite calls to lapack use the statically linked local copies.
With these changes everything appears to work. There are two test
failures (below) which do not appear when running the same Numpy build
outside of Matlab but they don't seem so severe.

So:
[robini@robini2-pc numpy]$ cat site.cfg
[mkl]
search_static_first = true
library_dirs = /tmp/intel64
include_dirs = /opt/intel/mkl/include
#mkl_libs = mkl_sequential, mkl_intel_lp64, mkl_core,
mkl_lapack95_lp64, mkl_blas95_lp64
mkl_libs = mkl_lapack95, mkl_blas95, mkl_intel_lp64, mkl_sequential,
mkl_core, svml, imf, irc
lapack_libs =

[robini@robini2-pc numpy]$ ls /tmp/intel64/
libimf.a   libmkl_gnu_thread.a
libirc.a   libmkl_intel_ilp64.a
libmkl_blacs_ilp64.a   libmkl_intel_lp64.a
libmkl_blacs_intelmpi_ilp64.a  libmkl_intel_sp2dp.a
libmkl_blacs_intelmpi_lp64.a   libmkl_intel_thread.a
libmkl_blacs_lp64.alibmkl_lapack95_ilp64.a
libmkl_blacs_openmpi_ilp64.a   libmkl_lapack95_lp64.a
libmkl_blacs_openmpi_lp64.alibmkl_pgi_thread.a
libmkl_blacs_sgimpt_ilp64.alibmkl_scalapack_ilp64.a
libmkl_blacs_sgimpt_lp64.a libmkl_scalapack_lp64.a
libmkl_blas95_ilp64.a  libmkl_sequential.a
libmkl_blas95_lp64.a   libmkl_solver_ilp64.a
libmkl_cdft_core.a libmkl_solver_ilp64_sequential.a
libmkl_core.a  libmkl_solver_lp64.a
libmkl_gf_ilp64.a  libmkl_solver_lp64_sequential.a
libmkl_gf_lp64.a   libsvml.a

in numpy/distutils/intelccompiler.py:
class IntelEM64TCCompiler(UnixCCompiler):
 A modified Intel x86_64 compiler compatible with a 64bit gcc
built Python.

compiler_type = 'intelem'
cc_exe = 'icc -m64 -fPIC'
cc_args = -fPIC
def __init__ (self, verbose=0, dry_run=0, force=0):
UnixCCompiler.__init__ (self, verbose,dry_run, force)
self.cc_exe = 'icc -m64 -fPIC -O3 -fomit-frame-pointer'
compiler = self.cc_exe
self.set_executables(compiler=compiler,
 compiler_so=compiler,
 compiler_cxx=compiler,
 linker_exe=compiler,
 linker_so=compiler + ' -shared
-static-intel -Bsymbolic')

Test failures (test_special_values also fails outside Matlab, but the
other 2 only occur when embedded):
==
FAIL: test_umath.test_nextafterl
--
Traceback (most recent call last):
  File /opt/epd-7.3/lib/python2.7/site-packages/nose/case.py, line
197, in runTest
self.test(*self.arg)
  File 
/home/robini/slash/lib/python2.7/site-packages/numpy/testing/decorators.py,
line 215, in knownfailer
return f(*args, **kwargs)
  File 
/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py,
line 1123, in test_nextafterl
return _test_nextafter(np.longdouble)
  File 
/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py,
line 1108, in _test_nextafter
assert np.nextafter(one, two) - one == eps
AssertionError

==
FAIL: test_umath.test_spacingl
--
Traceback (most recent call last):
  File /opt/epd-7.3/lib/python2.7/site-packages/nose/case.py, line
197, in runTest
self.test(*self.arg)
  File 
/home/robini/slash/lib/python2.7/site-packages/numpy/testing/decorators.py,
line 215, in knownfailer
return f(*args, **kwargs)
  File 
/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py,
line 1149, in test_spacingl
return _test_spacing(np.longdouble)
  File 
/home/robini/slash/lib/python2.7/site-packages/numpy/core/tests/test_umath.py,
line 1132, in _test_spacing
assert np.spacing(one) == eps
AssertionError

==
FAIL: test_special_values (test_umath_complex.TestClog)
--
Traceback (most recent call last):
  File 

Re: [Numpy-discussion] Which Python to use for Mac binaries

2013-01-08 Thread David Cournapeau
On Mon, Jan 7, 2013 at 7:31 AM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Sun, Jan 6, 2013 at 10:40 PM, Chris Barker - NOAA Federal
 chris.bar...@noaa.gov wrote:
 On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers ralf.gomm...@gmail.com wrote:
 Which exact Python do we need to use on Mac? Do we need to use the
 binary installer from python.org?

 Yes, the one from python.org.

 Or can I install it from source?

 you could install from source using the same method that the
 python.org binaries are built -- there is a script with the source to
 do that, though I'm not sure what the point of that would be.

 The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X 10.5.

 It would be great to continue support for that, though I wonder how
 many people still need it -- I don't think Apple supports 10.5
 anymore, for instance.

 The 10.7 -- 10.6 support hasn't been checked, but I wouldn't trust it. I
 have a 10.6 machine, so I can compile those binaries if needed.

 That would be better, but it would also be nice to check how building
 on 10.7 works.

 Avoid using system Python for anything. The first thing to do on any new OS
 X system is install Python some other way, preferably from python.org.

 +1

 Last note: bdist_mpkg is unmaintained and doesn't support Python 3.x. Most
 recent version is at: https://github.com/matthew-brett/bdist_mpkg, for
 previous versions numpy releases I've used that at commit e81a58a471

 There has been recent discussion on the pythonmac list about this --
 some waffling about how important it is -- though I think it would be
 good to keep it up to date.

 I updated my fork of bdist_mpkg with Python 3k support.  It doesn't
 have any tests that I could see, but I've run it on python 2.6 and 3.2
 and 3.3 on one of my packages as a first pass.

 If we want 3.x binaries, then we should fix that or (preferably) build
 binaries with Bento. Bento has grown support for mpkg's; I'm not sure how
 robust that is.

 So maybe bento is a better route than bdist_mpkg -- this is worth
 discussion on teh pythonmac list.

 David - can you give a status update on that?

It is more a starting point than anything else, and barely tested. I
would advise against using it ATM.

thanks,
David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Which Python to use for Mac binaries

2013-01-08 Thread Ondřej Čertík
On Mon, Jan 7, 2013 at 11:36 PM, Ralf Gommers ralf.gomm...@gmail.com wrote:



 On Tue, Jan 8, 2013 at 3:12 AM, Ondřej Čertík ondrej.cer...@gmail.com
 wrote:

 On Sun, Jan 6, 2013 at 2:04 AM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:
 
 
 
  On Sun, Jan 6, 2013 at 3:21 AM, Ondřej Čertík ondrej.cer...@gmail.com
  wrote:
 
  Hi,
 
  Currently the NumPy binaries are built using the pavement.py script,
  which uses the following Pythons:
 
  MPKG_PYTHON = {
  2.5:
  [/Library/Frameworks/Python.framework/Versions/2.5/bin/python],
  2.6:
  [/Library/Frameworks/Python.framework/Versions/2.6/bin/python],
  2.7:
  [/Library/Frameworks/Python.framework/Versions/2.7/bin/python],
  3.1:
  [/Library/Frameworks/Python.framework/Versions/3.1/bin/python3],
  3.2:
  [/Library/Frameworks/Python.framework/Versions/3.2/bin/python3],
  3.3:
  [/Library/Frameworks/Python.framework/Versions/3.3/bin/python3],
  }
 
  So for example I can easily create the 2.6 binary if that Python is
  pre-installed on the Mac box that I am using.
  On one of the Mac boxes that I am using, the 2.7 is missing, so are
  3.1, 3.2 and 3.3. So I was thinking
  of updating my Fabric fab file to automatically install all Pythons
  from source and build against that, just like I do for Wine.
 
  Which exact Python do we need to use on Mac? Do we need to use the
  binary installer from python.org?
 
 
  Yes, the one from python.org.
 
 
  Or can I install it from source? Finally, for which Python versions
  should we provide binary installers for Mac?
  For reference, the 1.6.2 had installers for 2.5, 2.6 and 2.7 only for
  OS X 10.3. There is only 2.7 version for OS X 10.6.
 
 
  The provided installers and naming scheme should match what's done for
  Python itself on python.org.
 
  The 10.3 installers for 2.5, 2.6 and 2.7 should be compiled on OS X
  10.5.
  This is kind of hard to come by these days, but Vincent Davis maintains
  a
  build machine for numpy and scipy. That's already set up correctly, so
  all
  you have to do is connect to it via ssh, check out v.17.0 in
  ~/Code/numpy,
  check in release.sh that the section for OS X 10.6 is disabled and for
  10.5
  enabled and run it.
 
  OS X 10.6 broke support for previous versions in some subtle ways, so
  even
  when using the 10.4 SDK numpy compiled on 10.6 won't run on 10.5. As
  long as
  we're supporting 10.5 you therefore need to compile on it.
 
  The 10.7 -- 10.6 support hasn't been checked, but I wouldn't trust it.
  I
  have a 10.6 machine, so I can compile those binaries if needed.
 
 
  Also, what is the meaning of the following piece of code in
  pavement.py:
 
  def _build_mpkg(pyver):
  # account for differences between Python 2.7.1 versions from
  python.org
  if os.environ.get('MACOSX_DEPLOYMENT_TARGET', None) == 10.6:
  ldflags = -undefined dynamic_lookup -bundle -arch i386 -arch
  x86_64 -Wl,-search_paths_first
  else:
  ldflags = -undefined dynamic_lookup -bundle -arch i386 -arch
  ppc -Wl,-search_paths_first
  ldflags +=  -L%s % os.path.join(os.path.dirname(__file__),
  build)
 
 
  The 10.6 binaries support only Intel Macs, both 32-bit and 64-bit. The
  10.3
  binaries support PPC Macs and 32-bit Intel. That's what the above does.
  Note
  that we simply follow the choice made by the Python release managers
  here.
 
 
  if pyver == 2.5:
  sh(CC=gcc-4.0 LDFLAGS='%s' %s setupegg.py bdist_mpkg %
  (ldflags,  .join(MPKG_PYTHON[pyver])))
  else:
  sh(LDFLAGS='%s' %s setupegg.py bdist_mpkg % (ldflags, 
  .join(MPKG_PYTHON[pyver])))
 
 
  This is necessary because in Python 2.5, distutils asks for gcc
  instead of
  gcc-4.0, so you may get the wrong one without CC=gcc-4.0. From Python
  2.6
  on this was fixed.
 
 
  In particular, the last line gets executed and it then fails with:
 
  paver dmg -p 2.6
  --- pavement.dmg
  --- pavement.clean
  LDFLAGS='-undefined dynamic_lookup -bundle -arch i386 -arch ppc
  -Wl,-search_paths_first -Lbuild'
  /Library/Frameworks/Python.framework/Versions/2.6/bin/python
  setupegg.py bdist_mpkg
  Traceback (most recent call last):
File setupegg.py, line 17, in module
  from setuptools import setup
  ImportError: No module named setuptools
 
 
  The reason is (I think) that if the Python binary is called explicitly
  with /Library/Frameworks/Python.framework/Versions/2.6/bin/python,
  then the paths are not setup properly in virtualenv, and thus
  setuptools (which is only installed in virtualenv, but not in system
  Python) fails to import. The solution is to simply apply this patch:
 
 
  Avoid using system Python for anything. The first thing to do on any new
  OS
  X system is install Python some other way, preferably from python.org.
 
 
  diff --git a/pavement.py b/pavement.py
  index e693016..0c637f8 100644
  --- a/pavement.py
  +++ b/pavement.py
  @@ -449,7 +449,7 @@ def _build_mpkg(pyver):
   if pyver == 2.5:
   

Re: [Numpy-discussion] Which Python to use for Mac binaries

2013-01-08 Thread Chris Barker - NOAA Federal
On Mon, Jan 7, 2013 at 10:23 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote:
 http://www.commandlinefu.com/commands/view/2031/install-an-mpkg-from-the-command-line-on-osx

 This requires root access. Without sudo, I get:

 $ installer -pkg /Volumes/Python\ 2.7.3/Python.mpkg/ -target ondrej
 installer: This package requires authentication to install.

 and since I don't have root access, it doesn't work.

 So one way around it would be to install python from source, that
 shouldn't require root access.

hmm -- this all may be a trick -- both the *.mpkg and the standard
build put everything in /Library/Frameworks/Python -- which is where
it belongs. Bu tif you need root access to write there, then there is
a problem. I'm sure a non-root build could put everything in the
users' home directory, then packages built against that would have
their paths messed up.

What's odd is that I'm pretty sure I've been able to point+click
install those without sudo...(I could recall incorrectly).

This would be a good question for the pythonmac list -- low traffic,
but there are some very smart and helpful folks there:

http://mail.python.org/mailman/listinfo/pythonmac-sig


 But I am not currently sure what to do with it. The Python.mpkg
 directory seems to contain the sources.

It should be possible to unpack a mpkg by hand, but it contains both
the contents, and various instal scripts, so that seems like a really
ugly solution.



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Matthew Brett
Hi,

On Mon, Jan 7, 2013 at 10:58 PM, Andrew Collette
andrew.colle...@gmail.com wrote:
 Hi,

 Taking 2) first, in this example:

 return self.f[dataset_name][...] + heightmap

 assuming it is not going to upcast, would you rather it overflow than
 raise an error?  Why?  The second seems more explicit and sensible to
 me.

 Yes, I think this (the 1.5 overflow behavior) was a bit odd, if easy
 to understand.

 For 1) - of course the upcasting in 1.6 is only going to work some of
 the time.   For example:

 In [2]: np.array([127], dtype=np.int8) * 1000
 Out[2]: array([-4072], dtype=int16)

 So - you'll get something, but there's a reasonable chance you won't
 get what you were expecting.  Of course that is true for 1.5 as well,
 but at least the rule there is simpler and so easier - in my opinion -
 to think about.

 Part of what my first example was trying to demonstrate was that the
 function author assumed arrays and scalars obeyed the same rules for
 addition.

 For example, if data were int8 and heightmap were an int16 array with
 a max value of 32767, and the data had a max value in the same spot
 with e.g. 10, then the addition would overflow at that position, even
 with the int16 result.  That's how array addition works in numpy, and
 as I understand it that's not slated to change.

 But when we have a scalar of value 32767 (which fits in int16 but not
 int8), we are proposing instead to do nothing under the assumption
 that it's an error.

 In summary: yes, there are some odd results, but they're consistent
 with the rules for addition elsewhere in numpy, and I would prefer
 that to treating this case as an error.

I think you are voting strongly for the current casting rules, because
they make it less obvious to the user that scalars are different from
arrays.

Returning to the question of 1.5 behavior vs the error - I think you
are saying you prefer the 1.5 silent-but-deadly approach to the error,
but I think I still haven't grasped why.  Maybe someone else can
explain it?  The holiday has not been good to my brain.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Andrew Collette
Hi,

 I think you are voting strongly for the current casting rules, because
 they make it less obvious to the user that scalars are different from
 arrays.

Maybe this is the source of my confusion... why should scalars be
different from arrays?  They should follow the same rules, as closely
as possible.  If a scalar value would fit in an int16, why not add it
using the rules for an int16 array?

 Returning to the question of 1.5 behavior vs the error - I think you
 are saying you prefer the 1.5 silent-but-deadly approach to the error,
 but I think I still haven't grasped why.  Maybe someone else can
 explain it?  The holiday has not been good to my brain.

In a strict choice between 1.5-behavior and errors, I'm not sure which
one I would pick.  I don't think either is particularly useful.  Of
course, other members of the community would likely have a different
view, especially those who got used to the 1.5 behavior.

Andrew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Linear least squares

2013-01-08 Thread Till Stensitz
Hi,
i did some profiling and testing of my data-fitting code. 
One of its core parts is doing some linear least squares, 
until now i used np.linalg.lstsq. Most of time the size
a is (250, 7) and of b is (250, 800).

Today i compared it to using pinv manually, 
to my surprise, it is much faster. I taught,
both are svd based? Too check another computer
i also run my test on wakari:

https://www.wakari.io/nb/tillsten/linear_least_squares

Also using scipy.linalg instead of np.linalg is 
slower for both cases. My numpy and scipy
are both from C. Gohlkes website. If my result
is valid in general, maybe the lstsq function
should be changed or a hint should be added
to the documentation.

greetings
Till



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Olivier Delalleau
2013/1/8 Andrew Collette andrew.colle...@gmail.com:
 Hi,

 I think you are voting strongly for the current casting rules, because
 they make it less obvious to the user that scalars are different from
 arrays.

 Maybe this is the source of my confusion... why should scalars be
 different from arrays?  They should follow the same rules, as closely
 as possible.  If a scalar value would fit in an int16, why not add it
 using the rules for an int16 array?

As I mentioned in another post, I also agree that it would make things
simpler and safer to just yield the same result as if we were using a
one-element array.

My understanding of the motivation for the rule scalars do not upcast
arrays unless they are of a fundamentally different type is that it
avoids accidentally upcasting arrays in operations like x + 1 (for
instance if x is a float32 array, the upcast would yield a float64
result, and if x is an int16, it would yield int64), which may waste
memory. I find it a useful feature, however I'm not sure it's worth
the headaches it can lead to.

However, my first reaction at the idea of dropping this rule
altogether is that it would lead to a long and painful deprecation
process. I may be wrong though, I really haven't thought about it
much.

-=- Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Alan G Isaac
On 1/8/2013 1:48 PM, Olivier Delalleau wrote:
 As I mentioned in another post, I also agree that it would make things
 simpler and safer to just yield the same result as if we were using a
 one-element array.


Yes!
Anything else is going to drive people insane,
especially new users.

Alan Isaac
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Nathaniel Smith
On 8 Jan 2013 17:24, Andrew Collette andrew.colle...@gmail.com wrote:

 Hi,

  I think you are voting strongly for the current casting rules, because
  they make it less obvious to the user that scalars are different from
  arrays.

 Maybe this is the source of my confusion... why should scalars be
 different from arrays?  They should follow the same rules, as closely
 as possible.  If a scalar value would fit in an int16, why not add it
 using the rules for an int16 array?

The problem is that rule for arrays - and for every other party of
numpy in general - are that we *don't* pick types based on values.
Numpy always uses input types to determine output types, not input
values.

# This value fits in an int8
In [5]: a = np.array([1])

# And yet...
In [6]: a.dtype
Out[6]: dtype('int64')

In [7]: small = np.array([1], dtype=np.int8)

# Computing 1 + 1 doesn't need a large integer... but we use one
In [8]: (small + a).dtype
Out[8]: dtype('int64')

Python scalars have an unambiguous types: a Python 'int' is a C
'long', and a Python 'float' is a C 'double'. And these are the types
that np.array() converts them to. So it's pretty unambiguous that
using the same rules for arrays and scalars would mean, ignore the
value of the scalar, and in expressions like
  np.array([1], dtype=np.int8) + 1
we should always upcast to int32/int64. The problem is that this makes
working with narrow types very awkward for no real benefit, so
everyone pretty much seems to want *some* kind of special case. These
are both absolutely special cases:

numarray through 1.5: in a binary operation, if one operand has
ndim==0 and the other has ndim0, ignore the width of the ndim==0
operand.

1.6, your proposal: in a binary operation, if one operand has ndim==0
and the other has ndim0, downcast the ndim==0 item to the smallest
width that is consistent with its value and the other operand's type.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Nathaniel Smith
On Tue, Jan 8, 2013 at 7:28 PM, Alan G Isaac alan.is...@gmail.com wrote:
 On 1/8/2013 1:48 PM, Olivier Delalleau wrote:
 As I mentioned in another post, I also agree that it would make things
 simpler and safer to just yield the same result as if we were using a
 one-element array.

 Yes!
 Anything else is going to drive people insane,
 especially new users.

New users don't use narrow-width dtypes... it's important to remember
in this discussion that in numpy, non-standard dtypes only arise when
users explicitly request them, so there's some expressed intention
there that we want to try and respect. (As opposed to the type
associated with Python manifest constants like the 2 in 2 * a,
which probably no programmer looked at and thought hmm, what I want
here is 2-as-an-int64.)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Andrew Collette
Hi Nathaniel,

(Responding to both your emails)

 The problem is that rule for arrays - and for every other party of
 numpy in general - are that we *don't* pick types based on values.
 Numpy always uses input types to determine output types, not input
 values.

Yes, of course... array operations are governed exclusively by their
dtypes.  It seems to me that, using the language of the bug report
(2878), if we have this:

result = arr + scalar

I would argue that our job is, rather than to pick result.dtype, to
pick scalar.dtype, and apply the normal rules for array operations.

 So it's pretty unambiguous that
 using the same rules for arrays and scalars would mean, ignore the
 value of the scalar, and in expressions like
   np.array([1], dtype=np.int8) + 1
 we should always upcast to int32/int64.

Ah, but that's my point: we already, in 1.6, ignore the intrinsic
width of the scalar and effectively substitute one based on it's
value:

 a = np.array([1], dtype=int8)
 (a + 1).dtype
dtype('int8')
 (a + 1000).dtype
dtype('int16')
 (a + 9).dtype
dtype('int32')
 (a + 2**40).dtype
dtype('int64')

 1.6, your proposal: in a binary operation, if one operand has ndim==0
 and the other has ndim0, downcast the ndim==0 item to the smallest
 width that is consistent with its value and the other operand's type.

Yes, exactly.  I'm not trying to propose a completely new behavior: as
I mentioned (although very far upthread), this is the mental model I
had of how things worked in 1.6 already.

 New users don't use narrow-width dtypes... it's important to remember
 in this discussion that in numpy, non-standard dtypes only arise when
 users explicitly request them, so there's some expressed intention
 there that we want to try and respect.

I would respectfully disagree.  One example I cited was that when
dealing with HDF5, it's very common to get int16's (and even int8's)
when reading from a file because they are used to save disk space.
All a new user has to do to get int8's from a file they got from
someone else is:

 data = some_hdf5_file['MyDataset'][...]

This is a general issue applying to data which is read from real-world
external sources.  For example, digitizers routinely represent their
samples as int8's or int16's, and you apply a scale and offset to get
a reading in volts.

As you say, the proposed change will prevent accidental upcasting by
people who selected int8/int16 on purpose to save memory, by notifying
them with a ValueError.  But another assumption we could make is that
people who choose to use narrow types for performance reasons should
be expected to use caution when performing operations that might
upcast, and that the default behavior should be to follow the normal
array rules as closely as possible, as is done in 1.6.

Andrew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Sebastian Berg
On Tue, 2013-01-08 at 19:59 +, Nathaniel Smith wrote:
 On 8 Jan 2013 17:24, Andrew Collette andrew.colle...@gmail.com wrote:
 
  Hi,
 
   I think you are voting strongly for the current casting rules, because
   they make it less obvious to the user that scalars are different from
   arrays.
 
  Maybe this is the source of my confusion... why should scalars be
  different from arrays?  They should follow the same rules, as closely
  as possible.  If a scalar value would fit in an int16, why not add it
  using the rules for an int16 array?
 
 The problem is that rule for arrays - and for every other party of
 numpy in general - are that we *don't* pick types based on values.
 Numpy always uses input types to determine output types, not input
 values.
 
 # This value fits in an int8
 In [5]: a = np.array([1])
 
 # And yet...
 In [6]: a.dtype
 Out[6]: dtype('int64')
 
 In [7]: small = np.array([1], dtype=np.int8)
 
 # Computing 1 + 1 doesn't need a large integer... but we use one
 In [8]: (small + a).dtype
 Out[8]: dtype('int64')
 
 Python scalars have an unambiguous types: a Python 'int' is a C
 'long', and a Python 'float' is a C 'double'. And these are the types
 that np.array() converts them to. So it's pretty unambiguous that
 using the same rules for arrays and scalars would mean, ignore the
 value of the scalar, and in expressions like
   np.array([1], dtype=np.int8) + 1
 we should always upcast to int32/int64. The problem is that this makes
 working with narrow types very awkward for no real benefit, so
 everyone pretty much seems to want *some* kind of special case. These
 are both absolutely special cases:
 
 numarray through 1.5: in a binary operation, if one operand has
 ndim==0 and the other has ndim0, ignore the width of the ndim==0
 operand.
 
 1.6, your proposal: in a binary operation, if one operand has ndim==0
 and the other has ndim0, downcast the ndim==0 item to the smallest
 width that is consistent with its value and the other operand's type.
 

Well, that leaves the maybe not quite implausible proposal of saying
that numpy scalars behave like arrays with ndim0, but python scalars
behave like they do in 1.6. to allow for easier working with narrow
types.

Sebastian

 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Chris Barker - NOAA Federal
On Tue, Jan 8, 2013 at 12:43 PM, Alan G Isaac alan.is...@gmail.com wrote:
 New users don't use narrow-width dtypes... it's important to remember

 1. I think the first statement is wrong.
 Control over dtypes is a good reason for
 a new use to consider NumPy.

Absolutely.

 Because NumPy supports broadcasting,
 it is natural for array-array operations and
 scalar-array operations to be consistent.
 I believe anything else will be too confusing.

Theoretically true -- but in practice, the problem arrises because it
is easy to write literals with the standard python scalars, so one is
very likely to want to do:

arr = np.zeros((m,n), dtype=np.uint8)
arr += 3

and not want an upcast.

I don't think we want to require that to be spelled:

arr += np.array(3, dtype=np.uint8)

so that defines desired behaviour for array-scalar.

but what should this do?

arr1 = np.zeros((m,n), dtype=np.uint8)
arr2 = np.zeros((m,n), dtype=np.uint16)

arr1 + arr2
  or
arr2 + arr1

upcast in both cases?
use the type of the left operand?
raise an exception?

matching the array- scalar approach would mean always keeping the
smallest type, which is unlikely to be what is wanted.

Having it be dependent on order would be really ripe fro confusion.

raising an exception might have been the best idea from the beginning.
(though I wouldn't want that in the array- scalar case).

So perhaps having a scalar array distinction, while quite impure, is
the best compromise.

NOTE: no matter how you slice it, at some point reducing operations
produce something different (that can no longer be reduced), so I do
think it would be nice for rank-zero arrays and scalars to be the same
thing (in this regard and others)

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Olivier Delalleau
2013/1/8 Sebastian Berg sebast...@sipsolutions.net:
 On Tue, 2013-01-08 at 19:59 +, Nathaniel Smith wrote:
 On 8 Jan 2013 17:24, Andrew Collette andrew.colle...@gmail.com wrote:
 
  Hi,
 
   I think you are voting strongly for the current casting rules, because
   they make it less obvious to the user that scalars are different from
   arrays.
 
  Maybe this is the source of my confusion... why should scalars be
  different from arrays?  They should follow the same rules, as closely
  as possible.  If a scalar value would fit in an int16, why not add it
  using the rules for an int16 array?

 The problem is that rule for arrays - and for every other party of
 numpy in general - are that we *don't* pick types based on values.
 Numpy always uses input types to determine output types, not input
 values.

 # This value fits in an int8
 In [5]: a = np.array([1])

 # And yet...
 In [6]: a.dtype
 Out[6]: dtype('int64')

 In [7]: small = np.array([1], dtype=np.int8)

 # Computing 1 + 1 doesn't need a large integer... but we use one
 In [8]: (small + a).dtype
 Out[8]: dtype('int64')

 Python scalars have an unambiguous types: a Python 'int' is a C
 'long', and a Python 'float' is a C 'double'. And these are the types
 that np.array() converts them to. So it's pretty unambiguous that
 using the same rules for arrays and scalars would mean, ignore the
 value of the scalar, and in expressions like
   np.array([1], dtype=np.int8) + 1
 we should always upcast to int32/int64. The problem is that this makes
 working with narrow types very awkward for no real benefit, so
 everyone pretty much seems to want *some* kind of special case. These
 are both absolutely special cases:

 numarray through 1.5: in a binary operation, if one operand has
 ndim==0 and the other has ndim0, ignore the width of the ndim==0
 operand.

 1.6, your proposal: in a binary operation, if one operand has ndim==0
 and the other has ndim0, downcast the ndim==0 item to the smallest
 width that is consistent with its value and the other operand's type.


 Well, that leaves the maybe not quite implausible proposal of saying
 that numpy scalars behave like arrays with ndim0, but python scalars
 behave like they do in 1.6. to allow for easier working with narrow
 types.

I know I already said it, but I really think it'd be a bad idea to
have a different behavior between Python scalars and Numpy scalars,
because I think most people would expect them to behave the same (when
knowing what dtype is a Python float / int). It could lead to very
tricky bugs to handle them differently.

-=- Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Olivier Delalleau
2013/1/8 Chris Barker - NOAA Federal chris.bar...@noaa.gov:
 On Tue, Jan 8, 2013 at 12:43 PM, Alan G Isaac alan.is...@gmail.com wrote:
 New users don't use narrow-width dtypes... it's important to remember

 1. I think the first statement is wrong.
 Control over dtypes is a good reason for
 a new use to consider NumPy.

 Absolutely.

 Because NumPy supports broadcasting,
 it is natural for array-array operations and
 scalar-array operations to be consistent.
 I believe anything else will be too confusing.

 Theoretically true -- but in practice, the problem arrises because it
 is easy to write literals with the standard python scalars, so one is
 very likely to want to do:

 arr = np.zeros((m,n), dtype=np.uint8)
 arr += 3

 and not want an upcast.

Note that the behavior with in-place operations is also an interesting
topic, but slightly different, since there is no ambiguity on the
dtype of the output (which is required to match that of the input). I
was actually thinking about this earlier today but decided not to
mention it yet to avoid making the discussion even more complex ;)

The key question is whether the operand should be cast before the
operation, or whether to perform the operation in an upcasted array,
then downcast it back into the original version. I actually thnk the
latter makes more sense (and that's actually what's being done I think
in 1.6.1 from a few tests I tried), and to me this is an argument in
favor of the upcast behavior for non-inplace operations.

-=- Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Dag Sverre Seljebotn
On 01/08/2013 06:20 PM, Andrew Collette wrote:
 Hi,

 I think you are voting strongly for the current casting rules, because
 they make it less obvious to the user that scalars are different from
 arrays.

 Maybe this is the source of my confusion... why should scalars be
 different from arrays?  They should follow the same rules, as closely

Scalars (as in, Python float/int) are inherently different because the 
user didn't specify a dtype.

For an array, there was always *some* point where the user chose, 
explicitly or implicitly, a dtype.

 as possible.  If a scalar value would fit in an int16, why not add it
 using the rules for an int16 array?

So you are saying that, for an array x, you want

x + random.randint(10)

to produce an array with a random dtype?

So that after carefully testing that your code works, suddenly a 
different draw (or user input, or whatever) causes a different set of 
dtypes to ripple through your entire program?

To me this is something that must be avoided at all costs. It's hard 
enough to reason about the code one writes without throwing in complete 
randomness (by which I mean, types determined by values).

Dag Sverre





 Returning to the question of 1.5 behavior vs the error - I think you
 are saying you prefer the 1.5 silent-but-deadly approach to the error,
 but I think I still haven't grasped why.  Maybe someone else can
 explain it?  The holiday has not been good to my brain.

 In a strict choice between 1.5-behavior and errors, I'm not sure which
 one I would pick.  I don't think either is particularly useful.  Of
 course, other members of the community would likely have a different
 view, especially those who got used to the 1.5 behavior.

 Andrew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Dag Sverre Seljebotn
On 01/08/2013 10:32 PM, Dag Sverre Seljebotn wrote:
 On 01/08/2013 06:20 PM, Andrew Collette wrote:
 Hi,

 I think you are voting strongly for the current casting rules, because
 they make it less obvious to the user that scalars are different from
 arrays.

 Maybe this is the source of my confusion... why should scalars be
 different from arrays?  They should follow the same rules, as closely

 Scalars (as in, Python float/int) are inherently different because the
 user didn't specify a dtype.

 For an array, there was always *some* point where the user chose,
 explicitly or implicitly, a dtype.

 as possible.  If a scalar value would fit in an int16, why not add it
 using the rules for an int16 array?

 So you are saying that, for an array x, you want

 x + random.randint(10)

 to produce an array with a random dtype?

 So that after carefully testing that your code works, suddenly a
 different draw (or user input, or whatever) causes a different set of
 dtypes to ripple through your entire program?

 To me this is something that must be avoided at all costs. It's hard
 enough to reason about the code one writes without throwing in complete
 randomness (by which I mean, types determined by values).

Oh, sorry, given that this is indeed the present behaviour, this just 
sounds silly. I should have said it's something I dislike about the 
present behaviour then.

Dag Sverre


 Dag Sverre





 Returning to the question of 1.5 behavior vs the error - I think you
 are saying you prefer the 1.5 silent-but-deadly approach to the error,
 but I think I still haven't grasped why.  Maybe someone else can
 explain it?  The holiday has not been good to my brain.

 In a strict choice between 1.5-behavior and errors, I'm not sure which
 one I would pick.  I don't think either is particularly useful.  Of
 course, other members of the community would likely have a different
 view, especially those who got used to the 1.5 behavior.

 Andrew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Andrew Collette
Hi Dag,

 So you are saying that, for an array x, you want

 x + random.randint(10)

 to produce an array with a random dtype?

Under the proposed behavior, depending on the dtype of x and the value
from random, this would sometimes add-with-rollover and sometimes
raise ValueError.

Under the 1.5 behavior, it would always add-with-rollover and preserve
the type of x.

Under the 1.6 behavior, it produces a range of dtypes, each of which
is at least large enough to hold the random int.

Personally, I prefer the third option, but I strongly prefer either
the second or the third to the first.

Andrew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linear least squares

2013-01-08 Thread josef . pktd
On Tue, Jan 8, 2013 at 1:17 PM, Till Stensitz mail.t...@gmx.de wrote:
 Hi,
 i did some profiling and testing of my data-fitting code.
 One of its core parts is doing some linear least squares,
 until now i used np.linalg.lstsq. Most of time the size
 a is (250, 7) and of b is (250, 800).

My guess is that this depends a lot on the shape

try a is (1, 7) and b is (1, 1)

Josef



 Today i compared it to using pinv manually,
 to my surprise, it is much faster. I taught,
 both are svd based? Too check another computer
 i also run my test on wakari:

 https://www.wakari.io/nb/tillsten/linear_least_squares

 Also using scipy.linalg instead of np.linalg is
 slower for both cases. My numpy and scipy
 are both from C. Gohlkes website. If my result
 is valid in general, maybe the lstsq function
 should be changed or a hint should be added
 to the documentation.

 greetings
 Till



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Olivier Delalleau
Le mardi 8 janvier 2013, Andrew Collette a écrit :

 Hi Dag,

  So you are saying that, for an array x, you want
 
  x + random.randint(10)
 
  to produce an array with a random dtype?

 Under the proposed behavior, depending on the dtype of x and the value
 from random, this would sometimes add-with-rollover and sometimes
 raise ValueError.

 Under the 1.5 behavior, it would always add-with-rollover and preserve
 the type of x.

 Under the 1.6 behavior, it produces a range of dtypes, each of which
 is at least large enough to hold the random int.

 Personally, I prefer the third option, but I strongly prefer either
 the second or the third to the first.

 Andrew


Keep in mind that in the third option (current 1.6 behavior) the dtype is
large enough to hold the random number, but not necessarily to hold the
result. So for instance if x is an int16 array with only positive values,
the result of this addition may contain negative values (or not, depending
on the number being drawn). That's the part I feel is flawed with this
behavior, it is quite unpredictable.

-=- Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Andrew Collette
Hi,

 Keep in mind that in the third option (current 1.6 behavior) the dtype is
 large enough to hold the random number, but not necessarily to hold the
 result. So for instance if x is an int16 array with only positive values,
 the result of this addition may contain negative values (or not, depending
 on the number being drawn). That's the part I feel is flawed with this
 behavior, it is quite unpredictable.

Yes, certainly.  But in either the proposed or 1.5 behavior, if the
values in x are close to the limits of the type, this can happen also.

Andrew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Olivier Delalleau
Le mardi 8 janvier 2013, Andrew Collette a écrit :

 Hi,

  Keep in mind that in the third option (current 1.6 behavior) the dtype is
  large enough to hold the random number, but not necessarily to hold the
  result. So for instance if x is an int16 array with only positive values,
  the result of this addition may contain negative values (or not,
 depending
  on the number being drawn). That's the part I feel is flawed with this
  behavior, it is quite unpredictable.

 Yes, certainly.  But in either the proposed or 1.5 behavior, if the
 values in x are close to the limits of the type, this can happen also.


My previous email may not have been clear enough, so to be sure: in my
above example, if the random number is 3, then the result may
contain negative
values (int16). If the random number is 5, then the result will only
contain positive values (upcast to int32). Do you believe it is a good
behavior?

-=- Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linear least squares

2013-01-08 Thread Nathaniel Smith
On Tue, Jan 8, 2013 at 6:17 PM, Till Stensitz mail.t...@gmx.de wrote:
 Hi,
 i did some profiling and testing of my data-fitting code.
 One of its core parts is doing some linear least squares,
 until now i used np.linalg.lstsq. Most of time the size
 a is (250, 7) and of b is (250, 800).

 Today i compared it to using pinv manually,
 to my surprise, it is much faster. I taught,
 both are svd based?

np.linalg.lstsq is written in Python (calling LAPACK for the SVD), so
you could run the line_profiler over it and see where the slowdown is.

An obvious thing is that it always computes residuals, which could be
costly; if your pinv code isn't doing that then it's not really
comparable. (Though might still be well-suited for your actual
problem.)

Depending on how well-conditioned your problems are, and how much
speed you need, there are faster ways than pinv as well. (Going via qr
might or might not, going via cholesky almost certainly will be.)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

2013-01-08 Thread Andrew Collette
Hi Olivier,

 Yes, certainly.  But in either the proposed or 1.5 behavior, if the
 values in x are close to the limits of the type, this can happen also.


 My previous email may not have been clear enough, so to be sure: in my above
 example, if the random number is 3, then the result may contain negative
 values (int16). If the random number is 5, then the result will only
 contain positive values (upcast to int32). Do you believe it is a good
 behavior?

Under the proposed behavior, if the random number is 3, then you
*still* may have negative values, and if it's 5, you get
ValueError.  No, I don't think the behavior you outlined is
particularly nice, but (1) it's consistent with array addition
elsewhere, at least in my mind, and (2) I don't think that sometimes
getting a ValueError is a big improvement.

Although I still prefer automatic upcasting, this discussion is really
making me see the value of a nice, simple rule like in 1.5. :)

Andrew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linear least squares

2013-01-08 Thread Charles R Harris
On Tue, Jan 8, 2013 at 11:17 AM, Till Stensitz mail.t...@gmx.de wrote:

 Hi,
 i did some profiling and testing of my data-fitting code.
 One of its core parts is doing some linear least squares,
 until now i used np.linalg.lstsq. Most of time the size
 a is (250, 7) and of b is (250, 800).

 Today i compared it to using pinv manually,
 to my surprise, it is much faster. I taught,
 both are svd based? Too check another computer
 i also run my test on wakari:

 https://www.wakari.io/nb/tillsten/linear_least_squares

 Also using scipy.linalg instead of np.linalg is
 slower for both cases. My numpy and scipy
 are both from C. Gohlkes website. If my result
 is valid in general, maybe the lstsq function
 should be changed or a hint should be added
 to the documentation.


Do you know if both are using Atlas (MKL)? Numpy will compile a default
unoptimized version if there is no Atlas (or MKL). Also, lstsq is a direct
call to an LAPACK least squares function, so the underlying functions
themselves are probably different for lstsq and pinv.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug with ufuncs made with frompyfunc

2013-01-08 Thread OKB (not okblacke)
A bug causing errors with using methods of ufuncs created with 
frompyfunc was mentioned on the list over a year ago: 
http://mail.scipy.org/pipermail/numpy-discussion/2011-
September/058501.html

Is there any word on the status of this bug?  I wasn't able to find 
a ticket in the bug tracker.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion