from:"Skipper Seabold"

Re: [Numpy-discussion] Is there a pure numpy recipe for this?

2014-03-26 Thread Skipper Seabold

On Wed, Mar 26, 2014 at 4:28 PM, Slaunger  wrote:
> jseabold wrote
>> IIUC,
>>
>> [~/]
>> [1]: np.logical_and([True, False, True], [False, False, True])
>> [1]: array([False, False,  True], dtype=bool)
>>
>> You can avoid looping over k since they're all the same length
>>
>> [~/]
>> [3]: np.logical_and([[True, False],[False, True],[False, True]],
>> [[False, False], [False, True], [True, True]])
>> [3]:
>> array([[False, False],
>>[False,  True],
>>[False,  True]], dtype=bool)
>>
>> [~/]
>> [4]: np.sum(np.logical_and([[True, False],[False, True],[False,
>> True]], [[False, False], [False, True], [True, True]]), axis=0)
>> [4]: array([0, 2])
>
> Well, yes, if you work with the pure f_k and g_k that is true, but this
> two-dimensional array will have 4*10^14 elements and will exhaust my memory.
>
> That is why I have found a more efficient method for finding only the much
> fewer changes_at elements for each k, and these arrays have unequal length,
> and has to be considered for eack k (which is tolerable as long as I avoid a
> further inner loop for each k in explicit Python).
>
> I could implement this in C and get it done sufficiently efficient. I just
> like to make a point in demonstrating this is also doable in finite time in
> Python/numpy.
>

If you want to attack it straight on and keep it conceptually simple,
this looks like it would work. Fair warning, I've never done this and
have no idea if it's actually memory and computationally efficient, so
I'd be interested to hear from experts. I just wanted to see if it
would work from disk. I wonder if a solution using PyTables would be
faster.

Provided that you can chunk your data into a memmap array, then
something you *could* do

N = 2*10**7
chunk_size = 10

farr1 = 'scratch/arr1'
farr2 = 'scratch/arr2'

arr1 = np.memmap(farr1, dtype='uint8', mode='w+', shape=(N, 4))
arr2 = np.memmap(farr2, dtype='uint8', mode='w+', shape=(N, 4))

for i in xrange(0, N, chunk_size):
arr1[i:i+chunk_size] = np.random.randint(2, size=(chunk_size,
4)).astype(np.uint8)
arr2[i:i+chunk_size] = np.random.randint(2, size=(chunk_size,
4)).astype(np.uint8)

del arr1
del arr2

arr1 = np.memmap(farr1, mode='r', dtype='uint8', shape=(N,4))
arr2 = np.memmap(farr2, mode='r', dtype='uint8', shape=(N,4))


equal = np.logical_and(arr1[:chunk_size],
   arr2[:chunk_size]).sum(0)

for i in xrange(chunk_size, N, chunk_size):
equal += np.logical_and(arr1[i:i+chunk_size],
arr2[i:i+chunk_size]).sum(0)

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Is there a pure numpy recipe for this?

2014-03-26 Thread Skipper Seabold

On Wed, Mar 26, 2014 at 3:48 PM, Slaunger  wrote:
> I am working on solving a recent recreational mathematical problem on
> Project Euler   . I have a solution, which works
> fine for small N up to 10^5 but it takes too long to compute for the actual
> problem, where N is of the order 2*10^7. The problem is nested loops, and I
> am hoping to avoid one level in the loop by using clever numpy magic (as I
> have done often before). However, there is one step, which I cannot figure
> out how to do using numpy operations alone, and I am hoping for some help
>
> The subproblem is that I have in principle k = 1, ..., N sets of boolean
> arrays
> f_k and g_k each of length N.
>
> For each k I need to find the number of elements i where both f_k[i] and
> g_k[i] are True and sum that up over all N values of k.

IIUC,

[~/]
[1]: np.logical_and([True, False, True], [False, False, True])
[1]: array([False, False,  True], dtype=bool)

You can avoid looping over k since they're all the same length

[~/]
[3]: np.logical_and([[True, False],[False, True],[False, True]],
[[False, False], [False, True], [True, True]])
[3]:
array([[False, False],
   [False,  True],
   [False,  True]], dtype=bool)

[~/]
[4]: np.sum(np.logical_and([[True, False],[False, True],[False,
True]], [[False, False], [False, True], [True, True]]), axis=0)
[4]: array([0, 2])


>
> A problem of the order 4*10^14 if I just do it brute force. This takes way
> too long (there is a one minute rule).
>
> However, after a lot of thinking and by using some properties of the f_k and
> g_k I have managed to construct using only pure numpy function and only a
> single loop over k, arrays
>
> f_k_changes_at
> g_k_changes_at
>
> which contain the indices i at which the functions change it boolean value
> from True to False or False to True.
>
> It so happens that the number of changes is only a small fraction of N, the
> fraction decreases with larger N, so the size of these changes_at arrays
> contains perhaps only 1000 elements instead of 1000 for each k, a
> significant reduction of complexity.
>
> Now, my problem is to figure out for how many values of i both f_k and g_k
> are True given the changes_at arrays.
>
> As this may be a little hard to understand here is a specific example of how
> these arrays can look like for k = 2 and N = 150
>
> f_2_changes_at = [  2   3  39  41  58  59  65  66  93 102 145]
> g_2_changes_at = [  2  94 101 146 149]
>
> with the boundary condition that f_2[0] = g_2[0] = False
>
> Which expands to
> if_2g_2   f_2 and g_2
> 0 F   F  F
> 1 F   F  F <-
> 2 T  T  T <-
> 3 F  T   F
> 4 F  T   F
> ...
> 38   F  T   F <-
> 39   T  T   T
> 40   T  T   T <-
> 41   F  T   F
> 42   F  T   F
> ...
> 57   F  T   F <-
> 58   T  T   T <-
> 59   F  T   F
> 60   F  T   F
> ...
> 64   F  T   F <-
> 65   T  T   T <-
> 66   F  T   F
> ...
> 92   F  T   F <-
> 93   T  T   T <-
> 94   T  F   F
> ...
> 100  T F   F <-
> 101  T T   T <-
> 102  F T   F
> ...
> 144  F T   F <-
> 145  T T   T <-
> 146  T F   F
> 147  T F   F
> 148  T F   F <-
> 149  T T   T <-
>
> With the sum of elements fulfilling the condition being (see arrows)
>
> (2 - 1) + (40 - 38) + (58 - 57) + (65 - 64) + (93 - 92) + (101 - 100) + (145
> - 144) + (149 - 148) =
> 1 + 2 + 1 + 1 + 1 + 1 + 1 + 1 = 9
>
> So, is there a numpy recipe for doing the equivalent process without
> expanding it into the full arrays?
>
> I have tried looping over each element in the changes_at arrays and build up
> the sums, but that is too inefficient as I then have an inner for loop
> containing conditional branching code
>
> Thanks in advance, Slaunger
>
>
>
> --
> View this message in context: 
> http://numpy-discussion.10968.n7.nabble.com/Is-there-a-pure-numpy-recipe-for-this-tp37077.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] 1.8.1 release

2014-03-06 Thread Skipper Seabold

Hi,

Should [1] be considered a release blocker for 1.8.1?

Skipper

[1] https://github.com/numpy/numpy/issues/4442
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] svd error checking vs. speed

2014-02-15 Thread Skipper Seabold

On Sat, Feb 15, 2014 at 5:08 PM,  wrote:

> On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg
>  wrote:
> > On Sa, 2014-02-15 at 16:37 -0500, alex wrote:
> >> Hello list,
> >>
> >> Here's another idea resurrection from numpy github comments that I've
> >> been advised could be posted here for re-discussion.
> >>
> >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd
> >> with respect to input checking.  The argument against the change is
> >> raw speed; if you know that you will never feed non-finite input to
> >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd.  An
> >> argument for the change could be to avoid issues reported on github
> >> like crashes, hangs, spurious non-convergence exceptions, etc. from
> >> the undefined behavior of svd of non-finite input.
> >>
> >
> > +1, unless this is a huge speed penalty, correctness (and decent error
> > messages) should come first in my opinion, this is python after all. If
> > this is a noticable speed difference, a kwarg may be an option (but
> > would think about that some more).
>
> maybe -1
>
> statsmodels is using np.linalg.pinv which uses svd
> I never ran heard of any crash (*), and the only time I compared with
> scipy I didn't like the slowdown.
> I didn't do any serious timings just a few examples.
>
> (*) not converged, ...
>
> pinv(x.T).dot(x) -> pinv(x.T, please_don_t_check=True).dot(y)
>
> numbers ?
>

FWIW, I see this spurious SVD did not converge warning very frequently with
ARMA when there is a nan that has creeped in. I usually know where to find
the problem, but I think it'd be nice if this error message was a little
better.

Skipper


>
> grep: we also use scipy.linalg.pinv in some cases
>
> Josef
>
>
> >
> > - Sebastian
> >
> >> """
> >> [...] the following numpy code hangs until I `kill -9` it.
> >>
> >> ```
> >> $ python runtests.py --shell
> >> $ python
> >> Python 2.7.5+
> >> [GCC 4.8.1] on linux2
> >> >>> import numpy as np
> >> >>> np.__version__
> >> '1.9.0.dev-e3f0f53'
> >> >>> A = np.array([[1e3, 0], [0, 1]])
> >> >>> B = np.array([[1e300, 0], [0, 1]])
> >> >>> C = np.array([[1e3000, 0], [0, 1]])
> >> >>> np.linalg.svd(A)
> >> (array([[ 1.,  0.],
> >>[ 0.,  1.]]), array([ 1000., 1.]), array([[ 1.,  0.],
> >>[ 0.,  1.]]))
> >> >>> np.linalg.svd(B)
> >> (array([[ 1.,  0.],
> >>[ 0.,  1.]]), array([  1.e+300,   1.e+000]),
> >> array([[ 1.,  0.],
> >>[ 0.,  1.]]))
> >> >>> np.linalg.svd(C)
> >> [hangs forever]
> >> ```
> >> """
> >>
> >> Alex
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-10 Thread Skipper Seabold

On Mon, Feb 10, 2014 at 2:49 PM, Matthew Brett wrote:

> Hi,
>
> On Mon, Feb 10, 2014 at 11:44 AM,   wrote:
> >
> >
> > On Mon, Feb 10, 2014 at 2:12 PM, eat  wrote:
> >>
> >>
> >>
> >>
> >> On Mon, Feb 10, 2014 at 9:08 PM, alex  wrote:
> >>>
> >>> On Mon, Feb 10, 2014 at 2:03 PM, eat  wrote:
> >>> > Rhetorical or not, but FWIW I'll prefer to take singular value
> >>> > decomposition
> >>> > (u, s, vt= svd(x)) and then based on the singular values s I'll
> >>> > estimate a
> >>> > "numerically feasible rank" r. Thus the diagonal of such hat matrix
> >>> > would be
> >>> > (u[:, :r]** 2).sum(1).
> >>>
> >>> It's a small detail but you probably want svd(x, full_matrices=False)
> >>> to avoid anything NxN.
> >>
> >> Indeed.
> >
> >
> > I meant the entire diagonal not the trace of the projection matrix.
> >
> > My (not articulated) thought was that I use element wise multiplication
> > together with dot products instead of the three dot products, however
> > elementwise algebra is not very common in linear algebra based textbooks.
> >
> > The question is whether students and new user coming from `matrix`
> languages
> > can translate formulas into code, or just copy formulas to code.
> > (It took me a while to get used to numpy and take advantage of it's
> features
> > coming from GAUSS and Matlab.)
> >
> > OT since the precense or absence of matrix in numpy doesn't affect me.
>
> Josef - as a data point - does statsmodels use np.matrix?
>
>
No.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Behavior of nan{max, min} and nanarg{max, min} for all-nan slices.

2013-10-03 Thread Skipper Seabold

On Thu, Oct 3, 2013 at 9:10 AM, Benjamin Root  wrote:
> On Thu, Oct 3, 2013 at 6:06 AM, Daniele Nicolodi  wrote:
>>
>> Hello,
>>
>> sorry, I don't know where exactly jump in in the thread, it is getting
>> quite long and articulated...
>>
>> On 02/10/2013 21:19, Charles R Harris wrote:
>> > The main problem I had was deciding what arg{max, min} should return as
>> > the return value is an integer. I like your suggestion of returning 0.
>>
>> What about returning -1? It is still an integer (on my numpy version the
>> return value is a signed integer), it still has the property that
>> a[np.argmin(a)] == nan for a of only nans, but it is easily identifiable
>> as an anomalous return value if needed.
>>
>
> This actually makes a lot of sense. We would never return -1 for any other
> reason. And if it is used for indexing anywhere else, that's (mostly) ok. A
> problem might occur if the indexes gathered from this function are then used
> to define slices. But I can't really convince myself that it would be all
> that terrible in that case, too. Documentation will be paramount here.

Please, no. It's another thing to remember and another way to shoot
yourself in the foot and introduce casual bugs.

FWIW, my vote is to raise an error or return a nan, which will likely
eventually raise an error. If I have all nans, it's usually the case
that something's off, and I'd like to know sooner rather than later.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Deprecation of financial routines

2013-08-19 Thread Skipper Seabold

On Mon, Aug 19, 2013 at 9:38 AM, Cera, Tim  wrote:

> On Mon, Aug 19, 2013 at 2:37 AM, Juan Luis Cano 
> wrote:
> >
> > As now master is open for 1.9, following the discussion opened here
> >
> > https://github.com/numpy/numpy/issues/2880
> >
> > it was suggested that we deprecate and eventually remove the financial
> > functions in NumPy, because they pollute the main namespace and some are
> > unimplemented. We could put them in a separate package, in case it
> > doesn't exist yet. Nathaniel Smith and Ralf Gommers already gave +1, and
> > Charles Harris suggested bringing this up in the mailing list.
>

+1 on scipy.finance / scipy.financial (or even numpy.finance /
numpy.financial)

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] the mean, var, std of non-arrays

2013-07-18 Thread Skipper Seabold

On Thu, Jul 18, 2013 at 10:49 PM, Yaroslav Halchenko
wrote:

> Hi everyone,
>
> Some of my elderly code stopped working upon upgrades of numpy and
> upcoming pandas: https://github.com/pydata/pandas/issues/4290 so I have
> looked at the code of
>
>   2481   def mean(a, axis=None, dtype=None, out=None, keepdims=False):
>   2482   """
>   ...
>   2489   Parameters
>   2490   --
>   2491   a : array_like
>   2492   Array containing numbers whose mean is desired. If `a` is
> not an
>   2493   array, a conversion is attempted.
>   ...
>   2555   """
>   2556   if type(a) is not mu.ndarray:
>   2557   try:
>   2558   mean = a.mean
>   2559   return mean(axis=axis, dtype=dtype, out=out)
>   2560   except AttributeError:
>   2561   pass
>   2562
>   2563   return _methods._mean(a, axis=axis, dtype=dtype,
>   2564   out=out, keepdims=keepdims)
>
> here 'array_like'ness is checked by a having mean function.  Then it is
> assumed
> that it has the same definition as ndarray, including dtype keyword
> argument.
>
> Not sure anyways if my direct numpy.mean application to pandas DataFrame is
> "kosher" -- initially I just assumed that any argument is asanyarray'ed
> first
> -- but I think here catching TypeError for those incompatible .mean's
> would not
> hurt either.  What do you think?  Similar logic applies to mean cousins
> (var,
> std, ...?) decorated around _methods implementations.


Related? From a while ago.

https://github.com/numpy/numpy/pull/160

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Unable to building numpy with openblas usingbento or distutils

2013-03-25 Thread Skipper Seabold

On Mon, Mar 25, 2013 at 2:40 PM, Dinesh B Vadhia
wrote:

> **
> Caveat: Not tested but it did look interesting:
> http://osdf.github.com/blog/numpyscipy-with-openblas-for-ubuntu-1204-second-try.html
> .
> Would be interested to know if it worked out as want to try out OpenBlas
> in the future.
>
>
Yes, this is one of the sources I used. I needed to change the c_check file
in openblas as described up thread, and I didn't like the
half-distutils/half-bento hack, but with Ake's patch to numpy's distutils,
and my site.cfg, this works as described for me (Kubuntu 12.10) using just
the usual setup.py.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils

2013-03-23 Thread Skipper Seabold

On Sat, Mar 23, 2013 at 8:44 PM, Skipper Seabold  wrote:
> On Sat, Mar 23, 2013 at 7:26 PM, Ake Sandgren  
> wrote:
>> On Sat, 2013-03-23 at 14:19 -0400, Skipper Seabold wrote:
>>> Some help on this would be greatly appreciated. It's been recommended
>>> to use OpenBlas over ATLAS, so I've been trying to build numpy with
>>> openblas and have run into a few problems.
>>
>>>
>>> To truly support OpenBlas, is it maybe necessary to make some
>>> additions to numpy/distutils/system_info.py?
>>
>> Here is how.
>>
>> https://github.com/akesandgren/numpy/commit/363339dd3a9826f3e3e7dc4248c258d3c4dfcd7c
>>
>
>
> Thanks that works well for numpy. Test pass. I hope that makes it into
> a pull request. My site.cfg looks like this. I don't know about the
> lapack_opt section. It doesn't seem to work.
>
> [DEFAULT]
> library_dirs = /home/skipper/.local/lib
> include_dirs = /home/skipper/.local/include
>
> [openblas]
> libraries = openblas
>
> [lapack_opt]
> libraries = openblas
>
> Do you have any idea how to get scipy working too. I have a similar
> site.cfg, but it does not find lapack, which is rolled into
> libopenblas from what I understand. I can do
>
> export LAPACK=~/.local/lib/libopenblas.a
> python setup.py build &> build.log
> sudo -E python setup.py install
>
> There are no obvious failures in the build.log, but scipy is still
> broken because it needs lapack from numpy I guess.

The answer is to

export BLAS=~/.local/lib/libopenblas.a
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/

before building and installing. Now everything works. Whew. Thanks a
lot for the help.

>
>>>> import numpy as np
>>>> np.show_config()
> lapack_info:
>   NOT AVAILABLE
> atlas_threads_info:
>   NOT AVAILABLE
> blas_opt_info:
> libraries = ['openblas', 'openblas']
> library_dirs = ['/home/skipper/.local/lib']
> language = f77
> lapack_src_info:
>   NOT AVAILABLE
> openblas_info:
> libraries = ['openblas', 'openblas']
> library_dirs = ['/home/skipper/.local/lib']
> language = f77
> lapack_opt_info:
>   NOT AVAILABLE
> atlas_info:
>   NOT AVAILABLE
> lapack_mkl_info:
>   NOT AVAILABLE
> blas_mkl_info:
>   NOT AVAILABLE
> mkl_info:
>   NOT AVAILABLE
>>>> from scipy import stats
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/dist-packages/scipy/stats/__init__.py",
> line 320, in 
> from .stats import *
>   File "/usr/local/lib/python2.7/dist-packages/scipy/stats/stats.py",
> line 242, in 
> import scipy.linalg as linalg
>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/__init__.py",
> line 147, in 
> from .misc import *
>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/misc.py",
> line 5, in 
> from . import blas
>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/blas.py",
> line 113, in 
> from scipy.linalg import _fblas
> ImportError: libopenblas.so.0: cannot open shared object file: No such
> file or directory
>
> Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils

2013-03-23 Thread Skipper Seabold

On Sat, Mar 23, 2013 at 7:26 PM, Ake Sandgren  wrote:
> On Sat, 2013-03-23 at 14:19 -0400, Skipper Seabold wrote:
>> Some help on this would be greatly appreciated. It's been recommended
>> to use OpenBlas over ATLAS, so I've been trying to build numpy with
>> openblas and have run into a few problems.
>
>>
>> To truly support OpenBlas, is it maybe necessary to make some
>> additions to numpy/distutils/system_info.py?
>
> Here is how.
>
> https://github.com/akesandgren/numpy/commit/363339dd3a9826f3e3e7dc4248c258d3c4dfcd7c
>


Thanks that works well for numpy. Test pass. I hope that makes it into
a pull request. My site.cfg looks like this. I don't know about the
lapack_opt section. It doesn't seem to work.

[DEFAULT]
library_dirs = /home/skipper/.local/lib
include_dirs = /home/skipper/.local/include

[openblas]
libraries = openblas

[lapack_opt]
libraries = openblas

Do you have any idea how to get scipy working too. I have a similar
site.cfg, but it does not find lapack, which is rolled into
libopenblas from what I understand. I can do

export LAPACK=~/.local/lib/libopenblas.a
python setup.py build &> build.log
sudo -E python setup.py install

There are no obvious failures in the build.log, but scipy is still
broken because it needs lapack from numpy I guess.

>>> import numpy as np
>>> np.show_config()
lapack_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/home/skipper/.local/lib']
language = f77
lapack_src_info:
  NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/home/skipper/.local/lib']
language = f77
lapack_opt_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE
>>> from scipy import stats
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/dist-packages/scipy/stats/__init__.py",
line 320, in 
from .stats import *
  File "/usr/local/lib/python2.7/dist-packages/scipy/stats/stats.py",
line 242, in 
import scipy.linalg as linalg
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/__init__.py",
line 147, in 
from .misc import *
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/misc.py",
line 5, in 
from . import blas
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/blas.py",
line 113, in 
from scipy.linalg import _fblas
ImportError: libopenblas.so.0: cannot open shared object file: No such
file or directory

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils

2013-03-23 Thread Skipper Seabold

On Sat, Mar 23, 2013 at 2:32 PM, Hanno Klemm  wrote:

> Skipper,
> this looks like a problem that I had in the bad old days with ATLAS, as
> well. Try compiling openblas with the -fPIC flag that used to help.
>
>
Thanks for having a look. I checked after seeing that odd bento failure
(see here [1]), and it looks to me like OpenBlas uses the -fPIC flag in all
of the gcc and gfortran calls. Possible related? [2]

Skipper

[1] https://github.com/cournape/Bento/issues/116
[2] https://github.com/cournape/Bento/issues/128



> Best of luck,
> Hanno
>
> hanno.kl...@me.com
>
> Sent from my mobile device, please excuse my brevity.
>
> On 23.03.2013, at 19:19, Skipper Seabold  wrote:
>
> Some help on this would be greatly appreciated. It's been recommended to
> use OpenBlas over ATLAS, so I've been trying to build numpy with openblas
> and have run into a few problems.
>
> 1) Build fails using bento master and waf 1.7.9, see below.
> 2) Distutils doesn't seem to be able to find lapack as part of atlas. I
> tried to skip a site.cfg and define environmental variables. No idea what I
> missed.
>
> I followed instructions found scattered over the internet and only
> understand vaguely the issues. Maybe someone can help. I'd be happy to
> update the wiki with any answers.
>
> To truly support OpenBlas, is it maybe necessary to make some additions to
> numpy/distutils/system_info.py?
>
> Thanks for having a look,
>
> Skipper
>
> Install OpenBlas
> -
> git clone git://github.com/xianyi/OpenBLAS
> cd OpenBlas
>
> Edit c_check to look for libpthreads in the right place (Kubuntu 12.10)
>
> |4 $ git diff c_check
> ```
> diff --git a/c_check b/c_check
> index 4d82237..de0fd33 100644
> --- a/c_check
> +++ b/c_check
> @@ -241,7 +241,7 @@ print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if
> $need_fu
>
>  if ($os eq "LINUX") {
>
> -@pthread = split(/\s+/, `nm /lib/libpthread.so* | grep
> _pthread_create`);
> +@pthread = split(/\s+/, `nm /lib/x86_64-linux-gnu/libpthread.so* |
> grep _pthread_create`);
>
>  if ($pthread[2] ne "") {
> print CONFFILE "#define PTHREAD_CREATE_FUNC $pthread[2]\n";
> ```
>
> make fc=gfortran
> make PREFIX=~/.local install
>
> Everything looks ok, so far.
>
> Install NumPy
> ---
> Using numpy master
>
> I tried to use bento master and waf 1.7.9, following instructions from
> David's blog
>
> bentomaker configure --prefix=/home/skipper/.local
> --with-blas-lapack-libdir=/home/skipper/.local/lib
> --blas-lapack-type=openblas ..
> bentomaker build -j4
>
> ```
> 
> [101/104] cshlib: build/numpy/core/src/umath/umath_tests.c.11.o ->
> build/numpy/core/umath_tests.so
>
> /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation
> R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used
> when making a shared object; recompile with -fPIC
> /usr/bin/ld: final link failed: Bad value
> collect2: error: ld returned 1 exit status
> /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation
> R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used
> when making a shared object; recompile with -fPIC
> /usr/bin/ld: final link failed: Bad value
> collect2: error: ld returned 1 exit status
> ```
>
> No idea, so, let's try distutils
>
> export LAPACK=~/.local/lib/libopenblas.a
> export BLAS=~/.local/lib/libopenblas.a
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/
> echo $LD_LIBRARY_PATH
> ```
> :/usr/local/lib64/R/bin:/home/skipper/.local/lib/
> ```
>
> This step seems to be necessary?
>
> python setup.py config
> ```
> Running from numpy source directory.
> non-existing path in 'numpy/distutils': 'site.cfg'
> F2PY Version 2
> numpy/core/setup_common.py:88: MismatchCAPIWarning: API mismatch detected,
> the C API version numbers have to be updated. Current C api version is 8,
> with checksum f4362353e2d72f889fda0128aa015037, but recorded checksum for C
> API version 8 in codegen_dir/cversions.txt is
> 17321775fc884de0b1eda478cd61c74b. If functions were added in the C API, you
> have to update C_API_VERSION  in numpy/core/setup_common.py.
>   MismatchCAPIWarning)
> blas_opt_info:
> blas_mkl_info:
>   libraries mkl,vml,guide not found in ['/usr/local/lib64',
> '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
>
> atlas_blas_threads_info:
> Setting PTATLAS=ATLAS
>   libraries ptf77blas,ptcblas,atlas not found in ['/

[Numpy-discussion] Unable to building numpy with openblas using bento or distutils

2013-03-23 Thread Skipper Seabold

Some help on this would be greatly appreciated. It's been recommended to
use OpenBlas over ATLAS, so I've been trying to build numpy with openblas
and have run into a few problems.

1) Build fails using bento master and waf 1.7.9, see below.
2) Distutils doesn't seem to be able to find lapack as part of atlas. I
tried to skip a site.cfg and define environmental variables. No idea what I
missed.

I followed instructions found scattered over the internet and only
understand vaguely the issues. Maybe someone can help. I'd be happy to
update the wiki with any answers.

To truly support OpenBlas, is it maybe necessary to make some additions to
numpy/distutils/system_info.py?

Thanks for having a look,

Skipper

Install OpenBlas
-
git clone git://github.com/xianyi/OpenBLAS
cd OpenBlas

Edit c_check to look for libpthreads in the right place (Kubuntu 12.10)

|4 $ git diff c_check
```
diff --git a/c_check b/c_check
index 4d82237..de0fd33 100644
--- a/c_check
+++ b/c_check
@@ -241,7 +241,7 @@ print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if
$need_fu

 if ($os eq "LINUX") {

-@pthread = split(/\s+/, `nm /lib/libpthread.so* | grep
_pthread_create`);
+@pthread = split(/\s+/, `nm /lib/x86_64-linux-gnu/libpthread.so* |
grep _pthread_create`);

 if ($pthread[2] ne "") {
print CONFFILE "#define PTHREAD_CREATE_FUNC $pthread[2]\n";
```

make fc=gfortran
make PREFIX=~/.local install

Everything looks ok, so far.

Install NumPy
---
Using numpy master

I tried to use bento master and waf 1.7.9, following instructions from
David's blog

bentomaker configure --prefix=/home/skipper/.local
--with-blas-lapack-libdir=/home/skipper/.local/lib
--blas-lapack-type=openblas ..
bentomaker build -j4

```

[101/104] cshlib: build/numpy/core/src/umath/umath_tests.c.11.o ->
build/numpy/core/umath_tests.so

/usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation
R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used
when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
/usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation
R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used
when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
```

No idea, so, let's try distutils

export LAPACK=~/.local/lib/libopenblas.a
export BLAS=~/.local/lib/libopenblas.a
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/
echo $LD_LIBRARY_PATH
```
:/usr/local/lib64/R/bin:/home/skipper/.local/lib/
```

This step seems to be necessary?

python setup.py config
```
Running from numpy source directory.
non-existing path in 'numpy/distutils': 'site.cfg'
F2PY Version 2
numpy/core/setup_common.py:88: MismatchCAPIWarning: API mismatch detected,
the C API version numbers have to be updated. Current C api version is 8,
with checksum f4362353e2d72f889fda0128aa015037, but recorded checksum for C
API version 8 in codegen_dir/cversions.txt is
17321775fc884de0b1eda478cd61c74b. If functions were added in the C API, you
have to update C_API_VERSION  in numpy/core/setup_common.py.
  MismatchCAPIWarning)
blas_opt_info:
blas_mkl_info:
  libraries mkl,vml,guide not found in ['/usr/local/lib64',
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

atlas_blas_threads_info:
Setting PTATLAS=ATLAS
  libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64',
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

atlas_blas_info:
  libraries f77blas,cblas,atlas not found in ['/usr/local/lib64',
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

/home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1501:
UserWarning:
Atlas (http://math-atlas.sourceforge.net/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [atlas]) or by setting
the ATLAS environment variable.
  warnings.warn(AtlasNotFoundError.__doc__)
blas_info:
Replacing _lib_names[0]=='blas' with 'openblas'
Replacing _lib_names[0]=='openblas' with 'openblas'
  FOUND:
libraries = ['openblas']
library_dirs = ['/home/skipper/.local/lib']
language = f77

  FOUND:
libraries = ['openblas']
library_dirs = ['/home/skipper/.local/lib']
define_macros = [('NO_ATLAS_INFO', 1)]
language = f77

non-existing path in 'numpy/lib': 'benchmarks'
lapack_opt_info:
lapack_mkl_info:
mkl_info:
  libraries mkl,vml,guide not found in ['/usr/local/lib64',
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

  NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64
  libraries lapack_atlas not found in /usr/local/lib64

Re: [Numpy-discussion] Numpy correlate

2013-03-18 Thread Skipper Seabold

On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig wrote:

>  Hi Sudheer,
>
> Le 14/03/2013 10:18, Sudheer Joseph a écrit :
>
> Dear Numpy/Scipy experts,
>   Attached is a script which I
> made to test the numpy.correlate ( which is called py plt.xcorr) to see how
> the cross correlation is calculated. From this it appears the if i call
> plt.xcorr(x,y)
> Y is slided back in time compared to x. ie if y is a process that causes a
> delayed response in x after 5 timesteps then there should be a high
> correlation at Lag 5. However in attached plot the response is seen in only
> -ve side of the lags.
> Can any one advice me on how to see which way exactly the 2 series
> are slided back or forth.? and understand the cause result relation
> better?( I understand merely by correlation one cannot assume cause and
> result relation, but it is important to know which series is older in time
> at a given lag.
>
> You indeed pointed out a lack of documentation of in matplotlib.xcorr
> function because the definition of covariance can be ambiguous.
>
> The way I would try to get an interpretation of xcorr function (& its
> friends) is to go back to the theoretical definition of cross-correlation,
> which is a normalized version of the covariance.
>
> In your example you've created a time series X(k) and a lagged one : Y(k)
> = X(k-5)
>
> Now, the covariance function of X and Y is commonly defined as :
>  Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
>  (assuming that X and Y are centered for the sake of clarity).
>
> If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This
> yields naturally the fact that the covariance is indeed maximal at h=-5 and
> not h=+5.
>
> Note that this reasoning does yield the opposite result with a different
> definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
> that's what I first did !).
>
>
> Therefore, I think there should be a definition in of cross correlation in
> matplotlib xcorr docstring. In R's acf doc, there is this mention : "The
> lag k value returned by ccf(x, y) estimates the correlation between x[t+k]
> and y[t]. "
> (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)
>
> Now I believe, this upper discussion really belongs to matplotlib ML. I'll
> put an issue on github (I just spotted a mistake the definition of
> normalization anyway)
>


You might be interested in the statsmodels implementation which should be
similar to the R functionality.

http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html
http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adding .abs() method to the array object

2013-02-25 Thread Skipper Seabold

On Mon, Feb 25, 2013 at 10:43 AM, Till Stensitzki  wrote:
>
> First, sorry that i didnt search for an old thread, but because i
disagree with
> conclusion i would at least address my reason:
>
>> I don't like
>> np.abs(arr).max()
>> because I have to concentrate to much on the braces, especially if arr
>> is a calculation
>
> This exactly, adding an abs into an old expression is always a little
annoyance
> due to the parenthesis. The argument that np.abs() also works is true for
> (almost?) every other method. The fact that so many methods already
exists,
> especially for most of the commonly used functions (min, max, dot, mean,
std,
> argmin, argmax, conj, T) makes me missing abs. Of course, if one would
redesign
> the api, one would drop most methods (i am looking at you ptp and
byteswap). But
> the objected is already cluttered and adding abs is imo logical
application of
> "practicality beats purity".
>

I tend to agree here. The situation isn't all that dire for the number of
methods in an array. No scrolling at reasonably small terminal sizes.

[~/]
[3]: x.
x.T x.copy  x.getfield  x.put   x.std
x.all   x.ctypesx.imag  x.ravel x.strides
x.any   x.cumprod   x.item  x.real  x.sum
x.argmaxx.cumsumx.itemset   x.repeatx.swapaxes
x.argminx.data  x.itemsize  x.reshape   x.take
x.argsort   x.diagonal  x.max   x.resizex.tofile
x.astypex.dot   x.mean  x.round x.tolist
x.base  x.dtype x.min   x.searchsorted  x.tostring
x.byteswap  x.dump  x.nbytesx.setfield  x.trace
x.choosex.dumps x.ndim  x.setflags  x.transpose
x.clip  x.fill  x.newbyteorder  x.shape x.var
x.compress  x.flags x.nonzero   x.size  x.view
x.conj  x.flat  x.prod  x.sort
x.conjugate x.flatten   x.ptp   x.squeeze

I find myself typing things like

arr.abs()

and

arr.unique()

quite often.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New numpy functions: filled, filled_like

2013-01-13 Thread Skipper Seabold

On Sun, Jan 13, 2013 at 6:39 PM, Nathaniel Smith  wrote:

> On Sun, Jan 13, 2013 at 11:24 PM, Robert Kern 
> wrote:
> > On Sun, Jan 13, 2013 at 6:27 PM, Nathaniel Smith  wrote:
> >> Hi all,
> >>
> >> PR 2875 adds two new functions, that generalize zeros(), ones(),
> >> zeros_like(), ones_like(), by simply taking an arbitrary fill value:
> >>   https://github.com/numpy/numpy/pull/2875
> >> So
> >>   np.ones((10, 10))
> >> is the same as
> >>   np.filled((10, 10), 1)
> >>
> >> The implementations are trivial, but the API seems useful because it
> >> provides an idiomatic way of efficiently creating an array full of
> >> inf, or nan, or None, whatever funny value you need. All the
> >> alternatives are either inefficient (np.ones(...) * np.inf) or
> >> cumbersome (a = np.empty(...); a.fill(...)). Or so it seems to me. But
> >> there's a question of taste here; one could argue instead that these
> >> just add more clutter to the numpy namespace. So, before we merge,
> >> anyone want to chime in?
> >
> > One alternative that does not expand the API with two-liners is to let
> > the ndarray.fill() method return self:
> >
> >   a = np.empty(...).fill(20.0)
>
> This violates the convention that in-place operations never return
> self, to avoid confusion with out-of-place operations. E.g.
> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus
> np.sort(), and in the broader Python world, list.sort() versus
> sorted(), list.reverse() versus reversed(). (This was an explicit
> reason given for list.sort to not return self, even.)
>
> Maybe enabling this idiom is a good enough reason to break the
> convention ("Special cases aren't special enough to break the rules. /
> Although practicality beats purity"), but it at least makes me -0 on
> this...
>
>
I tend to agree with the notion that inplace operations shouldn't return
self, but I don't know if it's just because I've been conditioned this way.
Not returning self breaks the fluid interface pattern [1], as noted in a
similar discussion on pandas [2], FWIW, though there's likely some way to
have both worlds.

Skipper

[1] https://en.wikipedia.org/wiki/Fluent_interface
[2] https://github.com/pydata/pandas/issues/1893
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal to drop python 2.4 support in numpy 1.8

2012-12-13 Thread Skipper Seabold

On Thu, Dec 13, 2012 at 12:00 PM, David Cournapeau  wrote:

> I would even go as far as dropping 2.5 as well then (RHEL 6
> uses python 2.6).

+1

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] result shape from dot for 0d, 1d, 2d scalar

2012-11-28 Thread Skipper Seabold

On Wed, Nov 28, 2012 at 12:31 PM, Sebastian Berg  wrote:

> Maybe a strict matrix product would make sense too, but the dot function
> behavior cannot be changed in any case, so its pointless to argue about
> it. Just make sure your arrays are 2-d (or matrices) if you want a
> matrix product, which will give the behavior you expect in a much more
> controlled fashion anyway.
>
>
I'm not arguing anything. I was just stating why I was surprised and was
looking for guidance to update my expectations, which you've provided.
Thanks. Assuring input dimensions is my solution.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] result shape from dot for 0d, 1d, 2d scalar

2012-11-28 Thread Skipper Seabold

On Tue, Nov 27, 2012 at 11:16 AM, Sebastian Berg  wrote:

> On Mon, 2012-11-26 at 13:54 -0500, Skipper Seabold wrote:
> > I discovered this because scipy.optimize.fmin_powell appears to
> > squeeze 1d argmin to 0d unlike the other optimizers, but that's a
> > different story.
> >
> >
> > I would expect the 0d array to behave like the 1d array not the 2d as
> > it does below. Thoughts? Maybe too big of a pain to change this
> > behavior if indeed it's not desired, but I found it to be unexpected.
>
> I don't quite understand why it is unexpected. A 1-d array is considered
> a vector, a 0-d array is a scalar.
>
>
When you put it like this I guess it makes sense. I don't encounter 0d
arrays often and never think of a 0d array as truly a scalar like
np.array(1.).item(). See below for my intuition.


> > [255]: np.version.full_version # same on 1.5.1
> > [255]: '1.8.0.dev-8e0a542'
> >
> >
> > [262]: arr = np.random.random((25,1))
> >
> >
> > [~/]
> > [263]: np.dot(arr, np.array([1.])).shape
> > [263]: (25,)
> >
> Matrix times vector = vector
> >
> > [~/]
> > [264]: np.dot(arr, np.array([[1.]])).shape
> > [264]: (25, 1)
> >
> Matrix times matrix = matrix
> >
> > [~/]
> > [265]: np.dot(arr, np.array(1.)).shape
> > [265]: (25, 1)
> >
> matrix times scalar = matrix (of same shape)
> >
> > [~/]
> > [271]: np.dot(arr.squeeze(), np.array(1.)).shape
> > [271]: (25,)
> >
> vector times scalar = vector (of same shape)
> >
> > Huh? 0d arrays broadcast with dot?
> >
> Remember a 0-d array is a scalar, there is no actual broadcasting
> involved here. (except that vectors (1-d arrays) are special)
>
>
Maybe I'm misunderstanding. How do you mean there is no broadcasting?
They're clearly not conformable. Is vector.scalar specially defined (I have
no idea)? I recall arguing once and submitting a patch such that
np.linalg.det(5) and np.linalg.inv(5) should be well-defined and work but
the counter-argument was that a scalar is not the same as a scalar matrix.
This seems to be an exception.

Here, I guess, following that counterargument, I'd expected the scalar to
fail in dot. I certainly don't expect a (N,2).scalar -> (N,2). Or I'd
expect it to follow the rules of matrix notation and be treated like the 1d
scalar vector so that (N,1).scalar -> (N,). To my mind, this follows more
closely to the expectation that (J,K).(M,N) -> (J,N), i.e., the second
dimension of the result is the same as the second dimension of whatever is
post-multiplying where the first dimension is inferred if necessary (or
should fail if non-existent). So my expectations are (were)

(N,).() -> (N,)
(N,1).() -> (N,)
(N,1).(1,) -> (N,)
(N,1).(1,1) -> (N,1)
(N,2).() -> Error

Skipper

> [~]
> > [279]: arr = np.random.random((25,2))
> >
> >
> > [~/]
> > [280]: np.dot(arr.squeeze(), np.array(2.)).shape
> > [280]: (25, 2)
> >
> >
> > Skipper
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] result shape from dot for 0d, 1d, 2d scalar

2012-11-26 Thread Skipper Seabold

I discovered this because scipy.optimize.fmin_powell appears to squeeze 1d
argmin to 0d unlike the other optimizers, but that's a different story.

I would expect the 0d array to behave like the 1d array not the 2d as it
does below. Thoughts? Maybe too big of a pain to change this behavior if
indeed it's not desired, but I found it to be unexpected.

[255]: np.version.full_version # same on 1.5.1
[255]: '1.8.0.dev-8e0a542'

[262]: arr = np.random.random((25,1))

[~/]
[263]: np.dot(arr, np.array([1.])).shape
[263]: (25,)

[~/]
[264]: np.dot(arr, np.array([[1.]])).shape
[264]: (25, 1)

[~/]
[265]: np.dot(arr, np.array(1.)).shape
[265]: (25, 1)

[~/]
[271]: np.dot(arr.squeeze(), np.array(1.)).shape
[271]: (25,)

Huh? 0d arrays broadcast with dot?

[~]
[279]: arr = np.random.random((25,2))

[~/]
[280]: np.dot(arr.squeeze(), np.array(2.)).shape
[280]: (25, 2)

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] the mean, var, std of empty arrays

2012-11-21 Thread Skipper Seabold

On Wed, Nov 21, 2012 at 9:22 PM, Olivier Delalleau  wrote:

> Current behavior looks sensible to me. I personally would prefer no
> warning but I think it makes sense to have one as it can be helpful to
> detect issues faster.
>
> -=- Olivier
>

It's configurable.

[~/]
[1]: np.seterr(all='ignore')

[1]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under':
'ignore'}

[~/]
[2]: np.array([]).mean()

[2]: nan

[~/]
[3]: np.seterr(all='warn')

[3]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under':
'ignore'}

[~/]
[4]: np.array([]).mean()

/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:57:
RuntimeWarning: invalid value encountered in double_scalars
  ret = ret / float(rcount)
[4]: nan

Skipper


> 2012/11/21 Charles R Harris 
>
>> What should be the value of the mean, var, and std of empty arrays?
>> Currently
>>
>> In [12]: a
>> Out[12]: array([], dtype=int64)
>>
>> In [13]: a.mean()
>> Out[13]: nan
>>
>> In [14]: a.std()
>> Out[14]: nan
>>
>> In [15]: a.var()
>> Out[15]: nan
>>
>> I think the nan comes from 0/0. All of these also raise warnings the
>> first time they are called.
>>
>> Chuck
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release

2012-08-21 Thread Skipper Seabold

On Tue, Aug 21, 2012 at 3:59 PM, Christoph Gohlke  wrote:

> On 8/21/2012 9:24 AM, Ondřej Čertík wrote:
> > Hi,
> >
> > I'm pleased to announce the availability of the first beta release of
> > NumPy 1.7.0b1.
> >
> > Sources and binary installers can be found at
> > https://sourceforge.net/projects/numpy/files/NumPy/1.7.0b1/
> >
> > Please test this release and report any issues on the numpy-discussion
> > mailing list. The following problems are known and
> > we'll work on fixing them before the final release:
> >
> > http://projects.scipy.org/numpy/ticket/2187
> > http://projects.scipy.org/numpy/ticket/2185
> > http://projects.scipy.org/numpy/ticket/2066
> > http://projects.scipy.org/numpy/ticket/1588
> > http://projects.scipy.org/numpy/ticket/2076
> > http://projects.scipy.org/numpy/ticket/2101
> > http://projects.scipy.org/numpy/ticket/2108
> > http://projects.scipy.org/numpy/ticket/2150
> > http://projects.scipy.org/numpy/ticket/2189
> >
> > I would like to thank Ralf for a lot of help with creating binaries
> > and other help for this release.
> >
> > Cheers,
> > Ondrej
> >
> >
>
> Hi Ondrej,
>
> will numpy 1.7.0 final support Python 3.3? The recent patch in the
> master branch seems to work well.
>
> I tested a win-amd64-py2.7\msvc9\MKL build of the numpy
> maintenance/1.7.x branch against a number of package binaries from
> .
>
> The test results are at
> <
> http://www.lfd.uci.edu/~gohlke/pythonlibs/tests/20120821-win-amd64-py2.7-numpy-MKL-1.7.0rc1.dev-28ffac7/
> >.
> For comparison, the tests against numpy-MKL-1.6.2 are at
>  >.
>
> Besides some numpy 1.7.x test errors due to RuntimeWarning and
> DeprecationWarning, there are hundreds of "RuntimeWarning (numpy.dtype
> size changed, may indicate binary incompatibility)" when loading Cython
> extensions.
>
> There are additional test failures in scipy, statsmodels, bottleneck,
> skimage, vigra, and mahotas. I did not check in detail or with existing
> tickets (http://projects.scipy.org/ is timing out or responding with
> HTTP 500 status).
>

Most (all?) of the statsmodels issues are due to structured / record array
view changes discussed in the thread "view of recarray issue." I.e.,
rec_array.view((float, 3)) no longer works, though I thought this was fixed.


>
> Other packages test OK against numpy 1.7.x, e.g. PIL, PyGame,
> matplotlib, Pandas, tables, and numexpr.
>
> Hope it helps.
>
> Christoph
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] view of recarray issue

2012-07-25 Thread Skipper Seabold

On Sun, Jul 22, 2012 at 2:15 PM, Ralf Gommers
 wrote:
> Hi,
>
> Just a heads up that right now views of recarrays seem to be problematic,
> this doesn't work anymore:
>
 import statsmodels.api as sm
 dta = sm.datasets.macrodata.load()  # returns a record array with 14
 fields
 dta.data[['infl', 'realgdp']].view((float,2))
>
> I opened http://projects.scipy.org/numpy/ticket/2187 for this. Probably a
> blocker for 1.7.0.
>
> Question: is that really the recommended way to get an (N, 2) size float
> array from two columns of a larger record array? If so, why isn't there a
> better way? If you'd want to write to that (N, 2) array you have to append a
> copy, making it even uglier. Also, then there really should be tests for
> views in test_records.py.
>

Any comments on this? I have a lot of broken code to deal with if
((float, shape[1])) is no longer allowed on structured and rec arrays.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Multivariate hypergeometric distribution?

2012-07-02 Thread Skipper Seabold

On Mon, Jul 2, 2012 at 9:35 PM,  wrote:

> On Mon, Jul 2, 2012 at 8:08 PM,   wrote:
> > On Mon, Jul 2, 2012 at 4:16 PM, Fernando Perez 
> wrote:
> >> Hi all,
> >>
> >> in recent work with a colleague, the need came up for a multivariate
> >> hypergeometric sampler; I had a look in the numpy code and saw we have
> >> the bivariate version, but not the multivariate one.
> >>
> >> I had a look at the code in scipy.stats.distributions, and it doesn't
> >> look too difficult to add a proper multivariate hypergeometric by
> >> extending the bivariate code, with one important caveat: the hard part
> >> is the implementation of the actual discrete hypergeometric sampler,
> >> which lives inside of numpy/random/mtrand/distributions.c:
> >>
> >>
> https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L743
> >>
> >> That code is hand-written C, and it only works for the bivariate case
> >> right now.  It doesn't look terribly difficult to extend, but it will
> >> certainly take a bit of care and testing to ensure all edge cases are
> >> handled correctly.
> >
> > My only foray into this
> >
> > http://projects.scipy.org/numpy/ticket/921
> > http://projects.scipy.org/numpy/ticket/923
> >
> > This looks difficult to add without a good reference and clear
> > description of the algorithm.
> >
> >>
> >> Does anyone happen to have that implemented lying around, in a form
> >> that would be easy to merge to add this capability to numpy?
> >
> > not me, I have never even heard of multivariate hypergeometric
> distribution.
> >
> >
> > maybe http://hal.inria.fr/docs/00/11/00/56/PDF/perm.pdf  p.11
> > with some properties
> http://www.math.uah.edu/stat/urn/MultiHypergeometric.html
> >
> > I've seen one other algorithm, that seems to need N (number of draws
> > in hypergeom) random variables for one multivariate hypergeometric
> > random draw, which seems slow to me.
> >
> > But maybe someone has it lying around.
>
> Now I have a pure num/sci/python version around.
>
> A bit more than an hour, so no guarantees, but freq and pmf look close
> enough.


I could be wrong, but I think PyMC has sampling and likelihood.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy 1.7 release delays

2012-06-26 Thread Skipper Seabold

On Tue, Jun 26, 2012 at 7:59 PM, Fernando Perez  wrote:
> On Tue, Jun 26, 2012 at 1:10 PM, Travis Oliphant  wrote:
>> One issues is the one that Sage identified about the array interface
>> regression as noted by Jason.    Any other regressions from 1.5.x need to be
>> addressed as well.    We'll have to decide on a case-by-case basis if there
>> are issues that conflict with 1.6.x behavior.
>>
>
> One thing this discussion made me think about, is that it would be
> great to identify a few key projects that:
>
> - use numpy heavily
> - have reasonably solid test suites
>
> and create a special build job that runs *those* test suites
> periodically.  Not necessarily on every last numpy commit, but at
> least on a reasonable schedule.
>
> I think having that kind of information readily available, and with
> the ability to switch which numpy branch/commit those tests do get run
> against, could be very valuable as an early warning system for numpy
> to know if an apparently inconsequential change has unexpected side
> effects downstream.
>
> In IPython we've really benefited greatly from our improved CI
> infrastructure, but that only goes as far as catching *our own*
> problems.  This kind of downstream integration testing could be very
> useful.
>

+1. Was thinking the same thing.

My uninformed opinion from the sidelines: For me, this begged the
question of why projects would wait so long and be upgrading 1.5.x ->
1.7.x. it sounded to me like an outreach problem. The whole point of
having release candidates is so that downstream users (and especially
big public downstream libraries) can test the release candidate and
give feedback on any changes that affect them. This feedback step is
especially crucial for a project without 100% test coverage (for new
code and old)... Putting more restrictions on changes that can be made
in releases doesn't seem to me to be the right fix, though,
admittedly, numpy is a bit of a different beast than other projects. I
would think you would want downstream projects not to wait 2 years to
upgrade and skip a couple of minor releases.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fwd: Named dtype array: Difference between a[0]['name'] and a['name'][0]?

2012-05-21 Thread Skipper Seabold

On Mon, May 21, 2012 at 4:37 PM, Travis Oliphant  wrote:
> This is the right place to ask, it's just that it can take time to get an
> answer because people who might know the answer may not have the time to
> respond immediately.
>
> The short answer is that this is not really a "normal" bug, but it could be
> considered a "design" bug (although the issues may not be straightforward to
> resolve).    What that means is that it may not be changed in the short term
> --- and you should just use the first spelling.
>
> Structured arrays can be a confusing area of NumPy for several of reasons.
> You've constructed an example that touches on several of them.   You have a
> data-type that is a "structure" array with one member ("tuple").  That
> member contains a 2-vector of integers.
>
> First of all, it is important to remember that with Python, doing
> a['tuple'][0] = (1,2) is equivalent to b = a['tuple']; b[0] = (1,2).   In
> like manner, a[0]['tuple'] = (1,2) is equivalent to b = a[0]; b['tuple'] =
> (1,2).
>
> To understand the behavior, we need to dissect both code paths and what
> happens.   You built a (3,) array of those elements in 'a'.  When you write
> b = a['tuple'] you should probably be getting a (3,) array of (2,)-integers,
> but as there is currently no formal dtype support for (n,)-integers as a
> general dtype in NumPy, you get back a (3,2) array of integers which is the
> closest thing that NumPy can give you.    Setting the [0] row of this object
> via
>
> a['tuple'][0] = (1,2)
>
> works just fine and does what you would expect.
>
> On the other hand, when you type:
>
> b = a[0]
>
> you are getting back an array-scalar which is a particularly interesting
> kind of array scalar that can hold records.    This new object is formally
> of type numpy.void and it holds a "scalar representation" of anything that
> fits under the "VOID" basic dtype.
>
> For some reason:
>
> b['tuple'] = [1,2]
>
> is not working.   On my system I'm getting a different error: TypeError:
> object of type 'int' has no len()
>
> I think this should be filed as a bug on the issue tracker which is for the
> time being here:    http://projects.scipy.org/numpy
>
> The problem is ultimately the void->copyswap function being called in
> voidtype_setfields if someone wants to investigate.   I think this behavior
> should work.
>

Just playing around I found this to be odd, though I guess makes some
sense given your comments

[~/]
[12]: b['tuple'] = [(1,2)]

[~/]
[13]: b
[13]: ([1, 0],)

[~/]
[14]: a
[14]:
array([([1, 0],), ([0, 0],), ([0, 0],)],
  dtype=[('tuple', 'http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Scipy build can't find BLAS when using numpy master? (Was: Should arr.diagonal() return a copy or a view? (1.7 compatibility issue))

2012-05-16 Thread Skipper Seabold

On Wed, May 16, 2012 at 11:50 AM, Robert Kern  wrote:
> On Wed, May 16, 2012 at 4:35 PM, Nathaniel Smith  wrote:
>> On Wed, May 16, 2012 at 4:24 PM, Robert Kern  wrote:
>>> On Wed, May 16, 2012 at 4:21 PM, Nathaniel Smith  wrote:
>>>
 I built some pristine python 2.7 installs from scratch (no virtualenv,
 no distro tweaks, etc.). Then I installed some version of numpy in
 each, then tried building scipy. With numpy 1.6.1 (built from git),
 everything seems fine - it finds atlas in /usr/lib64/atlas.

 With current numpy master (3d416a0a), numpy itself builds fine,
 but scipy's setup.py can't find atlas (even though it's looking in the
 right places). Log attached.
>>>
>>> The log says it's looking in /usr/lib64/atlas-base/, not /usr/lib64/atlas/.
>>
>> Sorry, that was an error in my email. The libraries are actually in
>> /usr/lib64/atlas-base, and when I build against numpy 1.6.1, the build
>> log says:
>>
>> Setting PTATLAS=ATLAS
>>  FOUND:
>>    libraries = ['ptf77blas', 'ptcblas', 'atlas']
>>    library_dirs = ['/usr/lib64/atlas-base']
>>    language = c
>>    define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')]
>>    include_dirs = ['/usr/include/atlas']
>>
>>  FOUND:
>>    libraries = ['ptf77blas', 'ptcblas', 'atlas']
>>    library_dirs = ['/usr/lib64/atlas-base']
>>    language = c
>>    define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')]
>>    include_dirs = ['/usr/include/atlas']
>
> Hmm. I would throw in some print statements into the "_check_libs()"
> and "_lib_list()" methods in numpy/distutils/system_info.py and see
> what it's checking for.

Possibly related?

http://thread.gmane.org/gmane.comp.python.numeric.general/46145

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] linalg.lstsq

2012-04-25 Thread Skipper Seabold

On Wed, Apr 25, 2012 at 12:13 PM, Till Stensitzki  wrote:
> Hello,
> is there weighted version of linalg.lstsq available?
> In my case, b is a (N,K) matrix, so i can't use manual scaling of x and b.
>

What shape are the weights in this case? I'm not that familiar with
problems with an N x K b matrix.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] adding a cut function to numpy

2012-04-16 Thread Skipper Seabold

On Mon, Apr 16, 2012 at 8:08 PM, Tony Yu  wrote:
>
>
> On Mon, Apr 16, 2012 at 6:01 PM, Skipper Seabold 
> wrote:
>>
>> On Mon, Apr 16, 2012 at 5:51 PM, Tony Yu  wrote:
>> >
>> >
>> > On Mon, Apr 16, 2012 at 5:27 PM, Skipper Seabold 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have a pull request here [1] to add a cut function similar to R's
>> >> [2]. It seems there are often requests for similar functionality. It's
>> >> something I'm making use of for my own work and would like to use in
>> >> statstmodels and in generating instances of pandas' Factor class, but
>> >> is this generally something people would find useful to warrant its
>> >> inclusion in numpy? It will be even more useful I think with an enum
>> >> dtype in numpy.
>> >>
>> >> If you aren't familiar with cut, here's a potential use case. Going
>> >> from a continuous to a categorical variable.
>> >>
>> >> Given a continuous variable
>> >>
>> >> [~/]
>> >> [8]: age = np.random.randint(15,70, size=100)
>> >>
>> >> [~/]
>> >> [9]: age
>> >> [9]:
>> >> array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44,
>> >> 27,
>> >>       17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42,
>> >> 69,
>> >>       50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52,
>> >> 40,
>> >>       27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22,
>> >> 25,
>> >>       36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50,
>> >> 27,
>> >>       23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26])
>> >>
>> >> Give me a variable where people are in age groups (lower bound is not
>> >> inclusive)
>> >>
>> >> [~/]
>> >> [10]: groups = [14, 25, 35, 45, 55, 70]
>> >>
>> >> [~/]
>> >> [11]: age_cat = np.cut(age, groups)
>> >>
>> >> [~/]
>> >> [12]: age_cat
>> >> [12]:
>> >> array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5,
>> >> 1,
>> >> 3,
>> >>       1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4,
>> >> 4,
>> >>       3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1,
>> >> 3,
>> >>       3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3,
>> >> 5,
>> >>       3, 2, 3, 2, 1, 3, 2, 2])
>> >>
>> >> Skipper
>> >>
>> >> [1] https://github.com/numpy/numpy/pull/248
>> >> [2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html
>> >
>> >
>> > Is this the same as `np.searchsorted` (with reversed arguments)?
>> >
>> > In [292]: np.searchsorted(groups, age)
>> > Out[292]:
>> > array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1,
>> > 3,
>> >        1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4,
>> > 4,
>> >        3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1,
>> > 3,
>> >        3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3,
>> > 5,
>> >        3, 2, 3, 2, 1, 3, 2, 2])
>> >
>>
>> That's news to me, and I don't know how I missed it.
>
>
> Actually, the only reason I remember searchsorted is because I also
> implemented a variant of it before finding that it existed.
>

It's certainly not an obvious name for the behavior I wanted at least
with my background. Ie., I want something that works on the data not
the bins/groups. And it's not referenced in histogram or digitize,
though now that I wade back through some threads I see people pointing
to it. It also appears to be faster than my implementation with
digitize with a quick look.

>>
>> It looks like
>> there is overlap, but cut will also do binning for equal width
>> categorization
>>
>> [~/]
>> [21]: np.cut(age, 6)
>> [21]:
>> array([5, 2, 1, 2, 3, 6, 5, 2, 1, 1, 4, 6, 3, 5, 3, 4, 2, 1, 2, 1, 6, 2,
>> 4,
>>       1, 5, 2, 4, 5, 2, 3, 6, 2, 3, 6, 4, 6, 1, 6, 6, 4, 2, 1, 1, 1, 4, 4,
>>       4, 2, 5, 5, 3, 2, 5, 4, 4, 2, 5, 1, 3, 2, 5, 1, 5, 1, 1, 2, 1, 2, 3,
>>       4, 2, 5, 3, 2, 4, 3, 1, 6, 2, 2, 1, 1, 3, 4, 2, 1, 6, 6, 6, 3, 3, 6,
>

Re: [Numpy-discussion] adding a cut function to numpy

2012-04-16 Thread Skipper Seabold

On Mon, Apr 16, 2012 at 5:51 PM, Tony Yu  wrote:
>
>
> On Mon, Apr 16, 2012 at 5:27 PM, Skipper Seabold 
> wrote:
>>
>> Hi,
>>
>> I have a pull request here [1] to add a cut function similar to R's
>> [2]. It seems there are often requests for similar functionality. It's
>> something I'm making use of for my own work and would like to use in
>> statstmodels and in generating instances of pandas' Factor class, but
>> is this generally something people would find useful to warrant its
>> inclusion in numpy? It will be even more useful I think with an enum
>> dtype in numpy.
>>
>> If you aren't familiar with cut, here's a potential use case. Going
>> from a continuous to a categorical variable.
>>
>> Given a continuous variable
>>
>> [~/]
>> [8]: age = np.random.randint(15,70, size=100)
>>
>> [~/]
>> [9]: age
>> [9]:
>> array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44, 27,
>>       17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42, 69,
>>       50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52, 40,
>>       27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22, 25,
>>       36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50, 27,
>>       23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26])
>>
>> Give me a variable where people are in age groups (lower bound is not
>> inclusive)
>>
>> [~/]
>> [10]: groups = [14, 25, 35, 45, 55, 70]
>>
>> [~/]
>> [11]: age_cat = np.cut(age, groups)
>>
>> [~/]
>> [12]: age_cat
>> [12]:
>> array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1,
>> 3,
>>       1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4,
>>       3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3,
>>       3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5,
>>       3, 2, 3, 2, 1, 3, 2, 2])
>>
>> Skipper
>>
>> [1] https://github.com/numpy/numpy/pull/248
>> [2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html
>
>
> Is this the same as `np.searchsorted` (with reversed arguments)?
>
> In [292]: np.searchsorted(groups, age)
> Out[292]:
> array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3,
>        1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4,
>        3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3,
>        3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5,
>        3, 2, 3, 2, 1, 3, 2, 2])
>

That's news to me, and I don't know how I missed it. It looks like
there is overlap, but cut will also do binning for equal width
categorization

[~/]
[21]: np.cut(age, 6)
[21]:
array([5, 2, 1, 2, 3, 6, 5, 2, 1, 1, 4, 6, 3, 5, 3, 4, 2, 1, 2, 1, 6, 2, 4,
   1, 5, 2, 4, 5, 2, 3, 6, 2, 3, 6, 4, 6, 1, 6, 6, 4, 2, 1, 1, 1, 4, 4,
   4, 2, 5, 5, 3, 2, 5, 4, 4, 2, 5, 1, 3, 2, 5, 1, 5, 1, 1, 2, 1, 2, 3,
   4, 2, 5, 3, 2, 4, 3, 1, 6, 2, 2, 1, 1, 3, 4, 2, 1, 6, 6, 6, 3, 3, 6,
   3, 3, 3, 3, 1, 3, 2, 2])

and explicitly handles the case with constant x

[~/]
[26]: x = np.ones(100)*6

[~/]
[27]: np.cut(x, 5)
[27]:
array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
   3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
   3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
   3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
   3, 3, 3, 3, 3, 3, 3, 3])

I guess I could patch searchsorted. Thoughts?

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] adding a cut function to numpy

2012-04-16 Thread Skipper Seabold

Hi,

I have a pull request here [1] to add a cut function similar to R's
[2]. It seems there are often requests for similar functionality. It's
something I'm making use of for my own work and would like to use in
statstmodels and in generating instances of pandas' Factor class, but
is this generally something people would find useful to warrant its
inclusion in numpy? It will be even more useful I think with an enum
dtype in numpy.

If you aren't familiar with cut, here's a potential use case. Going
from a continuous to a categorical variable.

Given a continuous variable

[~/]
[8]: age = np.random.randint(15,70, size=100)

[~/]
[9]: age
[9]:
array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44, 27,
   17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42, 69,
   50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52, 40,
   27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22, 25,
   36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50, 27,
   23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26])

Give me a variable where people are in age groups (lower bound is not inclusive)

[~/]
[10]: groups = [14, 25, 35, 45, 55, 70]

[~/]
[11]: age_cat = np.cut(age, groups)

[~/]
[12]: age_cat
[12]:
array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3,
   1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4,
   3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3,
   3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5,
   3, 2, 3, 2, 1, 3, 2, 2])

Skipper

[1] https://github.com/numpy/numpy/pull/248
[2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy doc for percentile function

2012-04-04 Thread Skipper Seabold

On Wed, Apr 4, 2012 at 6:10 AM, Pierre Haessig wrote:

> Hi,
>
> I'm looking for the entry point in Numpy doc for the percentile function.
> I'm assuming it should sit in routines.statistics but do not see it :
> http://docs.scipy.org/doc/numpy/reference/routines.statistics.html
>
>
I don't see it in the docs either.


> Am I missing something ? If indeed the percentile entry should be added,
> do you agree it could be added to the "Histogram" section ? (and
> "Histogram" would become "Histograms and percentiles")
>
>
IMO it should go under the extremal values header and this should be
changed to order statistics.


> Also, as Frédéric Bastien pointed out, I feel that the current doc build
> is broken (especially the links :-( )
>
>
Indeed. It does look that way. It is the same on my local build. Perhaps
this deserves a separate message. They show up here.

http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.statistics.rst/#routines-statistics

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy videos

2012-03-12 Thread Skipper Seabold

On Mon, Mar 12, 2012 at 6:04 PM, Abhishek Pratap  wrote:
>
> Hey Guys
>
> Few days with folks at my first pycon has made me wonder how much of
> cool things I was missing ..
>
> I am looking to do some quick catch up on numpy and wondering if there
> are any set of videos that I can refer to. I learn quicker seeing
> videos  and would appreciate if you guys can point me to anything
> available it will be of great help.
>

You'll find a lot of videos here. The tutorials in particular may
interest you from past conferences.

http://conference.scipy.org/index.html

Oddly though it doesn't look like there's a straight link to the 2011
conference there.

http://conference.scipy.org/scipy2011/

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] checking for c compiler during build

2012-03-07 Thread Skipper Seabold

On Wed, Mar 7, 2012 at 12:35 PM, Skipper Seabold  wrote:
> Is there a way to use numpy.distuils to programmatically check for a C
> compiler at build time in a platform independent way?

Wading through the numpy/distutils code some more. Would something as
simple as this work all the time? Seems to do the trick for me.

from distutils.dist import Distribution
from distutils.command.config import config as distutils_config
from distutils import log

dummy_c_text = r'''
void do_nothing(void);
int main(void) {
do_nothing();
return 0;
}
'''

def has_c_compiler():
c = distutils_config(Distribution())
try:
success = c.try_compile(dummy_c_text)
return True
except:
log.info("No C compiler detected of files.")
return False

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] checking for c compiler during build

2012-03-07 Thread Skipper Seabold

Is there a way to use numpy.distuils to programmatically check for a C
compiler at build time in a platform independent way?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Missing data again

2012-03-03 Thread Skipper Seabold

On Sat, Mar 3, 2012 at 4:46 PM, Mark Wiebe  wrote:
> On Sat, Mar 3, 2012 at 12:30 PM, Travis Oliphant 

>>
>>        * the reduction operations need to default to "skipna" --- this is
>> the most common use case which has been re-inforced again to me today by a
>> new user to Python who is using masked arrays presently
>
>
> This is a completely trivial change. I went with the default as I did
> because it's what R, the primary inspiration for the NA design, does. We'll
> have to be sure this is well-marked in the documentation about "NumPy NA for
> R users".
>

It may be trivial to change the code, but this isn't a trivial change.
"Most common use case" is hard for me to swallow, since there are so
many. Of the different statistical softwares I've used, none that I
recall ignores missing data (silently) by default. This sounds
dangerous to me. It's one thing to be convenient to work with missing
data, but it's another to try to sweep the problem under the rug. I
imagine the choice of the R developers was a thoughtful one.

Perhaps something like np.seterr should also be implemented for
missing data since there's probably no resolution to what's most
sensible here.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] speed of array creation from tuples

2012-02-27 Thread Skipper Seabold

I am surprised by this (though maybe I shouldn't be?) It's always faster to
use list comprehension to unpack lists of tuples than np.array/asarray?

[~/]
[1]: X = [tuple(np.random.randint(10,size=2)) for _ in
range(100)]

[~/]
[2]: timeit np.array([x1 for _,x1 in
X])
1 loops, best of 3: 26.4 us per loop

[~/]
[3]: timeit
np.array(X)
1000 loops, best of 3: 378 us per loop

[~/]
[4]: timeit x1, x2 = np.array([x for _,x in X]), np.array([y for y,_ in
X])
1 loops, best of 3: 53.4 us per loop

[~/]
[5]: timeit x1, x2 =
np.array(X).T
1000 loops, best of 3: 384 us per loop


Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bincount([], minlength=2) should work right?

2012-02-25 Thread Skipper Seabold

On Sat, Feb 25, 2012 at 5:13 PM, Alan G Isaac  wrote:
> On 2/25/2012 4:44 PM, James Bergstra wrote:
>> bincount([]) makes no sense,
>
> I disagree:
> http://permalink.gmane.org/gmane.comp.python.numeric.general/42041
>
>
>> but if a minlength argument is provided,
>> then the routine should succeed.
>
> Definitely!
>

There's a PR to fix this here.

https://github.com/numpy/numpy/pull/84

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt?

2012-02-20 Thread Skipper Seabold

On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen  wrote:
> On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes  wrote:
>> Hey everyone,
>>
>> I have timeseries data in which the column label is simply a filename from
>> which the original data was taken.  Here's some sample data:
>>
>> name1.txt  name2.txt  name3.txt
>> 32  34    953
>> 32  03    402
>>
>> I've noticed that the standard genfromtxt() method works great; however, the
>> names aren't written correctly.  That is, if I use the command:
>>
>> print data['name1.txt']
>>
>> Nothing happens.
>>
>> However, when I remove the file extension, Eg:
>>
>> name1  name2  name3
>> 32  34    953
>> 32  03    402
>>
>> Then print data['name1'] return (32, 32) as expected.  It seems that the
>> period in the name isn't compatible with the genfromtxt() names attribute.
>> Is there a workaround, or do I need to restructure my program to get the
>> extension removed?  I'd rather not do this if possible for reasons that
>> aren't important for the discussion at hand.
>
> It looks like the period is just getting stripped out of the names:
>
> In [1]: import numpy as N
>
> In [2]: N.genfromtxt('sample.txt', names=True)
> Out[2]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
>      dtype=[('name1txt', '
> Interestingly, this still happens if you supply the names manually:
>
> In [17]: def reader(filename):
>   :     infile = open(filename, 'r')
>   :     names = infile.readline().split()
>   :     data = N.genfromtxt(infile, names=names)
>   :     infile.close()
>   :     return data
>   :
>
> In [20]: data = reader('sample.txt')
>
> In [21]: data
> Out[21]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
>      dtype=[('name1txt', '
> What you can do is reset the names after genfromtxt is through with it, 
> though:
>
> In [34]: def reader(filename):
>   :     infile = open(filename, 'r')
>   :     names = infile.readline().split()
>   :     infile.close()
>   :     data = N.genfromtxt(filename, names=True)
>   :     data.dtype.names = names
>   :     return data
>   :
>
> In [35]: data = reader('sample.txt')
>
> In [36]: data
> Out[36]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
>      dtype=[('name1.txt', '
> Be warned, I don't know why the period is getting stripped; there may
> be a good reason, and adding it in might cause problems.

I think the period is stripped because recarrays also offer attribute
access of names. So you wouldn't be able to do

your_array.sample.txt

All the names get passed through a name validator. IIRC it's something like

from numpy.lib import _iotools

validator = _iotools.NameValidator()

validator.validate('sample1.txt')
validator.validate('a name with spaces')

NameValidator has a good docstring and the gist of this should be in
the genfromtxt docs, if it's not already.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] adding unsigned int and int

2011-12-06 Thread Skipper Seabold

On Tue, Dec 6, 2011 at 7:53 AM, Matthew Brett  wrote:
> Hi,
>
> On Tue, Dec 6, 2011 at 4:45 AM, Skipper Seabold  wrote:
>> Hi,
>>
>> Is this intended?
>>
>> [~/]
>> [1]: np.result_type(np.uint, np.int)
>> [1]: dtype('float64')
>
> I would guess so - if your system ints are 64 bit.  int64 can't
> contain the range for uint64, nor can uint64 contain all int64,  If
> there had been a larger int type, it would promote to int, I believe.
> At least on my system:
>
> In [4]: np.result_type(np.int32, np.uint32)
> Out[4]: dtype('int64')
>

Makes sense. Thanks,

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] adding unsigned int and int

2011-12-06 Thread Skipper Seabold

Hi,

Is this intended?

[~/]
[1]: np.result_type(np.uint, np.int)
[1]: dtype('float64')

[~/]
[2]: np.version.version
[2]: '2.0.0.dev-aded70c'


Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] simulate AR

2011-10-14 Thread Skipper Seabold

On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac  wrote:
>
> On 10/14/2011 1:42 PM, josef.p...@gmail.com wrote:
> > If I remember correctly, signal.lfilter doesn't require stationarity,
> > but handling of the starting values is a bit difficult.
>
>
> Hmm.  Yes.
> AR(1) is trivial, but how do you handle higher orders?
>

Not sure if this is what you're after, but here I go the other way
signal -> noise with known initial values of an ARMA(p,q) process.
Here I want to set it such that the first p error terms are zero, I
had to solve for the zi that make this so

https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/tsa/arima_model.py#L295

This is me talking to myself about this.

http://thread.gmane.org/gmane.comp.python.scientific.user/27162/focus=27162

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] wanted: decent matplotlib alternative

2011-10-13 Thread Skipper Seabold

On Thu, Oct 13, 2011 at 6:22 PM, Gökhan Sever  wrote:
>
>
> On Thu, Oct 13, 2011 at 4:15 PM, Benjamin Root  wrote:
>>
>> Myself and other developers would greatly appreciate help from the
>> community to point out which examples are too confusing or out of date. We
>
> It would be nice to have a social interface for the mpl gallery like the one
> similar to the R-gallery
> [http://www.r-bloggers.com/the-r-graph-gallery-goes-social/]

Big +1. Just yesterday I wanted to add some cool "notes to self"
plots. IIRC there was a lightning talk at SciPy conference two summers
ago about starting a web site just like this. Don't know what happened
though.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] speeding up operations on small vectors

2011-10-11 Thread Skipper Seabold

On Tue, Oct 11, 2011 at 12:41 PM, Christoph Groth  wrote:
> Skipper Seabold  writes:
>
>> So it's the dot function being called repeatedly on smallish arrays
>> that's the bottleneck? I've run into this as well. See this thread
>> [1].
>> (...)
>
> Thanks for the links.  "tokyo" is interesting, though I fear the
> intermediate matrix size regime where it really makes a difference will
> be rather small.  My concern is in really tiny vectors, where it's not
> even worth to call BLAS.
>

IIUC, it's not so much the BLAS that's helpful but avoiding the
overhead in calling numpy.dot from cython.

>> I'd be very interested to hear if you achieve a great speed-up with
>> cython+tokyo.
>
> I try to solve this problem in some way or other.  I'll post here if I
> end up with something interesting.

Please do.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] speeding up operations on small vectors

2011-10-11 Thread Skipper Seabold

On Tue, Oct 11, 2011 at 11:57 AM, Christoph Groth  wrote:
> Pauli Virtanen  writes:
>
>>> Thank you for your suggestion.  It doesn't help me however, because
>>> the algorithm I'm _really_ trying to speed up cannot be vectorized
>>> with numpy in the way you vectorized my toy example.
>>>
>>> Any other ideas?
>>
>> Reformulate the problem so that it can be vectorized. Without knowing
>> more about the actual algorithm you are trying to implement, it's not
>> easy to give more detailed help.
>
> My question was about ways to achieve a speedup without modifying the
> algorithm.  I was hoping that there is some numpy-like library for
> python which for small arrays achieves a performance at least on par
> with the implementation using tuples.  This should be possible
> technically.

So it's the dot function being called repeatedly on smallish arrays
that's the bottleneck? I've run into this as well. See this thread
[1]. You might gain some speed if you drop it down into Cython, some
examples in that thread. If you're still up against it, you can try
the C code that Fernando posted for fast matrix multiplication (I
haven't yet), or you might be able to do well to use tokyo from Cython
since Wes' has fixed it up [2].

I'd be very interested to hear if you achieve a great speed-up with
cython+tokyo.

Cheers,

Skipper

[1] http://mail.scipy.org/pipermail/scipy-user/2010-December/thread.html#27791
[2] https://github.com/wesm/tokyo
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] why doesn't numpy.distutils find ATLAS?

2011-09-21 Thread Skipper Seabold

I have installed numpy with my own ATLAS, but trying to install PyMC,
it can't find the ATLAS libs. I also have an older package that
formerly installed but no longer does. I'm being a bit lazy, but am I
missing something? Briefly checking, it looks like the
conventions/assumptions for site.cfg in an installation as per
site.cfg.example are different from those expected of
numpy.distutils.system_info. I have the directories specified under
[DEFAULT] as per the installation instructions. If I don't have the
libraries installed in one of the default folders, it looks like
they're not found.

[~/]
[1]: from numpy.distutils import system_info

[~/]
[2]: np.show_config()
atlas_threads_info:
libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/home/skipper/built/atlas_files/lib']
define_macros = [('ATLAS_INFO', '"\\"3.9.32\\""')]
language = f77
include_dirs = ['/home/skipper/built/atlas_files/include']
blas_opt_info:
libraries = ['ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/home/skipper/built/atlas_files/lib']
define_macros = [('ATLAS_INFO', '"\\"3.9.32\\""')]
language = c
include_dirs = ['/home/skipper/built/atlas_files/include']
atlas_blas_threads_info:
libraries = ['ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/home/skipper/built/atlas_files/lib']
define_macros = [('ATLAS_INFO', '"\\"3.9.32\\""')]
language = c
include_dirs = ['/home/skipper/built/atlas_files/include']
lapack_opt_info:
libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/home/skipper/built/atlas_files/lib']
define_macros = [('ATLAS_INFO', '"\\"3.9.32\\""')]
language = f77
include_dirs = ['/home/skipper/built/atlas_files/include']
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE

[~/]
[3]: system_info.get_info('atlas',1)
/usr/local/lib/python2.7/dist-packages/numpy/distutils/system_info.py:462:
UserWarning:
Atlas (http://math-atlas.sourceforge.net/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [atlas]) or by setting
the ATLAS environment variable.
  warnings.warn(self.notfounderror.__doc__)
[3]: {}
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Bug in numpy std, etc. with other data structures?

2011-09-17 Thread Skipper Seabold

Just ran into this. Any objections for having numpy.std and other
functions in core/fromnumeric.py call asanyarray before trying to use
the array's method? Other data structures like pandas and larry define
their own std method, for instance, and this doesn't allow them to
pass through. I'm inclined to say that the issue is with numpy, though
maybe the data structures shouldn't shadow numpy array methods while
altering the signature. I dunno.

df = pandas.DataFrame(np.random.random((10,5)))

np.std(df,axis=0)

TypeError: std() got an unexpected keyword argument 'dtype'

np.std(np.asanyarray(df),axis=0)
array([ 0.30883352,  0.3133324 ,  0.26517361,  0.26389029,  0.20022444])

Though I don't think this would work with larry yet.

Pull request: https://github.com/numpy/numpy/pull/160

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix

2011-08-26 Thread Skipper Seabold

On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas  wrote:
> Hello All,
>
>
>
> I am trying to identify columns of a matrix that are perfectly collinear.
> It is not that difficult to identify when two columns are identical are have
> zero variance, but I do not know how to ID when the culprit is of a higher
> order. i.e. columns 1 + 2 + 3 = column 4.  NUM.corrcoef(matrix.T) will
> return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide
> a very large condition number…. But they do not tell me which columns are
> causing the problem.   For example:
>
>
>
> zt = numpy. array([[ 1.  ,  1.  ,  1.  ,  1.  ,  1.  ],
>
>    [ 0.25,  0.1 ,  0.2 ,  0.25,  0.5 ],
>
>    [ 0.75,  0.9 ,  0.8 ,  0.75,  0.5 ],
>
>    [ 3.  ,  8.  ,  0.  ,  5.  ,  0.  ]])
>
>
>
> How can I identify that columns 0,1,2 are the issue because: column 1 +
> column 2 = column 0?
>
>
>
> Any input would be greatly appreciated.  Thanks much,
>

The way that I know to do this in a regression context for (near
perfect) multicollinearity is VIF. It's long been on my todo list for
statsmodels.

http://en.wikipedia.org/wiki/Variance_inflation_factor

Maybe there are other ways with decompositions. I'd be happy to hear about them.

Please post back if you write any code to do this.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] summing an array

2011-08-18 Thread Skipper Seabold

On Thu, Aug 18, 2011 at 10:19 AM, Chris Withers  wrote:
> Hi All,
>
> Hopefully a simple newbie question, if I have an array such as :
>
> array([0, 1, 2, 3, 4])
>
> ...what's the best way to cummulatively sum it so that I end up with:
>
> array([0, 1, 3, 6, 10])
>
> How would I do this both in-place and to create a new array?
>

[~/]
[1]: a = np.arange(5)

[~/]
[2]: a
[2]: array([0, 1, 2, 3, 4])

[~/]
[3]: np.cumsum(a)
[3]: array([ 0,  1,  3,  6, 10])

[~/]
[4]: np.cumsum(a,out=a)
[4]: array([ 0,  1,  3,  6, 10])

[~/]
[5]: a
[5]: array([ 0,  1,  3,  6, 10])

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpydoc - latex longtables error

2011-08-10 Thread Skipper Seabold

On Wed, Aug 10, 2011 at 3:28 PM, Matthew Brett  wrote:
> Hi,
>
> I think this one might be for Pauli.
>
> I've run into an odd problem that seems to be an interaction of
> numpydoc and autosummary and large classes.
>
> In summary, large classes and numpydoc lead to large tables of class
> methods, and there seems to be an error in the creation of the large
> tables in latex.
>
> Specifically, if I run 'make latexpdf' with the attached minimal
> sphinx setup, I get a pdflatex error ending thus:
>
> ...
> l.118 \begin{longtable}{LL}
>
> and this is because longtable does not accept LL as an argument, but
> needs '|l|l|' (bar - el - bar - el - bar).
>
> I see in sphinx.writers.latex.py, around line 657, that sphinx knows
> about this in general, and long tables in standard ReST work fine with
> the el-bar arguments passed to longtable.
>
>        if self.table.colspec:
>            self.body.append(self.table.colspec)
>        else:
>            if self.table.has_problematic:
>                colwidth = 0.95 / self.table.colcount
>                colspec = ('p{%.3f\\linewidth}|' % colwidth) * \
>                          self.table.colcount
>                self.body.append('{|' + colspec + '}\n')
>            elif self.table.longtable:
>                self.body.append('{|' + ('l|' * self.table.colcount) + '}\n')
>            else:
>                self.body.append('{|' + ('L|' * self.table.colcount) + '}\n')
>
> However, using numpydoc and autosummary (see the conf.py file), what
> seems to happen is that, when we reach the self.table.colspec test at
> the beginning of the snippet above, 'self.table.colspec' is defined:
>
> In [1]: self.table.colspec
> Out[1]: '{LL}\n'
>
> and thus the LL gets written as the arg to longtable:
>
> \begin{longtable}{LL}
>
> and the pdf build breaks.
>
> I'm using the numpydoc out of the current numpy source tree.
>
> At that point I wasn't sure how to proceed with debugging.  Can you
> give any hints?
>

It's not a proper fix, but our workaround is to edit the Makefile for
latex (and latexpdf) to

https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/Makefile#L94
https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/make.bat#L121

to call the script to replace the longtable arguments

https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/fix_longtable.py

The workaround itself probably isn't optimal, and I'd be happy to hear
of a proper fix.

Cheers,

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] limit to number of fields in recarray

2011-08-02 Thread Skipper Seabold

On Tue, Aug 2, 2011 at 4:09 PM, Craig Yoshioka  wrote:
> yup, duplicate field names given.  I didn't commit the non-working version 
> and I didn't want to mess up my working code so I tried duplicating the dtype 
> in a new file and couldn't recreate the error.   I suppose the answer to my 
> question is, there is no limit to the number of records?  Must have been an 
> invalid name, or a different error on my part.  Out of curiosity, what does 
> recarray consider an invalid field name?

I guess this checking is only done in genfromtxt and that's where I
recall coming across it.

http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#validating-names

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] limit to number of fields in recarray

2011-08-02 Thread Skipper Seabold

On Tue, Aug 2, 2011 at 3:19 PM, Craig Yoshioka  wrote:
> duplicate column in dtype?
>

"Duplicate field names given."? Can you post code to replicate?

> I just consolidated some of the columns and the error went away... none had 
> duplicate field names... hence the question.
>

I don't think this would be raised unless there are duplicates. There
is some name changing for invalid field names that could result in a
name collision. I think I've run into this before.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] code review/build & test for datetime business day API

2011-07-25 Thread Skipper Seabold

2011/7/25 Stéfan van der Walt :
> On Mon, Jul 25, 2011 at 2:11 PM, Charles R Harris
>  wrote:
>> It's just asking for import problems and general confusion to shadow a
>> Python module, that's why we renamed io to npyio.
>
> Why?  Users can simply do
>
> import numpy.io as npyio ?
>

IIRC this was changed because of a (now fixed) bug in 2to3.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-06 Thread Skipper Seabold

On Wed, Jul 6, 2011 at 7:14 PM, Christopher Jordan-Squire
 wrote:
> On Wed, Jul 6, 2011 at 3:47 PM,  wrote:
>> On Wed, Jul 6, 2011 at 4:38 PM,   wrote:
>> > On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire

>> >> Mean value replacement, or more generally single scalar value
>> >> replacement,
>> >> is generally not a good idea. It biases downward your standard error
>> >> estimates if you use mean replacement, and it will bias both if you use
>> >> anything other than mean replacement. The bias is gets worse with more
>> >> missing data. So it's worst in the precisely the cases where you'd want
>> >> to
>> >> fill in the data the most. (Though I admit I'm not too familiar with
>> >> time
>> >> series, so maybe this doesn't apply. But it's true as a general
>> >> principle in
>> >> statistics.) I'm not sure why we'd want to make this use case easier.
>>
>> Another qualification on this (I cannot help it).
>> I think this only applies if you use a prefabricated no-missing-values
>> algorithm. If I write it myself, I can do the proper correction for
>> the reduced number of observations. (similar to the case when we
>> ignore correlated information and use statistics based on uncorrelated
>> observations which also overestimate the amount of information we have
>> available.)
>>
>
> Can you do that sort of technique with longitudinal (panel) data? I'm
> honestly curious because I haven't looked into such corrections before. I
> haven't been able to find a reference after a few quick google searches. I
> don't suppose you know one off the top of your head?
> And you're right about the last measurement carried forward. I was just
> thinking about filling in all missing values with the same value.
> -Chris Jordan-Squire
> PS--Thanks for mentioning the statsmodels discussion. I'd been keeping track
> of that on a different email account, and I haven't realized it wasn't
> forwarding those messages correctly.
>

Maybe a bit OT, but I've seen people doing imputation using Bayesian
MCMC or multiple imputation for missing values in panel data. Google
'data augmentation' or 'multiple imputation'. I haven't looked much
into the details yet, but it's definitely not mean replacement.

FWIW (I haven't been following closely the discussion), there is a
distinction in statistics between ignorable and nonignorable missing
data, but I can't think of a situation where I would need this at the
computational level rather than relying on a (numerically comparable)
missing data type(s) a la SAS/Stata. I've also found the odd examples
of IGNORE without a clear answer to be scary.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-05 Thread Skipper Seabold

On Tue, Jul 5, 2011 at 2:46 PM, Pierre GM  wrote:
>
> On Jul 5, 2011, at 8:33 PM, Skipper Seabold wrote:
>
>> On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold  wrote:
>>> On Fri, Jul 1, 2011 at 2:22 PM,   wrote:
>>>> On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold  
>>>> wrote:
>>>>> lib.recfunctions has never been fully advertised. The two bugs I just
>>>>> discovered lead me to believe that it's not that well vetted, but it
>>>>> is useful. I can't be the only one using these?
>>>>>
>>>>> What do people think of either deprecating lib.recfunctions or at
>>>>> least importing them into the numpy.rec namespace?
>>>>>
>>>>> I'm sure this has come up before, but gmane search isn't working for me.
>>>>
>>>> about once a year
>>>>
>>>> http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655
>>>>
>>>> my guess is not much has changed since then
>>>>
>>>
>>> Ah, yes. I recall now.
>>>
>>> I agree that they're more general than rec, but also don't have a
>>> first best solution for this. So I think we should move them (in a
>>> correct way) to numpy.rec and add (some of?) them as methods to
>>> recarrays. The best we can do beyond that is put some docs on the
>>> structured array page and notes in the docstrings that they also work
>>> for ndarrays with structured dtype.
>>>
>>> I'll submit a pull request soon and maybe that'll generate some interest.
>>>
>>
>> Had a brief look at what getting lib.recfunctions into rec/core.rec
>> namespace would entail. It's not as simple as it could be, because
>> there are circular imports between core.records and recfunctions (and
>> its imports). It seems that it is possible to work around the circular
>> imports in some of the code except for the degree to which
>> recfunctions is wrapped up with the masked array code.
>
> Hello,
> The idea behin having a lib.recfunctions and not a rec.recfunctions or 
> whatever was to illustrate that the functions of this package are more 
> generic than they appear. They work with regular structured ndarrays and 
> don't need recarrays. Methinks we gonna lose this aspect if you try to rename 
> it, but hey, your call.

I agree (even though 'rec' is already in the name). My goal was to
just have numpy.rec.join_by, numpy.rec.stack_arrays, etc, so they're
right there (rec seems more intuitive than lib to me). Do you think
that they may be better off in the main numpy namespace? This is far
from my call, just trying to reach some consensus and make an effort
to move the status quo.

> As to as why they were never really advertised ? Because I never received any 
> feedback when I started developing them (developing is a big word here, I 
> just took a lot of code that John D Hunter had developed in matplotlib and 
> make it more consistent). I advertised them once or twice on the list, wrote 
> the basic docstrings, but waited for other people to start using them.

As Josef pointed out before, it's a chicken and egg thing re:
advertisement and feedback. I think the best advertisement is by
namespace. I use them frequently, and I haven't offered any feedback
because I've never been left wanting (recent pull request is the only
exception). For the most part they do what I want and the docs are
good.

> Anyhow.
> So, yes, there might be some weird import to polish. Note that if you decided 
> to just rename the package and leave it where it was, it would probably be 
> easier.
>

Imports are fine as long as they stay where they are and aren't
imported into numpy.core.

>
>> The path of least resistance is to just import lib.recfunctions.* into
>> the (already crowded) main numpy namespace and be done with it.
>
> Why ? Why can't you leave it available through numpy.lib ? Once again, if 
> it's only a matter of PRing, you could start writing an entry page in the doc 
> describing the functions, that would improve the visibility.

I'm fine with leaving the code where it is, but typing
numpy.lib.recfunctions. is painful (ditto `import
numpy.lib.recfunctions as nprf`). Every time. And I have to do this
often. Even if they are imported into the lib namespace (they aren't),
it would be an improvement, but I still don't think it occurs to
people to hunt through lib to try and join two structured arrays. It
looks like everything in the lib namespace is imported into the main
numpy namespace anyway. And 2) I found a little buglet recently that
made me think

Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-05 Thread Skipper Seabold

On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold  wrote:
> On Fri, Jul 1, 2011 at 2:22 PM,   wrote:
>> On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold  wrote:
>>> lib.recfunctions has never been fully advertised. The two bugs I just
>>> discovered lead me to believe that it's not that well vetted, but it
>>> is useful. I can't be the only one using these?
>>>
>>> What do people think of either deprecating lib.recfunctions or at
>>> least importing them into the numpy.rec namespace?
>>>
>>> I'm sure this has come up before, but gmane search isn't working for me.
>>
>> about once a year
>>
>> http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655
>>
>> my guess is not much has changed since then
>>
>
> Ah, yes. I recall now.
>
> I agree that they're more general than rec, but also don't have a
> first best solution for this. So I think we should move them (in a
> correct way) to numpy.rec and add (some of?) them as methods to
> recarrays. The best we can do beyond that is put some docs on the
> structured array page and notes in the docstrings that they also work
> for ndarrays with structured dtype.
>
> I'll submit a pull request soon and maybe that'll generate some interest.
>

Had a brief look at what getting lib.recfunctions into rec/core.rec
namespace would entail. It's not as simple as it could be, because
there are circular imports between core.records and recfunctions (and
its imports). It seems that it is possible to work around the circular
imports in some of the code except for the degree to which
recfunctions is wrapped up with the masked array code.

The path of least resistance is to just import lib.recfunctions.* into
the (already crowded) main numpy namespace and be done with it.
Another option, though it's more work, is to remove all the internal
masked array support and let the user decide what do with the
record/structured arrays after they're returned (I invariably have to
set usemask=False anyway). The functions can then be wrapped by
higher-level ones in np.ma if the old usemask behavior is still
desirable for people. This should probably wait until the new masked
array changes are in and settled a bit though.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] An NA compromise idea -- many-NA

2011-07-01 Thread Skipper Seabold

On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn
 wrote:
> I propose a simple idea *for the long term* for generalizing Mark's
> proposal, that I hope may perhaps put some people behind Mark's concrete
> proposal in the short term.
>
> If key feature missing in Mark's proposal is the ability to distinguish
> between different reason for NA-ness; IGNORE vs. NA. However, one could
> conceive wanting to track a whole host of reasons:
>
> homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, TOO_LAZY])
>
> Wouldn't it be a shame to put a lot of work into NA, but then have users
> to still keep a seperate "shadow-array" for stuff like this?
>
> a) In this case the generality of Mark's proposal seems justified and
> less confusing to teach newcomers (?)
>
> b) Since Mark's proposal seems to generalize well to many NAs (there's 8
> bits in the mask, and millions of available NaN-s in floating point), if
> people agreed to this one could leave it for later and just go on with
> the proposed idea.
>

I have not been following the discussion in much detail, so forgive me
if this has come up. But I think this approach is also similar to
thinking behind missing values in SAS and "extended" missing values in
Stata. They are missing but preserve an order. This way you can pull
out values that are missing because they were eaten by a dog and see
if these missing ones are systematically different than the ones that
are missing because they're too lazy. Use case that pops to mind,
seeing if the various ways of attrition in surveys or experiments
varies in a non-random way.

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm
http://www.stata.com/help.cgi?missing

Maybe this is neither here nor there, I just don't want to end up with
the R way is the only way. That's why I prefer Python :)

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-01 Thread Skipper Seabold

On Fri, Jul 1, 2011 at 3:05 PM, Benjamin Root  wrote:
>
>
> On Fri, Jul 1, 2011 at 12:59 PM, Skipper Seabold 
> wrote:
>>
>> lib.recfunctions has never been fully advertised. The two bugs I just
>> discovered lead me to believe that it's not that well vetted, but it
>> is useful. I can't be the only one using these?
>>
>
> Nope, you aren't the only one.  I use them in my code.

Ah, good. I was recently just surprised by some results of pretty
basic use of join_by.

>
>>
>> What do people think of either deprecating lib.recfunctions or at
>> least importing them into the numpy.rec namespace?
>>
>
> I wouldn't mind moving them around, but I certainly would not want them
> deprecated.
>

Just meant deprecated in terms of the namespace.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-01 Thread Skipper Seabold

On Fri, Jul 1, 2011 at 2:22 PM,   wrote:
> On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold  wrote:
>> lib.recfunctions has never been fully advertised. The two bugs I just
>> discovered lead me to believe that it's not that well vetted, but it
>> is useful. I can't be the only one using these?
>>
>> What do people think of either deprecating lib.recfunctions or at
>> least importing them into the numpy.rec namespace?
>>
>> I'm sure this has come up before, but gmane search isn't working for me.
>
> about once a year
>
> http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655
>
> my guess is not much has changed since then
>

Ah, yes. I recall now.

I agree that they're more general than rec, but also don't have a
first best solution for this. So I think we should move them (in a
correct way) to numpy.rec and add (some of?) them as methods to
recarrays. The best we can do beyond that is put some docs on the
structured array page and notes in the docstrings that they also work
for ndarrays with structured dtype.

I'll submit a pull request soon and maybe that'll generate some interest.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Moving lib.recfunctions?

2011-07-01 Thread Skipper Seabold

lib.recfunctions has never been fully advertised. The two bugs I just
discovered lead me to believe that it's not that well vetted, but it
is useful. I can't be the only one using these?

What do people think of either deprecating lib.recfunctions or at
least importing them into the numpy.rec namespace?

I'm sure this has come up before, but gmane search isn't working for me.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Two bugs in recfunctions.join_by with patch

2011-06-29 Thread Skipper Seabold

These two cases failed in recfunctions.join_by

1) the case for having either r1postfix or r2postfix as an empty
string was not handled.

2) If there is more than one key and more than variable with a name collision.

Patch and tests in a pull request here: https://github.com/numpy/numpy/pull/100

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Multiply along axis

2011-06-29 Thread Skipper Seabold

On Wed, Jun 29, 2011 at 10:32 AM, Robert Elsner  wrote:
>
> Hello everyone,
>
> I would like to solve the following problem (preferably without
> reshaping / flipping the array a).
>
> Assume I have a vector v of length x and an n-dimensional array a where
> one dimension has length x as well. Now I would like to multiply the
> vector v along a given axis of a.
>
> Some example code
>
> a = np.random.random((2,3))
> x = np.zeros(2)
>
> a * x   # Fails because not broadcastable
>
>
> So how do I multiply x with the columns of a so that for each column j
> a[:,j] = a[:,j] * x
>
> without using a loop. Is there some (fast) fast way to accomplish that
> with numpy/scipy?
>

Is this what you want?

a * x[:,None]

or

a *  x[:,np.newaxis]

or more generally

a * np.expand_dims(x,1)

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] numpy submodules missing in intersphinx objects.inv?

2011-06-20 Thread Skipper Seabold

I was just trying to link to the numpy.testing module but can't using
intersphinx. Does anyone know why certain submodules aren't included
in objects.inv? It looks as though it has something to do either with
having a reference at the top of the rst file (so you can :ref: link
to it) or having that module autodoc'd (then you can :mod: link to
it), but I can't find any actual documentation about this actually
being the case. This is the relevant ticket
http://projects.scipy.org/numpy/ticket/1432

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy/Scipy Testing Guidelines URL?

2011-06-20 Thread Skipper Seabold

On Mon, Jun 20, 2011 at 5:41 PM, Ralf Gommers
 wrote:
>
>
> On Mon, Jun 20, 2011 at 11:36 PM, Skipper Seabold 
> wrote:
>>
>> Are the testing guidelines included in the HTML docs anywhere? If I
>> recall, they used to be, and I couldn't find them with a brief
>> look/google. I'd like to link to them. Maybe the rendered rst page is
>> considered their new home?
>
> That is the new home. Should be linked from the old home. A patch to add
> those to the numpy devguide is welcome I think.
>

numpy/doc/source/dev? This is what I was thinking. But I didn't know
if there was some other grand design to having a lot of the developer
specific information in numpy/doc.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Numpy/Scipy Testing Guidelines URL?

2011-06-20 Thread Skipper Seabold

Are the testing guidelines included in the HTML docs anywhere? If I
recall, they used to be, and I couldn't find them with a brief
look/google. I'd like to link to them. Maybe the rendered rst page is
considered their new home?

https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bincount limitations

2011-06-02 Thread Skipper Seabold

On Thu, Jun 2, 2011 at 1:11 PM, Robert Kern  wrote:
> On Thu, Jun 2, 2011 at 12:08, Skipper Seabold  wrote:
>> On Wed, Jun 1, 2011 at 10:10 PM, Alan G Isaac  wrote:
>>>> On Thu, Jun 2, 2011 at 1:49 AM, Mark Miller  
>>>> wrote:
>>>>> Not quite. Bincount is fine if you have a set of approximately
>>>>> sequential numbers. But if you don't
>>>
>>>
>>> On 6/1/2011 9:35 PM, David Cournapeau wrote:
>>>> Even worse, it fails miserably if you sequential numbers but with a high 
>>>> shift.
>>>> np.bincount([10001, 10002]) # will take a lof of memory
>>>> Doing bincount with dict is faster in those cases.
>>>
>>>
>>> Since this discussion has turned shortcomings of bincount,
>>> may I ask why np.bincount([]) is not an empty array?
>>> Even more puzzling, why is np.bincount([],minlength=6)
>>> not a 6-array of zeros?
>>>
>>
>> Just looks like it wasn't coded that way, but it's low-hanging fruit.
>> Any objections to adding this behavior? This commit should take care
>> of it. Tests pass. Comments welcome, as I'm just getting my feet wet
>> here.
>>
>> https://github.com/jseabold/numpy/commit/133148880bba5fa3a11dfbb95cefb3da4f7970d5
>
> I would use np.zeros(5, dtype=int) in test_empty_with_minlength(), but
> otherwise, it looks good.
>

Ok, thanks. Made the change and removed the old test that it fails on
empty. Pull request.

https://github.com/numpy/numpy/pull/84

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bincount limitations

2011-06-02 Thread Skipper Seabold

On Wed, Jun 1, 2011 at 10:10 PM, Alan G Isaac  wrote:
>> On Thu, Jun 2, 2011 at 1:49 AM, Mark Miller  
>> wrote:
>>> Not quite. Bincount is fine if you have a set of approximately
>>> sequential numbers. But if you don't
>
>
> On 6/1/2011 9:35 PM, David Cournapeau wrote:
>> Even worse, it fails miserably if you sequential numbers but with a high 
>> shift.
>> np.bincount([10001, 10002]) # will take a lof of memory
>> Doing bincount with dict is faster in those cases.
>
>
> Since this discussion has turned shortcomings of bincount,
> may I ask why np.bincount([]) is not an empty array?
> Even more puzzling, why is np.bincount([],minlength=6)
> not a 6-array of zeros?
>

Just looks like it wasn't coded that way, but it's low-hanging fruit.
Any objections to adding this behavior? This commit should take care
of it. Tests pass. Comments welcome, as I'm just getting my feet wet
here.

https://github.com/jseabold/numpy/commit/133148880bba5fa3a11dfbb95cefb3da4f7970d5

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New functions.

2011-06-01 Thread Skipper Seabold

On Wed, Jun 1, 2011 at 11:31 AM, Mark Miller  wrote:
> I'd love to see something like a "count_unique" function included. The
> numpy.unique function is handy, but it can be a little awkward to
> efficiently go back and get counts of each unique value after the
> fact.
>

Does bincount do what you're after?

[~/]
[1]: x = np.random.randint(1,6,5)

[~/]
[2]: x
[2]: array([1, 3, 4, 5, 1])

[~/]
[3]: np.bincount(x)
[3]: array([0, 2, 0, 1, 1, 1])

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New functions.

2011-05-31 Thread Skipper Seabold

On Tue, May 31, 2011 at 9:53 PM, Warren Weckesser
 wrote:
>
>
> On Tue, May 31, 2011 at 8:36 PM, Skipper Seabold 
> wrote:
>> I don't know if it's one pass off the top of my head, but I've used
>> percentile for interpercentile ranges.
>>
>> [docs]
>> [1]: X = np.random.random(1000)
>>
>> [docs]
>> [2]: np.percentile(X,[0,100])
>> [2]: [0.00016535235312509222, 0.99961513543316571]
>>
>> [docs]
>> [3]: X.min(),X.max()
>> [3]: (0.00016535235312509222, 0.99961513543316571)
>>
>
>
> percentile() isn't one pass; using percentile like that is much slower:
>
> In [25]: %timeit np.percentile(X,[0,100])
> 1 loops, best of 3: 103 us per loop
>
> In [26]: %timeit X.min(),X.max()
> 10 loops, best of 3: 11.8 us per loop
>

Probably should've checked that before opening my mouth. Never
actually used it for a minmax, but it is faster than two calls to
scipy.stats.scoreatpercentile. Guess I'm +1 to fast order statistics.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New functions.

2011-05-31 Thread Skipper Seabold

On Tue, May 31, 2011 at 9:31 PM, Benjamin Root  wrote:
>
>
> On Tue, May 31, 2011 at 8:18 PM, Warren Weckesser
>  wrote:
>>
>>
>> On Tue, May 31, 2011 at 8:08 PM, Charles R Harris
>>  wrote:
>>>
>>> Hi All,
>>>
>>> I've been contemplating new functions that could be added to numpy and
>>> thought I'd run them by folks to see if there is any interest.
>>>
>>> 1) Modified sort/argsort functions that return the maximum k values.
>>>     This is easy to do with heapsort and almost as easy with mergesort.
>>>
>>
>>
>> While you're at, how about a function that finds both the max and min in
>> one pass?  (Mentioned previously in this thread:
>> http://mail.scipy.org/pipermail/numpy-discussion/2010-June/051072.html)
>>
>>
>
> +1 from myself and probably just about anybody in matplotlib.  If both the
> maxs and mins are searched during the same run through an array, I would
> imagine that would result in a noticeable speedup with automatic range
> finding.
>

I don't know if it's one pass off the top of my head, but I've used
percentile for interpercentile ranges.

[docs]
[1]: X = np.random.random(1000)

[docs]
[2]: np.percentile(X,[0,100])
[2]: [0.00016535235312509222, 0.99961513543316571]

[docs]
[3]: X.min(),X.max()
[3]: (0.00016535235312509222, 0.99961513543316571)

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] finding elements that match any in a set

2011-05-27 Thread Skipper Seabold

On Fri, May 27, 2011 at 12:48 PM, Michael Katz
 wrote:
> I have a numpy array, records, with named fields including a field named
> "integer_field". I have an array (or list) of values of interest, and I want
> to get the indexes where integer_field has any of those values.
>
> Because I can do
>
>     indexes = np.where( records.integer_field > 5 )
>
> I thought I could do
>
>     indexes = np.where( records.integer_field in values )
>
> But that doesn't work. (As a side question I'm interested in why that
> doesn't work, when values is a python list.)
>
> How can I get those indexes? I though perhaps I need a nested np.where, or
> some other two-step process, but it wasn't clear to me how to do it.
>

Check out this recent thread. I think the proposed class does what you
want. It's more efficient than in1d, if values is small compared to
the length of records.

http://thread.gmane.org/gmane.comp.python.scientific.user/29035/

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Bug: np.linalg.qr crash the python interpreter if getting a shape (0, x) array.

2011-05-08 Thread Skipper Seabold

On Sun, May 8, 2011 at 11:42 AM, Skipper Seabold  wrote:
> On Fri, May 6, 2011 at 11:30 AM, Till Stensitzki  wrote:
>> Hi,
>> discovered another small bug. Windows 7 32 bit, Python 2.6.
>>
>> In [1]: np.__version__
>> Out[1]: '1.5.1'
>>
>> In [2]: a=np.zeros((0,2))
>>
>> In [3]: np.linalg.qr(a)
>>  ** On entry to DGEQRF parameter number  4 had an illegal value
>>
>
> In [3]: a
> Out[3]: array([], shape=(0, 2), dtype=float64)
>
> It doesn't crash for me on 1.5.1, but returns error code -4 from
> dgeqrf. Maybe np.linalg.linalg._assertRank2 should also check that
> a.size > 0, or there should an _assertNotEmpty?
>
> Maybe you should file a ticket so it doesn't get lost.
>

Added an _assertNonEmpty to qr and filed a pull request.

https://github.com/jseabold/numpy/compare/master...qr-nonempty-check

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Bug: np.linalg.qr crash the python interpreter if getting a shape (0, x) array.

2011-05-08 Thread Skipper Seabold

On Fri, May 6, 2011 at 11:30 AM, Till Stensitzki  wrote:
> Hi,
> discovered another small bug. Windows 7 32 bit, Python 2.6.
>
> In [1]: np.__version__
> Out[1]: '1.5.1'
>
> In [2]: a=np.zeros((0,2))
>
> In [3]: np.linalg.qr(a)
>  ** On entry to DGEQRF parameter number  4 had an illegal value
>

In [3]: a
Out[3]: array([], shape=(0, 2), dtype=float64)

It doesn't crash for me on 1.5.1, but returns error code -4 from
dgeqrf. Maybe np.linalg.linalg._assertRank2 should also check that
a.size > 0, or there should an _assertNotEmpty?

Maybe you should file a ticket so it doesn't get lost.

Skipper

>
> --Here python is crashed.
> While np.linalg.lstsq doesn't crashs, the error message is not very helpfull.
>
> greetings Till
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy > 1.5.1

2011-04-11 Thread Skipper Seabold

All,

We noticed some failing tests for statsmodels between numpy 1.5.1 and
numpy >= 1.6.0. These are the versions where I noticed the change. It
seems that when you divide a float array and multiply by a boolean
array the answers are different (unless the others are also off by
some floating point). Can anyone replicate the following using this
script or point out what I'm missing?

import numpy as np
print np.__version__
np.random.seed(12345)

test = np.random.randint(0,2,size=10).astype(bool)
t = 1.345
arr = np.random.random(size=10)

arr # okay
t/arr # okay
(1-test)*arr # okay
(1-test)*t/arr # huh?

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> print np.__version__
2.0.0.dev-fe3852f
>>> np.random.seed(12345)
>>>
>>> test = np.random.randint(0,2,size=10).astype(bool)
>>> t = 1.345
>>> arr = np.random.random(size=10)
>>>
>>> arr # okay
array([ 0.5955447 ,  0.96451452,  0.6531771 ,  0.74890664,  0.65356987,
0.74771481,  0.96130674,  0.0083883 ,  0.10644438,  0.29870371])
>>> t/arr # okay
array([   2.25843668,1.39448393,2.05916589,1.7959515 ,
  2.0579284 ,1.79881418,1.39913718,  160.342421  ,
 12.63570742,4.50278968])
>>> (1-test)*arr # okay
array([ 0.5955447 ,  0.,  0.,  0.,  0.65356987,
0.,  0.96130674,  0.0083883 ,  0.,  0.29870371])
>>> (1-test)*t/arr # huh?
array([   2.25797754,0.,0.,0.,
  2.05751003,0.,1.39885274,  160.3098235 ,
  0.,4.50187427])


Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> print np.__version__
1.5.1
>>> np.random.seed(12345)
>>>
>>> test = np.random.randint(0,2,size=10).astype(bool)
>>> t = 1.345
>>> arr = np.random.random(size=10)
>>>
>>> arr # okay
array([ 0.5955447 ,  0.96451452,  0.6531771 ,  0.74890664,  0.65356987,
0.74771481,  0.96130674,  0.0083883 ,  0.10644438,  0.29870371])
>>> t/arr # okay
array([   2.25843668,1.39448393,2.05916589,1.7959515 ,
  2.0579284 ,1.79881418,1.39913718,  160.342421  ,
 12.63570742,4.50278968])
>>> (1-test)*arr # okay
array([ 0.5955447 ,  0.,  0.,  0.,  0.65356987,
0.,  0.96130674,  0.0083883 ,  0.,  0.29870371])
>>> (1-test)*t/arr # huh?
array([   2.25843668,0.,0.,0.,
  2.0579284 ,0.,1.39913718,  160.342421  ,
  0.,4.50278968])

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Remove deprecated skiprows

2011-04-05 Thread Skipper Seabold

On Tue, Apr 5, 2011 at 6:05 PM, Pierre GM  wrote:
>
> On Apr 5, 2011, at 11:52 PM, Ralf Gommers wrote:
>
>> On Tue, Apr 5, 2011 at 11:45 PM, Skipper Seabold  wrote:
>>> On Sun, Apr 3, 2011 at 8:20 PM, Charles R Harris
>>>  wrote:
>>>> Should skiprows be removed?
>>>>
>>>>     if skiprows:
>>>>         warnings.warn(\
>>>>             "The use of `skiprows` is deprecated, it will be removed in
>>>> numpy 2.0.\n" \
>>>>             "Please use `skip_header` instead.",
>>>>             DeprecationWarning)
>>>>         skip_header = skiprows
>>>>
>>>> Its been deprecated since 1.4. Personally, I like skiprows better than
>>>> skip_header ;)
>>>>
>>>
>>> +1 for skiprows. I always have to look it up.
>>
>> To me one is not much better than the other, but -1 for skiprows
>> because un-deprecating it and deprecating skip_header is inconsistent
>> and annoying for users.
>
> -1 for skiprows. When  I introduced skip_footer, it looked more consistent to 
> have a skip_header and skip_footer than a skiprows and a skip_footer.
> Now, we could probably find more meaningful names, like, skip_header_rows... 
> OK, just joking.
>

I'm only slightly serious here, but inconsistent keywords are also
annoying for users. e.g. vs. numpy.loadtxt, though I know that the use
cases are slightly different. To my mind, skip_headers is a bool and
skiprows is (more general and) an int. Using R too much maybe.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Remove deprecated skiprows

2011-04-05 Thread Skipper Seabold

On Sun, Apr 3, 2011 at 8:20 PM, Charles R Harris
 wrote:
> Should skiprows be removed?
>
>     if skiprows:
>     warnings.warn(\
>     "The use of `skiprows` is deprecated, it will be removed in
> numpy 2.0.\n" \
>     "Please use `skip_header` instead.",
>     DeprecationWarning)
>     skip_header = skiprows
>
> Its been deprecated since 1.4. Personally, I like skiprows better than
> skip_header ;)
>

+1 for skiprows. I always have to look it up.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] "Best" dot product along an axis?

2011-04-04 Thread Skipper Seabold

Going through some of the recent threads on similar problems, I'm
trying to discern which is "best."

I have X is T x i, theta is T x i x j. I want a T by j array that
contains X[t].T.dot(theta[t]) along axis 0.

Say,

T,i,j = 166,7,3

X = np.random.random((T,i))
theta = np.random.random((T,i,j))

# Using sum

F1 = (X[:,:,None] * theta).sum(1)

# Using a loop

F2 = np.zeros((T,j))
for t in range(T):
for j_idx in range(j):
for i_idx in range(7):
F2[t,j_idx] += X[t,i_idx] * theta[t,i_idx,j_idx]

# One way with dot, keeps an extra T index
F3 = np.squeeze(np.dot(X[:,None,:],theta)).reshape(-1,3)[::167]

# the above is way fast than
F3_2 = np.dot(X,theta).reshape(-1,3)[::167]

# zipped arrays
F4 = np.asarray([np.dot(x,theta_i) for x,theta_i in zip(X,theta)])

It seems that F3 is the fastest way to do this given T,i,j, but it
seems inefficient to compute things I don't want. Does it make sense
that F3 is faster than F3_2? I'm using generic Ubuntu ATLAS on this
machine. Is there a more efficient way to do this? I'm not coming up
with one.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output

2011-03-24 Thread Skipper Seabold

On Thu, Mar 24, 2011 at 10:52 AM, eat  wrote:
> Hi
>
> On Thu, Mar 24, 2011 at 4:17 PM, Skipper Seabold 

>> Also, as Robert pointed out to me before np.errstate is a
>> context-manager for ignoring these warnings. I often wrap optimization
>> code with it.
>
> I didn't realize that its context-manager, but that's really create!
> (Perhaps documents could reflect that.)
> Regards,
> eat

I added errstate to the see also of seterr:
http://docs.scipy.org/numpy/docs/numpy.core.numeric.seterr/

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] should get rid of the annoying numpy STDERR output

2011-03-24 Thread Skipper Seabold

On Thu, Mar 24, 2011 at 9:45 AM, Ralf Gommers
 wrote:
> 2011/3/24 Dmitrey :
>>
>> Hi
>>
>> 2011/3/24 Dmitrey 
>>>
>>> >>> from numpy import inf, array
>>> >>> inf*0
>>> nan
>>>
>>> (ok)
>>>
>>> >>> array(inf) * 0.0
>>> StdErr: Warning: invalid value encountered in multiply
>>> nan
>>>
>>> My cycled calculations yields this thousands times slowing computations
>>> and making text output completely non-readable.
>>
>> Would old= seterr(invalid= 'ignore') be sufficient for you?
>>
>> yes for me, but I'm not sure for all those users who use my soft. Maybe it
>> will hide some bugs in their objective functions and nonlinear constraints
>> in numerical optimization and nonlinear equation systems.
>
> If we do what you request in your message subject your users will have
> the same issue.
>
> You can wrap (parts of) your code in something like:
>  olderr = seterr(invalid= 'ignore')
>  
>  seterr(**olderr)
>

Also, as Robert pointed out to me before np.errstate is a
context-manager for ignoring these warnings. I often wrap optimization
code with it.

In [3]: np.array(np.inf)*0.
Warning: invalid value encountered in multiply
Out[3]: nan

In [4]: with np.errstate(all='ignore'):
   ...: np.array(np.inf)*0.
   ...:
   ...:
Out[4]: nan

In [5]: np.array(np.inf)*0.
Warning: invalid value encountered in multiply
Out[5]: nan

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Extracting data from genfromtxt?

2011-03-18 Thread Skipper Seabold

On Fri, Mar 18, 2011 at 4:27 PM, Rohaq  wrote:
> I've got a CSV file with the following layout:
>
> Location,Result1,Result2
> 1,0,0
> 2,0,0
> 3,1,0
> 4,1,0
> 5,1,1
> 6,0,1
> 7,0,1
> 8,0,0
>
> ...etc., and I've loaded it using the following:
>
> import numpy as np
> data = np.genfromtxt('file.csv',delimiter=',',dtype=None,names=True)
>
> Resulting in the following:
>
> array([(1, 0, 0), (2, 0, 0), (3, 0, 0), ..., (16382, 0, 0), (16383, 0, 0),
>       (16384, 0, 0)],
>      dtype=[('Location', '
> I want to plot the data in matplotlib.pyplot, but I'm having trouble
> reshaping the data into something it'll recognise. I can pull the
> Location easily enough with data['Location'], but I need to get the
> results from every other column present.
>
> There could be a lot more than 2 other columns, Result3,Result4, etc.,
> so I need to code it to be scalable; if this were acting like a normal
> dict, I could write a loop that iterates over the keys, and loads the
> results data into a new object, but I can't seem to find a function
> that will return the keys so that I can do this.
>
> Can anybody offer any tips for this? I'm fairly new to Python/Numpy,
> so any help is appreciated.
>

You've got a few options here. If you just want a plain array from
genfromtxt in the first place, then you can pass skip_headers=1, to
genfromtxt and it should will give you a plain array, if all the data
is of a homogeneous type.

If you want use your structured array, the most generic way to get an
ndarray (I think) is

data[list(data.dtype.names)].view((float,len(data.dtype.names))

You can replace float with int, if that's what you want.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Multiple Linear Regression

2011-03-16 Thread Skipper Seabold

On Wed, Mar 16, 2011 at 2:53 AM, dileep kunjaai  wrote:
> Dear sir,
>  Can we do multiple linear regression(MLR)  in python is there any
> inbuilt function for MLR
>

You might be interested in statsmodels

http://statsmodels.sourceforge.net/

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for a bit more info on structured arrays in the "basics" page

2011-03-09 Thread Skipper Seabold

On Wed, Mar 9, 2011 at 8:13 PM, Ralf Gommers
 wrote:
> On Wed, Mar 9, 2011 at 11:24 PM, Skipper Seabold  wrote:
>> On Tue, Mar 8, 2011 at 8:08 PM, Skipper Seabold  wrote:
>>> On Sun, Mar 6, 2011 at 11:12 PM, Ralf Gommers
>>>  wrote:
>>>> On Sun, Mar 6, 2011 at 1:10 AM, Skipper Seabold  
>>>> wrote:
>>>>> On Sat, Mar 5, 2011 at 9:28 AM, Ralf Gommers
>>>>>  wrote:
>>>>>> On Sat, Mar 5, 2011 at 8:09 AM, Russell E. Owen  wrote:
>>>>>>> The page <http://docs.scipy.org/doc/numpy/user/basics.rec.html>
>>>>>>>
>>>>>>> gives a good introduction to structured arrays. However, it says nothing
>>>>>>> about how to set a particular element (all fields at once) from a
>>>>>>> collection of data.
>>>>>>>
>>>>>>> For instance:
>>>>>>>
>>>>>>> stArr = numpy.zeros([4,5], dtype=[("pos", float, (2,)), ("rot", float)])
>>>>>>>
>>>>>>> The question is how to set stArr[0]?
>>>>>>>
>>>>>>> >From experimentation it appears that you can provide a tuple, but not a
>>>>>>> list. Hence the following works just fine (and that the tuple can
>>>>>>> contain a list):
>>>>>>> strArr[0,0] = ([1.0, 1.1], 2.0)
>>>>>>>
>>>>>>> but the following fails:
>>>>>>> strArr[0,0] = [[1.0, 1.1], 2.0]
>>>>>>> with an error:
>>>>>>> TypeError: expected a readable buffer object
>>>>>>>
>>>>>>> This is useful information if one is trying to initialize a structured
>>>>>>> array from a collection of data, such as that returned from a database
>>>>>>> query.
>>>>>>>
>>>>>
>>>>> I added a bit at the end here, though it is mentioned briefly above.
>>>>> Feel free to expand. It's a wiki. You just need edit rights.
>>>>>
>>>>> http://docs.scipy.org/numpy/docs/numpy.doc.structured_arrays/
>>>>
>>>> Thanks, I'll make sure that goes in for 1.6.0.
>>>>
>>>>>> I'm wondering if that's not a bug? If it's intentional then it is
>>>>>> certainly counterintuitive.
>>>>>>
>>>>>
>>>>> This comes up from time to time.
>>>>>
>>>>> http://thread.gmane.org/gmane.comp.python.numeric.general/30793/focus=30793
>>>>>
>>>>> Perhaps an enhancement ticket could be filed? It doesn't sound trivial
>>>>> to implement.
>>>>
>>>> I filed #1758.
>>>>
>>>> You can also assign with an array which fails silently, certainly a bug:
>>>>
>>>>>>> arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')])
>>>>>>> arr['var1'] = np.arange(5)
>>>>>>> arr[0] = (10,20)
>>>>>>> arr[0]
>>>> (10.0, 20.0)
>>>>
>>>>>>> arr[0] = np.array([10,20])  # no exception, but garbage out
>>>>>>> arr[0]
>>>> (4.2439915824246103e-313, 0.0)
>>>>
>>>
>>> This is a casting issue. Your array is an integer array. You can
>>> assign with an array.
>>>
>>> arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')])
>>> arr[0] = np.array([10.0,20])
>>> arr[0]
>>> (10.0, 20.0)
>>>
>>
>> FYI, I fixed the docs to reflect this.
>
> Thanks, the doc is accurate now. Although I'm not sure we want to
> document a bug (which I'm still sure it is) like that without making
> clear it it is a bug.
>
>> I know numpy is already pretty verbose by default, but should the
>> integer case throw up a warning similar to casting from complex to
>> real?
>
> Please no, that will be very annoying. Plus it's much better defined
> then complex -> real.
>

I assume this is the same thing

x = np.array([1.5])
x.view(int)
array([4609434218613702656])

x = np.array([1])
x.view(float)
array([  4.94065646e-324])

I've been bit by this before, so I know not to do it, but I think it
would be nice if it threw up a "don't try do that!".

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for a bit more info on structured arrays in the "basics" page

2011-03-09 Thread Skipper Seabold

On Tue, Mar 8, 2011 at 8:08 PM, Skipper Seabold  wrote:
> On Sun, Mar 6, 2011 at 11:12 PM, Ralf Gommers
>  wrote:
>> On Sun, Mar 6, 2011 at 1:10 AM, Skipper Seabold  wrote:
>>> On Sat, Mar 5, 2011 at 9:28 AM, Ralf Gommers
>>>  wrote:
>>>> On Sat, Mar 5, 2011 at 8:09 AM, Russell E. Owen  wrote:
>>>>> The page <http://docs.scipy.org/doc/numpy/user/basics.rec.html>
>>>>>
>>>>> gives a good introduction to structured arrays. However, it says nothing
>>>>> about how to set a particular element (all fields at once) from a
>>>>> collection of data.
>>>>>
>>>>> For instance:
>>>>>
>>>>> stArr = numpy.zeros([4,5], dtype=[("pos", float, (2,)), ("rot", float)])
>>>>>
>>>>> The question is how to set stArr[0]?
>>>>>
>>>>> >From experimentation it appears that you can provide a tuple, but not a
>>>>> list. Hence the following works just fine (and that the tuple can
>>>>> contain a list):
>>>>> strArr[0,0] = ([1.0, 1.1], 2.0)
>>>>>
>>>>> but the following fails:
>>>>> strArr[0,0] = [[1.0, 1.1], 2.0]
>>>>> with an error:
>>>>> TypeError: expected a readable buffer object
>>>>>
>>>>> This is useful information if one is trying to initialize a structured
>>>>> array from a collection of data, such as that returned from a database
>>>>> query.
>>>>>
>>>
>>> I added a bit at the end here, though it is mentioned briefly above.
>>> Feel free to expand. It's a wiki. You just need edit rights.
>>>
>>> http://docs.scipy.org/numpy/docs/numpy.doc.structured_arrays/
>>
>> Thanks, I'll make sure that goes in for 1.6.0.
>>
>>>> I'm wondering if that's not a bug? If it's intentional then it is
>>>> certainly counterintuitive.
>>>>
>>>
>>> This comes up from time to time.
>>>
>>> http://thread.gmane.org/gmane.comp.python.numeric.general/30793/focus=30793
>>>
>>> Perhaps an enhancement ticket could be filed? It doesn't sound trivial
>>> to implement.
>>
>> I filed #1758.
>>
>> You can also assign with an array which fails silently, certainly a bug:
>>
>>>>> arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')])
>>>>> arr['var1'] = np.arange(5)
>>>>> arr[0] = (10,20)
>>>>> arr[0]
>> (10.0, 20.0)
>>
>>>>> arr[0] = np.array([10,20])  # no exception, but garbage out
>>>>> arr[0]
>> (4.2439915824246103e-313, 0.0)
>>
>
> This is a casting issue. Your array is an integer array. You can
> assign with an array.
>
> arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')])
> arr[0] = np.array([10.0,20])
> arr[0]
> (10.0, 20.0)
>

FYI, I fixed the docs to reflect this.

I know numpy is already pretty verbose by default, but should the
integer case throw up a warning similar to casting from complex to
real?

>>> x = np.zeros(2)
>>> x[:] = np.array([1+1j,1+1j])
ComplexWarning: Casting complex values to real discards the imaginary part

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] append_fields behavior with a (sort of) empty array

2011-03-08 Thread Skipper Seabold

First consider this example of column_stack

import numpy as np
x = np.random.random((5,1))
y = np.random.random((5,3))

arr = np.column_stack((x[[]], y))
# this fails which is expected

arr = np.column_stack((x[:,[]], y))
# this happily works I guess because

x[:,[]]
# array([], shape=(5, 0), dtype=float64)

Now if I want to something analogous with structured array I don't see
off the bat how.  Is this desired behavior? At the very least it's a
partial bug I think.

import numpy.lib.recfunctions as nprf

xx = np.zeros(5, dtype=[('variable',float)])
xx['variable'] = np.random.random(5)

yy = np.random.random((5,2))

arr2 = nprf.append_fields((xx[[]][:], ['var1','var2'], yy.T, usemask=False)

arr2
array([(1e+20, 0.24277386045950911, 0.3801949115638894),
   (1e+20, 0.81574702386474807, 0.95094000248766541),
   (1e+20, 0.7901945469951196, 0.49315509384277167),
   (1e+20, 0.84140435880491093, 0.37125966704723368),
   (1e+20, 0.66595836283749366, 0.18154580574750145)],
  dtype=[('variable', 'http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for a bit more info on structured arrays in the "basics" page

2011-03-08 Thread Skipper Seabold

On Sun, Mar 6, 2011 at 11:12 PM, Ralf Gommers
 wrote:
> On Sun, Mar 6, 2011 at 1:10 AM, Skipper Seabold  wrote:
>> On Sat, Mar 5, 2011 at 9:28 AM, Ralf Gommers
>>  wrote:
>>> On Sat, Mar 5, 2011 at 8:09 AM, Russell E. Owen  wrote:
>>>> The page <http://docs.scipy.org/doc/numpy/user/basics.rec.html>
>>>>
>>>> gives a good introduction to structured arrays. However, it says nothing
>>>> about how to set a particular element (all fields at once) from a
>>>> collection of data.
>>>>
>>>> For instance:
>>>>
>>>> stArr = numpy.zeros([4,5], dtype=[("pos", float, (2,)), ("rot", float)])
>>>>
>>>> The question is how to set stArr[0]?
>>>>
>>>> >From experimentation it appears that you can provide a tuple, but not a
>>>> list. Hence the following works just fine (and that the tuple can
>>>> contain a list):
>>>> strArr[0,0] = ([1.0, 1.1], 2.0)
>>>>
>>>> but the following fails:
>>>> strArr[0,0] = [[1.0, 1.1], 2.0]
>>>> with an error:
>>>> TypeError: expected a readable buffer object
>>>>
>>>> This is useful information if one is trying to initialize a structured
>>>> array from a collection of data, such as that returned from a database
>>>> query.
>>>>
>>
>> I added a bit at the end here, though it is mentioned briefly above.
>> Feel free to expand. It's a wiki. You just need edit rights.
>>
>> http://docs.scipy.org/numpy/docs/numpy.doc.structured_arrays/
>
> Thanks, I'll make sure that goes in for 1.6.0.
>
>>> I'm wondering if that's not a bug? If it's intentional then it is
>>> certainly counterintuitive.
>>>
>>
>> This comes up from time to time.
>>
>> http://thread.gmane.org/gmane.comp.python.numeric.general/30793/focus=30793
>>
>> Perhaps an enhancement ticket could be filed? It doesn't sound trivial
>> to implement.
>
> I filed #1758.
>
> You can also assign with an array which fails silently, certainly a bug:
>
>>>> arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')])
>>>> arr['var1'] = np.arange(5)
>>>> arr[0] = (10,20)
>>>> arr[0]
> (10.0, 20.0)
>
>>>> arr[0] = np.array([10,20])  # no exception, but garbage out
>>>> arr[0]
> (4.2439915824246103e-313, 0.0)
>

This is a casting issue. Your array is an integer array. You can
assign with an array.

arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')])
arr[0] = np.array([10.0,20])
arr[0]
(10.0, 20.0)

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for a bit more info on structured arrays in the "basics" page

2011-03-05 Thread Skipper Seabold

On Sat, Mar 5, 2011 at 9:28 AM, Ralf Gommers
 wrote:
> On Sat, Mar 5, 2011 at 8:09 AM, Russell E. Owen  wrote:
>> The page 
>>
>> gives a good introduction to structured arrays. However, it says nothing
>> about how to set a particular element (all fields at once) from a
>> collection of data.
>>
>> For instance:
>>
>> stArr = numpy.zeros([4,5], dtype=[("pos", float, (2,)), ("rot", float)])
>>
>> The question is how to set stArr[0]?
>>
>> >From experimentation it appears that you can provide a tuple, but not a
>> list. Hence the following works just fine (and that the tuple can
>> contain a list):
>> strArr[0,0] = ([1.0, 1.1], 2.0)
>>
>> but the following fails:
>> strArr[0,0] = [[1.0, 1.1], 2.0]
>> with an error:
>> TypeError: expected a readable buffer object
>>
>> This is useful information if one is trying to initialize a structured
>> array from a collection of data, such as that returned from a database
>> query.
>>

I added a bit at the end here, though it is mentioned briefly above.
Feel free to expand. It's a wiki. You just need edit rights.

http://docs.scipy.org/numpy/docs/numpy.doc.structured_arrays/

> I'm wondering if that's not a bug? If it's intentional then it is
> certainly counterintuitive.
>

This comes up from time to time.

http://thread.gmane.org/gmane.comp.python.numeric.general/30793/focus=30793

Perhaps an enhancement ticket could be filed? It doesn't sound trivial
to implement.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] (no subject)

2011-03-02 Thread Skipper Seabold

On Wed, Mar 2, 2011 at 5:25 PM, Alex Ter-Sarkissov  wrote:
> hi, the question is probably very silly, but can't get my head around it
>
> Say I have an NxM numerical array. What I want is to obtain the row and
> column number of the smallest value(kinda like find command in Matlab). I
> use something like where(min(array_name)), but keep getting the error
> message. I'd be glad if any1 could have any suggestions in this regard.
>

In [1]: import numpy as np

In [2]: A = np.random.randn(10,10)

In [3]: np.where(A==np.min(A))
Out[3]: (array([5]), array([3]))

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Different behavior for astype('str') in numpy 1.5.1 vs 1.6.0?

2011-02-21 Thread Skipper Seabold

On Mon, Feb 21, 2011 at 3:49 PM, Pauli Virtanen  wrote:
> On Mon, 21 Feb 2011 15:09:05 -0500, Skipper Seabold wrote:
> [clip]
>> Should I file a ticket?
>
> Yes.

http://projects.scipy.org/numpy/ticket/1748

>
> [clip]
>> PS. Is there an incompatibility of numpy 1.5.1 and numpy 1.6.0 trunk for
>> packages that depend on numpy?
>
> It should be binary compatible.
>

I had to reinstall scipy, matplotlib, etc. when I changed from 1.6.0
back to 1.5.1.

It also looked like the behavior of numpy.testing was different (Ie.,
I was getting failures for my tests that pass under 1.5.1 at the
specified precision for assert_almost_equal), but this is a separate
issue that I'll have to explore later.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Different behavior for astype('str') in numpy 1.5.1 vs 1.6.0?

2011-02-21 Thread Skipper Seabold

I get a lot of errors and failures running our tests with most recent
numpy trunk. One in particular (so far) seems to be a bug.

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.6.0.dev-c50af53'

In [3]: tmp_arr = np.array([['black'],['white'],['other']])

In [4]: tmp_arr.astype('str')
Out[4]:
array([['b'],
   ['w'],
   ['o']],
  dtype='|S1')

The old behavior

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.5.1'

In [3]: tmp_arr = np.array([['black'],['white'],['other']])

In [4]: tmp_arr.astype('str')
Out[4]:
array([['black'],
   ['white'],
   ['other']],
  dtype='|S5')

Should I file a ticket?

Skipper

PS. Is there an incompatibility of numpy 1.5.1 and numpy 1.6.0 trunk
for packages that depend on numpy?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] bug in genfromtxt with missing values?

2011-01-25 Thread Skipper Seabold

On Tue, Jan 25, 2011 at 11:17 AM, Pierre GM  wrote:
>
> On Jan 24, 2011, at 11:47 PM, Skipper Seabold wrote:
>
>> Am I misreading the docs or missing something?  Consider the following
>> adapted from here:
>> http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html
>>
>> from StringIO import StringIO
>> import numpy as np
>>
>> data = "1, 2, 3\n4, ,5"
>>
>> np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c",
>> missing_values=" ", filling_values=0)
>> array([(1.0, 2.0, 3.0), (4.0, nan, 5.0)],
>>      dtype=[('a', '>
>> np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c",
>> missing_values={'b':" "}, filling_values={'b' : 0})
>> array([(1.0, 2.0, 3.0), (4.0, 0.0, 5.0)],
>>      dtype=[('a', '>
>> Unless I use the dict for missing_values, it doesn't fill them in.
>>
>
> It's probably a bug . Mind opening a ticket ? I'll try to care of it when I 
> can.
> Thx in advance
> P.

http://projects.scipy.org/numpy/ticket/1722

Forgot to use the code formatting, and it doesn't look like I can edit.

Thanks,

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] bug in genfromtxt with missing values?

2011-01-24 Thread Skipper Seabold

Am I misreading the docs or missing something?  Consider the following
adapted from here:
http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

from StringIO import StringIO
import numpy as np

data = "1, 2, 3\n4, ,5"

np.genfromtxt(StringIO(data), delimiter=",", names="a,b,c",
missing_values=" ", filling_values=0)
array([(1.0, 2.0, 3.0), (4.0, nan, 5.0)],
  dtype=[('a', 'http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Question regarding submitting patches

2011-01-05 Thread Skipper Seabold

On Tue, Jan 4, 2011 at 1:49 PM, Justin Peel  wrote:
> Hi all,
>
> I've been submitting some patches recently just by putting them on
> Trac. However, I noticed in the Numpy Developer Guide that it says:
>
>   The recommended way to proceed is either to attach these files to
> an enhancement ticket in the Numpy Trac and send a mail about the
> enhancement to the NumPy mailing list.
>
> This line is rather confusing. Either the 'either' should be removed
> or 'and'->'or'. In other words, is it sufficient to submit the patch
> in Trac or should I also be emailing the Numpy mailing list about each
> patch I submit?
>

I am not positive, but I think that having them on Trac ensures that
they're not lost and is sufficient.  An e-mail serves to draw some
attention (or not) for a review or speedier inclusion.

Should this recommendation in the docs be changed or amended with the
switch to git?

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Can I add rows and columns to recarray?

2010-12-05 Thread Skipper Seabold

On Sun, Dec 5, 2010 at 10:56 PM, Wai Yip Tung  wrote:
> I'm fairly new to numpy and I'm trying to figure out the right way to do
> things. Continuing on my question about using recarray as a relation. I
> have a recarray like this
>
>
> In [339]: arr = np.array([
>    .:     (1, 2.2, 0.0),
>    .:     (3, 4.5, 0.0)
>    .:     ],
>    .:     dtype=[
>    .:         ('unit',int),
>    .:         ('price',float),
>    .:         ('amount',float),
>    .:     ]
>    .: )
>
> In [340]: data = arr.view(recarray)
>
>
> One of the most common thing I want to do is to append rows to data.  I
> think concatenate() might be the method. But I get a problem:
>
>
> In [342]: np.concatenate((data0,[1,9.0,9.0]))
> ---
> TypeError                                 Traceback (most recent call last)
>
> c:\Python26\Lib\site-packages\numpy\ in ()
>
> TypeError: expected a readable buffer object
>
>
>
> The other thing I want to do is to calculate the column value. Right now
> it can do great thing like
>
>
>
> In [343]: data.amount = data.unit * data.price
>
>
>
> But sometimes it may require me to add a new column not already exist,
> e.g.:
>
>
> In [344]: data.discount_price = data.price * 0.9
>
>
> How can I add a new column? I tried column_stack. But it give a similar
> TypeError. I figure I need to first specify the type of the column. But I
> don't know how.
>

Check out numpy.lib.recfunctions

I often have

import numpy.lib.recfunctions as nprf

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Structured array? recarray? issue access by attribute name

2010-12-05 Thread Skipper Seabold

On Sun, Dec 5, 2010 at 10:44 PM, Wai Yip Tung  wrote:
> I'm trying to use numpy to manipulate CSV file. I'm looking for feature
> similar to relational database. So I come across a class recarray that
> seems to meet my need. And then I see other references of structured
> array. Are these just different name of the same feature?
>
> Also I encounter a problem trying to access an field by attribute name. I
> have
>
>
> In [303]: arr = np.array([
>    .:     (1, 2.2, 0.0),
>    .:     (3, 4.5, 0.0)
>    .:     ],
>    .:     dtype=[
>    .:         ('unit',int),
>    .:         ('price',float),
>    .:         ('amount',float),
>    .:     ]
>    .: )
>
> In [304]: data0 = arr.view(recarray)
>
> In [305]: data0.price[0]
> Out[305]: 2.2002
>

You don't have to take a view as a recarray if you don't want to.  You
lose attribute lookup but gain some speed.

In [14]: arr['price']
Out[14]: array([ 2.2,  4.5])

>
>
> It works fine when I get a price vector and pick the first element of it.
> But if instead I select the first row and try to access its price
> attribute, it wouldn't work
>

I'm not sure why this doesn't work.  It looks like taking a view of
the structured array as a recarray does not cast the structs to
records  Is this a bug?  Note that you can do

In [19]: arr[0]['price']
Out[19]: 2.2002

In [20]: data0[0]['price']
Out[20]: 2.2002

also slicing seems to work

In [27]: data0[0:1].price
Out[27]: array([ 2.2])

Skipper

>
>
> In [306]: data0[0].price
> ---
> AttributeError                            Traceback (most recent call last)
>
> c:\Python26\Lib\site-packages\numpy\ in ()
>
> AttributeError: 'numpy.void' object has no attribute 'price'
>
>
>
> Then I come across an alternative way to build a recarray. In that case
> both usage work fine.
>
>
>
> In [307]: data1 = np.rec.fromarrays(
>    .:     [[1,3],[2.2,4.5],[0.0,0.0]],
>    .:     names='unit,price,amount')
>
> In [309]: data1.price[0]
> Out[309]: 2.2002
>
> In [310]: data1[0].price
> Out[310]: 2.2002
>
>
> What's going on here?
>
>
> Wai Yip
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Warning: invalid value encountered in subtract

2010-11-30 Thread Skipper Seabold

On Tue, Nov 30, 2010 at 1:34 PM, Keith Goodman  wrote:
> After upgrading from numpy 1.4.1 to 1.5.1 I get warnings like
> "Warning: invalid value encountered in subtract" when I run unit tests
> (or timeit) using "python -c 'blah'" but not from an interactive
> session. How can I tell the warnings to go away?

If it's this type of floating point related stuff, you can use np.seterr

In [1]: import numpy as np

In [2]: np.log(1./np.array(0))
Warning: divide by zero encountered in divide
Out[2]: inf

In [3]: orig_settings = np.seterr()

In [4]: np.seterr(all="ignore")
Out[4]: {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignor
e'}

In [5]: np.log(1./np.array(0))
Out[5]: inf

In [6]: np.seterr(**orig_settings)
Out[6]: {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ig
nore'}

In [7]: np.log(1./np.array(0))
Warning: divide by zero encountered in divide
Out[7]: inf

I have been using the orig_settings so that I can take over the
control of this from the user and then set it back to how it was.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Catching and dealing with floating point errors

2010-11-08 Thread Skipper Seabold

On Mon, Nov 8, 2010 at 4:04 PM, Bruce Southey  wrote:
> On 11/08/2010 02:52 PM, Skipper Seabold wrote:
>> On Mon, Nov 8, 2010 at 3:42 PM, Bruce Southey  wrote:
>>> On 11/08/2010 02:17 PM, Skipper Seabold wrote:
>>>> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold    
>>>> wrote:
>>>>> I am doing some optimizations on random samples.  In a small number of
>>>>> cases, the objective is not well-defined for a given sample (it's not
>>>>> possible to tell beforehand and hopefully won't happen much in
>>>>> practice).  What is the most numpythonic way to handle this?  It
>>>>> doesn't look like I can use np.seterrcall in this case (without
>>>>> ignoring its actual intent).  Here's a toy example of the method I
>>>>> have come up with.
>>>>>
>>>>> import numpy as np
>>>>>
>>>>> def reset_seterr(d):
>>>>>      """
>>>>>      Helper function to reset FP error-handling to user's original 
>>>>> settings
>>>>>      """
>>>>>      for action in [i+'='+"'"+d[i]+"'" for i in d]:
>>>>>          exec(action)
>>>>>      np.seterr(over=over, divide=divide, invalid=invalid, under=under)
>>>>>
>>>> It just occurred to me that this is unsafe.  Better options for
>>>> resetting seterr?
>>>>
>>>>> def log_random_sample(X):
>>>>>      """
>>>>>      Toy example to catch a FP error, re-sample, and return objective
>>>>>      """
>>>>>      d = np.seterr() # get original values to reset
>>>>>      np.seterr('raise') # set to raise on fp error in order to catch
>>>>>      try:
>>>>>          ret = np.log(X)
>>>>>          reset_seterr(d)
>>>>>          return ret
>>>>>      except:
>>>>>          lb,ub = -1,1  # includes bad domain to test recursion
>>>>>          X = np.random.uniform(lb,ub)
>>>>>          reset_seterr(d)
>>>>>          return log_random_sample(X)
>>>>>
>>>>> lb,ub = 0,0
>>>>> orig_setting = np.seterr()
>>>>> X = np.random.uniform(lb,ub)
>>>>> log_random_sample(X)
>>>>> assert(orig_setting == np.seterr())
>>>>>
>>>>> This seems to work, but I'm not sure it's as transparent as it could
>>>>> be.  If it is, then maybe it will be useful to others.
>>>>>
>>>>> Skipper
>>>>>
>>>> ___
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> What do you mean by 'floating point error'?
>>> For example, log of zero is not what I would consider a 'floating point
>>> error'.
>>>
>>> In this case, if you are after a log distribution, then you should be
>>> ensuring that the lower bound to the np.random.uniform() is always
>>> greater than zero. That is, if lb<= zero then you *know* you have a
>>> problem at the very start.
>>>
>>>
>> Just a toy example to get a similar error.  I call x<= 0 on purpose here.
>>
>>
> I was aware of that.
>

Ah, ok.  I don't mean to be short, just busy.

> Messing about warnings is not what I consider Pythonic because you
> should be fixing the source of the problem. In this case, your sampling
> must be greater than zero. If you are sampling from a distribution, then
> that should be built into the call otherwise your samples will not be
> from the requested distribution.
>

Basically, it looks like a small sample issue with an estimator.  I'm
not sure about the theory yet (or the underlying numerical issue), but
I've confirmed that the solution also breaks down using several
different solvers with a constrained version of the primal in GAMS to
ensure that it's not just a domain error or numerical
underflow/overflow.  So at this point I just want to catch the warning
and resample.  Am going to explore the "bad" cases further at a later
time.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Catching and dealing with floating point errors

2010-11-08 Thread Skipper Seabold

On Mon, Nov 8, 2010 at 3:45 PM, Warren Weckesser
 wrote:
>
>
> On Mon, Nov 8, 2010 at 2:17 PM, Skipper Seabold  wrote:
>>
>> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold 
>> wrote:
>> > I am doing some optimizations on random samples.  In a small number of
>> > cases, the objective is not well-defined for a given sample (it's not
>> > possible to tell beforehand and hopefully won't happen much in
>> > practice).  What is the most numpythonic way to handle this?  It
>> > doesn't look like I can use np.seterrcall in this case (without
>> > ignoring its actual intent).  Here's a toy example of the method I
>> > have come up with.
>> >
>> > import numpy as np
>> >
>> > def reset_seterr(d):
>> >    """
>> >    Helper function to reset FP error-handling to user's original
>> > settings
>> >    """
>> >    for action in [i+'='+"'"+d[i]+"'" for i in d]:
>> >        exec(action)
>> >    np.seterr(over=over, divide=divide, invalid=invalid, under=under)
>> >
>>
>> It just occurred to me that this is unsafe.  Better options for
>> resetting seterr?
>
>
> Hey Skipper,
>
> I don't understand why you need your helper function.  Why not just pass the
> saved dictionary back to seterr()?  E.g.
>
> saved = np.seterr('raise')
> try:
>     # Do something dangerous...
>     result = whatever...
> except Exception:
>     # Handle the problems...
>     result = better result...
> np.seterr(**saved)
> return result
>

Ha.  I knew I was forgetting something.  Thanks.

>
> Warren
>
>
>
>>
>> > def log_random_sample(X):
>> >    """
>> >    Toy example to catch a FP error, re-sample, and return objective
>> >    """
>> >    d = np.seterr() # get original values to reset
>> >    np.seterr('raise') # set to raise on fp error in order to catch
>> >    try:
>> >        ret = np.log(X)
>> >        reset_seterr(d)
>> >        return ret
>> >    except:
>> >        lb,ub = -1,1  # includes bad domain to test recursion
>> >        X = np.random.uniform(lb,ub)
>> >        reset_seterr(d)
>> >        return log_random_sample(X)
>> >
>> > lb,ub = 0,0
>> > orig_setting = np.seterr()
>> > X = np.random.uniform(lb,ub)
>> > log_random_sample(X)
>> > assert(orig_setting == np.seterr())
>> >
>> > This seems to work, but I'm not sure it's as transparent as it could
>> > be.  If it is, then maybe it will be useful to others.
>> >
>> > Skipper
>> >
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Catching and dealing with floating point errors

2010-11-08 Thread Skipper Seabold

On Mon, Nov 8, 2010 at 3:42 PM, Bruce Southey  wrote:
> On 11/08/2010 02:17 PM, Skipper Seabold wrote:
>> On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold  wrote:
>>> I am doing some optimizations on random samples.  In a small number of
>>> cases, the objective is not well-defined for a given sample (it's not
>>> possible to tell beforehand and hopefully won't happen much in
>>> practice).  What is the most numpythonic way to handle this?  It
>>> doesn't look like I can use np.seterrcall in this case (without
>>> ignoring its actual intent).  Here's a toy example of the method I
>>> have come up with.
>>>
>>> import numpy as np
>>>
>>> def reset_seterr(d):
>>>     """
>>>     Helper function to reset FP error-handling to user's original settings
>>>     """
>>>     for action in [i+'='+"'"+d[i]+"'" for i in d]:
>>>         exec(action)
>>>     np.seterr(over=over, divide=divide, invalid=invalid, under=under)
>>>
>> It just occurred to me that this is unsafe.  Better options for
>> resetting seterr?
>>
>>> def log_random_sample(X):
>>>     """
>>>     Toy example to catch a FP error, re-sample, and return objective
>>>     """
>>>     d = np.seterr() # get original values to reset
>>>     np.seterr('raise') # set to raise on fp error in order to catch
>>>     try:
>>>         ret = np.log(X)
>>>         reset_seterr(d)
>>>         return ret
>>>     except:
>>>         lb,ub = -1,1  # includes bad domain to test recursion
>>>         X = np.random.uniform(lb,ub)
>>>         reset_seterr(d)
>>>         return log_random_sample(X)
>>>
>>> lb,ub = 0,0
>>> orig_setting = np.seterr()
>>> X = np.random.uniform(lb,ub)
>>> log_random_sample(X)
>>> assert(orig_setting == np.seterr())
>>>
>>> This seems to work, but I'm not sure it's as transparent as it could
>>> be.  If it is, then maybe it will be useful to others.
>>>
>>> Skipper
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> What do you mean by 'floating point error'?
> For example, log of zero is not what I would consider a 'floating point
> error'.
>
> In this case, if you are after a log distribution, then you should be
> ensuring that the lower bound to the np.random.uniform() is always
> greater than zero. That is, if lb <= zero then you *know* you have a
> problem at the very start.
>
>

Just a toy example to get a similar error.  I call x <= 0 on purpose here.

> Bruce
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Catching and dealing with floating point errors

2010-11-08 Thread Skipper Seabold

On Mon, Nov 8, 2010 at 3:14 PM, Skipper Seabold  wrote:
> I am doing some optimizations on random samples.  In a small number of
> cases, the objective is not well-defined for a given sample (it's not
> possible to tell beforehand and hopefully won't happen much in
> practice).  What is the most numpythonic way to handle this?  It
> doesn't look like I can use np.seterrcall in this case (without
> ignoring its actual intent).  Here's a toy example of the method I
> have come up with.
>
> import numpy as np
>
> def reset_seterr(d):
>    """
>    Helper function to reset FP error-handling to user's original settings
>    """
>    for action in [i+'='+"'"+d[i]+"'" for i in d]:
>        exec(action)
>    np.seterr(over=over, divide=divide, invalid=invalid, under=under)
>

It just occurred to me that this is unsafe.  Better options for
resetting seterr?

> def log_random_sample(X):
>    """
>    Toy example to catch a FP error, re-sample, and return objective
>    """
>    d = np.seterr() # get original values to reset
>    np.seterr('raise') # set to raise on fp error in order to catch
>    try:
>        ret = np.log(X)
>        reset_seterr(d)
>        return ret
>    except:
>        lb,ub = -1,1  # includes bad domain to test recursion
>        X = np.random.uniform(lb,ub)
>        reset_seterr(d)
>        return log_random_sample(X)
>
> lb,ub = 0,0
> orig_setting = np.seterr()
> X = np.random.uniform(lb,ub)
> log_random_sample(X)
> assert(orig_setting == np.seterr())
>
> This seems to work, but I'm not sure it's as transparent as it could
> be.  If it is, then maybe it will be useful to others.
>
> Skipper
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Catching and dealing with floating point errors

2010-11-08 Thread Skipper Seabold

I am doing some optimizations on random samples.  In a small number of
cases, the objective is not well-defined for a given sample (it's not
possible to tell beforehand and hopefully won't happen much in
practice).  What is the most numpythonic way to handle this?  It
doesn't look like I can use np.seterrcall in this case (without
ignoring its actual intent).  Here's a toy example of the method I
have come up with.

import numpy as np

def reset_seterr(d):
"""
Helper function to reset FP error-handling to user's original settings
"""
for action in [i+'='+"'"+d[i]+"'" for i in d]:
exec(action)
np.seterr(over=over, divide=divide, invalid=invalid, under=under)

def log_random_sample(X):
"""
Toy example to catch a FP error, re-sample, and return objective
"""
d = np.seterr() # get original values to reset
np.seterr('raise') # set to raise on fp error in order to catch
try:
ret = np.log(X)
reset_seterr(d)
return ret
except:
lb,ub = -1,1  # includes bad domain to test recursion
X = np.random.uniform(lb,ub)
reset_seterr(d)
return log_random_sample(X)

lb,ub = 0,0
orig_setting = np.seterr()
X = np.random.uniform(lb,ub)
log_random_sample(X)
assert(orig_setting == np.seterr())

This seems to work, but I'm not sure it's as transparent as it could
be.  If it is, then maybe it will be useful to others.

Skipper
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

1 2 3 >

1 - 100 of 237 matches

Mail list logo