Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 11:30 AM,   wrote:
> On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser
>  wrote:
>>
>>
>> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig 
>> wrote:
>>>
>>> Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :
>>>
 I don't see any way to fix multivariate_normal for this case, except
 for dropping svd or for random perturbing a covariance matrix with
 multiplicity of singular values.
>>>
>>> Hi,
>>> I just made a quick search in what R guys are doing. It happens there are
>>> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For
>>> instance, mvtnorm
>>> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached
>>> the related function from the source code of this package.
>>>
>>> Interestingly enough, it seems they provide 3 different methods (svd,
>>> eigen values, and Cholesky).
>>> I don't have the time now to dive in the assessments of pros and cons of
>>> those three. Maybe one works for our problem, but I didn't check yet.
>>>
>>> Pierre
>>>
>>
>>
>> For some alternatives to numpy's multivariate_normal, see
>> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.  Both versions
>> (Cholesky and eigh) are just a couple lines of code.
>
> Thanks both,
>
> The main point is that it is a "Needs decision"
>
> Robert argued several times on the mailing list why he chose svd.
> (with svd covariance can be closer to singular then with cholesky)
>
> In statsmodels we usually just use Cholesky for similar
> transformation, and I use occasionally an eigh version. (I need to
> look up the thread but I got puzzled about results with eig and
> multiplicity of eigenvalues before.)
>
> The R code is GPL, but the few lines of code look standard without any
> special provision for non-deterministic linear algebra.
>
> If multivariate_normal switches from svd to cholesky or eigh, we still
> need to check that we don't run into similar "determinacy" problems
> with numpy's linalg (I think in statsmodels we use mostly scipy, so I
> don't know.)

np.linalg.eigh always produces the same eigenvectors, both running
repeatedly in the same session and running the script several times on
the command line.

so eigh looks good as alternative to svd for this case, I don't know
if we buy numerical problems in other corner cases, but for near
singularity it's always possible to check the smallest eigenvalue

Josef

>
> Josef
>
>>
>> Warren
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser
 wrote:
>
>
> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig 
> wrote:
>>
>> Le 16/02/2012 16:20, josef.p...@gmail.com a écrit :
>>
>>> I don't see any way to fix multivariate_normal for this case, except
>>> for dropping svd or for random perturbing a covariance matrix with
>>> multiplicity of singular values.
>>
>> Hi,
>> I just made a quick search in what R guys are doing. It happens there are
>> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For
>> instance, mvtnorm
>> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached
>> the related function from the source code of this package.
>>
>> Interestingly enough, it seems they provide 3 different methods (svd,
>> eigen values, and Cholesky).
>> I don't have the time now to dive in the assessments of pros and cons of
>> those three. Maybe one works for our problem, but I didn't check yet.
>>
>> Pierre
>>
>
>
> For some alternatives to numpy's multivariate_normal, see
> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.  Both versions
> (Cholesky and eigh) are just a couple lines of code.

Thanks both,

The main point is that it is a "Needs decision"

Robert argued several times on the mailing list why he chose svd.
(with svd covariance can be closer to singular then with cholesky)

In statsmodels we usually just use Cholesky for similar
transformation, and I use occasionally an eigh version. (I need to
look up the thread but I got puzzled about results with eig and
multiplicity of eigenvalues before.)

The R code is GPL, but the few lines of code look standard without any
special provision for non-deterministic linear algebra.

If multivariate_normal switches from svd to cholesky or eigh, we still
need to check that we don't run into similar "determinacy" problems
with numpy's linalg (I think in statsmodels we use mostly scipy, so I
don't know.)

Josef

>
> Warren
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 9:08 AM, Pauli Virtanen  wrote:
> 16.02.2012 14:54, josef.p...@gmail.com kirjoitti:
> [clip]
>> If I interpret you correctly, this should be a svd ticket, or an svd
>> ticket as "duplicate" ?
>
> I think it should be a multivariate normal ticket.
>
> "Fixing" SVD is in my opinion not sensible: its only guarantee is that A
> = U S V^H down to numerical precision and S are sorted. If the algorithm
> assumes something extra, it is wrong. This sort of reproducibility
> issues affect potentially all code (depends on the compiler and
> libraries used), and trying to combat it at the linalg level is IMHO not
> our business --- if someone really wants it, they should tell their C
> compiler and all libraries to use a reproducible FP model.

I agree, I added the comments to the ticket.

>
> However, we should ensure the algorithms we provide are stable against
> rounding error. In this case, the random number generation is not, so it
> should be fixed.

storing the last column of v

vli = []
for i in range(10):
(u,s,v) = svd(R)
print('v[:,-1]')
print(v[:,-4:])
vli.append(v[:, -1])

>>> np.unique([tuple(vv.tolist()) for vv in vli])
array([[-0.31622777, -0.11785113,  0.08706383,  0.42953906,  0.75736963,
-0.31048693, -0.01693654,  0.10328164, -0.04417299, -0.10540926],
   [-0.31622777, -0.03661979,  0.61237244, -0.15302481,  0.0664198 ,
 0.11341968,  0.38265194,  0.51112292, -0.10540926,  0.25335061]])


The different v are not just a reordering of each other.

If my linear algebra is correct, then the algorithm provides different
basis vectors for the subspace with identical singular values.

I don't see any way to fix multivariate_normal for this case, except
for dropping svd or for random perturbing a covariance matrix with
multiplicity of singular values.

Josef

>
>        Pauli
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 8:45 AM, Pauli Virtanen  wrote:
> 16.02.2012 14:14, josef.p...@gmail.com kirjoitti:
> [clip]
>> We had other cases of several patterns in quasi-deterministic linalg
>> before, but as far as I remember only in the final digits of
>> precision, where it didn't matter much except for reducing test
>> precision in my cases.
>>
>> In the random multivariate normal case in the ticket the differences
>> are large, which makes them pretty unreliable and useless for
>> reproducability.
>
> Now that I read your mail more carefully, the following piece of code
> indeed does not give reproducible results on Linux with ATLAS either:
>
> 
> import numpy as np
> from numpy.linalg import svd
>
> d = 10
> alpha = 1 / d**0.5
> mu = np.ones(d)
> R = alpha * np.ones((d, d)) + (1 - alpha) * np.eye(d)
>
> for i in range(10):
>    u, s, vH = svd(R)
>    print vH[-1,1], abs(u.dot(np.diag(s)).dot(vH)-R).max()
> print s
> ---
>
> Of course, the returned SVD decomposition *is* correct in all cases.
>
> The reason seems to be that the matrix has 9 coinciding singular values,
> and the (alignment-dependent) rounding error is sufficient to perturb
> the choice (or order?) of singular vectors.
>
> So, the algorithm used to generate multivariate normal random numbers is
> then actually numerically unstable, as it relies on the order of
> singular vectors returned by SVD.
>
> I'm not sure how to fix this. Maybe the vectors returned by SVD should
> be sorted if there are numerically close singular values. Just ensuring
> alignment of the input probably won't guarantee reproducibility across
> platforms.
>
> Please file a bug ticket, so this doesn't get forgotten...

the multivariate normal case is already
http://projects.scipy.org/numpy/ticket/1842
I can add the diagnosis.

If I interpret you correctly, this should be a svd ticket, or an svd
ticket as "duplicate" ?

Thanks,

Josef


>
>        Pauli
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 8:14 AM,   wrote:
> On Thu, Feb 16, 2012 at 4:44 AM, Pauli Virtanen  wrote:
>> Hi,
>>
>> 16.02.2012 06:09, josef.p...@gmail.com kirjoitti:
>> [clip]
>>> numpy linalg.svd doesn't produce always the same results
>>>
>>> running this gives two different answers,
>>> using scipy.linalg.svd I always get the same answer, which is one of
>>> the numpy answers
>>> (numpy random.multivariate_normal is collateral damage)
>>
>> Are you using a Windows binary for Numpy compiled with the Intel
>> compilers, or maybe linked with Intel MKL?
>
> This was with the official numpy installer, compiled with MingW
>
> I just tried with 64 bit python 3.2 with MKL (Gohlke installer) and in
> several runs I always get the same answer.
>
>>
>> If yes, one possibility is that the exact sequence of floating point
>> operations in SVD or some other step in the calculation depends on the
>> data alignment, which can affect rounding error.
>>
>> See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf
>>
>> That would explain why the pattern you see is quasi-deterministic. The
>> other explanation would be using uninitialized memory at some point, but
>> that seems quite unlikely.
>
> Running the script on the commandline I always get several patterns,
> but running the script in the same process didn't converge to a unique
> pattern.
>
> We had other cases of several patterns in quasi-deterministic linalg
> before, but as far as I remember only in the final digits of
> precision, where it didn't matter much except for reducing test
> precision in my cases.
>
> In the random multivariate normal case in the ticket the differences
> are large, which makes them pretty unreliable and useless for
> reproducability.

linalg question

Is there anything special, or are there specific numerical problems
with an svd when most singular values are the same ?

The example has all random variables equal correlated

singular values
>>> s
array([ 3.84604989,  0.68377223,  0.68377223,  0.68377223,  0.68377223,
0.68377223,  0.68377223,  0.68377223,  0.68377223,  0.68377223])

Josef

>
> Josef
>
>>
>> --
>> Pauli Virtanen
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-16 Thread josef . pktd
On Thu, Feb 16, 2012 at 4:44 AM, Pauli Virtanen  wrote:
> Hi,
>
> 16.02.2012 06:09, josef.p...@gmail.com kirjoitti:
> [clip]
>> numpy linalg.svd doesn't produce always the same results
>>
>> running this gives two different answers,
>> using scipy.linalg.svd I always get the same answer, which is one of
>> the numpy answers
>> (numpy random.multivariate_normal is collateral damage)
>
> Are you using a Windows binary for Numpy compiled with the Intel
> compilers, or maybe linked with Intel MKL?

This was with the official numpy installer, compiled with MingW

I just tried with 64 bit python 3.2 with MKL (Gohlke installer) and in
several runs I always get the same answer.

>
> If yes, one possibility is that the exact sequence of floating point
> operations in SVD or some other step in the calculation depends on the
> data alignment, which can affect rounding error.
>
> See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf
>
> That would explain why the pattern you see is quasi-deterministic. The
> other explanation would be using uninitialized memory at some point, but
> that seems quite unlikely.

Running the script on the commandline I always get several patterns,
but running the script in the same process didn't converge to a unique
pattern.

We had other cases of several patterns in quasi-deterministic linalg
before, but as far as I remember only in the final digits of
precision, where it didn't matter much except for reducing test
precision in my cases.

In the random multivariate normal case in the ticket the differences
are large, which makes them pretty unreliable and useless for
reproducability.

Josef

>
> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-15 Thread josef . pktd
On Wed, Feb 15, 2012 at 10:52 PM,   wrote:
> Doing a bit of browsing in the numpy tracker, I found this. From my
> search this was not discussed on the mailing list.
>
> http://projects.scipy.org/numpy/ticket/1842
>
> The multivariate normal random sample is not always the same, even
> though a seed is specified.
>
> It seems to alternate randomly between two patterns, the sum of the
> random numbers is always the same.
>
> Is there a randomness in the svd?  Given the constant sum, I guess the
> underlying random numbers are the same.
>
> (windows 7, python 32bit on 64 machine, numpy 1.5.1)
>
> import numpy
>
> d = 10
> alpha = 1 / d**0.5
> mu = numpy.ones(d)
> R = alpha * numpy.ones((d, d)) + (1 - alpha) * numpy.eye(d)
> rs = numpy.random.RandomState(587482)
> rv1 = rs.multivariate_normal(mu, R, 1)
> print rv1, rv1.sum()
>
> numpy.random.seed(587482)
> rv2 = numpy.random.multivariate_normal(mu, R, 1)
> print rv2[0][0], rv2.sum()
>
>
> running it a few times
>

> [[ 0.43028555  1.06584226 -0.03496681 -0.31372591 -0.49716804 -1.50641838
>   0.99209124  0.57236839 -0.32107663  0.5865379 ]] 0.973769580498
> 0.0979690286727 0.973769580498

> [[ 0.09796903  1.41010513 -1.10250773  0.71321445  0.09903517  0.36432555
>  -1.27590062  0.04533834  0.37426153  0.24792873]] 0.973769580498
> 0.430285553017 0.973769580498

> [[ 0.09796903  1.41010513 -1.10250773  0.71321445  0.09903517  0.36432555
>  -1.27590062  0.04533834  0.37426153  0.24792873]] 0.973769580498
> 0.430285553017 0.973769580498

> [[ 0.43028555  1.06584226 -0.03496681 -0.31372591 -0.49716804 -1.50641838
>   0.99209124  0.57236839 -0.32107663  0.5865379 ]] 0.973769580498
> 0.0979690286727 0.973769580498
>
> Josef


numpy linalg.svd doesn't produce always the same results

running this gives two different answers,
using scipy.linalg.svd I always get the same answer, which is one of
the numpy answers
(numpy random.multivariate_normal is collateral damage)

What I don't understand is that numpy.random uses numpy.dual.svd which
I thought is scipy.linalg if available, but it looks like it takes the
numpy svd.


import numpy as np
#from numpy.dual import svd
from numpy.linalg import svd
#from scipy.linalg import svd

d = 10
alpha = 1 / d**0.5
mu = numpy.ones(d)
R = alpha * numpy.ones((d, d)) + (1 - alpha) * numpy.eye(d)

for i in range(10):
(u,s,v) = svd(R)
print 'v[-1]', v[-1]
---

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842

2012-02-15 Thread josef . pktd
Doing a bit of browsing in the numpy tracker, I found this. From my
search this was not discussed on the mailing list.

http://projects.scipy.org/numpy/ticket/1842

The multivariate normal random sample is not always the same, even
though a seed is specified.

It seems to alternate randomly between two patterns, the sum of the
random numbers is always the same.

Is there a randomness in the svd?  Given the constant sum, I guess the
underlying random numbers are the same.

(windows 7, python 32bit on 64 machine, numpy 1.5.1)

import numpy

d = 10
alpha = 1 / d**0.5
mu = numpy.ones(d)
R = alpha * numpy.ones((d, d)) + (1 - alpha) * numpy.eye(d)
rs = numpy.random.RandomState(587482)
rv1 = rs.multivariate_normal(mu, R, 1)
print rv1, rv1.sum()

numpy.random.seed(587482)
rv2 = numpy.random.multivariate_normal(mu, R, 1)
print rv2[0][0], rv2.sum()


running it a few times

>>>
[[ 0.43028555  1.06584226 -0.03496681 -0.31372591 -0.49716804 -1.50641838
   0.99209124  0.57236839 -0.32107663  0.5865379 ]] 0.973769580498
0.0979690286727 0.973769580498
>>>
[[ 0.09796903  1.41010513 -1.10250773  0.71321445  0.09903517  0.36432555
  -1.27590062  0.04533834  0.37426153  0.24792873]] 0.973769580498
0.430285553017 0.973769580498
>>>
[[ 0.09796903  1.41010513 -1.10250773  0.71321445  0.09903517  0.36432555
  -1.27590062  0.04533834  0.37426153  0.24792873]] 0.973769580498
0.430285553017 0.973769580498
>>>
[[ 0.43028555  1.06584226 -0.03496681 -0.31372591 -0.49716804 -1.50641838
   0.99209124  0.57236839 -0.32107663  0.5865379 ]] 0.973769580498
0.0979690286727 0.973769580498

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-15 Thread josef . pktd
On Wed, Feb 15, 2012 at 9:12 PM, Matthew Brett  wrote:
> Hi,
>
> On Wed, Feb 15, 2012 at 6:07 PM,   wrote:
>> On Wed, Feb 15, 2012 at 8:49 PM, Matthew Brett  
>> wrote:
>>> Hi,
>>>
>>> On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn
>>>  wrote:
 On 02/15/2012 02:24 PM, Mark Wiebe wrote:
>>>
> There certainly is governance now, it's just informal. It's a
> combination of how the design discussions are carried out, how pull
> requests occur, and who has commit rights.

 +1

 If non-contributing users came along on the Cython list demanding that
 we set up a system to select non-developers along on a board that would
 have discussions in order to veto pull requests, I don't know whether
 we'd ignore it or ridicule it or try to show some patience, but we
 certainly wouldn't take it seriously.
>>>
>>> In the spirit (as I read) of Dag's post, maybe we should accept that
>>> this thread is not going anywhere much, and summarize:
>>>
>>> The current situation is the following:
>>>
>>> Travis is de-facto BDFL for Numpy
>>> Disputes get resolved by convening an ad-hoc group of interested and /
>>> or active developers to resolve or vote, maybe off-list.  How this
>>> happens is for Travis to call.
>>>
>>> I think that's reasonable?
>>>
>>> As far as I can make out, in favor of the current status quo with no
>>> significant modification are:
>>>
>>> Travis (is that right)?
>>> Mark
>>> Peter
>>> Bryan vdv
>>> Perry
>>> Dag
>>>
>>> In favor of some sort of formalization of governance to be decided are:
>>>
>>> Me
>>> Ben R (did I get that right?)
>>> Bruce Southey
>>> Souheil Inati
>>> TJ
>>> Joe H
>>>
>>> I am not quite sure which side of that fence are:
>>>
>>> Josef
>>
>> Actually in the sense of separation of powers, I would vote for Chuck
>> as president, Travis as prime minister and an independent release
>> manager as supreme court, and the noisy mailing list community as
>> parliament.
>
> That sounds dangerously Canadian ...

Or Austrian or German

>
> But actually - I was hoping for an answer to whether you felt there
> was a need for a more formal governance structure, or not.

I thought a president, a prime minister and a parliament makes for a
formal government structure. :) maybe more personalized in the
American tradition.

I'm in favor of a more formal governance structure, however the only
real enforcement I see is in the reputation and goodwill, if all keys
are in one hand. I think spelling out both governance and guidelines
for development and testing make it easier to make it clear what we
can expect and so that we know when we should be upset (a bit of
repeated game enforcement since I'm an economist).
I have no idea how formal governance structures work in open source.

Actually, I liked the recent situation with a visionary 2.0 sometimes
in the future, while Chuck and Ralf kept putting out 1.x releases with
careful control of going forward and not breaking anything (I'm not
sure how to phrase this), with, of course, Mark doing large parts of
the heavy work.

Josef

>
>> (I don't see a constitution yet.)
>
> My feeling is there is not enough appetite for any change for that to
> be worth thinking about, but I might be wrong.
>
> See you,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-15 Thread josef . pktd
On Wed, Feb 15, 2012 at 8:49 PM, Matthew Brett  wrote:
> Hi,
>
> On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn
>  wrote:
>> On 02/15/2012 02:24 PM, Mark Wiebe wrote:
>
>>> There certainly is governance now, it's just informal. It's a
>>> combination of how the design discussions are carried out, how pull
>>> requests occur, and who has commit rights.
>>
>> +1
>>
>> If non-contributing users came along on the Cython list demanding that
>> we set up a system to select non-developers along on a board that would
>> have discussions in order to veto pull requests, I don't know whether
>> we'd ignore it or ridicule it or try to show some patience, but we
>> certainly wouldn't take it seriously.
>
> In the spirit (as I read) of Dag's post, maybe we should accept that
> this thread is not going anywhere much, and summarize:
>
> The current situation is the following:
>
> Travis is de-facto BDFL for Numpy
> Disputes get resolved by convening an ad-hoc group of interested and /
> or active developers to resolve or vote, maybe off-list.  How this
> happens is for Travis to call.
>
> I think that's reasonable?
>
> As far as I can make out, in favor of the current status quo with no
> significant modification are:
>
> Travis (is that right)?
> Mark
> Peter
> Bryan vdv
> Perry
> Dag
>
> In favor of some sort of formalization of governance to be decided are:
>
> Me
> Ben R (did I get that right?)
> Bruce Southey
> Souheil Inati
> TJ
> Joe H
>
> I am not quite sure which side of that fence are:
>
> Josef

Actually in the sense of separation of powers, I would vote for Chuck
as president, Travis as prime minister and an independent release
manager as supreme court, and the noisy mailing list community as
parliament.

(I don't see a constitution yet.)

Josef

> Alan
> Chuck
>
> If I missed someone who gave an opinion - sorry - please do speak up.
>
> I think it's clear that if - you, Travis, don't want to go this
> direction, there isn't much chance of anything happening, and I think
> those of us who think something needs doing will have to keep quiet,
> as Dag suggests.
>
> I would only suggest that you (Travis) specify that you will take the
> BDFL role so that we can be clear about the informal governance at
> least.
>
> Best,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy governance update

2012-02-15 Thread josef . pktd
On Wed, Feb 15, 2012 at 7:27 PM, Dag Sverre Seljebotn
 wrote:
> On 02/15/2012 02:24 PM, Mark Wiebe wrote:
>> On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett > > wrote:
>>
>>     Hi,
>>
>>     On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe >     > wrote:
>>      > On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett
>>     mailto:matthew.br...@gmail.com>>
>>      > wrote:
>>      >>
>>      >> Hi,
>>      >>
>>      >> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root >     > wrote:
>>      >> >
>>      >> >
>>      >> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac
>>     mailto:alan.is...@gmail.com>>
>>      >> > wrote:
>>      >> >> Can you provide an example where a more formal
>>      >> >> governance structure for NumPy would have meant
>>      >> >> more or better code development? (Please do not
>>      >> >> suggest the NA discussion!)
>>      >> >>
>>      >> >
>>      >> > Why not the NA discussion?  Would we really want to have that
>>     happen
>>      >> > again?
>>      >> > Note that it still isn't fully resolved and progress still
>>     needs to be
>>      >> > made
>>      >> > (I think the last thread did an excellent job of fleshing out
>>     the ideas,
>>      >> > but
>>      >> > it became too much to digest.  We may need to have someone go
>>     through
>>      >> > the
>>      >> > information, reduce it down and make one last push to bring it
>>     to a
>>      >> > conclusion).  The NA discussion is the perfect example where a
>>      >> > governance
>>      >> > structure would help resolve disputes.
>>      >>
>>      >> Yes, that was the most obvious example. I don't know about you,
>>     but I
>>      >> can't see any sign of that one being resolved.
>>      >>
>>      >> The other obvious example was the dispute about ABI breakage for
>>     numpy
>>      >> 1.5.0 where I believe Travis did invoke some sort of committee to
>>      >> vote, but (Travis can correct me if I'm wrong), the committee was
>>      >> named ad-hoc and contacted off-list.
>>      >>
>>      >> >
>>      >> >>
>>      >> >> Can you provide an example of what you might
>>      >> >> envision as a "more formal governance structure"?
>>      >> >> (I assume that any such structure will not put people
>>      >> >> who are not core contributors to NumPy in a position
>>      >> >> to tell core contributors what to spend their time on.)
>>      >> >>
>>      >> >> Early last December, Chuck Harris estimated that three
>>      >> >> people were active NumPy developers.  I liked the idea of
>>      >> >> creating a "board" of these 3 and a rule that says any
>>      >> >> active developer can request to join the board, that
>>      >> >> additions are determined by majority vote of the existing
>>      >> >> board, and  that having the board both small and odd
>>      >> >> numbered is a priority.  I also suggested inviting to this
>>      >> >> board a developer or two from important projects that are
>>      >> >> very NumPy dependent (e.g., Matplotlib).
>>      >> >>
>>      >> >> I still like this idea.  Would it fully satisfy you?
>>      >> >>
>>      >> >
>>      >> > I actually like that idea.  Matthew, is this along the lines
>>     of what you
>>      >> > were thinking?
>>      >>
>>      >> Honestly it would make me very happy if the discussion moved to what
>>      >> form the governance should take.  I would have thought that 3
>>     was too
>>      >> small a number.
>>      >
>>      >
>>      > One thing to note about this point is that during the NA
>>     discussion, the
>>      > only people doing active C-level development were Charles and me.
>>     I suspect
>>      > a discussion about how to recruit more people into that group
>>     might be more
>>      > important than governance at this point in time.
>>
>>     Mark - a) thanks for replying, it's good to hear your voice and b) I
>>     don't think there's any competition between the discussion about
>>     governance and the need to recruit more people into the group who
>>     understand the C code.
>>
>>
>> There hasn't really been any discussion about recruiting developers to
>> compete with the governance topic, now we can let the topics compete. :)
>>
>> Some of the mechanisms which will help are already being set in motion
>> through the discussion about better infrastructure support like bug
>> trackers and continuous integration. The forthcoming roadmap discussion
>> Travis alluded to, where we will propose a roadmap for review by the
>> numpy user community, will include many more such points.
>>
>>     Remember we are deciding here between governance - of a form to be
>>     decided - and no governance - which I think is the current situation.
>>     I know your desire is to see more people contributing to the C code.
>>     It would help a lot if you could say what you think the barriers are,
>>     how they could be lowered, and the risks tha

Re: [Numpy-discussion] Numpy governance update

2012-02-15 Thread josef . pktd
On Wed, Feb 15, 2012 at 4:43 PM, Matthew Brett  wrote:
> Hi,
>
> On Wed, Feb 15, 2012 at 12:45 PM, Alan G Isaac  wrote:
>> My analysis is fundamentally different than Matthew
>> and Benjamin's for a few reasons.
>>
>> 1. The problem has been miscast.
>>    The "economic interests" of the developers *always*
>>    has had an apparent conflict with the economic
>>    interests of the users: users want developers to work more
>>    on the code, and developers need to make a living, which
>>    often involves spending their time on other things.
>>    On this score, nothing has really changed.
>> 2. It seems pretty clear that Matthew wants some governance
>>    power to be held by individuals who are not actively
>>    developing NumPy.  As Chuck Harris pointed out long ago,
>>    that dog ain't going to hunt.
>> 3. Constitutions can be broken (and are, all the time).
>>    Designing a stable institution requires making it in
>>    the interests of the members to participate.
>>
>> Any formal governance structure that can be desirable
>> for the NumPy community as a whole has to be desirable
>> for the core developers.  The right way to produce a
>> governance structure is to make concrete proposals and
>> show how these proposals are in the interest of the
>> *developers* (as well as of the users).
>>
>> For example, Benjamin obliquely suggested that with an
>> appropriate governance board, the NA discussion could
>> have simply been shut down by having the developers
>> vote (as part of their governance).  This might be in
>> the interest of the developers and of the community
>> (I'm not sure), but I doubt it is what Matthew has in mind.
>> In any case, until proposals are put on the table along
>> with a clear effort to illustrate why it is in the interest
>> of the *developers* to adopt the proposals, I really do not
>> see this discussion moving forward.
>
> That's helpful, it would be good to discuss concrete proposals.
> Would you care to flesh out your proposal in more detail or is it as
> you quoted it before?
>
> Where do you stand on the desirability of consensus?
>
> Do you have any suggestions on how to ensure that the non-Continuum
> community has sufficient weight in decision making?

I'm going to miss Ralf as release manager, since in terms of
governance he had the last control over what's actually in the
released versions of numpy (and scipy).

(I think the ABI breakage in 1.4 not 1.5 was pretty painful for a long time.)

Josef

>
> Best,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] cumsum much slower than simple loop?

2012-02-09 Thread josef . pktd
On Thu, Feb 9, 2012 at 11:39 PM, Dave Cook  wrote:
> Why is numpy.cumsum (along axis=0) so much slower than a simple loop?  The
> same goes for numpy.add.accumulate
>
> # cumsumtest.py
> import numpy as np
>
> def loopcumsum(a):
>     csum = np.empty_like(a)
>     s = 0.0
>     for i in range(len(a)):
>     csum[i] = s = s + a[i]
>     return csum
>
> npcumsum = lambda a: np.cumsum(a, axis=0)
>
> addaccum = lambda a: np.add.accumulate(a)
>
> shape = (100, 8, 512)
> a = np.arange(np.prod(shape), dtype='f').reshape(shape)
> # check that we get the same results
> print (npcumsum(a)==loopcumsum(a)).all()
> print (addaccum(a)==loopcumsum(a)).all()
>
> ipython session:
>
> In [1]: from cumsumtest import *
> True
> True
>
> In [2]: timeit npcumsum(a)
> 100 loops, best of 3: 14.7 ms per loop
>
> In [3]: timeit addaccum(a)
> 100 loops, best of 3: 15.4 ms per loop
>
> In [4]: timeit loopcumsum(a)
> 100 loops, best of 3: 2.16 ms per loop

strange (if I didn't make a mistake)

In [12]: timeit a.cumsum(0)
100 loops, best of 3: 7.17 ms per loop

In [13]: timeit a.T.cumsum(-1).T
1000 loops, best of 3: 1.78 ms per loop

In [14]: (a.T.cumsum(-1).T == a.cumsum(0)).all()
Out[14]: True

Josef

>
> Dave Cook
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Logical indexing and higher-dimensional arrays.

2012-02-08 Thread josef . pktd
On Wed, Feb 8, 2012 at 10:29 AM, Sturla Molden  wrote:
> On 08.02.2012 15:49, Travis Oliphant wrote:
>
>> This sort of thing would take time, but is not out of the question in my 
>> mind because I suspect the number of users and use-cases of "broadcasted" 
>> fancy-indexing is small.

I think I use it quite a bit, and I like that the broadcasting in
indexing is as flexible as the broadcasting of numpy arrays
themselves.

x[np.arange(len(x)), np.arange(len(x))]  gives the diagonal for example.

or picking a different element from each column, (I don't remember
where I used that)

It is surprising at first and takes some getting used to, but I think
it's pretty nice.  On the other hand, I always avoid mixing slices and
indexes because of "strange" results.

Josef


>
> In Matlab this (misfeature?) is generally used to compensate for the
> lack of broadbasting. So many Matlab programmers manually broadcast by
> fancy indexing with an array of (boolean) ones (it's less verbose than
> reshape). But with broadcasting for binary array operators (NumPy and
> Fortran 90), this is never needed.
>
> Sturla
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows

2012-02-05 Thread josef . pktd
On Sun, Feb 5, 2012 at 12:52 PM, Paolo  wrote:

>  How I can do this?
>
>

I'm not sure without trying, numpy.loadtxt might be the easier choice

matrix="".join((i.strip() for i in f.readlines()))

I think strip() also removes newlines besides other whitespace
otherwise more explicitly
matrix="".join((i.strip(f.newlines) for i in f.readlines()))

or open the file with mode 'rU'  and strip('\n')

Josef




>
>
>
> Il 05/02/2012 18:47, josef.p...@gmail.com ha scritto:
>
>
>
> On Sun, Feb 5, 2012 at 12:39 PM, Paolo  wrote:
>
>>  This is my code:
>>
>> matrix="".join(f.readlines())
>>
>
> my guess would be, that you have to strip the line endings \n versus \r\n
>
> Josef
>
>
>>  matrix=np.fromstring(matrix, dtype=np.int16)
>> matrix=matrix.reshape(siz[2],siz[1],siz[0]).T
>>
>>
>>
>>
>> Il 05/02/2012 17:21, Olivier Delalleau ha scritto:
>>
>> It means there is some of your code that is not entirely
>> platform-independent. It's not possible to tell you which part because you
>> didn't provide your code. The problem may not even be numpy-related.
>> So you should first look at the current shape of 'matrix', and what are
>> the values of a, b and c, then see where the discrepancy is, and work from
>> there.
>>
>> -=- Olivier
>>
>> Le 5 février 2012 11:16, Paolo Zaffino  a écrit :
>>
>>>   Yes, I understand this but I don't know because on Linux and Mac it
>>> works well.
>>> If the matrix size is different it should be different indipendently
>>> from os type.
>>> Am I wrong?
>>> Thanks for your support!
>>>
>>>  --
>>> * From: * Olivier Delalleau ;
>>> * To: * Discussion of Numerical Python ;
>>> * Subject: * Re: [Numpy-discussion] "ValueError: total size of new
>>> array must be unchanged" only on Windows
>>> * Sent: * Sun, Feb 5, 2012 3:02:44 PM
>>>
>>>   It should mean that matrix.size != a * b * c.
>>>
>>> -=- Olivier
>>>
>>> Le 5 février 2012 09:32, Paolo  a écrit :
>>>
 Hello,
 I wrote a function that works on a numpy matrix and it works fine on Mac
 OS and GNU/Linux (I didn't test it on python 3).
 Now I have a problem with numpy: the same python file doesn't work on
 Windows (Windows xp, python 2.7 and numpy 2.6.1).
 I get this error:

 matrix=matrix.reshape(a, b, c)
 ValueError: total size of new array must be unchanged

 Why? Do anyone have an idea about this?
 Thank you very much.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

>>>
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> ___
> NumPy-Discussion mailing 
> listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows

2012-02-05 Thread josef . pktd
On Sun, Feb 5, 2012 at 12:39 PM, Paolo  wrote:

>  This is my code:
>
> matrix="".join(f.readlines())
>

my guess would be, that you have to strip the line endings \n versus \r\n

Josef


> matrix=np.fromstring(matrix, dtype=np.int16)
> matrix=matrix.reshape(siz[2],siz[1],siz[0]).T
>
>
>
>
> Il 05/02/2012 17:21, Olivier Delalleau ha scritto:
>
> It means there is some of your code that is not entirely
> platform-independent. It's not possible to tell you which part because you
> didn't provide your code. The problem may not even be numpy-related.
> So you should first look at the current shape of 'matrix', and what are
> the values of a, b and c, then see where the discrepancy is, and work from
> there.
>
> -=- Olivier
>
> Le 5 février 2012 11:16, Paolo Zaffino  a écrit :
>
>>   Yes, I understand this but I don't know because on Linux and Mac it
>> works well.
>> If the matrix size is different it should be different indipendently from
>> os type.
>> Am I wrong?
>> Thanks for your support!
>>
>>  --
>> * From: * Olivier Delalleau ;
>> * To: * Discussion of Numerical Python ;
>> * Subject: * Re: [Numpy-discussion] "ValueError: total size of new array
>> must be unchanged" only on Windows
>> * Sent: * Sun, Feb 5, 2012 3:02:44 PM
>>
>>   It should mean that matrix.size != a * b * c.
>>
>> -=- Olivier
>>
>> Le 5 février 2012 09:32, Paolo  a écrit :
>>
>>> Hello,
>>> I wrote a function that works on a numpy matrix and it works fine on Mac
>>> OS and GNU/Linux (I didn't test it on python 3).
>>> Now I have a problem with numpy: the same python file doesn't work on
>>> Windows (Windows xp, python 2.7 and numpy 2.6.1).
>>> I get this error:
>>>
>>> matrix=matrix.reshape(a, b, c)
>>> ValueError: total size of new array must be unchanged
>>>
>>> Why? Do anyone have an idea about this?
>>> Thank you very much.
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trick for fast

2012-02-03 Thread josef . pktd
On Fri, Feb 3, 2012 at 4:49 PM, Alan G Isaac  wrote:
> On 2/3/2012 3:37 PM, josef.p...@gmail.com wrote:
>> res = - np.dot(x.T, mass*x)
>> res[np.arange(3), np.arange(3)] -= np.trace(res)
>
>
> Nice!
> Get some speed gain with slicing:
>
> res = - np.dot(x.T, mass*x)
> res.flat[slice(0,None,4)] -= np.trace(res)

Actually, I thought about the most readable version using diag_indices

>>> di = np.diag_indices(3)
>>> res = - np.dot(x.T, mass*x)
>>> res[di] -= np.trace(res)

(but I thought I said already enough)

Josef

>
> Alan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trick for fast

2012-02-03 Thread josef . pktd
On Fri, Feb 3, 2012 at 2:33 PM,   wrote:
> On Fri, Feb 3, 2012 at 1:58 PM, santhu kumar  wrote:
>> Hi Josef,
>>
>> I am unclear on what you want to say, but all I am doing in the code is
>> getting inertia tensor for a bunch of particle masses.
>> (http://en.wikipedia.org/wiki/Moment_of_inertia#Moment_of_inertia_tensor)
>>
>> So the diagonals are not actually zeros but would have z^2 + y^2 ..
>> The reason which I said 3secs could be misunderstood .. This code is called
>> many times over a loop and the bulk time is taken in computing this inertial
>> tensor.
>> After code change, the entire loop finishes off in 3 ses.
>
> ok, I still had a python sum instead of the numpy sum in there  (I
> really really like namespaces :)
>
 np.sum(ri*ri)
> 5549.0
 sum(ri*ri)
> array([ 1764.,  1849.,  1936.])
>
> this might match better.
>
> print  (mass * x**2).sum() * np.eye(3)  - np.dot(x.T, mass*x)
>
> or
> res = - np.dot(x.T, mass*x)
> res[np.arange(3), np.arange(3)] += (mass * x**2).sum()
> print res

if my interpretation is now correct, and because I have seen many traces lately

>>> res = - np.dot(x.T, mass*x)
>>> res[np.arange(3), np.arange(3)] -= np.trace(res)
>>> res
array([[ 35752.5, -16755. , -17287.5],
   [-16755. ,  34665. , -17865. ],
   [-17287.5, -17865. ,  33532.5]])


>>> res - inert
array([[  7.27595761e-12,   0.e+00,   3.63797881e-12],
   [  0.e+00,   0.e+00,   0.e+00],
   [  0.e+00,   0.e+00,   0.e+00]])


Josef


>
> Josef
>>
>> Thanks for alertness,
>> Santhosh
>>
>> On Fri, Feb 3, 2012 at 12:47 PM,  wrote:
>>>
>>> Send NumPy-Discussion mailing list submissions to
>>>        numpy-discussion@scipy.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>        http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> or, via email, send a message with subject or body 'help' to
>>>        numpy-discussion-requ...@scipy.org
>>>
>>> You can reach the person managing the list at
>>>        numpy-discussion-ow...@scipy.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of NumPy-Discussion digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>   1. Re: Trick for fast (santhu kumar)
>>>   2. Re: Trick for fast (josef.p...@gmail.com)
>>>
>>>
>>> --
>>>
>>> Message: 1
>>> Date: Fri, 3 Feb 2012 12:29:26 -0600
>>> From: santhu kumar 
>>>
>>> Subject: Re: [Numpy-discussion] Trick for fast
>>> To: numpy-discussion@scipy.org
>>> Message-ID:
>>>
>>>  
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>>
>>> On Fri, Feb 3, 2012 at 1:29 PM, santhu kumar  wrote:
>>> > Hello all,
>>> >
>>> > Thanks for lovely solutions. I have sat on it for some time and wrote it
>>> > myself :
>>> >
>>> > n =x.shape[0]
>>> > ea = np.array([1,0,0,0,1,0,0,0,1])
>>> > inert = ((np.tile(ea,(n,1))*((x*x).sum(axis=1)[:,np.newaxis]) -
>>> >
>>> > np.hstack([x*x[:,0][:,np.newaxis],x*x[:,1][:,np.newaxis],x*x[:,2][:,np.newaxis]]))*mass[:,np.newaxis]).sum(axis=0)
>>> > inert.shape = 3,3
>>> >
>>> > Does the trick and reduces the time from over 45 secs to 3 secs.
>>> > I do want to try einsum but my numpy is little old and it does not have
>>> > it.
>>> >
>>> > Thanks Sebastian (it was tricky to understand your code for me) and
>>> > Josef
>>> > (clean).
>>>
>>> Isn't the entire substraction of the first term just to set the
>>> diagonal of the result to zero.
>>>
>>> It looks to me now just like the weighted dot product and setting the
>>> diagonal to zero. That shouldn't take 3 secs unless you actual
>>> dimensions are huge.
>>>
>>> Josef
>>>
>>>
>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trick for fast

2012-02-03 Thread josef . pktd
On Fri, Feb 3, 2012 at 1:58 PM, santhu kumar  wrote:
> Hi Josef,
>
> I am unclear on what you want to say, but all I am doing in the code is
> getting inertia tensor for a bunch of particle masses.
> (http://en.wikipedia.org/wiki/Moment_of_inertia#Moment_of_inertia_tensor)
>
> So the diagonals are not actually zeros but would have z^2 + y^2 ..
> The reason which I said 3secs could be misunderstood .. This code is called
> many times over a loop and the bulk time is taken in computing this inertial
> tensor.
> After code change, the entire loop finishes off in 3 ses.

ok, I still had a python sum instead of the numpy sum in there  (I
really really like namespaces :)

>>> np.sum(ri*ri)
5549.0
>>> sum(ri*ri)
array([ 1764.,  1849.,  1936.])

this might match better.

print  (mass * x**2).sum() * np.eye(3)  - np.dot(x.T, mass*x)

or
res = - np.dot(x.T, mass*x)
res[np.arange(3), np.arange(3)] += (mass * x**2).sum()
print res

Josef
>
> Thanks for alertness,
> Santhosh
>
> On Fri, Feb 3, 2012 at 12:47 PM,  wrote:
>>
>> Send NumPy-Discussion mailing list submissions to
>>        numpy-discussion@scipy.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>        http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> or, via email, send a message with subject or body 'help' to
>>        numpy-discussion-requ...@scipy.org
>>
>> You can reach the person managing the list at
>>        numpy-discussion-ow...@scipy.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of NumPy-Discussion digest..."
>>
>>
>> Today's Topics:
>>
>>   1. Re: Trick for fast (santhu kumar)
>>   2. Re: Trick for fast (josef.p...@gmail.com)
>>
>>
>> --
>>
>> Message: 1
>> Date: Fri, 3 Feb 2012 12:29:26 -0600
>> From: santhu kumar 
>>
>> Subject: Re: [Numpy-discussion] Trick for fast
>> To: numpy-discussion@scipy.org
>> Message-ID:
>>
>>  
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>>
>> On Fri, Feb 3, 2012 at 1:29 PM, santhu kumar  wrote:
>> > Hello all,
>> >
>> > Thanks for lovely solutions. I have sat on it for some time and wrote it
>> > myself :
>> >
>> > n =x.shape[0]
>> > ea = np.array([1,0,0,0,1,0,0,0,1])
>> > inert = ((np.tile(ea,(n,1))*((x*x).sum(axis=1)[:,np.newaxis]) -
>> >
>> > np.hstack([x*x[:,0][:,np.newaxis],x*x[:,1][:,np.newaxis],x*x[:,2][:,np.newaxis]]))*mass[:,np.newaxis]).sum(axis=0)
>> > inert.shape = 3,3
>> >
>> > Does the trick and reduces the time from over 45 secs to 3 secs.
>> > I do want to try einsum but my numpy is little old and it does not have
>> > it.
>> >
>> > Thanks Sebastian (it was tricky to understand your code for me) and
>> > Josef
>> > (clean).
>>
>> Isn't the entire substraction of the first term just to set the
>> diagonal of the result to zero.
>>
>> It looks to me now just like the weighted dot product and setting the
>> diagonal to zero. That shouldn't take 3 secs unless you actual
>> dimensions are huge.
>>
>> Josef
>>
>>
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Trick for fast

2012-02-03 Thread josef . pktd
On Fri, Feb 3, 2012 at 1:29 PM, santhu kumar  wrote:
> Hello all,
>
> Thanks for lovely solutions. I have sat on it for some time and wrote it
> myself :
>
> n =x.shape[0]
> ea = np.array([1,0,0,0,1,0,0,0,1])
> inert = ((np.tile(ea,(n,1))*((x*x).sum(axis=1)[:,np.newaxis]) -
> np.hstack([x*x[:,0][:,np.newaxis],x*x[:,1][:,np.newaxis],x*x[:,2][:,np.newaxis]]))*mass[:,np.newaxis]).sum(axis=0)
> inert.shape = 3,3
>
> Does the trick and reduces the time from over 45 secs to 3 secs.
> I do want to try einsum but my numpy is little old and it does not have it.
>
> Thanks Sebastian (it was tricky to understand your code for me) and Josef
> (clean).

Isn't the entire substraction of the first term just to set the
diagonal of the result to zero.

It looks to me now just like the weighted dot product and setting the
diagonal to zero. That shouldn't take 3 secs unless you actual
dimensions are huge.

Josef




>
>
> On Fri, Feb 3, 2012 at 12:00 PM,  wrote:
>>
>> Send NumPy-Discussion mailing list submissions to
>>        numpy-discussion@scipy.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>        http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> or, via email, send a message with subject or body 'help' to
>>        numpy-discussion-requ...@scipy.org
>>
>> You can reach the person managing the list at
>>        numpy-discussion-ow...@scipy.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of NumPy-Discussion digest..."
>>
>>
>> Today's Topics:
>>
>>   1. Re: Trick for fast (josef.p...@gmail.com)
>>   2. Re: Trick for fast (S?ren Gammelmark)
>>   3. Re: Trick for fast (Sebastian Berg)
>>
>>
>> --
>>
>> Message: 1
>> Date: Fri, 3 Feb 2012 09:10:28 -0500
>> From: josef.p...@gmail.com
>> Subject: Re: [Numpy-discussion] Trick for fast
>> To: Discussion of Numerical Python 
>> Message-ID:
>>
>>  
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>>
>> On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac  wrote:
>> > On 2/3/2012 5:16 AM, santhu kumar wrote:
>> >> x = nX3 vector.
>> >> mass = nX1 vector
>> >> inert = zeros((3,3))
>> >> for i in range(n):
>> >> ? ? ? ?ri = x[i,:].reshape(1,3)
>> >> ? ? ? ?inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri))
>>
>> >>
>> >
>> >
>> > This should buy you a bit.
>> >
>> > xdot = (x*x).sum(axis=1)
>> > for (massi,xi,xdoti) in zip(mass.flat,x,xdot):
>> > ? ? ? temp = -np.outer(xi,xi)
>> > ? ? ? temp.flat[slice(0,None,4)] += xdoti
>> > ? ? ? inert += massi*temp
>>
>> >
>> > Alan Isaac
>>
>> maybe something like this, (self contained example and name spaces to
>> make running it easier)
>>
>> import numpy as np
>> n = 15
>> x = np.arange(n*3.).reshape(-1,3) #nX3 vector.
>> mass = np.linspace(1,2,n)[:,None] #nX1 vector
>> inert = np.zeros((3,3))
>> for i in range(n):
>>      ri = x[i,:].reshape(1,3)
>>      inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri))
>> print inert
>>
>> print  np.diag((mass * x**2).sum(0))  - np.dot(x.T, mass*x)
>>
>> [[     0.  -16755.  -17287.5]
>>  [-16755.       0.  -17865. ]
>>  [-17287.5 -17865.       0. ]]
>> [[     0.  -16755.  -17287.5]
>>  [-16755.       0.  -17865. ]
>>  [-17287.5 -17865.       0. ]]
>>
>> Josef
>>
>>
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> --
>>
>> Message: 2
>> Date: Fri, 3 Feb 2012 15:43:25 +0100
>> From: S?ren Gammelmark 
>> Subject: Re: [Numpy-discussion] Trick for fast
>> To: Discussion of Numerical Python 
>> Message-ID:
>>
>>  
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>>
>> What about this?
>>
>> A = einsum("i,ij->", mass, x ** 2)
>> B = einsum("i,ij,ik->jk", mass, x, x)
>> I = A * eye(3) - B
>>
>> /S?ren
>>
>>
>> On 3 February 2012 15:10,  wrote:
>>
>> > On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac 
>> > wrote:
>> > > On 2/3/2012 5:16 AM, santhu kumar wrote:
>> > >> x = nX3 vector.
>> > >> mass = nX1 vector
>> > >> inert = zeros((3,3))
>> > >> for i in range(n):
>> > >>        ri = x[i,:].reshape(1,3)
>> > >>        inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri))
>> > >>
>> > >
>> > >
>> > > This should buy you a bit.
>> > >
>> > > xdot = (x*x).sum(axis=1)
>> > > for (massi,xi,xdoti) in zip(mass.flat,x,xdot):
>> > >       temp = -np.outer(xi,xi)
>> > >       temp.flat[slice(0,None,4)] += xdoti
>> > >       inert += massi*temp
>> > >
>> > > Alan Isaac
>> >
>> > maybe something like this, (self contained example and name spaces to
>> > make running it easier)
>> >
>> > import numpy as np
>> > n = 15
>> > x = np.arange(n*3.).reshape(-1,3) #nX3 vector.
>> > mass = np.linspace(1,2,n)[:,None] #nX1 vector
>> > inert = np.zeros((3,3))
>> > for i in range(n):
>> >      ri = x[i,:].reshape(1,3)
>> >       inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri))
>

Re: [Numpy-discussion] Trick for fast

2012-02-03 Thread josef . pktd
On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac  wrote:
> On 2/3/2012 5:16 AM, santhu kumar wrote:
>> x = nX3 vector.
>> mass = nX1 vector
>> inert = zeros((3,3))
>> for i in range(n):
>>        ri = x[i,:].reshape(1,3)
>>        inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri))
>>
>
>
> This should buy you a bit.
>
> xdot = (x*x).sum(axis=1)
> for (massi,xi,xdoti) in zip(mass.flat,x,xdot):
>       temp = -np.outer(xi,xi)
>       temp.flat[slice(0,None,4)] += xdoti
>       inert += massi*temp
>
> Alan Isaac

maybe something like this, (self contained example and name spaces to
make running it easier)

import numpy as np
n = 15
x = np.arange(n*3.).reshape(-1,3) #nX3 vector.
mass = np.linspace(1,2,n)[:,None] #nX1 vector
inert = np.zeros((3,3))
for i in range(n):
  ri = x[i,:].reshape(1,3)
  inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri))
print inert

print  np.diag((mass * x**2).sum(0))  - np.dot(x.T, mass*x)

[[ 0.  -16755.  -17287.5]
 [-16755.   0.  -17865. ]
 [-17287.5 -17865.   0. ]]
[[ 0.  -16755.  -17287.5]
 [-16755.   0.  -17865. ]
 [-17287.5 -17865.   0. ]]

Josef


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] autocorrelation computation performance : use of np.correlate

2012-02-01 Thread josef . pktd
On Wed, Feb 1, 2012 at 6:48 PM, Benjamin Root  wrote:
>
>
> On Wednesday, February 1, 2012, Pierre Haessig 
> wrote:
>> Hi,
>>
>> [I'm not sure whether this discussion belongs to numpy-discussion or
>> scipy-dev]
>>
>> In day to day time series analysis I regularly need to look at the data
>> autocorrelation ("acorr" or "acf" depending on the software package).
>> The straighforward available function I have is matplotlib.pyplot.acorr.
>> However, for a moderately long time series (say of length 10**5) it taking a
>> huge time just to just dislays the autocorrelation values within a small
>> range of time lags.
>> The main reason being it is relying on np.correlate(x,x, mode=2) while
>> only a few lags are needed.
>> (I guess mode=2 is an (old fashioned?) way to set mode='full')
>>
>> I know that np.correlate performance issue has been discussed already, and
>> there is a *somehow* related ticket
>> (http://projects.scipy.org/numpy/ticket/1260). I noticed in the ticket's
>> change number 2 the following comment by Josef : "Maybe a truncated
>> convolution/correlation would be good". I'll come back to this soon.
>>
>> I made an example script "acf_timing.py" to start my point with some
>> timing data :
>>
>> In Ipython:
> run acf_timing.py # it imports statsmodel's acf + define 2 other acf
> implementations + an example data 10**5 samples long
>>
>> %time l,c = mpl_acf(a, 10)
>> CPU times: user 8.69 s, sys: 0.00 s, total: 8.69 s
>> Wall time: 11.18 s # pretty long...
>>
>>  %time c = sm_acf(a, 10)
>> CPU times: user 8.76 s, sys: 0.01 s, total: 8.78 s
>> Wall time: 10.79 s # long as well. statsmodel has a similar underlying
>> implementation
>> #
>> http://statsmodels.sourceforge.net/generated/scikits.statsmodels.tsa.stattools.acf.html#scikits.statsmodels.tsa.stattools.acf
>>
>> #Now, better option : use the fft convolution
>> %time c=sm_acf(a, 10,fft=True)
>> CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s
>> Wall time: 0.07 s
>> # Fast, but I'm not sure about the memory implication of using fft though.
>>
>> #The naive option : just compute the acf lags that are needed
>> %time l,c = naive_acf(a, 10)
>> CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
>> Wall time: 0.01 s
>> # Iterative computation. Pretty silly but very fast
>> # (Now of course, this naive implementation won't scale nicely for a lot
>> of lags)

I don't think it's silly to have a short python loop, statsmodels
actually uses the loop in the models, for example in yule_walker (and
GLSAR), because in most statistical application I wouldn't expect a
large number of lags. The time series models don't use the acov
directly, but I think most of the time we just loop over the lags.

>>
>> Now comes (at last) the question : what should be done about this
>> performance issue ?
>>  - should there be a truncated np.convolve/np.correlate function, as Josef
>> suggested ?
>>  - or should people in need of autocorrelation find some workarounds
>> because this usecase is not big enough to call for a change in np.convolve ?
>>
>> I really feel this question is about *where* a change should be
>> implemented  (numpy, scipy.signal, maplotlib ?) so that it makes sense while
>> not breaking 10^10 lines of numpy related code...
>>
>> Best,
>> Pierre
>>
>>
>
> Speaking for matplotlib, the acorr() (and xcorr()) functions in mpl are
> merely a convenience.  The proper place for any change would not be mpl
> (although, we would certainly take advantage of any improved acorr() and
> xcorr() that are made available in numpy.

I also think that numpy or scipy would be the natural candidates for a
correlate that works fast for an intermediate number of desired lags
(but still short compared to length of data).

Josef

>
> Ben Root
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy all unexpected result (generator)

2012-01-31 Thread josef . pktd
On Tue, Jan 31, 2012 at 5:35 PM, Travis Oliphant  wrote:
> Actually i believe the NumPy 'any' and 'all' names pre-date the Python usage 
> which first appeared in Python 2.5
>
> I agree with Chris that namespaces are a great idea.  I don't agree with 
> deprecating 'any' and 'all'

I completely agree here.

I also like to keep np.all, np.any, np.max, ...

>>> np.max((i>  0 for i in xrange (10)))
 at 0x046493F0>
>>> max((i>  0 for i in xrange (10)))
True

I used an old-style matplotlib example as recipe yesterday, and the
first thing I did is getting rid of the missing name spaces, and I had
to think twice what amax and amin are.

aall, aany ??? ;)

Josef

>
> It also seems useful to revisit under what conditions 'array' could correctly 
> interpret a generator expression, but in the context of streaming or deferred 
> arrays.
>
> Travis
>
>
> --
> Travis Oliphant
> (on a mobile)
> 512-826-7480
>
>
> On Jan 31, 2012, at 4:22 PM, Robert Kern  wrote:
>
>> On Tue, Jan 31, 2012 at 22:17, Travis Oliphant  wrote:
>>> I also agree that an exception should be raised at the very least.
>>>
>>> It might also be possible to make the NumPy any, all, and sum functions
>>> behave like the builtins when given a generator.  It seems worth exploring
>>> at least.
>>
>> I would rather we deprecate the all() and any() functions in favor of
>> the alltrue() and sometrue() aliases that date back to Numeric.
>> Renaming them to match the builtin names was a mistake.
>>
>> --
>> Robert Kern
>>
>> "I have come to believe that the whole world is an enigma, a harmless
>> enigma that is made terrible by our own mad attempt to interpret it as
>> though it had an underlying truth."
>>   -- Umberto Eco
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cross-covariance function

2012-01-26 Thread josef . pktd
On Thu, Jan 26, 2012 at 3:58 PM, Bruce Southey  wrote:
> On Thu, Jan 26, 2012 at 12:45 PM,   wrote:
>> On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey  wrote:
>>> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
>>>  wrote:
 Le 26/01/2012 15:57, Bruce Southey a écrit :
> Can you please provide a
> couple of real examples with expected output that clearly show what
> you want?
>
 Hi Bruce,

 Thanks for your ticket feedback ! It's precisely because I see a big
 potential impact of the proposed change that I send first a ML message,
 second a ticket before jumping to a pull-request like a Sergio Leone's
 cowboy (sorry, I watched "for a few dollars more" last weekend...)

 Now, I realize that in the ticket writing I made the wrong trade-off
 between conciseness and accuracy which led to some of the errors you
 raised. Let me try to use your example to try to share what I have in mind.

> >> X = array([-2.1, -1. ,  4.3])
> >> Y = array([ 3.  ,  1.1 ,  0.12])

 Indeed, with today's cov behavior we have a 2x2 array:
> >> cov(X,Y)
 array([[ 11.71      ,  -4.286     ],
        [ -4.286     ,   2.1441]])

 Now, when I used the word 'concatenation', I wasn't precise enough
 because I meant assembling X and Y in the sense of 2 vectors of
 observations from 2 random variables X and Y.
 This is achieved by concatenate(X,Y) *when properly playing with
 dimensions* (which I didn't mentioned) :
> >> XY = np.concatenate((X[None, :], Y[None, :]))
 array([[-2.1 , -1.  ,  4.3 ],
        [ 3.  ,  1.1 ,  0.12]])
>>>
>>> In this context, I find stacking,  np.vstack((X,Y)), more appropriate
>>> than concatenate.
>>>

 In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
> >> np.cov(XY)
 array([[ 11.71      ,  -4.286     ],
        [ -4.286     ,   2.1441]])

>>> Sure the resulting array is the same but whole process is totally different.
>>>
>>>
 (And indeed, the actual cov Python code does use concatenate() )
>>> Yes, but the user does not see that. Whereas you are forcing the user
>>> to do the stacking in the correct dimensions.
>>>
>>>


 Now let me come back to my assertion about this behavior *usefulness*.
 You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
 simple scalars blocks).
>>> No there are not '4' blocks just rows and columns.
>>
>> Sturla showed the 4 blocks in his first message.
>>
> Well, I could not follow that because the code is wrong.
> X = np.array([-2.1, -1. ,  4.3])
 cX = X - X.mean(axis=0)[np.newaxis,:]
>
> Traceback (most recent call last):
>  File "", line 1, in 
>    cX = X - X.mean(axis=0)[np.newaxis,:]
> IndexError: 0-d arrays can only use a single () or a list of newaxes
> (and a single ...) as an index
>  whoops!
>
> Anyhow, variance-covariance matrix is symmetric but numpy or scipy
> lacks  lapac's symmetrix matrix
> (http://www.netlib.org/lapack/explore-html/de/d9e/group___s_y.html)
>
>>>
  * diagonal blocks are just cov(X) and cov(Y) (which in this case comes
 to var(X) and var(Y) when setting ddof to 1)
>>> Sure but variances are still covariances.
>>>
  * off diagonal blocks are symetric and are actually the covariance
 estimate of X, Y observations (from
 http://en.wikipedia.org/wiki/Covariance)
>>> Sure

 that is :
> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
 -4.2865

 The new proposed behaviour for cov is that cov(X,Y) would return :
 array(-4.2865)  instead of the 2*2 matrix.
>>>
>>> But how you interpret an 2D array where the rows are greater than 2?
>> Z=Y+X
>> np.cov(np.vstack((X,Y,Z)))
>>> array([[ 11.71      ,  -4.286     ,   7.424     ],
>>>       [ -4.286     ,   2.1441,  -2.14186667],
>>>       [  7.424     ,  -2.14186667,   5.2821]])
>>>
>>>

  * This would be in line with the cov(X,Y) mathematical definition, as
 well as with R behavior.
>>> I don't care what R does because I am using Python and Python is
>>> infinitely better than R is!
>>>
>>> But I think that is only in the 1D case.
>>
>> I just checked R to make sure I remember correctly
>>
>>> xx = matrix((1:20)^2, nrow=4)
>>> xx
>>     [,1] [,2] [,3] [,4] [,5]
>> [1,]    1   25   81  169  289
>> [2,]    4   36  100  196  324
>> [3,]    9   49  121  225  361
>> [4,]   16   64  144  256  400
>>> cov(xx, 2*xx[,1:2])
>>         [,1]      [,2]
>> [1,]  86.  219.
>> [2,] 219.  566.
>> [3,] 352.6667  912.6667
>> [4,] 486. 1259.
>> [5,] 619. 1606.
>>> cov(xx)
>>         [,1]     [,2]      [,3]      [,4]      [,5]
>> [1,]  43. 109.6667  176.  243.  309.6667
>> [2,] 109.6667 283.  456.  629.6667  803.
>> [3,] 176. 456.  736. 1016. 1296.
>> [4,] 243. 629.6667 1016. 1403. 1789.6667
>> [5,] 309.6667 803. 1296. 1789.66

Re: [Numpy-discussion] Cross-covariance function

2012-01-26 Thread josef . pktd
On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey  wrote:
> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
>  wrote:
>> Le 26/01/2012 15:57, Bruce Southey a écrit :
>>> Can you please provide a
>>> couple of real examples with expected output that clearly show what
>>> you want?
>>>
>> Hi Bruce,
>>
>> Thanks for your ticket feedback ! It's precisely because I see a big
>> potential impact of the proposed change that I send first a ML message,
>> second a ticket before jumping to a pull-request like a Sergio Leone's
>> cowboy (sorry, I watched "for a few dollars more" last weekend...)
>>
>> Now, I realize that in the ticket writing I made the wrong trade-off
>> between conciseness and accuracy which led to some of the errors you
>> raised. Let me try to use your example to try to share what I have in mind.
>>
>>> >> X = array([-2.1, -1. ,  4.3])
>>> >> Y = array([ 3.  ,  1.1 ,  0.12])
>>
>> Indeed, with today's cov behavior we have a 2x2 array:
>>> >> cov(X,Y)
>> array([[ 11.71      ,  -4.286     ],
>>        [ -4.286     ,   2.1441]])
>>
>> Now, when I used the word 'concatenation', I wasn't precise enough
>> because I meant assembling X and Y in the sense of 2 vectors of
>> observations from 2 random variables X and Y.
>> This is achieved by concatenate(X,Y) *when properly playing with
>> dimensions* (which I didn't mentioned) :
>>> >> XY = np.concatenate((X[None, :], Y[None, :]))
>> array([[-2.1 , -1.  ,  4.3 ],
>>        [ 3.  ,  1.1 ,  0.12]])
>
> In this context, I find stacking,  np.vstack((X,Y)), more appropriate
> than concatenate.
>
>>
>> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
>>> >> np.cov(XY)
>> array([[ 11.71      ,  -4.286     ],
>>        [ -4.286     ,   2.1441]])
>>
> Sure the resulting array is the same but whole process is totally different.
>
>
>> (And indeed, the actual cov Python code does use concatenate() )
> Yes, but the user does not see that. Whereas you are forcing the user
> to do the stacking in the correct dimensions.
>
>
>>
>>
>> Now let me come back to my assertion about this behavior *usefulness*.
>> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
>> simple scalars blocks).
> No there are not '4' blocks just rows and columns.

Sturla showed the 4 blocks in his first message.

>
>>  * diagonal blocks are just cov(X) and cov(Y) (which in this case comes
>> to var(X) and var(Y) when setting ddof to 1)
> Sure but variances are still covariances.
>
>>  * off diagonal blocks are symetric and are actually the covariance
>> estimate of X, Y observations (from
>> http://en.wikipedia.org/wiki/Covariance)
> Sure
>>
>> that is :
>>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
>> -4.2865
>>
>> The new proposed behaviour for cov is that cov(X,Y) would return :
>> array(-4.2865)  instead of the 2*2 matrix.
>
> But how you interpret an 2D array where the rows are greater than 2?
 Z=Y+X
 np.cov(np.vstack((X,Y,Z)))
> array([[ 11.71      ,  -4.286     ,   7.424     ],
>       [ -4.286     ,   2.1441,  -2.14186667],
>       [  7.424     ,  -2.14186667,   5.2821]])
>
>
>>
>>  * This would be in line with the cov(X,Y) mathematical definition, as
>> well as with R behavior.
> I don't care what R does because I am using Python and Python is
> infinitely better than R is!
>
> But I think that is only in the 1D case.

I just checked R to make sure I remember correctly

> xx = matrix((1:20)^2, nrow=4)
> xx
 [,1] [,2] [,3] [,4] [,5]
[1,]1   25   81  169  289
[2,]4   36  100  196  324
[3,]9   49  121  225  361
[4,]   16   64  144  256  400
> cov(xx, 2*xx[,1:2])
 [,1]  [,2]
[1,]  86.  219.
[2,] 219.  566.
[3,] 352.6667  912.6667
[4,] 486. 1259.
[5,] 619. 1606.
> cov(xx)
 [,1] [,2]  [,3]  [,4]  [,5]
[1,]  43. 109.6667  176.  243.  309.6667
[2,] 109.6667 283.  456.  629.6667  803.
[3,] 176. 456.  736. 1016. 1296.
[4,] 243. 629.6667 1016. 1403. 1789.6667
[5,] 309.6667 803. 1296. 1789.6667 2283.


>
>>  * This would save memory and computing resources. (and therefore help
>> save the planet ;-) )
> Nothing that you have provided shows that it will.

I don't know about saving the planet, but if X and Y have the same
number of columns, we save 3 quarters of the calculations, as Sturla
also explained in his first message.

Josef

>
>>
>> However, I do understand that the impact for this change may be big.
>> This indeed requires careful reviewing.
>>
>> Pierre
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> Bruce
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumP

Re: [Numpy-discussion] Cross-covariance function

2012-01-26 Thread josef . pktd
On Thu, Jan 26, 2012 at 12:26 PM, Sturla Molden  wrote:
> Den 26.01.2012 17:25, skrev Pierre Haessig:
>> However, in the case this change is not possible, I would see this
>> solution :
>> * add and xcov function that does what Elliot and Sturla and I
>> described, because
>
> The current np.cov implementation returns the cross-covariance the way
> it is commonly used in statistics. If MATLAB does not, then that is
> MATLAB's problem I think.

The discussion had this reversed, numpy matches the behavior of
MATLAB, while R (statistics) only returns the cross covariance part as
proposed.

If there is a new xcov, then I think there should also be a xcorrcoef.
This case needs a different implementation than corrcoef, since the
xcov doesn't contain the variances and they need to be calculated
separately.

Josef

>
> http://www.stat.washington.edu/research/reports/2001/tr391.pdf
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] view of a structured array?

2012-01-25 Thread josef . pktd
On Wed, Jan 25, 2012 at 2:27 PM, Chris Barker  wrote:
> HI folks,
>
> Is there a way to get a view of a subset of a structured array? I know
> that an arbitrary subset will not fit into the numpy "strides"offsets"
> model, but some will, and it would be nice to have a view:
>
> For example, here we have a stuctured array:
>
> In [56]: a
> Out[56]:
> array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
>       (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
>      dtype=[('i', '
>
> if I pull out one "field" a get a view:
>
> In [57]: b = a['f1']
>
> In [58]: b[0] = 1000
>
> In [59]: a
> Out[59]:
> array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
>       (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
>      dtype=[('i', '
> However, if I pull out more than one field, I get a copy:
>
> In [60]: b = a[['f1','f2']]
>
> In [61]: b
> Out[61]:
> array([(1000.0, 3.0), (8.0, 9.0), (123.4, 7.0), (10.0, 11.0), (14.0, 15.0)],
>      dtype=[('f1', '
> In [62]: b[1] = (2000,3000)
>
> In [63]: b
> Out[63]:
> array([(1000.0, 3.0), (2000.0, 3000.0), (123.4, 7.0), (10.0, 11.0),
>       (14.0, 15.0)],
>      dtype=[('f1', '
> In [64]: a
> Out[64]:
> array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
>       (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
>      dtype=[('i', '
>
> However, in this case, the two fields are contiguous, and thus I'm
> pretty sure one could build a numpy array that was a view. Is there
> any way to do so? Ideally without manipulating the strides by hand,
> but I may want to do that if it's the only way.
>
> -Chris
>

that's what I would try:

>>> b = a.view(dtype=[('i', '>> ('i2', '>> b['fl']
array([(2.0, 3.0), (8.0, 9.0), (123.41, 7.0), (10.0, 11.0),
   (14.0, 15.0)],
  dtype=[('f1', '>> b['fl'][2]= (200, 500)
>>> a
array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 200.0, 500.0, 8),
   (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
  dtype=[('i', '
>
>
> --
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris
 wrote:
>
>
> On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina
>  wrote:
>>
>> I found something similar, with a very simple example.
>>
>> On 64-bit linux, python 2.7.2, numpy development version:
>>
>> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>>
>> In [23]: a.mean()
>> Out[23]: 4034.16357421875
>>
>> In [24]: np.version.full_version
>> Out[24]: '2.0.0.dev-55472ca'
>>
>>
>> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
>> >>>a = np.ones((1024,1024),dtype=np.float32)
>> >>>a.mean()
>> 4000.0
>> >>>np.version.full_version
>> '1.6.1'
>>
>>
>
> Yes, the results are platform/compiler dependent. The 32 bit platforms tend
> to use extended precision accumulators and the x87 instruction set. The 64
> bit platforms tend to use sse2+. Different precisions, even though you might
> think they are the same.

just to confirm, same computer as before but the python 3.2 version is
64 bit, now I get the "Linux" result

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit
(AMD64)] on win32

>>> import numpy as np
>>> np.__version__
'1.5.1'
>>> a = 4000*np.ones((1024,1024),dtype=np.float32)
>>> a.mean()
4034.16357421875
>>> a.mean(0).mean(0)
4000.0
>>> a.mean(dtype=np.float64)
4000.0

Josef

>
> 
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bug in numpy.mean() ?

2012-01-24 Thread josef . pktd
On Tue, Jan 24, 2012 at 7:21 PM, eat  wrote:

> Hi
>
> On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina <
> kathleen.m.tac...@nasa.gov> wrote:
>
>> **
>> I found something similar, with a very simple example.
>>
>> On 64-bit linux, python 2.7.2, numpy development version:
>>
>> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>>
>> In [23]: a.mean()
>> Out[23]: 4034.16357421875
>>
>> In [24]: np.version.full_version
>> Out[24]: '2.0.0.dev-55472ca'
>>
>>
>> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
>> >>>a = np.ones((1024,1024),dtype=np.float32)
>> >>>a.mean()
>> 4000.0
>> >>>np.version.full_version
>> '1.6.1'
>>
> This indeed looks very nasty, regardless of whether it is a version or
> platform related problem.
>

Looks like platform specific, same result as -eat

Windows 7,
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32

>>> a = np.ones((1024,1024),dtype=np.float32)
>>> a.mean()
1.0

>>> (4000*a).dtype
dtype('float32')
>>> (4000*a).mean()
4000.0

>>> b = np.load("data.npy")
>>> b.mean()
3045.74718
>>> b.shape
(1000, 1000)
>>> b.mean(0).mean(0)
3045.74724
>>> _.dtype
dtype('float64')
>>> b.dtype
dtype('float32')

>>> b.mean(dtype=np.float32)
3045.74718

Josef


>
> -eat
>
>>
>>
>>
>> On Tue, 2012-01-24 at 17:12 -0600, eat wrote:
>>
>> Hi,
>>
>>
>>
>>  Oddly, but numpy 1.6 seems to behave more consistent manner:
>>
>>
>>
>>  In []: sys.version
>>
>>  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
>> (Intel)]'
>>
>>  In []: np.version.version
>>
>>  Out[]: '1.6.0'
>>
>>
>>
>>  In []: d= np.load('data.npy')
>>
>>  In []: d.dtype
>>
>>  Out[]: dtype('float32')
>>
>>
>>
>>  In []: d.mean()
>>
>>  Out[]: 3045.74718
>>
>>  In []: d.mean(dtype= np.float32)
>>
>>  Out[]: 3045.74718
>>
>>  In []: d.mean(dtype= np.float64)
>>
>>  Out[]: 3045.747251076416
>>
>>  In []: (d- d.min()).mean()+ d.min()
>>
>>  Out[]: 3045.7472508750002
>>
>>  In []: d.mean(axis= 0).mean()
>>
>>  Out[]: 3045.74724
>>
>>  In []: d.mean(axis= 1).mean()
>>
>>  Out[]: 3045.74724
>>
>>
>>
>>  Or does the results of calculations depend more on the platform?
>>
>>
>>
>>
>>
>>  My 2 cents,
>>
>>  eat
>>
>>   --
>> --
>> Kathleen M. Tacina
>> NASA Glenn Research Center
>> MS 5-10
>> 21000 Brookpark Road
>> Cleveland, OH 44135
>> Telephone: (216) 433-6660
>> Fax: (216) 433-5802
>> --
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cross-covariance function

2012-01-21 Thread josef . pktd
On Sat, Jan 21, 2012 at 6:26 PM, John Salvatier
 wrote:
> I ran into this a while ago and was confused why cov did not behave the way
> pierre suggested.

same here,
When I rewrote scipy.stats.spearmanr, I matched the numpy behavior for
two arrays, while R only returns the cross-correlation part.

Josef

>
> On Jan 21, 2012 12:48 PM, "Elliot Saba"  wrote:
>>
>> Thank you Sturla, that's exactly what I want.
>>
>> I'm sorry that I was not able to reply for so long, but Pierre's code is
>> similar to what I have already implemented, and I am in support of changing
>> the functionality of cov().  I am unaware of any arguments for a covariance
>> function that works in this way, except for the fact that the MATLAB cov()
>> function behaves in the same way. [1]
>>
>> MATLAB, however, has an xcov() function, which is similar to what we have
>> been discussing. [2]
>>
>> Unless you all wish to retain compatibility with MATLAB, I feel that the
>> behaviour of cov() suggested by Pierre is the most straightforward method,
>> and that if users wish to calculate the covariance of X concatenated with Y,
>> then they may simply concatenate the matrices explicitly before passing into
>> cov(), as this way the default method does not use 75% more CPU time.
>>
>> Again, if there is an argument for this functionality, I would love to
>> learn of it!
>> -E
>>
>> [1] http://www.mathworks.com/help//techdoc/ref/cov.html
>> [2] http://www.mathworks.com/help/toolbox/signal/ref/xcov.html
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.

2012-01-15 Thread josef . pktd
On Sun, Jan 15, 2012 at 3:15 AM, Nathaniel Smith  wrote:
> On Sat, Jan 14, 2012 at 2:12 PM, Charles R Harris
>  wrote:
>> This sort of makes sense, but is it the 'correct' behavior?
>>
>> In [20]: zeros(2, 'S')
>> Out[20]:
>> array(['', ''],
>>   dtype='|S1')
>
> I think of numpy strings as raw fixed-length byte arrays (since, well,
> that's what they are), so I would expect np.zeros to return all-NUL
> strings, like it does. (Not just 'empty' strings, which just means the
> first byte is NUL -- I expect all-NUL.) Maybe I've spent too much time
> working with C data structures, but that's my $0.02 :-)

Since I'm not coding in C: can a fixed-length empty string, '', be
represented as only first byte is NUL?

The following with the current behavior looks all reasonable to me

>>> np.zeros(2).view('S4')
array(['', '', '', ''],
  dtype='|S4')
>>> np.zeros(4, 'S4').view(float)
array([ 0.,  0.])
>>> np.zeros(4, 'S4').view(int)
array([0, 0, 0, 0])
>>> np.zeros(4, 'S4').view('S16')
array([''],
  dtype='|S16')


np.zeros(2, float).view('S4')
array(['', '', '', ''],
  dtype='|S4')

instead of astype

>>> np.zeros(2, float).astype('S4')
array(['0.0', '0.0'],
  dtype='|S4')

my 2c (with trying to understand what's the question)

Josef

>
> -- Nathaniel
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.

2012-01-14 Thread josef . pktd
On Sat, Jan 14, 2012 at 5:25 PM, Benjamin Root  wrote:
> On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root  wrote:
>>
>> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris
>>  wrote:
>>>
>>> This sort of makes sense, but is it the 'correct' behavior?
>>>
>>> In [20]: zeros(2, 'S')
>>> Out[20]:
>>> array(['', ''],
>>>   dtype='|S1')
>>>
>>> It might be more consistent to return '0' instead, as in
>>>
>>> In [3]: zeros(2, int).astype('S')
>>> Out[3]:
>>> array(['0', '0'],
>>>   dtype='|S24')



I would be surprised if zeros is not an empty string, since an empty
string is the "zero" for string addition.
multiplication for strings doesn't exist, so ones can be anything even
literally '1'

>>> a = np.zeros(5,'S4')
>>> a[:] = 'b'
>>> reduce(lambda x,y: x+y, a)
'b'


>>> a = np.zeros(1,'S100')
>>> for i in range(5): a[:] = a.item() + 'a'
...
>>> a
array(['a'],
  dtype='|S100')


just as a logical argument, I have no idea what's practical since last
time I tried to use numpy strings, I didn't find string addition and
went back to double and triple list comprehension.

Josef

>>>
>>> Chuck
>>>
>>
>> Whatever it should be, numpy is currently inconsistent:
>>
>> >>> np.empty(2, 'S')
>> array(['0', '\xd4'],
>>   dtype='|S1')
>> >>> np.zeros(2, 'S')
>>
>> array(['', ''],
>>   dtype='|S1')
>> >>> np.ones(2, 'S')
>> array(['1', '1'],
>>   dtype='|S1')
>>
>> I would expect '0''s for the call to zeros() and empty strings for the
>> call to empty().
>>
>> Ben Root
>>
>
> On the other hand, it is fairly standard to assume that the values in the
> array returned by empty() to be random, uninitialized junk.  So, maybe
> empty()'s current behavior is ok, but certainly zeros()'s and ones()'s
> behaviors need to be looked at.
>
> Ben Root
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] polynomial package update

2012-01-02 Thread josef . pktd
On Mon, Jan 2, 2012 at 9:44 PM, Charles R Harris
 wrote:
> Hi All,
>
> I've made a pull request for a  rather large update of the polynomial
> package. The new features are
>
> 1) Bug fixes
> 2) Improved documentation in the numpy reference
> 3) Preliminary support for multi-dimensional coefficient arrays
> 4) Support for NA in the fitting routines
> 5) Improved testing and test coverage
> 6) Gauss quadrature
> 7) Weight functions
> 8) (Mostly) Symmetrized companion matrices
> 9) Add cast and basis as static functions of convenience classes
> 10) Remove deprecated import from package init.py
>
> If anyone has an interest in that package, please take some time and review
> it here.

(Since I'm not setup for compiling numpy I cannot try it out. Just
some spotty reading of the code.)

The two things I'm most interested in are the 2d, 3d enhancements and
the quadrature.

What's the return of the 2d vander functions?

If I read it correctly, it's:

>>> xyn = np.array([['x^%d*y^%d'%(px,py) for py in range(5)] for px in 
>>> range(3)])
>>> xyn
array([['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4'],
   ['x^1*y^0', 'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4'],
   ['x^2*y^0', 'x^2*y^1', 'x^2*y^2', 'x^2*y^3', 'x^2*y^4']],
  dtype='|S7')
>>> xyn.ravel()
array(['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4', 'x^1*y^0',
   'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4', 'x^2*y^0', 'x^2*y^1',
   'x^2*y^2', 'x^2*y^3', 'x^2*y^4'],
  dtype='|S7')

Are the normalization constants available in explicit form to get an
orthonormal basis?
The test_100 look like good recipes for getting the normalization and
the integration constants.

Are the quads weights and points the same as in scipy.special (up to
floating point differences)?

Looks very useful and I'm looking forward to trying it out, and I will
borrow some code like test_100 as recipes.
(For densities, I still need mostly orthonormal basis and integration
normalized to 1.)

Josef







>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing empty dimensions with empty arrays

2011-12-26 Thread josef . pktd
2011/12/26 Jordi Gutiérrez Hermoso :
> On 26 December 2011 14:56, Ralf Gommers  wrote:
>>
>>
>> On Mon, Dec 26, 2011 at 8:50 PM,  wrote:
>>> I have a hard time thinking through empty 2-dim arrays, and don't know
>>> what rules should apply.
>>> However, in my code I might want to catch these cases rather early
>>> than late and then having to work my way backwards to find out where
>>> the content disappeared.
>>
>>
>> Same here. Almost always, my empty arrays are either due to bugs or they
>> signal that I do need to special-case something. Silent passing through of
>> empty arrays to all numpy functions is not what I would want.
>
> I find it quite annoying to treat the empty set with special
> deference. "All of my great-grandkids live in Antarctica" should be
> true for me (I'm only 30 years old). If you decide that is not true
> for me, it leads to a bunch of other logical annoyances up there
>
> The rule that shouldn't be special cased is what I described: x[idx1,
> idx2] should be a valid construction if it's true that all elements of
> idx1 and idx2 are integers in the correct range. The sizes of the
> empty matrices are also somewhat obvious.
>
> Special-casing vacuous truth makes me write annoying special cases.
> Octave doesn't error out for those special cases, and I think it's a
> good thing it doesn't. It's logically consistent.

I don't think I ever ran into an empty matrix in matlab, and wouldn't
know how it behaves.

But it looks like the [:, empty] is a special case that doesn't work

>>> np.ones((3,0))
array([], shape=(3, 0), dtype=float64)
>>> np.ones((3,0))[1,[]]
array([], dtype=float64)
>>> np.ones((3,0))[:,[]]
Traceback (most recent call last):
  File "", line 1, in 
IndexError: invalid index

>>> np.ones((3,0))[np.arange(3),[]]
Traceback (most recent call last):
  File "", line 1, in 
ValueError: shape mismatch: objects cannot be broadcast to a single shape

oops, my mistake

>>> np.broadcast_arrays(np.arange(3)[:,None],[])
[array([], shape=(3, 0), dtype=int32), array([], shape=(3, 0), dtype=float64)]
>>> np.ones((3,0))[np.arange(3)[:,None],[]]
array([], shape=(3, 0), dtype=float64)
>>> np.broadcast_arrays(np.arange(3)[:,None],[[]])
[array([], shape=(3, 0), dtype=int32), array([], shape=(3, 0), dtype=float64)]

>>> np.ones((3,0))[np.arange(3)[:,None],[]]
array([], shape=(3, 0), dtype=float64)
>>> np.ones((3,0))[np.arange(3)[:,None],[[]]]
array([], shape=(3, 0), dtype=float64)
>>> np.ones((3,0))[np.arange(3)[:,None],np.array([],int)]
array([], shape=(3, 0), dtype=float64)

>>> np.take(np.ones((3,0)),[], axis=1)
array([], shape=(3, 0), dtype=float64)
>>> np.take(np.ones((3,0)),[], axis=0)
array([], shape=(0, 0), dtype=float64)

I would prefer consistent indexing, independent of whether I find it
useful to have pages of code working with nothing.

Josef
I don't think a paper where the referee or editor catches the authors
using assumptions that describe an empty set will ever get published
(maybe with a qualifier, outside of philosophy).
It might happen, though, that the empty set slips through the
refereeing process.

>
> - Jordi G. H.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing empty dimensions with empty arrays

2011-12-26 Thread josef . pktd
On Mon, Dec 26, 2011 at 1:51 PM, Ralf Gommers
 wrote:
>
>
> 2011/12/25 Jordi Gutiérrez Hermoso 
>>
>> I have been instructed to bring this issue to the mailing list:
>>
>>   http://projects.scipy.org/numpy/ticket/1994
>>
> The issue is this corner case:
>
 idx = []
 x = np.array([])
 x[idx]  #works
> array([], dtype=float64)
 x[:, idx]  #works
> array([], dtype=float64)
>
 x = np.ones((5,0))
 x[idx]  #works
> array([], shape=(0, 0), dtype=float64)
 x[:, idx]  #doesn't work
> Traceback (most recent call last):
>   File "", line 1, in 
>     x[:, idx]  #doesn't work
> IndexError: invalid index
>
>
> This is obviously inconsistent, but I think just fixing this one case is not
> enough; unexpected behavior with empty inputs/indexes keeps coming up. Do we
> need a clear set of rules that all functions follow and tests to ensure
> these rules are actually followed, or not?

this works
>>> xx = np.arange(12).reshape(3,4)
>>> xx
array([[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11]])
>>> x = xx[:,xx[:,-1]<3]
>>> x
array([], shape=(3, 0), dtype=int32)
>>> x<0
array([], shape=(3, 0), dtype=bool)
>>> x[x<0]
array([], dtype=int32)
>>> x[:,x<0]
array([], dtype=int32)

>>> x.ndim
2

I have a hard time thinking through empty 2-dim arrays, and don't know
what rules should apply.
However, in my code I might want to catch these cases rather early
than late and then having to work my way backwards to find out where
the content disappeared.

my 2c

Josef
>
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Binning

2011-12-22 Thread josef . pktd
On Thu, Dec 22, 2011 at 11:39 AM, Adrien  wrote:
> Le 22/12/2011 17:17, josef.p...@gmail.com a écrit :
>> On Thu, Dec 22, 2011 at 6:27 AM, Adrien Gaidon  wrote:
>>> Hello Nicola,
>>>
>>> I am not aware of a magical "one function" numpy solution (is there one
>>> numpy gurus?).
>>>
>>> I don't know if it's optimal, but here's how I usually do similar things.
>>>
>>> I wrote a simple function that assigns points (any number of dimensions) to
>>> a regular multi-dimensional grid. It is
>>> here: https://gist.github.com/1509853 It is short, commented and should be
>>> straightforward to use.
>>>
>>> Once you have the assignments, you can:
>>> - get the non-empty cell indexes with `np.unique(assignments)`
>>> - retrieve the points assigned to a cell with `points[assignments ==
>>> cell_index]`
>>> - iterate over assignments to select the points you want for each cell.
>> looks nice, reading through it.
>> line 71 looks like a nice trick
>>
>> BSD licensed, so we can keep it?
>
> Off course! :-)

with numpy 1.5 compatibility, if I did it right (reverse engineering
ravel_multi_index which I never used)

https://gist.github.com/1511969/222e3316048bce5763b1004331af898088ffcd9e

Josef

> Cheers,
>
> Adrien
>
>> as far as I know numpy doesn't have anything like a digitize_nd
>>
>> Thanks,
>>
>> Josef
>>
>>> Hope this helps,
>>>
>>> Adrien
>>>
>>> PS: This is one of the first times I post an answer on this list, so if I
>>> did anything wrong, let me know. Numpy is such a wonderful thing and you
>>> guys do such an amazing work, that I though it is time to give back at least
>>> epsilon of what I got from you :-)
>>>
>>>
>>> 2011/12/22 Nicola Creati
 Hello,

 I have a cloud on sparse points that can be described by a Nx3 array (N
 is the number of points). Each point is defined by an x, y and z
 coordinate:

 x0 y0 z0
 x1 y1 z1
    .    .    .
    .    .    .
    .    .    .
 xn yn zn


 I need to bin the cloud to a regular 2D array according to a desired bin
 size assigning to each cell (bin) the minimum z of all points that fall
 in that cell(bin). Moreover I need indexes of points that fall in each
 cell(bin).

 Is there any way to accomplish this task in numpy?

 Thanks.

 Nicola Creati




 --
 Nicola Creati
 Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS
 www.inogs.it Dipartimento di Geofisica della Litosfera Geophysics of
 Lithosphere Department CARS (Cartography and Remote Sensing) Research Group
 http://www.inogs.it/Cars/ Borgo Grotta Gigante 42/c 34010 Sgonico - Trieste
 - ITALY ncre...@ogs.trieste.it
 off.   +39 040 2140 213
 fax.   +39 040 327307

 _
 This communication, that may contain confidential and/or legally
 privileged information, is intended solely for the use of the intended
 addressees. Opinions, conclusions and other information contained in this
 message, that do not relate to the official business of OGS, shall be
 considered as not given or endorsed by it. Every opinion or advice 
 contained
 in this communication is subject to the terms and conditions provided by 
 the
 agreement governing the engagement with such a client. Any use, disclosure,
 copying or distribution of the contents of this communication by a
 not-intended recipient or in violation of the purposes of this 
 communication
 is strictly prohibited and may be unlawful. For Italy only: Ai sensi del
 D.Lgs.196/2003 - "T.U. sulla Privacy" si precisa che le informazioni
 contenute in questo messaggio sono riservate ed a uso esclusivo del
 destinatario.
 _

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Binning

2011-12-22 Thread josef . pktd
On Thu, Dec 22, 2011 at 11:17 AM,   wrote:
> On Thu, Dec 22, 2011 at 6:27 AM, Adrien Gaidon  wrote:
>> Hello Nicola,
>>
>> I am not aware of a magical "one function" numpy solution (is there one
>> numpy gurus?).
>>
>> I don't know if it's optimal, but here's how I usually do similar things.
>>
>> I wrote a simple function that assigns points (any number of dimensions) to
>> a regular multi-dimensional grid. It is
>> here: https://gist.github.com/1509853 It is short, commented and should be
>> straightforward to use.
>>
>> Once you have the assignments, you can:
>> - get the non-empty cell indexes with `np.unique(assignments)`
>> - retrieve the points assigned to a cell with `points[assignments ==
>> cell_index]`
>> - iterate over assignments to select the points you want for each cell.
>
> looks nice, reading through it.
> line 71 looks like a nice trick

forgot the qualifier: if I understand it correctly, just quickly reading it.
Josef

>
> BSD licensed, so we can keep it?
>
> as far as I know numpy doesn't have anything like a digitize_nd
>
> Thanks,
>
> Josef
>
>>
>> Hope this helps,
>>
>> Adrien
>>
>> PS: This is one of the first times I post an answer on this list, so if I
>> did anything wrong, let me know. Numpy is such a wonderful thing and you
>> guys do such an amazing work, that I though it is time to give back at least
>> epsilon of what I got from you :-)
>>
>>
>> 2011/12/22 Nicola Creati 
>>>
>>> Hello,
>>>
>>> I have a cloud on sparse points that can be described by a Nx3 array (N
>>> is the number of points). Each point is defined by an x, y and z
>>> coordinate:
>>>
>>> x0 y0 z0
>>> x1 y1 z1
>>>   .    .    .
>>>   .    .    .
>>>   .    .    .
>>> xn yn zn
>>>
>>>
>>> I need to bin the cloud to a regular 2D array according to a desired bin
>>> size assigning to each cell (bin) the minimum z of all points that fall
>>> in that cell(bin). Moreover I need indexes of points that fall in each
>>> cell(bin).
>>>
>>> Is there any way to accomplish this task in numpy?
>>>
>>> Thanks.
>>>
>>> Nicola Creati
>>>
>>>
>>>
>>>
>>> --
>>> Nicola Creati
>>> Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS
>>> www.inogs.it Dipartimento di Geofisica della Litosfera Geophysics of
>>> Lithosphere Department CARS (Cartography and Remote Sensing) Research Group
>>> http://www.inogs.it/Cars/ Borgo Grotta Gigante 42/c 34010 Sgonico - Trieste
>>> - ITALY ncre...@ogs.trieste.it
>>> off.   +39 040 2140 213
>>> fax.   +39 040 327307
>>>
>>> _
>>> This communication, that may contain confidential and/or legally
>>> privileged information, is intended solely for the use of the intended
>>> addressees. Opinions, conclusions and other information contained in this
>>> message, that do not relate to the official business of OGS, shall be
>>> considered as not given or endorsed by it. Every opinion or advice contained
>>> in this communication is subject to the terms and conditions provided by the
>>> agreement governing the engagement with such a client. Any use, disclosure,
>>> copying or distribution of the contents of this communication by a
>>> not-intended recipient or in violation of the purposes of this communication
>>> is strictly prohibited and may be unlawful. For Italy only: Ai sensi del
>>> D.Lgs.196/2003 - "T.U. sulla Privacy" si precisa che le informazioni
>>> contenute in questo messaggio sono riservate ed a uso esclusivo del
>>> destinatario.
>>> _
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Binning

2011-12-22 Thread josef . pktd
On Thu, Dec 22, 2011 at 6:27 AM, Adrien Gaidon  wrote:
> Hello Nicola,
>
> I am not aware of a magical "one function" numpy solution (is there one
> numpy gurus?).
>
> I don't know if it's optimal, but here's how I usually do similar things.
>
> I wrote a simple function that assigns points (any number of dimensions) to
> a regular multi-dimensional grid. It is
> here: https://gist.github.com/1509853 It is short, commented and should be
> straightforward to use.
>
> Once you have the assignments, you can:
> - get the non-empty cell indexes with `np.unique(assignments)`
> - retrieve the points assigned to a cell with `points[assignments ==
> cell_index]`
> - iterate over assignments to select the points you want for each cell.

looks nice, reading through it.
line 71 looks like a nice trick

BSD licensed, so we can keep it?

as far as I know numpy doesn't have anything like a digitize_nd

Thanks,

Josef

>
> Hope this helps,
>
> Adrien
>
> PS: This is one of the first times I post an answer on this list, so if I
> did anything wrong, let me know. Numpy is such a wonderful thing and you
> guys do such an amazing work, that I though it is time to give back at least
> epsilon of what I got from you :-)
>
>
> 2011/12/22 Nicola Creati 
>>
>> Hello,
>>
>> I have a cloud on sparse points that can be described by a Nx3 array (N
>> is the number of points). Each point is defined by an x, y and z
>> coordinate:
>>
>> x0 y0 z0
>> x1 y1 z1
>>   .    .    .
>>   .    .    .
>>   .    .    .
>> xn yn zn
>>
>>
>> I need to bin the cloud to a regular 2D array according to a desired bin
>> size assigning to each cell (bin) the minimum z of all points that fall
>> in that cell(bin). Moreover I need indexes of points that fall in each
>> cell(bin).
>>
>> Is there any way to accomplish this task in numpy?
>>
>> Thanks.
>>
>> Nicola Creati
>>
>>
>>
>>
>> --
>> Nicola Creati
>> Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS
>> www.inogs.it Dipartimento di Geofisica della Litosfera Geophysics of
>> Lithosphere Department CARS (Cartography and Remote Sensing) Research Group
>> http://www.inogs.it/Cars/ Borgo Grotta Gigante 42/c 34010 Sgonico - Trieste
>> - ITALY ncre...@ogs.trieste.it
>> off.   +39 040 2140 213
>> fax.   +39 040 327307
>>
>> _
>> This communication, that may contain confidential and/or legally
>> privileged information, is intended solely for the use of the intended
>> addressees. Opinions, conclusions and other information contained in this
>> message, that do not relate to the official business of OGS, shall be
>> considered as not given or endorsed by it. Every opinion or advice contained
>> in this communication is subject to the terms and conditions provided by the
>> agreement governing the engagement with such a client. Any use, disclosure,
>> copying or distribution of the contents of this communication by a
>> not-intended recipient or in violation of the purposes of this communication
>> is strictly prohibited and may be unlawful. For Italy only: Ai sensi del
>> D.Lgs.196/2003 - "T.U. sulla Privacy" si precisa che le informazioni
>> contenute in questo messaggio sono riservate ed a uso esclusivo del
>> destinatario.
>> _
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind

2011-12-19 Thread josef . pktd
On Mon, Dec 19, 2011 at 8:16 PM, eat  wrote:
> Hi,
>
> On Tue, Dec 20, 2011 at 2:33 AM,  wrote:
>>
>> On Mon, Dec 19, 2011 at 6:27 PM, eat  wrote:
>> > Hi,
>> >
>> > Especially when the keyword return_index of np.unique(.) is specified to
>> > be
>> > True, would it in general also be reasonable to be able to specify the
>> > sorting algorithm of the underlying np.argsort(.)?
>> >
>> > The rationale is that (for at least some of my cases) higher level
>> > algorithms seems to be too over complicated unless I'm not able to
>> > request a
>> > stable sorting order from np.unique(.) (like np.unique(., return_index=
>> > True, kind= 'mergesort').
>> >
>> > (FWIW, I apparently do have a working local hack for this kind of
>> > functionality, but without extensive testing of 'all' corner cases).
>>
>> Just to understand this:
>>
>> Is the return_index currently always the first occurrence or random?
>
> No, for current implementation it's not always the first occurrence
> returned. AFAIK, the only stable algorithm to provide this is ' mergesort'
> and that's why I'll like to have a keyword 'kind' to propagate down to then
> internals.

Thanks, then I'm all in favor of mergesort.

>>
>>
>> I haven't found a use for return_index yet (but use return_inverse a
>> lot), but if we are guaranteed to get the first instance, then this
>> could be very useful.
>
> I think that 'return_inverse' will suffer of the same
> nondeterministic behavior as well.

I don't think so, because there is a unique mapping from observations
to unique items.

Josef

>
> Thanks,
> eat
>>
>>
>> Josef
>>
>>
>> >
>> >
>> > Thanks,
>> > eat
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Would it be possible to enhance np.unique(.) with a keyword kind

2011-12-19 Thread josef . pktd
On Mon, Dec 19, 2011 at 6:27 PM, eat  wrote:
> Hi,
>
> Especially when the keyword return_index of np.unique(.) is specified to be
> True, would it in general also be reasonable to be able to specify the
> sorting algorithm of the underlying np.argsort(.)?
>
> The rationale is that (for at least some of my cases) higher level
> algorithms seems to be too over complicated unless I'm not able to request a
> stable sorting order from np.unique(.) (like np.unique(., return_index=
> True, kind= 'mergesort').
>
> (FWIW, I apparently do have a working local hack for this kind of
> functionality, but without extensive testing of 'all' corner cases).

Just to understand this:

Is the return_index currently always the first occurrence or random?

I haven't found a use for return_index yet (but use return_inverse a
lot), but if we are guaranteed to get the first instance, then this
could be very useful.

Josef


>
>
> Thanks,
> eat
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] polynomial with negative exponents

2011-12-12 Thread josef . pktd
On Mon, Dec 12, 2011 at 9:35 AM,   wrote:
> On Mon, Dec 12, 2011 at 9:04 AM, LASAGNA DAVIDE
>  wrote:
>> Hi,
>>
>> I have written a class for polynomials with negative
>> exponents like:
>>
>> p(x) = a0 + a1*x**-1 + ... + an*x**-n
>>
>> The code is this one:
>>
>> class NegativeExpPolynomial( object ):
>>      def __init__ ( self, coeffs ):
>>          self.coeffs = np.array( coeffs )
>>
>>      def __call__( self, x ):
>>          return sum( (c*x**(-i) for i, c in enumerate(
>> self.coeffs ) ) )
>
> something like
>
> self.coeffs = np.asarray(self.coeffs)
>
> np.sum(self.coeffs * x**(-np.arange(len(self.coeffs)))
>
> or
> np.dot(self.coeffs,  x**(-np.arange(len(self.coeffs)))      #check
> shapes, or np.inner
>
> Josef
>
>>
>> where coeffs = [a0, a1, ..., an].
>>
>> I find that the way i evaluate the polynomial is kind of
>> *slow*, especially for polynomial with order larger than
>> ~200 and for arrays x large enough.

there might be numerical problems with large polynomials if the range
of values is large

Josef

>>
>> Do you have suggestions on how to speed up this code?
>>
>> Regards,
>>
>> Davide Lasagna
>>
>> --
>> Phd Student
>> Dipartimento di Ingegneria Aeronautica a Spaziale
>> Politecnico di Torino, Italy
>> tel: 011/0906871
>> e-mail: davide.lasa...@polito.it; lasagnadav...@gmail.com
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] polynomial with negative exponents

2011-12-12 Thread josef . pktd
On Mon, Dec 12, 2011 at 9:04 AM, LASAGNA DAVIDE
 wrote:
> Hi,
>
> I have written a class for polynomials with negative
> exponents like:
>
> p(x) = a0 + a1*x**-1 + ... + an*x**-n
>
> The code is this one:
>
> class NegativeExpPolynomial( object ):
>      def __init__ ( self, coeffs ):
>          self.coeffs = np.array( coeffs )
>
>      def __call__( self, x ):
>          return sum( (c*x**(-i) for i, c in enumerate(
> self.coeffs ) ) )

something like

self.coeffs = np.asarray(self.coeffs)

np.sum(self.coeffs * x**(-np.arange(len(self.coeffs)))

or
np.dot(self.coeffs,  x**(-np.arange(len(self.coeffs)))  #check
shapes, or np.inner

Josef

>
> where coeffs = [a0, a1, ..., an].
>
> I find that the way i evaluate the polynomial is kind of
> *slow*, especially for polynomial with order larger than
> ~200 and for arrays x large enough.
>
> Do you have suggestions on how to speed up this code?
>
> Regards,
>
> Davide Lasagna
>
> --
> Phd Student
> Dipartimento di Ingegneria Aeronautica a Spaziale
> Politecnico di Torino, Italy
> tel: 011/0906871
> e-mail: davide.lasa...@polito.it; lasagnadav...@gmail.com
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] type checking, what's recommended?

2011-12-07 Thread josef . pktd
If I want to know whether something that might be an array is really a
plain ndarray and not a subclass, is using `type` the safest bet?

All the other forms don't discriminate against subclasses.

>>> type(np.ma.zeros(3)) is np.ndarray
False
>>> type(np.zeros(3)) is np.ndarray
True

>>> isinstance(np.ma.zeros(3), np.ndarray)
True
>>> isinstance(np.zeros(3), np.ndarray)
True

>>> issubclass(np.ma.zeros(3).__class__, np.ndarray)
True
>>> issubclass(np.zeros(3).__class__, np.ndarray)
True

>>> isinstance(np.matrix(np.zeros(3)), np.ndarray)
True
>>> type(np.matrix(np.zeros(3))) is np.ndarray
False

Thanks,

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loop through values in a array and find maximum as looping

2011-12-06 Thread josef . pktd
On Tue, Dec 6, 2011 at 9:36 PM, Olivier Delalleau  wrote:
> The "out=a" keyword will ensure your first array will keep being updated. So
> you can do something like:
>
> a = my_list_of_arrays[0]
> for b in my_list_of_arrays[1:]:
>   numpy.maximum(a, b, out=a)

I didn't think of the out argument which makes it more efficient, but
in my example I used Python's reduce which takes an iterable and not
one huge array.

Josef


>
> -=- Olivier
>
> 2011/12/6 questions anon 
>>
>> thanks for all of your help, that does look appropriate but I am not sure
>> how to loop it over thousands of files.
>> I need to keep the first array to compare with but replace any greater
>> values as I loop through each array comparing back to the same array. does
>> that make sense?
>>
>>
>> On Wed, Dec 7, 2011 at 1:12 PM, Olivier Delalleau  wrote:
>>>
>>> Thanks, I didn't know you could specify the out array :)
>>>
>>> (to the OP: my initial suggestion, although probably not very efficient,
>>> seems to work with 2D arrays too, so I have no idea why it didn't work for
>>> you -- but Nathaniel's one seems to be the ideal one anyway).
>>>
>>> -=- Olivier
>>>
>>>
>>> 2011/12/6 Nathaniel Smith 

 I think you want
   np.maximum(a, b, out=a)

 - Nathaniel

 On Dec 6, 2011 9:04 PM, "questions anon" 
 wrote:
>
> thanks for responding Josef but that is not really what I am looking
> for, I have a multidimensional array and if the next array has any values
> greater than what is in my first array I want to replace them. The data 
> are
> contained in netcdf files.
> I can achieve what I want if I combine all of my arrays using numpy
> concatenate and then using the command numpy.max(myarray, axis=0) but
> because I have so many arrays I end up with a memory error so I need to 
> find
> a way to get the maximum while looping.
>
>
>
> On Wed, Dec 7, 2011 at 12:36 PM,  wrote:
>>
>> On Tue, Dec 6, 2011 at 7:55 PM, Olivier Delalleau 
>> wrote:
>> > It may not be the most efficient way to do this, but you can do:
>> > mask = b > a
>> > a[mask] = b[mask]
>> >
>> > -=- Olivier
>> >
>> > 2011/12/6 questions anon 
>> >>
>> >> I would like to produce an array with the maximum values out of
>> >> many
>> >> (1s) of arrays.
>> >> I need to loop through many multidimentional arrays and if a value
>> >> is
>> >> larger (in the same place as the previous array) then I would like
>> >> that
>> >> value to replace it.
>> >>
>> >> e.g.
>> >> a=[1,1,2,2
>> >> 11,2,2
>> >> 1,1,2,2]
>> >> b=[1,1,3,2
>> >> 2,1,0,0
>> >> 1,1,2,0]
>> >>
>> >> where b>a replace with value in b, so the new a should be :
>> >>
>> >> a=[1,1,3,2]
>> >> 2,1,2,2
>> >> 1,1,2,2]
>> >>
>> >> and then keep looping through many arrays and replace whenever
>> >> value is
>> >> larger.
>> >>
>> >> I have tried numpy.putmask but that results in
>> >> TypeError: putmask() argument 1 must be numpy.ndarray, not list
>> >> Any other ideas? Thanks
>>
>> if I understand correctly it's a minimum.reduce
>>
>> numpy
>>
>> >>> a = np.concatenate((np.arange(5)[::-1],
>> >>> np.arange(5)))*np.ones((4,3,1))
>> >>> np.minimum.reduce(a, axis=2)
>> array([[ 0.,  0.,  0.],
>>       [ 0.,  0.,  0.],
>>       [ 0.,  0.,  0.],
>>       [ 0.,  0.,  0.]])
>> >>> a.T.shape
>> (10, 3, 4)
>>
>> python with iterable
>>
>> >>> reduce(np.maximum, a.T)
>> array([[ 4.,  4.,  4.,  4.],
>>       [ 4.,  4.,  4.,  4.],
>>       [ 4.,  4.,  4.,  4.]])
>> >>> reduce(np.minimum, a.T)
>> array([[ 0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.],
>>       [ 0.,  0.,  0.,  0.]])
>>
>> Josef
>>
>> >>
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >>
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

>>>
>>>
>>> ___
>>> N

Re: [Numpy-discussion] loop through values in a array and find maximum as looping

2011-12-06 Thread josef . pktd
On Tue, Dec 6, 2011 at 7:55 PM, Olivier Delalleau  wrote:
> It may not be the most efficient way to do this, but you can do:
> mask = b > a
> a[mask] = b[mask]
>
> -=- Olivier
>
> 2011/12/6 questions anon 
>>
>> I would like to produce an array with the maximum values out of many
>> (1s) of arrays.
>> I need to loop through many multidimentional arrays and if a value is
>> larger (in the same place as the previous array) then I would like that
>> value to replace it.
>>
>> e.g.
>> a=[1,1,2,2
>> 11,2,2
>> 1,1,2,2]
>> b=[1,1,3,2
>> 2,1,0,0
>> 1,1,2,0]
>>
>> where b>a replace with value in b, so the new a should be :
>>
>> a=[1,1,3,2]
>> 2,1,2,2
>> 1,1,2,2]
>>
>> and then keep looping through many arrays and replace whenever value is
>> larger.
>>
>> I have tried numpy.putmask but that results in
>> TypeError: putmask() argument 1 must be numpy.ndarray, not list
>> Any other ideas? Thanks

if I understand correctly it's a minimum.reduce

numpy

>>> a = np.concatenate((np.arange(5)[::-1], np.arange(5)))*np.ones((4,3,1))
>>> np.minimum.reduce(a, axis=2)
array([[ 0.,  0.,  0.],
   [ 0.,  0.,  0.],
   [ 0.,  0.,  0.],
   [ 0.,  0.,  0.]])
>>> a.T.shape
(10, 3, 4)

python with iterable

>>> reduce(np.maximum, a.T)
array([[ 4.,  4.,  4.,  4.],
   [ 4.,  4.,  4.,  4.],
   [ 4.,  4.,  4.,  4.]])
>>> reduce(np.minimum, a.T)
array([[ 0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.]])

Josef

>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] build numpy matrix out of smaller matrix

2011-12-01 Thread josef . pktd
On Thu, Dec 1, 2011 at 12:26 PM, Benjamin Root  wrote:
> On Thu, Dec 1, 2011 at 10:52 AM, jonasr  wrote:
>>
>>
>> Hi,
>> is there any possibility to define a numpy matrix, via a smaller given
>> matrix, i.e. in matlab
>> i can do this like
>>
>> a=[1 2 ; 3 4 ]
>>
>>
>> A=[a a ; a a ]
>>
>> so that i finally get
>>
>> A=[ [1,2,1,2]
>>      [3,4,3,4]
>>      [1,2,1,2]
>>      [3,4,3,4]]
>>
>> i tried different things on numpy which didn't work
>> any ideas ?
>>
>> thank you
>>
>
> numpy.tile() might be what you are looking for.

or which is my favorite tile and repeat replacement

>>> a= np.array([[1, 2], [3, 4]])
>>> np.kron(np.ones((2,2)), a)
array([[ 1.,  2.,  1.,  2.],
   [ 3.,  4.,  3.,  4.],
   [ 1.,  2.,  1.,  2.],
   [ 3.,  4.,  3.,  4.]])

>>> np.kron(a, np.ones((2,2)))
array([[ 1.,  1.,  2.,  2.],
   [ 1.,  1.,  2.,  2.],
   [ 3.,  3.,  4.,  4.],
   [ 3.,  3.,  4.,  4.]])

Josef

>
> Cheers!
> Ben Root
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.dot and array order

2011-11-30 Thread josef . pktd
np.__version__   '1.5.1'   official win32 installer

(playing with ipython for once)

I thought np.dot is Lapack based and favors fortran order, but if the
second array is fortran ordered, then dot takes twice as long. The
order of the first array seems irrelevant
(or maybe just with my shapes, in case it matters: the first array is
float64, the second is bool, and I'm low in left over memory)

In [93]: %timeit np.dot(x.T, indi)
1 loops, best of 3: 1.33 s per loop

In [94]: %timeit np.dot(xf.T, indi)
1 loops, best of 3: 1.27 s per loop

In [95]: %timeit np.dot(xf.T, indif)
1 loops, best of 3: 3 s per loop

In [100]: %timeit np.dot(x.T, indif)
1 loops, best of 3: 3.05 s per loop


In [96]: x.flags.c_contiguous
Out[96]: True

In [97]: xf.flags.c_contiguous
Out[97]: False

In [98]: indi.flags.c_contiguous
Out[98]: True

In [99]: indif.flags.c_contiguous
Out[99]: False

In [101]: x.shape
Out[101]: (20, 20)

In [102]: indi.shape
Out[102]: (20, 500)


It's just the way it is, or does it depend on ?

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] what statistical module to use for python?

2011-11-30 Thread josef . pktd
On Wed, Nov 30, 2011 at 1:16 PM, Chao YUE  wrote:
> Hi all,
>
> I just want to broadly ask what statistical package are you guys using? I
> mean routine statistical function like linear regression, GLM, ANOVA... etc.
>
> I know there is SciKits packages like statsmodels, but are there more
> general and complete ones?
>
> thanks to all,

I forwarded it to the scipy-user mailing list since that is more suitable.

Josef

>
> Chao
> --
> ***
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
> 
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] who owns the data?

2011-11-30 Thread josef . pktd
On Wed, Nov 30, 2011 at 4:00 PM, Robert Kern  wrote:
> On Wed, Nov 30, 2011 at 20:30,   wrote:
>> just a basic question (since I haven't looked at this in some time)
>>
>> I'm creating a structured array in a function. However, I want to
>> return the array with just a simple dtype
>>
>> uni = uni.view(dt).reshape(-1, ncols)
>> return uni
>>
>> the returned uni has owndata=False. Who owns the data, since the
>> underlying, original array went out of scope?
>
> Every time you make a view through .view(), slicing, .T, certain
> restricted .reshape() calls , etc. a reference to the original object
> is stored on the view. Consequently, the original object does not get
> garbage collected until all of the views go away too. Making view of a
> view just adds another link in the chain. In your example, the
> original object that was assigned to `uni` before that last assignment
> statement was executed maintains ownership of the memory. The new
> ndarray object that gets assigned to `uni` for the return statement
> refers to the temporary ndarray returned by .view() which in turn
> refers to the original `uni` array which owns the actual memory.

Thanks for the explanation.

There where cases on the mailing list where views created problem, so
I just thought of trying to own the data, but I don't think it's
really relevant.


>
>> 2)
>> uni.dtype = dt
>> uni.reshape(-1, ncols)
>> return uni
>>
>> this works and uni owns the data.
>
> uni.reshape() doesn't reshape `uni` inplace, though. It is possible
> that your `uni` array wasn't contiguous to begin with. In all of the
> cases that your first example would have owndata=False, this one
> should too.

this bug happened to me a few times now. I found it but only checked
the flags before fixing it.

Since reshape again creates a view, the next step is to assign to shape

uni.shape = (uni.size//ncols, ncols)

but that starts to look like too much inplace modifications just to avoid a view

Thanks,

Josef

>
>> I'm only worried whether assigning
>> to dtype directly is not a dangerous thing to do.
>
> It's no worse than .view(dt). The same kind of checking goes on in both 
> places.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] who owns the data?

2011-11-30 Thread josef . pktd
just a basic question (since I haven't looked at this in some time)

I'm creating a structured array in a function. However, I want to
return the array with just a simple dtype

uni = uni.view(dt).reshape(-1, ncols)
return uni

the returned uni has owndata=False. Who owns the data, since the
underlying, original array went out of scope?

alternatives

1)
uni = np.asarray(uni, dt).reshape(-1, ncols)
return uni

looks obvious but raises exception

2)
uni.dtype = dt
uni.reshape(-1, ncols)
return uni

this works and uni owns the data. I'm only worried whether assigning
to dtype directly is not a dangerous thing to do.

>>> u
array([0, 0, 0, 1, 1, 0, 1, 1])
>>> u.dtype = np.dtype("float")
>>> u
array([  0.e+000,   2.12199579e-314,   4.94065646e-324,
 2.12199579e-314])

adding a safety check:

for t in uni.dtype.fields.values():
assert (t[0] == dt)


maybe I shouldn't care if nobody owns the data.

Thanks,

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] bug in reshape ?

2011-11-22 Thread josef . pktd
might be an old story >>> np.__version__  -> '1.5.1'

It thought for once it's easier to use reshape to add a new axis
instead of ...,None
but my results got weird (normal(0,1) sample of 2.13795875e-314)

>>> x = 1
>>> y = np.arange(3)
>>> z = np.arange(2)[:,None]
>>> np.broadcast(x,y,z)

>>> np.broadcast_arrays(x,y,z)
[array([[1, 1, 1],
   [1, 1, 1]]), array([[0, 1, 2],
   [0, 1, 2]]), array([[0, 0, 0],
   [1, 1, 1]])]
>>> x1, y1, z1 = np.broadcast_arrays(x,y,z)
>>> map(np.shape, (x1, y1, z1))
[(2, 3), (2, 3), (2, 3)]

shape looks fine, let's add an extra axis with reshape

>>> x1.reshape(2,3,1)
array([[[  1],
[  1],
[ 1099464714]],

   [[-2147481592],
[184],
[  1]]])

what's that ?

>>> (0+x1).reshape(2,3,1)
array([[[1],
[1],
[1]],

   [[1],
[1],
[1]]])

>>> (y1*1.).reshape(2,3,1)
array([[[ 0.],
[ 1.],
[ 2.]],

   [[ 0.],
[ 1.],
[ 2.]]])

>>> (y1).reshape(2,3,1)
array([[[  0],
[  1],
[  2]],

   [[  0],
[ 1099447643],
[-2147483648]]])


>>> x1, y1, z1 = np.broadcast_arrays(x,y,z)
>>> x1[...,None]
array([[[1],
[1],
[1]],

   [[1],
[1],
[1]]])

>>> x1.shape
(2, 3)
>>> x1.reshape(2,3,1)
array([[[  1],
[  1],
[ 1099464730]],

   [[-2147479536],
[ -445054780],
[ 1063686842]]])


the background story: playing broadcasting tricks for
http://projects.scipy.org/scipy/ticket/1544

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser
 wrote:
>
>
> On Sat, Nov 12, 2011 at 9:59 AM,  wrote:
>>
>> On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
>>  wrote:
>> >
>> >
>> > On Sat, Nov 12, 2011 at 6:43 AM,  wrote:
>> >>
>> >> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu 
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I am playing with multiple ways to speed up the following expression
>> >> > (it is in the inner loop):
>> >> >
>> >> >
>> >> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>> >> >
>> >> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
>> >> > scalars.
>> >> >
>> >> > I played with numexpr, but it was way slower than directly using
>> >> > numpy. It would be nice if I could create a Mx3 matrix without
>> >> > copying
>> >> > memory and so I can use dot() to calculate the whole thing.
>> >> >
>> >> > Can anyone help with giving some advices to make this faster?
>> >>
>> >> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
>> >> conditions.
>> >
>> >
>> >
>> > As Josef pointed out, this is a convolution.  There are (at least)
>> > three convolution functions in numpy+scipy that you could use:
>> > numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
>> > Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
>> > beat the simple expression
>> >    a*x[2:] + b*x[1:-1] + c*x[:-2]
>> > Your idea of forming a matrix without copying memory can be done using
>> > "stride tricks", and for arrays of the size you are interested in, it
>> > computes the result faster than the simple expression (see below).
>> >
>> > Another fast alternative is to use one of the inline code generators.
>> > This example is a easy to implement with scipy.weave.blitz, and it gives
>> > a big speedup.
>> >
>> > Here's a test script:
>> >
>> > #- convolve1dtest.py -
>> >
>> >
>> > import numpy as np
>> > from numpy.lib.stride_tricks import as_strided
>> > from scipy.ndimage import convolve1d
>> > from scipy.weave import blitz
>> >
>> > # weighting coefficients
>> > a = -0.5
>> > b = 1.0
>> > c = -0.25
>> > w = np.array((a,b,c))
>> > # Reversed w:
>> > rw = w[::-1]
>> >
>> > # Length of C
>> > n = 250
>> >
>> > # The original version of the calculation:
>> > # Some dummy data
>> > C = np.arange(float(n))
>> > C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > # Save for comparison.
>> > C0 = C.copy()
>> >
>> > # Do it again using a matrix multiplication.
>> > C = np.arange(float(n))
>> > # The "virtual" matrix view of C.
>> > V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
>> > C.strides[0]))
>> > C[1:-1] = np.dot(V, rw)
>> > C1 = C.copy()
>> >
>> > # Again, with convolve1d this time.
>> > C = np.arange(float(n))
>> > C[1:-1] = convolve1d(C, w)[1:-1]
>> > C2 = C.copy()
>> >
>> > # scipy.weave.blitz
>> > C = np.arange(float(n))
>> > # Must work with a copy, D, in the formula, because blitz does not use
>> > # a temporary variable.
>> > D = C.copy()
>> > expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
>> > blitz(expr, check_size=0)
>> > C3 = C.copy()
>> >
>> >
>> > # Verify that all the methods give the same result.
>> > print np.all(C0 == C1)
>> > print np.all(C0 == C2)
>> > print np.all(C0 == C3)
>> >
>> > #-
>> >
>> > And here's a snippet from an ipython session:
>> >
>> > In [51]: run convolve1dtest.py
>> > True
>> > True
>> > True
>> >
>> > In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > 10 loops, best of 3: 16.5 us per loop
>> >
>> > In [53]: %timeit C[1:-1] = np.dot(V, rw)
>> > 10 loops, best of 3: 9.84 us per loop
>> >
>> > In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
>> > 10 loops, best of 3: 18.7 us per loop
>> >
>> > In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
>> > 10 loops, best of 3: 4.91 us per loop
>> >
>> >
>> >
>> > scipy.weave.blitz is fastest (but note that blitz has already been
>> > called
>> > once, so the time shown does not include the compilation required in
>> > the first call).  You could also try scipy.weave.inline, cython.inline,
>> > or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
>> >
>> > Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
>> > expression or convolve1d.  However, if you also have to set up V inside
>> > your inner loop, the speed gain will be lost.  The relative speeds also
>> > depend on the size of C.  For large C, the simple expression is faster
>> > than the matrix multiplication by V (but blitz is still faster).  In
>> > the following, I have changed n to 2500 before running
>> > convolve1dtest.py:
>> >
>> > In [56]: run convolve1dtest.py
>> > True
>> > True
>> > True
>> >
>> > In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > 1 loops, best of 3: 29.5 us per loop
>> >
>> > In [58]: %timeit C[1:-1] = np.dot(V, rw)
>> > 1 loops, best of 3: 56.4 us per loop
>> >
>> > In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
>> > 1 loops, best 

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
 wrote:
>
>
> On Sat, Nov 12, 2011 at 6:43 AM,  wrote:
>>
>> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu  wrote:
>> > Hi,
>> >
>> > I am playing with multiple ways to speed up the following expression
>> > (it is in the inner loop):
>> >
>> >
>> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>> >
>> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
>> > scalars.
>> >
>> > I played with numexpr, but it was way slower than directly using
>> > numpy. It would be nice if I could create a Mx3 matrix without copying
>> > memory and so I can use dot() to calculate the whole thing.
>> >
>> > Can anyone help with giving some advices to make this faster?
>>
>> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
>> conditions.
>
>
>
> As Josef pointed out, this is a convolution.  There are (at least)
> three convolution functions in numpy+scipy that you could use:
> numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
> Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
> beat the simple expression
>    a*x[2:] + b*x[1:-1] + c*x[:-2]
> Your idea of forming a matrix without copying memory can be done using
> "stride tricks", and for arrays of the size you are interested in, it
> computes the result faster than the simple expression (see below).
>
> Another fast alternative is to use one of the inline code generators.
> This example is a easy to implement with scipy.weave.blitz, and it gives
> a big speedup.
>
> Here's a test script:
>
> #- convolve1dtest.py -
>
>
> import numpy as np
> from numpy.lib.stride_tricks import as_strided
> from scipy.ndimage import convolve1d
> from scipy.weave import blitz
>
> # weighting coefficients
> a = -0.5
> b = 1.0
> c = -0.25
> w = np.array((a,b,c))
> # Reversed w:
> rw = w[::-1]
>
> # Length of C
> n = 250
>
> # The original version of the calculation:
> # Some dummy data
> C = np.arange(float(n))
> C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> # Save for comparison.
> C0 = C.copy()
>
> # Do it again using a matrix multiplication.
> C = np.arange(float(n))
> # The "virtual" matrix view of C.
> V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0]))
> C[1:-1] = np.dot(V, rw)
> C1 = C.copy()
>
> # Again, with convolve1d this time.
> C = np.arange(float(n))
> C[1:-1] = convolve1d(C, w)[1:-1]
> C2 = C.copy()
>
> # scipy.weave.blitz
> C = np.arange(float(n))
> # Must work with a copy, D, in the formula, because blitz does not use
> # a temporary variable.
> D = C.copy()
> expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
> blitz(expr, check_size=0)
> C3 = C.copy()
>
>
> # Verify that all the methods give the same result.
> print np.all(C0 == C1)
> print np.all(C0 == C2)
> print np.all(C0 == C3)
>
> #-
>
> And here's a snippet from an ipython session:
>
> In [51]: run convolve1dtest.py
> True
> True
> True
>
> In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> 10 loops, best of 3: 16.5 us per loop
>
> In [53]: %timeit C[1:-1] = np.dot(V, rw)
> 10 loops, best of 3: 9.84 us per loop
>
> In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> 10 loops, best of 3: 18.7 us per loop
>
> In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
> 10 loops, best of 3: 4.91 us per loop
>
>
>
> scipy.weave.blitz is fastest (but note that blitz has already been called
> once, so the time shown does not include the compilation required in
> the first call).  You could also try scipy.weave.inline, cython.inline,
> or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
>
> Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
> expression or convolve1d.  However, if you also have to set up V inside
> your inner loop, the speed gain will be lost.  The relative speeds also
> depend on the size of C.  For large C, the simple expression is faster
> than the matrix multiplication by V (but blitz is still faster).  In
> the following, I have changed n to 2500 before running convolve1dtest.py:
>
> In [56]: run convolve1dtest.py
> True
> True
> True
>
> In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> 1 loops, best of 3: 29.5 us per loop
>
> In [58]: %timeit C[1:-1] = np.dot(V, rw)
> 1 loops, best of 3: 56.4 us per loop
>
> In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> 1 loops, best of 3: 37.3 us per loop
>
> In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
> 10 loops, best of 3: 10.3 us per loop
>
>
> blitz wins, the simple numpy expression is a distant second, and now
> the matrix multiplication is slowest.
>
> I hope that helps--I know I learned quite a bit. :)

Interesting, two questions

does scipy.signal convolve have a similar overhead as np.convolve1d ?

memory:
the blitz code doesn't include the array copy (D), so the timing might
be a bit misleading?
I assume the as_strided call doesn't allocate any memory yet, so the
timing s

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu  wrote:
> Hi,
>
> I am playing with multiple ways to speed up the following expression
> (it is in the inner loop):
>
>
> C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>
> where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.
>
> I played with numexpr, but it was way slower than directly using
> numpy. It would be nice if I could create a Mx3 matrix without copying
> memory and so I can use dot() to calculate the whole thing.
>
> Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c])  to me except for the boundary conditions.

Josef


>
> Thanks,
> G
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?

2011-10-30 Thread josef . pktd
On Sun, Oct 30, 2011 at 7:18 AM, David Cournapeau  wrote:
> On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers
>  wrote:
>> Hi David,
>>
>> On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau 
>> wrote:
>>>
>>> Hi,
>>>
>>> I was wondering if we could finally move to a more recent version of
>>> compilers for official win32 installers. This would of course concern
>>> the next release cycle, not the ones where beta/rc are already in
>>> progress.
>>>
>>> Basically, the pros:
>>>  - we will have to move at some point
>>>  - gcc 4.* seem less buggy, especially C++ and fortran.
>>>  - no need to maintain msvcr90 vodoo
>>> The cons:
>>>  - it will most likely break the ABI
>>>  - we need to recompile atlas (but I can take care of it)
>>>  - the biggest: it is difficult to combine gfortran with visual
>>> studio (more exactly you cannot link gfortran runtime to a visual
>>> studio executable). The only solution I could think of would be to
>>> recompile the gfortran runtime with Visual Studio, which for some
>>> reason does not sound very appealing :)
>>
>> To get the datetime changes to work with MinGW, we already concluded that
>> building with 4.x is more or less required (without recognizing some of the
>> points you list above). Changes to mingw32ccompiler to fix compilation with
>> 4.x went in in https://github.com/numpy/numpy/pull/156. It would be good if
>> you could check those.
>
> I will look into it more carefully, but overall, it seems that
> building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well.
> The main issue is that gcc 4.* adds some dependencies on mingw dlls.
> There are two options:
>  - adding the dlls in the installers
>  - statically linking those, which seems to be a bad idea
> (generalizing the dll boundaries problem to exception and things we
> would rather not care about:
> http://cygwin.com/ml/cygwin/2007-06/msg00332.html).
>
>> It probably makes sense make this move for numpy 1.7. If this breaks the ABI
>> then it would be easiest to make numpy 1.7 the minimum required version for
>> scipy 0.11.
>
> My thinking as well.

It looks like it's really time to upgrade. pythonxy comes with
gfortran and a new MingW, and I cannot build scipy anymore. I don't
find an installer for the old MingW 3.5 anymore for my new computer.

(I haven't seen any problems mixing gcc 4.4 with C/C++ code like
scikits.learn against the official numpy/scipy installer versions.)

I can volunteer for testing, since it looks like I'm set up for gfortran.

Josef

>
> cheers,
>
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?

2011-10-27 Thread josef . pktd
On Thu, Oct 27, 2011 at 9:02 AM, David Cournapeau  wrote:
> Hi,
>
> I was wondering if we could finally move to a more recent version of
> compilers for official win32 installers. This would of course concern
> the next release cycle, not the ones where beta/rc are already in
> progress.
>
> Basically, the pros:
>  - we will have to move at some point
>  - gcc 4.* seem less buggy, especially C++ and fortran.
>  - no need to maintain msvcr90 vodoo
> The cons:
>  - it will most likely break the ABI
>  - we need to recompile atlas (but I can take care of it)
>  - the biggest: it is difficult to combine gfortran with visual
> studio (more exactly you cannot link gfortran runtime to a visual
> studio executable). The only solution I could think of would be to
> recompile the gfortran runtime with Visual Studio, which for some
> reason does not sound very appealing :)

What does the last mean in practice? (definition of linking in this case?)
If numpy and scipy are compiled with MingW gcc 4.*, then it cannot be
used with the standard python?
Or does it just mean we cannot combine fortran extensions that are
build with MingW with extension build with visual studio?

another example: would Matplotlib compiled against visual studio work
with a new MingW compiled numpy? I guess that's what the ABI break
would prevent?

Since we will have to update MingW sooner or later anyway, I'm in
favor of doing it. And given the comments on the mailing list about
the Linux transition to gfortran, I expect that the transition will
take some time.

Thanks for your successful effort that installation on Windows was
without problems for years for a user like me.

Josef

>
> Thoughts ?
>
> cheers,
>
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dtyping with .astype()

2011-10-17 Thread josef . pktd
On Mon, Oct 17, 2011 at 10:18 AM, Pauli Virtanen  wrote:
> 17.10.2011 15:48, josef.p...@gmail.com kirjoitti:
>> On Mon, Oct 17, 2011 at 6:17 AM, Pauli Virtanen  wrote:
> [clip]
 What am I missing? How to do this?
>>>
>>> np.rec.fromarrays(arr.T, dtype=dt)
>>
>> y.astype(float16).view(dt)
>
> I think this will give surprises if the original array is not in C-order.

I forgot about those, dangerous if the array is square, otherwise it
raises an error if it is in F-order

maybe
np.asarray(y, np.float16, order='C').view(dt)
if I don't like record arrays

Josef

>
>        Pauli
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dtyping with .astype()

2011-10-17 Thread josef . pktd
On Mon, Oct 17, 2011 at 6:17 AM, Pauli Virtanen  wrote:
> 13.10.2011 12:59, Alex van der Spek kirjoitti:
>> gives me a confusing result. I only asked to name the columns and change 
>> their
>> types to half precision floats.
>
> Structured arrays shouldn't be thought as an array with named columns,
> as they are somewhat different.
>
>> What am I missing? How to do this?
>
> np.rec.fromarrays(arr.T, dtype=dt)

y.astype(float16).view(dt)

?
Josef


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simulate AR

2011-10-14 Thread josef . pktd
On Fri, Oct 14, 2011 at 2:59 PM,   wrote:
> On Fri, Oct 14, 2011 at 2:29 PM,   wrote:
>> On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac  wrote:
>>> On 10/14/2011 1:42 PM, josef.p...@gmail.com wrote:
 If I remember correctly, signal.lfilter doesn't require stationarity,
 but handling of the starting values is a bit difficult.
>>>
>>>
>>> Hmm.  Yes.
>>> AR(1) is trivial, but how do you handle higher orders?
>>
>> I would have to look for it.
>> You can invert the stationary part of the AR polynomial with the numpy
>> polynomial classes using division. The main thing is to pad with
>> enough zeros corresponding to the truncation that you want. And in the
>> old classes to watch out because the order is reversed high to low
>> instead of low to high or the other way around.
>
> I found it in the examples folder (pure numpy)
> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/examples/tsa/lagpolynomial.py
>
 ar = LagPolynomial([1, -0.8])
 ma = LagPolynomial([1])
 ma.div(ar)
> Polynomial([ 1. ,  0.8], [-1.,  1.])
 ma.div(ar, maxlag=50)
> Polynomial([  1.e+00,   8.e-01,   6.4000e-01,
>         5.1200e-01,   4.0960e-01,   3.2768e-01,
>         2.62144000e-01,   2.09715200e-01,   1.67772160e-01,
>         1.34217728e-01,   1.07374182e-01,   8.58993459e-02,
>         6.87194767e-02,   5.49755814e-02,   4.39804651e-02,
>         3.51843721e-02,   2.81474977e-02,   2.25179981e-02,
>         1.80143985e-02,   1.44115188e-02,   1.15292150e-02,
>         9.22337204e-03,   7.37869763e-03,   5.90295810e-03,
>         4.72236648e-03,   3.77789319e-03,   3.02231455e-03,
>         2.41785164e-03,   1.93428131e-03,   1.54742505e-03,
>         1.23794004e-03,   9.90352031e-04,   7.92281625e-04,
>         6.33825300e-04,   5.07060240e-04,   4.05648192e-04,
>         3.24518554e-04,   2.59614843e-04,   2.07691874e-04,
>         1.66153499e-04,   1.32922800e-04,   1.06338240e-04,
>         8.50705917e-05,   6.80564734e-05,   5.44451787e-05,
>         4.35561430e-05,   3.48449144e-05,   2.78759315e-05,
>         2.23007452e-05], [-1.,  1.])
>
 ar = LagPolynomial([1, -0.8, 0.2, 0.1, 0.1])
 ma.div(ar, maxlag=50)
> Polynomial([  1.e+00,   8.e-01,   4.4000e-01,
>         9.2000e-02,  -1.9440e-01,  -2.9792e-01,
>        -2.52656000e-01,  -1.32300800e-01,  -6.07744000e-03,
>         7.66558080e-02,   1.01035814e-01,   7.93353139e-02,
>         3.62032515e-02,  -4.67362386e-03,  -2.90166622e-02,
>        -3.38324615e-02,  -2.44155995e-02,  -9.39695872e-03,
>         3.65046531e-03,   1.06245701e-02,   1.11508188e-02,
>         7.37039040e-03,   2.23864501e-03,  -1.86070097e-03,
>        -3.78841070e-03,  -3.61949191e-03,  -2.17570579e-03,
>        -4.51755084e-04,   8.14527351e-04,   1.32149267e-03,
>         1.15703475e-03,   6.25052041e-04,   5.50326804e-05,
>        -3.28837006e-04,  -4.52284820e-04,  -3.64068927e-04,
>        -1.73417745e-04,   1.21917720e-05,   1.26072341e-04,
>         1.52168186e-04,   1.12642678e-04,   4.58540937e-05,
>        -1.36693133e-05,  -4.65873557e-05,  -5.03856990e-05,
>        -3.42095661e-05], [-1.,  1.])

I just realized that my LagPolynomial has a filter method

>>> marep = ma.div(ar, maxlag=50)
>>> u = np.random.randn(5000)
>>> x = marep.filter(u)[1000:]
>>> import  scikits.statsmodels.api as sm
>>> sm.tsa.AR(x).fit(4, trend='nc').params
array([ 0.80183437, -0.22098967, -0.08484519, -0.12590277])
>>> #true (different convention)
>>> -np.array([1, -0.8, 0.2, 0.1, 0.1])[1:]
array([ 0.8, -0.2, -0.1, -0.1])

not bad, if the sample is large enough. I don't remember what numpy
polynomial use under the hood (maybe convolve)

>
> no guarantees, I don't remember how much I tested these things, but I
> spent a lot of time doing it 3 or 4 different ways.

Josef


>
> Josef
>
>>
>> I switched to using mostly lfilter, but I guess the statsmodels
>> sandbox (and the mailing list) still has my "playing with ARMA
>> polynomials" code.
>> (I think it might be pretty useful for teaching. I wished I had the
>> functions to calculate some examples when I learned this.)
>>
>> Josef
>>>
>>> Thanks,
>>> Alan
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simulate AR

2011-10-14 Thread josef . pktd
On Fri, Oct 14, 2011 at 2:29 PM,   wrote:
> On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac  wrote:
>> On 10/14/2011 1:42 PM, josef.p...@gmail.com wrote:
>>> If I remember correctly, signal.lfilter doesn't require stationarity,
>>> but handling of the starting values is a bit difficult.
>>
>>
>> Hmm.  Yes.
>> AR(1) is trivial, but how do you handle higher orders?
>
> I would have to look for it.
> You can invert the stationary part of the AR polynomial with the numpy
> polynomial classes using division. The main thing is to pad with
> enough zeros corresponding to the truncation that you want. And in the
> old classes to watch out because the order is reversed high to low
> instead of low to high or the other way around.

I found it in the examples folder (pure numpy)
https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/examples/tsa/lagpolynomial.py

>>> ar = LagPolynomial([1, -0.8])
>>> ma = LagPolynomial([1])
>>> ma.div(ar)
Polynomial([ 1. ,  0.8], [-1.,  1.])
>>> ma.div(ar, maxlag=50)
Polynomial([  1.e+00,   8.e-01,   6.4000e-01,
 5.1200e-01,   4.0960e-01,   3.2768e-01,
 2.62144000e-01,   2.09715200e-01,   1.67772160e-01,
 1.34217728e-01,   1.07374182e-01,   8.58993459e-02,
 6.87194767e-02,   5.49755814e-02,   4.39804651e-02,
 3.51843721e-02,   2.81474977e-02,   2.25179981e-02,
 1.80143985e-02,   1.44115188e-02,   1.15292150e-02,
 9.22337204e-03,   7.37869763e-03,   5.90295810e-03,
 4.72236648e-03,   3.77789319e-03,   3.02231455e-03,
 2.41785164e-03,   1.93428131e-03,   1.54742505e-03,
 1.23794004e-03,   9.90352031e-04,   7.92281625e-04,
 6.33825300e-04,   5.07060240e-04,   4.05648192e-04,
 3.24518554e-04,   2.59614843e-04,   2.07691874e-04,
 1.66153499e-04,   1.32922800e-04,   1.06338240e-04,
 8.50705917e-05,   6.80564734e-05,   5.44451787e-05,
 4.35561430e-05,   3.48449144e-05,   2.78759315e-05,
 2.23007452e-05], [-1.,  1.])

>>> ar = LagPolynomial([1, -0.8, 0.2, 0.1, 0.1])
>>> ma.div(ar, maxlag=50)
Polynomial([  1.e+00,   8.e-01,   4.4000e-01,
 9.2000e-02,  -1.9440e-01,  -2.9792e-01,
-2.52656000e-01,  -1.32300800e-01,  -6.07744000e-03,
 7.66558080e-02,   1.01035814e-01,   7.93353139e-02,
 3.62032515e-02,  -4.67362386e-03,  -2.90166622e-02,
-3.38324615e-02,  -2.44155995e-02,  -9.39695872e-03,
 3.65046531e-03,   1.06245701e-02,   1.11508188e-02,
 7.37039040e-03,   2.23864501e-03,  -1.86070097e-03,
-3.78841070e-03,  -3.61949191e-03,  -2.17570579e-03,
-4.51755084e-04,   8.14527351e-04,   1.32149267e-03,
 1.15703475e-03,   6.25052041e-04,   5.50326804e-05,
-3.28837006e-04,  -4.52284820e-04,  -3.64068927e-04,
-1.73417745e-04,   1.21917720e-05,   1.26072341e-04,
 1.52168186e-04,   1.12642678e-04,   4.58540937e-05,
-1.36693133e-05,  -4.65873557e-05,  -5.03856990e-05,
-3.42095661e-05], [-1.,  1.])

no guarantees, I don't remember how much I tested these things, but I
spent a lot of time doing it 3 or 4 different ways.

Josef

>
> I switched to using mostly lfilter, but I guess the statsmodels
> sandbox (and the mailing list) still has my "playing with ARMA
> polynomials" code.
> (I think it might be pretty useful for teaching. I wished I had the
> functions to calculate some examples when I learned this.)
>
> Josef
>>
>> Thanks,
>> Alan
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simulate AR

2011-10-14 Thread josef . pktd
On Fri, Oct 14, 2011 at 2:39 PM, Skipper Seabold  wrote:
> On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac  wrote:
>>
>> On 10/14/2011 1:42 PM, josef.p...@gmail.com wrote:
>> > If I remember correctly, signal.lfilter doesn't require stationarity,
>> > but handling of the starting values is a bit difficult.
>>
>>
>> Hmm.  Yes.
>> AR(1) is trivial, but how do you handle higher orders?
>>
>
> Not sure if this is what you're after, but here I go the other way
> signal -> noise with known initial values of an ARMA(p,q) process.
> Here I want to set it such that the first p error terms are zero, I
> had to solve for the zi that make this so
>
> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/tsa/arima_model.py#L295
>
> This is me talking to myself about this.
>
> http://thread.gmane.org/gmane.comp.python.scientific.user/27162/focus=27162

with two more simultaneous threads on the statsmodels mailing list. :)

Josef

>
> Skipper
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simulate AR

2011-10-14 Thread josef . pktd
On Fri, Oct 14, 2011 at 2:18 PM, Alan G Isaac  wrote:
> On 10/14/2011 1:42 PM, josef.p...@gmail.com wrote:
>> If I remember correctly, signal.lfilter doesn't require stationarity,
>> but handling of the starting values is a bit difficult.
>
>
> Hmm.  Yes.
> AR(1) is trivial, but how do you handle higher orders?

I would have to look for it.
You can invert the stationary part of the AR polynomial with the numpy
polynomial classes using division. The main thing is to pad with
enough zeros corresponding to the truncation that you want. And in the
old classes to watch out because the order is reversed high to low
instead of low to high or the other way around.

I switched to using mostly lfilter, but I guess the statsmodels
sandbox (and the mailing list) still has my "playing with ARMA
polynomials" code.
(I think it might be pretty useful for teaching. I wished I had the
functions to calculate some examples when I learned this.)

Josef
>
> Thanks,
> Alan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simulate AR

2011-10-14 Thread josef . pktd
On Fri, Oct 14, 2011 at 1:26 PM, Alan G Isaac  wrote:
>>> Assuming stationarity ...
>
> On 10/14/2011 1:22 PM, josef.p...@gmail.com wrote:
>> maybe ?
>
> I just meant that the MA approximation is
> not reliable for a non-stationary AR.
> E.g., http://www.jstor.org/stable/2348631

section 5: simulating an ARIMA: simulate stationary ARMA, then cumsum it.

I guess, this only applies to simple integrated processes, where we
can split it up into ar(L)(1-L) y_t with ar(L) a stationary
polynomials.
(besides seasonal integration, I haven't seen or used any other
non-stationary AR processes.)

If I remember correctly, signal.lfilter doesn't require stationarity,
but handling of the starting values is a bit difficult.

Josef

>
> Cheers,
> Alan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simulate AR

2011-10-14 Thread josef . pktd
On Fri, Oct 14, 2011 at 12:49 PM, Alan G Isaac  wrote:
> On 10/14/2011 12:21 PM, josef.p...@gmail.com wrote:
>> One other way to simulate the AR is to get the (truncated)
>> MA-representation, and then convolve can be used
>
>
> Assuming stationarity ...

maybe ?
If it's integrated, then you need a starting point and cumsum might
still work. (like in a random walk)
No idea about seasonal integration, it would require too much thinking

(not tested)

Josef

>
> Alan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simulate AR

2011-10-14 Thread josef . pktd
On Fri, Oct 14, 2011 at 11:56 AM, Fabrice Silva  wrote:
> Le vendredi 14 octobre 2011 à 10:49 -0400, josef.p...@gmail.com a
> écrit :
>> On Fri, Oct 14, 2011 at 10:24 AM, Alan G Isaac  wrote:
>> > As a simple example, if I have y0 and a white noise series e,
>> > what is the best way to produces a series y such that y[t] = 0.9*y[t-1] + 
>> > e[t]
>> > for t=1,2,...?
>> >
>> > 1. How can I best simulate an autoregressive process using NumPy?
>> >
>> > 2. With SciPy, it looks like I could do this as
>> > e[0] = y0
>> > signal.lfilter((1,),(1,-0.9),e)
>> > Am I overlooking similar (or substitute) functionality in NumPy?
>>
>> I don't think so. At least I didn't find anything in numpy for this.
>> An MA process would be a convolution, but for simulating AR I only
>> found signal.lfilter. (unless numpy has gained extra features that I
>> don't have in 1.5)
>>
>> Except, I think it's possible to do it with fft, if you want to
>> fft-inverse-convolve (?)
>> But simulating an ARMA with fft was much slower than lfilter in my
>> short experimentation with it.
>
> About speed comparison between lfilter, convolve, etc...
> http://www.scipy.org/Cookbook/ApplyFIRFilter

One other way to simulate the AR is to get the (truncated)
MA-representation, and then convolve can be used, as in
AppluFIRFilter.

numpy polynomials can be used it invert the AR-polynomial (with a bit
of juggling.)

Josef

>
> --
> Fabrice Silva
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] simulate AR

2011-10-14 Thread josef . pktd
On Fri, Oct 14, 2011 at 10:24 AM, Alan G Isaac  wrote:
> As a simple example, if I have y0 and a white noise series e,
> what is the best way to produces a series y such that y[t] = 0.9*y[t-1] + e[t]
> for t=1,2,...?
>
> 1. How can I best simulate an autoregressive process using NumPy?
>
> 2. With SciPy, it looks like I could do this as
> e[0] = y0
> signal.lfilter((1,),(1,-0.9),e)
> Am I overlooking similar (or substitute) functionality in NumPy?

I don't think so. At least I didn't find anything in numpy for this.
An MA process would be a convolution, but for simulating AR I only
found signal.lfilter. (unless numpy has gained extra features that I
don't have in 1.5)

Except, I think it's possible to do it with fft, if you want to
fft-inverse-convolve (?)
But simulating an ARMA with fft was much slower than lfilter in my
short experimentation with it.

Josef

>
> Thanks,
> Alan Isaac
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] wanted: decent matplotlib alternative

2011-10-13 Thread josef . pktd
On Thu, Oct 13, 2011 at 6:42 PM, John Hunter  wrote:
> On Thu, Oct 13, 2011 at 5:36 PM, Eric Firing  wrote:
>
>>> It would be nice to have a social interface for the mpl gallery like the
>>> one similar to the R-gallery
>>> [http://www.r-bloggers.com/the-r-graph-gallery-goes-social/]
>>
>> I think that the priority should go towards massive pruning,
>> organization, and cleanup of the gallery.  This would be a great project
>> for a new contributor to mpl.
>
> So as to not hijack poor Christoph's thread, who after all is looking
> for mpl alternatives, and not to abuse numpy-discussion's bandwidth
> with mpl issues, I have opened and issue on the github tracker as an
> "open thread" to register gripes and suggestions for mpl doc and
> gallery improvements, particularly as it regards API usage
>
> https://github.com/matplotlib/matplotlib/issues/524

I started, but I would like to mention that I'm very happy with the
pyplot interface (even if sometimes it takes a bit of time to figure
out which options to use).

Josef

>
> JDH
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] the axis parameter in the np.ma.concatenate is not working?

2011-10-13 Thread josef . pktd
On Thu, Oct 13, 2011 at 1:17 PM, Chao YUE  wrote:
> Dear all,
>
> I use numpy version 1.5.1 which is installed by default when I do sudo
> apt-get install numpy on ubuntu 11.04.
> but it seems that for np.ma.concatenate(arrays, axis), the axis parameter is
> not working?
>
> In [460]: a=np.arange(10)
>
> In [461]: a=np.ma.masked_array(a,a<3)
>
> In [462]: a
> Out[462]:
> masked_array(data = [-- -- -- 3 4 5 6 7 8 9],
>  mask = [ True  True  True False False False False False False
> False],
>    fill_value = 99)
>
>
> In [463]: b=np.arange(10)
>
> In [464]: b=np.ma.masked_array(a,b>7)
>
> In [465]: b
> Out[465]:
> masked_array(data = [-- -- -- 3 4 5 6 7 -- --],
>  mask = [ True  True  True False False False False False  True
> True],
>    fill_value = 99)
>
>
> In [466]: c=np.ma.concatenate((a,b),axis=0)
>
> In [467]: c
> Out[467]:
> masked_array(data = [-- -- -- 3 4 5 6 7 8 9 -- -- -- 3 4 5 6 7 -- --],
>  mask = [ True  True  True False False False False False False
> False  True  True
>   True False False False False False  True  True],
>    fill_value = 99)
>
>
> In [468]: c.shape
> Out[468]: (20,)
>
> In [469]: c=np.ma.concatenate((a,b),axis=1)

maybe you want numpy.ma.column_stack

for concatenate you need to add extra axis first

something like
c=np.ma.concatenate((a[:,None], b[:,None]),axis=1)  (not tested)

Josef

>
> In [470]: c.shape
> Out[470]: (20,)
>
> cheers,
>
> Chao
>
> --
> ***
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
> 
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to list all the values in a ndarray without repeat (like filter in excel)

2011-10-13 Thread josef . pktd
On Thu, Oct 13, 2011 at 9:14 AM, Chao YUE  wrote:
> Dear all,
>
> if I have a ndarray like array([1,2,3,2,3,1,1,1,2,2,,2,2,3]) containing
> some values that are flag for data quality.
> how can list all the values in this array, like doing a descriptive
> statistics. I guess I should use Scipy statistics ?
> Thanks for any ideas.

Not sure what you are asking for

np.unique to get unique values
selecting by mask: mask = (myarr == 3), arr[mask]
or statistics of another variable by groups/flags: np.bincount(arr,
weights=myvar)

Josef


>
> Chao
>
> --
> ***
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
> 
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] abs for max negative integers - desired behavior?

2011-10-11 Thread josef . pktd
On Tue, Oct 11, 2011 at 7:13 PM, Benjamin Root  wrote:
> On Tue, Oct 11, 2011 at 2:51 PM, Matthew Brett 
> wrote:
>>
>> Hi
>>
>> On Tue, Oct 11, 2011 at 3:16 PM, Charles R Harris
>>  wrote:
>> >
>> >
>> > On Tue, Oct 11, 2011 at 12:23 PM, Matthew Brett
>> > 
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I recently ran into this:
>> >>
>> >> In [68]: arr = np.array(-128, np.int8)
>> >>
>> >> In [69]: arr
>> >> Out[69]: array(-128, dtype=int8)
>> >>
>> >> In [70]: np.abs(arr)
>> >> Out[70]: -128
>> >>
>> >
>> > This has come up for discussion before, but no consensus was ever
>> > reached.
>> > One solution is for abs to return an unsigned type, but then combining
>> > that
>> > with signed type of the same number of bits will cause both to be cast
>> > to
>> > higher precision. IIRC, matlab was said to return +127 as abs(-128),
>> > which,
>> > if true, is quite curious.
>>
>> octave-3.2.3:1> a = int8([-128, 127])
>> a =
>>
>>  -128  127
>>
>> octave-3.2.3:2> abs(a)
>> ans =
>>
>>  127  127
>>
>> Matlab is the same.  That is curious...
>>
>> See you,
>>
>> Matthew
>
> Well, it _is_ only off by 0.78%. That should be good enough for government
> work, right?

So, which government is using numpy, only off by 200%

Josef

>
> Ben Root
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nice float -> integer conversion?

2011-10-11 Thread josef . pktd
On Tue, Oct 11, 2011 at 3:06 PM, Derek Homeier
 wrote:
> On 11 Oct 2011, at 20:06, Matthew Brett wrote:
>
>> Have I missed a fast way of doing nice float to integer conversion?
>>
>> By nice I mean, rounding to the nearest integer, converting NaN to 0,
>> inf, -inf to the max and min of the integer range?  The astype method
>> and cast functions don't do what I need here:
>>
>> In [40]: np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16)
>> Out[40]: array([1, 0, 0, 0], dtype=int16)
>>
>> In [41]: np.cast[np.int16](np.array([1.6, np.nan, np.inf, -np.inf]))
>> Out[41]: array([1, 0, 0, 0], dtype=int16)
>>
>> Have I missed something obvious?
>
> np.[a]round comes closer to what you wish (is there consensus
> that NaN should map to 0?), but not quite there, and it's not really
> consistent either!
>
> In [42]: c = np.zeros(4, np.int16)
> In [43]: d = np.zeros(4, np.int32)
> In [44]: np.around([1.6,np.nan,np.inf,-np.inf], out=c)
> Out[44]: array([2, 0, 0, 0], dtype=int16)
>
> In [45]: np.around([1.6,np.nan,np.inf,-np.inf], out=d)
> Out[45]: array([          2, -2147483648, -2147483648, -2147483648], 
> dtype=int32)
>
> Perhaps a starting point to harmonise this behaviour and get it closer to
> your expectations (it still would not be really nice having to define the
> output array first, I guess)...

what numpy is this?

>>> np.array([1.6, np.nan, np.inf, -np.inf]).astype(np.int16)
array([ 1, -32768, -32768, -32768], dtype=int16)
>>> np.__version__
'1.5.1'
>>> a = np.ones(4, np.int16)
>>> a[:]=np.array([1.6, np.nan, np.inf, -np.inf])
>>> a
array([ 1, -32768, -32768, -32768], dtype=int16)


I thought we get ValueError to avoid nan to zero bugs

>>> a[2] = np.nan
Traceback (most recent call last):
  File "", line 1, in 
a[2] = np.nan
ValueError: cannot convert float NaN to integer

Josef



>
> Cheers,
>                                                        Derek
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A Foundation for the support of NumPy and SciPy

2011-10-04 Thread josef . pktd
On Tue, Oct 4, 2011 at 7:36 PM, Nathaniel Smith  wrote:
> [Does the group actually exist yet? Google says: "No groups match
> fastecuhla." Replying here instead...]

https://groups.google.com/group/fastecuhla

>
> I've been following discussions around non-profit incorporation for
> FOSS projects for about a decade (including some years on the internal
> mailing list for SPI Inc. -- Debian's non-profit foundation). My
> strong recommendation is that we not do it ourselves. Setting up our
> own non-profit takes an immense amount of energy, and keeping it going
> requires continuing to jump through annoying hoops on a regular basis
> (you must have a procedure for selecting a board; the board must meet
> on some regular schedule, achieve quorum, and regularly elect
> officers; each board meeting must have minutes produced and approved,
> you must file taxes on time, ...), and it's expensive to boot (you'll
> need a professional accountant, etc.). As a result, most projects that
> try going it on their own end up with a horrible mess sooner or later.
> It works okay if you're, say, Gnome, but most projects are not Gnome.
>
> But fortunately, this is a solved problem: there are several
> non-profit umbrella corporations that are set up to let experts take
> care of this nonsense and amortize the costs over multiple projects.
> The Software Freedom Conservancy is probably the most well put
> together:
>   http://www.sfconservancy.org/overview/
>   http://www.sfconservancy.org/members/services/
>   http://sfconservancy.org/about/board/
> Many large projects with complicated legal situations like Samba,
> Busybox, jQuery, Wine, Boost, ... have also chosen this approach:
>   http://www.sfconservancy.org/members/current/
>
> TL;DR: When it comes to legal matters: starting your own non-profit is
> to joining an existing umbrella non-profit as CVS is to git. (And in
> fact git is also a SF Conservancy member.)
>
> My $0.02,
> -- Nathaniel
>
> On Tue, Oct 4, 2011 at 1:58 PM, Travis Oliphant  wrote:
>> Hi all,
>>
>> At the recent US SciPy conference and at other times in the past I have been 
>> approached about the possibility of creating a foundation to support the 
>> development of SciPy and NumPy.
>>
>> I know there are varying opinions about that, but I am generally very 
>> supportive of the idea and would like to encourage it as much as I can.   It 
>> would be interesting to have a public discussion of the issues, but these 
>> discussions should not clog the main list of either NumPy or SciPy.
>>
>> As a result, there has been set up a public mailing list for discussion of 
>> the creation of a Foundation for the Advancement of Scientific, Technical, 
>> and Engineering Computing Using High Level Abstractions (FASTECUHLA).     
>> The list is fastecu...@googlegroups.com
>>
>> This is a place-holder name that can be replaced if somebody comes up with a 
>> better one.     Please sign up for that list if you would like to contribute 
>> to the discussion.
>>
>> The items to discuss include:
>>        * where to organize
>>        * what the purposes should be
>>        * who should be members
>>        * where should money come from
>>        * what other organizations exist that we could either piggy-back on 
>> or emulate
>>        * what are the pitfalls on starting a foundation to support NumPy and 
>> SciPy versus other approaches
>>        * who has time to participate in its organization and maintenance
>>
>> One important feature is that I see this foundation as a service opportunity 
>> and obligation and not as a "feather-in-the-cap" or something to join 
>> lightly.   I'm hopeful that it can be a place where people and organizations 
>> can donate money and know that it will be going directly to further the core 
>> packages for Scientific Computing with Python.
>>
>> Thank you,
>>
>> -Travis
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dealing with arrays

2011-09-20 Thread josef . pktd
On Tue, Sep 20, 2011 at 10:06 AM,   wrote:
> Hi Chico, my list needs to be converted to a numpy array first before I do 
> the count. Some other calculations rely on the count and its going to be 
> built all on numpy.
>
> I have tried the collections.Counter() option but in python 2.6, the module 
> collections has no Counter function.
>
> Anymore suggestions please.
> Sent from my BlackBerry wireless device from MTN
>
> -Original Message-
> From: ch...@cnpmf.embrapa.br
> Sender: numpy-discussion-boun...@scipy.org
> Date: Tue, 20 Sep 2011 11:00:08
> To: Discussion of Numerical Python
> Reply-To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Dealing with arrays
>
> What about:
>
 listoflists = [[1,1,1], [1,2,3], [2,2,2]]
 for eachlist in listoflists:
> ...     print eachlist.count(1), eachlist.count(2), eachlist.count(3)
> ...
> 3 0 0
> 1 1 1
> 0 3 0

>
> Chico
>
>
>
>> Hi All,
>>
>> I am extracting a list of lists from an application. In this case, my list
>> has 12 lists.
>> I then coverted each list into a numpy array. This returns 12 different
>> arrays. Very well. The 12 different arrays are like as follows:
>> [2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2
>>  2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2
>>  2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2
>>  2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 2
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2
>>  2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 2 1
>>  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2
>>  2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2
>>  1 2 2 2 2 2]
>> [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>>  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Re: [Numpy-discussion] datetime64 y2k38 bug

2011-09-18 Thread josef . pktd
On Sun, Sep 18, 2011 at 11:13 PM, Charles R Harris
 wrote:
>
>
> On Sun, Sep 18, 2011 at 9:08 PM, Charles R Harris
>  wrote:
>>
>>
>> On Sun, Sep 18, 2011 at 6:32 PM, Benjamin Root  wrote:
>>>
>>> I was working on adding some test cases in numpy for the argmin/max
>>> functions with some datetime64s.  I found that on my 32-bit machine, it
>>> fails to parse a date past the Y2.038k date.  I find this odd because the
>>> datetime is supposed to be 64-bits, but I guess there is some arch-dependent
>>> code somewhere?
>>>
>>
>> I think that is actually POSIX for the time_t structure. Which is not to
>> say it's good ;) Google UNIX Year 2038 problem. ISTR reading recently that
>> there is a movement afoot to fix the time_t structure on 32 bit machines for
>> Linux. You've got to wonder, what were the POSIX people thinking?
>>
>
> See comments here.



Thanks for the entertaining link

"
I think it's still perfectly valid to say "you're a moron,
and we need to fix it"
"
(just a quote, doesn't apply to the python community)

Josef

>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preventing an ndarray subclass from returning new subclass instances for std(), etc

2011-09-18 Thread josef . pktd
On Sun, Sep 18, 2011 at 12:48 PM, Keith Hughitt  wrote:
> Interesting. It works as expected when called as a method:
>
> In [10]: x = np.ma.array([[1,2,3]])
> In [11]: x.std()
> Out[11]: 0.81649658092772603

That's only the treatment of scalars in std

>>> x = np.ma.array(np.random.randn(4,5))
>>> x.std(0)
masked_array(data = [0.842985810318 0.423312129749 0.607139983509
0.583121781933 0.681360542632],
 mask = [False False False False False],
   fill_value = 1e+20)

Josef

>
> but not for my class:
>
> In [14]: aia.std()
> Out[14]: AIAMap(292.4342470467856)
> In [15]: np.std(aia)
> Out[15]: AIAMap(292.4342470467856)
> In [16]: np.array(aia).std()
> Out[16]: 292.43424704678557
>
> Keith
> 2011/9/18 Stéfan van der Walt 
>>
>> On Sun, Sep 18, 2011 at 9:09 AM, Keith Hughitt 
>> wrote:
>> > I'm sure it is simply a coding error on my part, but so far I havne't
>> > been
>> > able to track it down.
>>
>> I don't think it's something you've done wrong.  See e.g.:
>>
>> In [9]: x = np.ma.array([[1,2,3]])
>>
>> In [10]: np.std(x, axis=0)
>> Out[10]:
>> masked_array(data = [0.0 0.0 0.0],
>>             mask = [False False False],
>>       fill_value = 1e+20)
>>
>> Regards
>> Stéfan
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in numpy std, etc. with other data structures?

2011-09-17 Thread josef . pktd
On Sat, Sep 17, 2011 at 5:12 PM, Wes McKinney  wrote:
> On Sat, Sep 17, 2011 at 4:48 PM, Skipper Seabold  wrote:
>> Just ran into this. Any objections for having numpy.std and other
>> functions in core/fromnumeric.py call asanyarray before trying to use
>> the array's method? Other data structures like pandas and larry define
>> their own std method, for instance, and this doesn't allow them to
>> pass through. I'm inclined to say that the issue is with numpy, though
>> maybe the data structures shouldn't shadow numpy array methods while
>> altering the signature. I dunno.
>>
>> df = pandas.DataFrame(np.random.random((10,5)))
>>
>> np.std(df,axis=0)
>> 
>> TypeError: std() got an unexpected keyword argument 'dtype'
>>
>> np.std(np.asanyarray(df),axis=0)
>> array([ 0.30883352,  0.3133324 ,  0.26517361,  0.26389029,  0.20022444])
>>
>> Though I don't think this would work with larry yet.
>>
>> Pull request: https://github.com/numpy/numpy/pull/160
>>
>> Skipper
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> Note I've no real intention of making DataFrame fully ndarray-like--
> but it's nice to be able to type:
>
> df.std(axis=0)
> df.std(axis=1)
> np.sqrt(df)
>
> etc. which works the same as ndarray. I suppose the
> __array__/__array_wrap__ interface is there largely as a convenience.

I'm a bit worried about the different ddof defaults in cases like
this. Essentially we will not be able to rely on ddof=0 anymore.
Different defaults on axis are easy to catch, but having the same
function call return sometimes ddof=0 and sometimes ddof=1 might make
for some fun debugging.

Josef


> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lstsq: masked arrays, weights, scaling, and covariance

2011-09-17 Thread josef . pktd
On Sat, Sep 17, 2011 at 2:52 PM, Charles R Harris
 wrote:
> Hi All,
>
> I'd like to start a discussion about modifications to lstsq to accommodate
> the new masked arrays and move weights, scaling, and covariance
> determination down to a lower common level. This is motivated by Travis'
> recent changes to polyfit as well as my own various polynomial fits that
> also allow weights. Also, once these features are pushed down to lstsq, it
> should be possible to push them down further into a c-wrapper for the LAPACK
> routines, which is where I really think they belong in the long run.
>
> Because missing values will effect the std/var/cov in the same way as
> weights of zero, I think support for missing values and weights go naturally
> together. Support for scaling and covariance are less closely tied, but they
> are both features I use all the time in practice and having them available
> will be useful.  It might also be nice to change the return signature,
> though this would require a new function. I rather like the idea of
> returning the coefficients and a dictionary, where everything not a
> coefficient gets stuffed into the dictionary. In this regard see also Denis
> Laxalde's proposal, something we might want to be consistent with.
>
> Thoughts?

What's the speed penalty if we just want to use numpy/scipy linalg as
a library and don't need any of the extra features?

As some of the discussions have shown it can be pretty expensive to
use linalg in loops.

Josef

>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull Request Review: R-like sample function

2011-09-01 Thread josef . pktd
On Thu, Sep 1, 2011 at 6:02 PM, Christopher Jordan-Squire
 wrote:
> Hi--I've just submitted a numpy 2.0 pull request for a function sample
> in np.random. It's essentially an implementation of R's sample
> function. It allows possibly non-uniform, possibly without-replacement
> sampling from a given 1-D array-like. This is very useful for quickly
> and cleanly creating samples from, for example, a list of strings or a
> list of non-contiguous, non-evenly spaced integers. Both occur in data
> analysis with categorical data.
>
> It is, essentially, a convenience function that wraps a number of
> existing ways to take a random sample. I think it belongs in
> numpy.random rather than scipy.stats because it's just a random
> sampler, rather than a probability distribution. It isn't possible to
> define a scipy.stats discrete random variable on strings--it would
> have to instead be done on the indices of the list containing the
> possible samples. And (as far as I can tell) the scipy.stats
> distributions can't be used for sampling without replacement.
>
> https://github.com/numpy/numpy/pull/151

I don't think you can kill numpy.random.random and similar mixed in
with an adding a new function commit.

First these functions would need to be deprecated.

"it does not break the API as the previous function was not in the docs"

This is a doc bug, I assume. I don't think it means users/developers
don't rely on it.

searching for np.random.random shows 120 threads in my gmail reader,
python uses random.random()
dir(np.random) shows it
I copied it from mailing list examples. It's used quite a bit in
scipy, as I saw because of your work.

I also find the historical multiplicity of aliases confusing, but
which names should be deprecated would at least require a discussion
and a separate commit.

Josef


>
> -Chris JS
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement)

2011-08-31 Thread josef . pktd
On Wed, Aug 31, 2011 at 3:22 PM, Olivier Delalleau  wrote:
> 2011/8/31 Christopher Jordan-Squire 
>>
>> On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau  wrote:
>> > You can use:
>> > 1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7]))
>> >
>> > For your "real" application you'll probably want to use a value >1 for
>> > the
>> > first parameter (equal to your sample size), instead of calling it
>> > multiple
>> > times.
>> >
>> > -=- Olivier
>>
>> Thanks. Warren (Weckesser) mentioned this possibility to me yesterday
>> and I forgot to put it in my post. I assume you mean something like
>>
>> x = np.arange(3)
>> y = np.random.multinomial(30, [0.1,0.2,0.7])
>> z = np.repeat(x, y)
>> np.random.shuffle(z)
>>
>> That look right?
>>
>> -Chris JS
>>
>
> Yes, exactly.

Chuck's answer to the same question, when I asked on the list, used
searchsorted and is fast

cdfvalues.searchsorted(np.random.random(size))

my recent version of it for FiniteLatticeDistribution

def rvs(self, size=1):
'''draw random variables with shape given by size

'''
#w = self.pdfvalues
#p = cumsum(w)/float(w.sum())
#p.searchsorted(np.random.random(size))
return self.support[self.cdfvalues.searchsorted(np.random.random(size))]

Josef


>
> -=- Olivier
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] converting standard array to record array

2011-08-30 Thread josef . pktd
On Tue, Aug 30, 2011 at 4:34 PM, Bryce Ready  wrote:
> Hello all,
>
> So i'm using numpy 1.6.0, and trying to convert a (4,4) numpy array of dtype
> 'f8' into a record array of this dtype:
>
>> dt = np.dtype([('mat','(4,4)f8')])
>
> Here is the code snippet:
>
>> In [21]: a = np.random.randn(4,4)
>>
>> In [22]: a.view(dt)
>
> and the resulting error:
>
>> ValueError: new type not compatible with array.
>
> Can anyone shed some light for me on why this conversion is not possible?
> It is certainly technically possible, since the memory layout of the two
> arrays should be the same.
>
> Can anyone recommend a better way to do this conversion?

I guess it can only convert rows, each row needs the memory size of the dt

>>> np.random.randn(4,4).ravel().view(dt).shape
(1,)
>>> np.random.randn(2,4,4).reshape(-1,16).view(dt)
array([[ ([[1.7107996212005496, 0.64334162481360346,
-2.1589367225479004, 1.2302260107072134], [0.90703092017458831,
-1.0297890301610224, -0.095086304368665275, 0.35407366904038495],
[-1.1083969421298907, 0.83307347286837752, 0.39886399402076494,
0.26313136034262563], [0.81860729029038914, -1.1443047382313905,
0.73326737255810859, 0.34482475392499168]],)],
   [ ([[0.69027418489768777, 0.25867753263599164,
1.0320990807184023, 0.21836691513066409], [0.45913017094388614,
-0.89570247025515981, 0.76452726059163534, -2.2953009964941642],
[0.60248580944596275, 1.0863090037733505, -0.10849220482850662,
-0.19176089514256078], [-1.0700600508627109, -1.4743316703511105,
0.79193567523155062, 0.82243321942810521]],)]],
  dtype=[('mat', '
> Thanks in advance!
>
> -Bryce Ready
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix

2011-08-26 Thread josef . pktd
On Fri, Aug 26, 2011 at 2:57 PM, Charles R Harris
 wrote:
>
>
> On Fri, Aug 26, 2011 at 12:38 PM, Mark Janikas  wrote:
>>
>> Charles!  That looks like it could be a winner!  It looks like you always
>> choose the last column of the U matrix and ID the columns that have the same
>> values?  It works when I add extra columns as well!  BTW, sorry for my lack
>> of knowledge… but what was the point of the dot multiply at the end?  That
>> they add up to essentially zero, indicating singularity?  Thanks so much!
>
> The indicator of collinearity is the singular value in d, the corresponding
> column in u represent the linear combination of rows that are ~0, the
> corresponding row in v represents the linear combination of columns that are
> ~0. If you have several combinations that are ~0, of course you can add them
> together and get another. Basically, if you take the rows in v corresponding
> to small singular values, you get a basis for the for the null space of the
> matrix, the corresponding columns in u are a basis for the orthogonal
> complement of the range of the matrix. If that is getting a bit technical
> you can just play around with things.

Interpretation is a bit difficult if there are more than one zero eigenvalues

>>> zt2 = np.vstack((zt, zt[2,:] + zt[3,:]))
>>> zt2
array([[ 1.  ,  1.  ,  1.  ,  1.  ,  1.  ],
   [ 0.25,  0.1 ,  0.2 ,  0.25,  0.5 ],
   [ 0.75,  0.9 ,  0.8 ,  0.75,  0.5 ],
   [ 3.  ,  8.  ,  0.  ,  5.  ,  0.  ],
   [ 3.75,  8.9 ,  0.8 ,  5.75,  0.5 ]])
>>> u,d,v = np.linalg.svd(zt2)
>>> d
array([  1.51561431e+01,   1.91327688e+00,   3.25113875e-01,
 1.05664844e-15,   5.29054218e-16])
>>> u[:,-2:]
array([[ 0.59948553, -0.12496837],
   [-0.59948553,  0.12496837],
   [-0.51747833, -0.48188813],
   [ 0.0820072 , -0.60685651],
   [-0.0820072 ,  0.60685651]])

Josef

>
> 
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix

2011-08-26 Thread josef . pktd
On Fri, Aug 26, 2011 at 1:41 PM, Mark Janikas  wrote:
> I wonder if my last statement is essentially the only answer... which I 
> wanted to avoid...
>
> Should I just use combinations of the columns and try and construct the 
> corrcoef() (then ID whether NaNs are present), or use the condition number to 
> ID the singularity?  I just wanted to avoid the whole k! algorithm.
>
> MJ
>
> -Original Message-
> From: numpy-discussion-boun...@scipy.org 
> [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Mark Janikas
> Sent: Friday, August 26, 2011 10:35 AM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix
>
> I actually use the VIF when the design matrix can be inverted I do it the 
> quick and dirty way as opposed to the step regression:
>
> 1. Calc the correlation coefficient of the matrix (w/o the intercept)
> 2. Return the diagonal of the inversion of the correlation matrix in step 1.
>
> Again, the problem lies in the multiple column relationship... I wouldn't be 
> able to run sub regressions at all when the columns are perfectly collinear.
>
> MJ
>
> -Original Message-
> From: numpy-discussion-boun...@scipy.org 
> [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Skipper Seabold
> Sent: Friday, August 26, 2011 10:28 AM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix
>
> On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas  wrote:
>> Hello All,
>>
>>
>>
>> I am trying to identify columns of a matrix that are perfectly collinear.
>> It is not that difficult to identify when two columns are identical are have
>> zero variance, but I do not know how to ID when the culprit is of a higher
>> order. i.e. columns 1 + 2 + 3 = column 4.  NUM.corrcoef(matrix.T) will
>> return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide
>> a very large condition number.. But they do not tell me which columns are
>> causing the problem.   For example:
>>
>>
>>
>> zt = numpy. array([[ 1.  ,  1.  ,  1.  ,  1.  ,  1.  ],
>>
>>    [ 0.25,  0.1 ,  0.2 ,  0.25,  0.5 ],
>>
>>    [ 0.75,  0.9 ,  0.8 ,  0.75,  0.5 ],
>>
>>    [ 3.  ,  8.  ,  0.  ,  5.  ,  0.  ]])
>>
>>
>>
>> How can I identify that columns 0,1,2 are the issue because: column 1 +
>> column 2 = column 0?
>>
>>
>>
>> Any input would be greatly appreciated.  Thanks much,
>>
>
> The way that I know to do this in a regression context for (near
> perfect) multicollinearity is VIF. It's long been on my todo list for
> statsmodels.
>
> http://en.wikipedia.org/wiki/Variance_inflation_factor
>
> Maybe there are other ways with decompositions. I'd be happy to hear about 
> them.
>
> Please post back if you write any code to do this.

Partial answer in a different context. I have written a function that
only adds columns if they maintain invertibility, using brute force:
add each column sequentially, check whether the matrix is singular.
Don't add the columns that already included as linear combination.
(But this doesn't tell which columns are in the colinear vector.)

I did this for categorical variables, so sequence was predefined.

Just finding a non-singular subspace would be easier, PCA, SVD, or
scikits.learn matrix decomposition (?).

(factor models and Johansen's cointegration tests are also just doing
matrix decomposition that identify subspaces)

Maybe rotation in Factor Analysis is able to identify the vectors, but
I don't have much idea about that.

Josef

>
> Skipper
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] c-info.ufunc-tutorial.rst

2011-08-24 Thread josef . pktd
On Wed, Aug 24, 2011 at 8:43 PM, Anthony Scopatz  wrote:
>
>
> On Wed, Aug 24, 2011 at 7:34 PM, srean  wrote:
>>
>> Thanks Anthony and Mark, this is good to know.
>>
>> So what would be the advised way of looking at freshly baked documentation
>> ? Just look at the raw files ? or is there some place else where the correct
>> sphinx rendered docs are hosted.
>
> Building the docs yourself is probably the safest bet.  However, someone
> should probably hook up the numpy and scipy repos to readthedocs.org.  That
> would solve this problem...

Maybe someone just needs to add it here
http://docs.scipy.org/numpy/docs/numpy-docs/user/c-info.rst/#c-info

and it would show up in numpy's own docs, which are hooked up to the
repo, as far as I know.

Josef


>
>>
>> On Wed, Aug 24, 2011 at 7:19 PM, Anthony Scopatz 
>> wrote:
>>>
>>> code-block:: is a directive that I think might be specific to sphinx.
>>>  Naturally, github's renderer will drop it.
>>> On Wed, Aug 24, 2011 at 7:10 PM, Mark Wiebe  wrote:

 I believe this is because of github's .rst processor which simply drops
 blocks it can't understand. When building NumPy documentation, many more
 extensions and context exists. I'm getting the same thing in the C-API
 NA-mask documentation I just posted.
 -Mark
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] life expectancy of scipy.stats nan statistics

2011-08-19 Thread josef . pktd
I'm just looking at http://projects.scipy.org/scipy/ticket/1200

I agree with Ralf that the bias keyword should be changed to ddof as
in the numpy functions. For functions in scipy.stats, and statistics
in general, I prefer the usual axis=0 default.

However, I think these functions, like scipy.stats.nanstd, should be
replaced by corresponding numpy functions, which might happen
relatively soon. But how soon?

Is it worth deprecating bias in scipy 0.10, and then deprecate again
for removal in 0.11 or 0.12?

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion)

2011-08-13 Thread josef . pktd
On Sat, Aug 13, 2011 at 3:45 PM, Pauli Virtanen  wrote:
> Sat, 13 Aug 2011 18:14:11 +0200, Ralf Gommers wrote:
> [clip]
>> We should check if there's still any code in SVN branches that is
>> useful.
>> If so the people who are interested in it should move it somewhere else.
>> Anyone?
>
> All the SVN branches are available in Git, though some are hidden. Do
>
>        git fetch upstream +refs/*:refs/remotes/upstream/everything/*
>
> and you shall receive (also some Github's internal branches named pull/*).
> However, AFAIK, there's not so much useful in there.
>
> In any case, as far as I'm aware, the SVN can be safely be turned off,
> both for Numpy and Scipy. The admins can access the original repository
> on the server, so if something turns out to be missed, it can be brought
> back.
>
>        Pauli

Does Trac require svn access to dig out old information?
for example links to old changesets, annotate/blame, ... ?

Josef


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpydoc - latex longtables error

2011-08-10 Thread josef . pktd
On Wed, Aug 10, 2011 at 6:17 PM, Matthew Brett  wrote:
> Hi,
>
> On Wed, Aug 10, 2011 at 12:38 PM, Skipper Seabold  wrote:
>> On Wed, Aug 10, 2011 at 3:28 PM, Matthew Brett  
>> wrote:
>>> Hi,
>>>
>>> I think this one might be for Pauli.
>>>
>>> I've run into an odd problem that seems to be an interaction of
>>> numpydoc and autosummary and large classes.
>>>
>>> In summary, large classes and numpydoc lead to large tables of class
>>> methods, and there seems to be an error in the creation of the large
>>> tables in latex.
>>>
>>> Specifically, if I run 'make latexpdf' with the attached minimal
>>> sphinx setup, I get a pdflatex error ending thus:
>>>
>>> ...
>>> l.118 \begin{longtable}{LL}
>>>
>>> and this is because longtable does not accept LL as an argument, but
>>> needs '|l|l|' (bar - el - bar - el - bar).
>>>
>>> I see in sphinx.writers.latex.py, around line 657, that sphinx knows
>>> about this in general, and long tables in standard ReST work fine with
>>> the el-bar arguments passed to longtable.
>>>
>>>        if self.table.colspec:
>>>            self.body.append(self.table.colspec)
>>>        else:
>>>            if self.table.has_problematic:
>>>                colwidth = 0.95 / self.table.colcount
>>>                colspec = ('p{%.3f\\linewidth}|' % colwidth) * \
>>>                          self.table.colcount
>>>                self.body.append('{|' + colspec + '}\n')
>>>            elif self.table.longtable:
>>>                self.body.append('{|' + ('l|' * self.table.colcount) + '}\n')
>>>            else:
>>>                self.body.append('{|' + ('L|' * self.table.colcount) + '}\n')
>>>
>>> However, using numpydoc and autosummary (see the conf.py file), what
>>> seems to happen is that, when we reach the self.table.colspec test at
>>> the beginning of the snippet above, 'self.table.colspec' is defined:
>>>
>>> In [1]: self.table.colspec
>>> Out[1]: '{LL}\n'
>>>
>>> and thus the LL gets written as the arg to longtable:
>>>
>>> \begin{longtable}{LL}
>>>
>>> and the pdf build breaks.
>>>
>>> I'm using the numpydoc out of the current numpy source tree.
>>>
>>> At that point I wasn't sure how to proceed with debugging.  Can you
>>> give any hints?
>>>
>>
>> It's not a proper fix, but our workaround is to edit the Makefile for
>> latex (and latexpdf) to
>>
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/Makefile#L94
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/make.bat#L121
>>
>> to call the script to replace the longtable arguments
>>
>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/fix_longtable.py
>>
>> The workaround itself probably isn't optimal, and I'd be happy to hear
>> of a proper fix.
>
> Thanks - yes - I found your workaround in my explorations, I put in a
> version in our tree too:
>
> https://github.com/matthew-brett/nipy/blob/latex_build_fixes/tools/fix_longtable.py
>
>  - but I agree it seems much better to get to the root cause.

When I tried to figure this out, I never found out why the correct
sphinx longtable code path never gets reached, or which code
(numpydoc, autosummary or sphinx) is filling in the colspec.

Josef

>
> See you,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] hold parameters

2011-08-02 Thread josef . pktd
On Mon, Aug 1, 2011 at 8:43 PM, Bevan Jenkins  wrote:
> Hello,
>
> I have a function that I fitting to a curve via scipy.optimize.leastsq.  The
> function has 4 parameters and this is all working fine.
>
> For a site, I have a number of curves (n=10 in the example below).  I would
> like to some of the parameters to be the best fit across all curves (best fit
> for a site) while letting the other parameters vary for each curve.  I have
> this working as well.
>
> The issue I have is like to be able to vary this for a run.  That is do a run
> where parameter1 is best fit for entire site, whith the remaining three
> varying per curve. Then on the next run, have two parameters being held or
> fitted for all curves at one.  Or be able to do a run where all 4 parameters
> are fit for each individual curve.
>
> Using my e.g. below, if I change the 'fix' dict, so that 'a','b', and 'c' are
> True, with 'd' False, then I will have to change the zip to
> for a,b,c in zip(a,b,c):
>    solve(a,b,c,d)
>
> I would prefer to find a way to do this via code.  I hope this example makes
> sense.  The code below is all within my objective function that is being
> called by scipy.optimize.leastsq.
> import numpy as np
>
> def solve(a,b,c,d):
>    print a,b,c,d
>    #return x*a*b*c*d
>
>
>
> fix = {"a":True,"b":True,"c":False,"d":False}
>
> n=10
> params = np.array([0,1,2,3]*n)
> params = params.reshape(-1,4)
>
> if fix["a"] is True:
>    a = params[0,0]
> else:
>    a = params[:,0]
> if fix["b"] is True:
>    b = params[0,1]
> else:
>    b = params[:,1]
> if fix["c"] is True:
>    c = params[0,2]
> else:
>    c = params[:,2]
> if fix["d"] is True:
>    d = params[0,3]
> else:
>    d = params[:,3]
>
> res=[]
> for c,d in zip(c,d):
>    res = solve(a,b,c,d)
>    #res = solve(a,b,c,d)-self.orig
> #return np.hstack(res)**2


I'm not a fan of named individual parameters for function arguments
when the number of arguments varies, *args

What I'm using is a full parameter array with nan's

fixed = np.array([nan, nan, c, d])  #fix c,d

def func(args):
   fixed[np.isnan(fixed)] = args
   a,b,c,d = fixed
   ...

to set starting values allstartvals = np.array([a0, b0, c0, d0])

startvals = allstartvals[np.isnan(fixed)

optimize.leastsq(func, startvals, other_args)

or something like this.
I find it easier to keep track of the parameters, if I just have an
array or tuple.

for an alternative, Travis used a different way in the scipy.stats
implementation of partially fixed parameters for distributions fit
with named arguments. (I don't remember the details)

Josef



>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] random numbers from arbitrary distribution

2011-07-07 Thread josef . pktd
On Thu, Jul 7, 2011 at 9:26 AM, Wolfgang Kerzendorf
 wrote:
> Hi all,
>
> Is there an  way to get random numbers from an arbitrary distribution
> already built-in to numpy. I am interested to do that for a black body
> distribution

What's a black body distributions? From a quick look at Wikipedia it
might be planck, which is available in scipy.stats.

A new distribution can be generated by subclassing
scipy.stats.distributions.rv_continuous
It has a generic random variable generator using the quantile
function, ppf. If the ppf is available, then this is fast. If only the
pdf is given, generating random variables is slw. (cdf is
calculated by integrate.quad, ppf is generated from cdf with
optimize.fsolve)

Some distributions can be generated by a transformation of variables,
which can also be very fast.

Nothing else to generate arbitrary random variables is available in
numpy or scipy.

(I just started to read a paper that uses a piecewise polynomial
approximation to the ppf that should work for fast generation of
random variables when only the pdf is given.)

Josef


>
> Thanks
>    Wolfgang
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-06 Thread josef . pktd
On Wed, Jul 6, 2011 at 7:14 PM, Christopher Jordan-Squire
 wrote:
>
>
> On Wed, Jul 6, 2011 at 3:47 PM,  wrote:
>>
>> On Wed, Jul 6, 2011 at 4:38 PM,   wrote:
>> > On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire
>> >  wrote:
>> >>
>> >>
>> >> On Wed, Jul 6, 2011 at 1:08 PM,  wrote:
>> >>>
>> >>> On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire
>> >>>  wrote:
>> >>> >
>> >>> >
>> >>> > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker
>> >>> > 
>> >>> > wrote:
>> >>> >>
>> >>> >> Christopher Jordan-Squire wrote:
>> >>> >> > If we follow those rules for IGNORE for all computations, we
>> >>> >> > sometimes
>> >>> >> > get some weird output. For example:
>> >>> >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is
>> >>> >> > matrix
>> >>> >> > multiply and not * with broadcasting.) Or should that sort of
>> >>> >> > operation
>> >>> >> > through an error?
>> >>> >>
>> >>> >> That should throw an error -- matrix computation is heavily
>> >>> >> influenced
>> >>> >> by the shape and size of matrices, so I think IGNORES really don't
>> >>> >> make
>> >>> >> sense there.
>> >>> >>
>> >>> >>
>> >>> >
>> >>> > If the IGNORES don't make sense in basic numpy computations then I'm
>> >>> > kinda
>> >>> > confused why they'd be included at the numpy core level.
>> >>> >
>> >>> >>
>> >>> >> Nathaniel Smith wrote:
>> >>> >> > It's exactly this transparency that worries Matthew and me -- we
>> >>> >> > feel
>> >>> >> > that the alterNEP preserves it, and the NEP attempts to erase it.
>> >>> >> > In
>> >>> >> > the NEP, there are two totally different underlying data
>> >>> >> > structures,
>> >>> >> > but this difference is blurred at the Python level. The idea is
>> >>> >> > that
>> >>> >> > you shouldn't have to think about which you have, but if you work
>> >>> >> > with
>> >>> >> > C/Fortran, then of course you do have to be constantly aware of
>> >>> >> > the
>> >>> >> > underlying implementation anyway.
>> >>> >>
>> >>> >> I don't think this bothers me -- I think it's analogous to things
>> >>> >> in
>> >>> >> numpy like Fortran order and non-contiguous arrays -- you can
>> >>> >> ignore
>> >>> >> all
>> >>> >> that when working in pure python when performance isn't critical,
>> >>> >> but
>> >>> >> you need a deeper understanding if you want to work with the data
>> >>> >> in C
>> >>> >> or Fortran or to tune performance in python.
>> >>> >>
>> >>> >> So as long as there is an API to query and control how things work,
>> >>> >> I
>> >>> >> like that it's hidden from simple python code.
>> >>> >>
>> >>> >> -Chris
>> >>> >>
>> >>> >>
>> >>> >
>> >>> > I'm similarly not too concerned about it. Performance seems finicky
>> >>> > when
>> >>> > you're dealing with missing data, since a lot of arrays will likely
>> >>> > have
>> >>> > to
>> >>> > be copied over to other arrays containing only complete data before
>> >>> > being
>> >>> > handed over to BLAS.
>> >>>
>> >>> Unless you know the neutral value for the computation or you just want
>> >>> to do a forward_fill in time series, and you have to ask the user not
>> >>> to give you an unmutable array with NAs if they don't want extra
>> >>> copies.
>> >>>
>> >>> Josef
>> >>>
>> >>
>> >> Mean value replacement, or more generally single scalar value
>> >> replacement,
>> >> is generally not a good idea. It biases downward your standard error
>> >> estimates if you use mean replacement, and it will bias both if you use
>> >> anything other than mean replacement. The bias is gets worse with more
>> >> missing data. So it's worst in the precisely the cases where you'd want
>> >> to
>> >> fill in the data the most. (Though I admit I'm not too familiar with
>> >> time
>> >> series, so maybe this doesn't apply. But it's true as a general
>> >> principle in
>> >> statistics.) I'm not sure why we'd want to make this use case easier.
>>
>> Another qualification on this (I cannot help it).
>> I think this only applies if you use a prefabricated no-missing-values
>> algorithm. If I write it myself, I can do the proper correction for
>> the reduced number of observations. (similar to the case when we
>> ignore correlated information and use statistics based on uncorrelated
>> observations which also overestimate the amount of information we have
>> available.)
>>
>
> Can you do that sort of technique with longitudinal (panel) data? I'm
> honestly curious because I haven't looked into such corrections before. I
> haven't been able to find a reference after a few quick google searches. I
> don't suppose you know one off the top of your head?

I was thinking mainly of simple cases where the correction only
requires to correctly count the number of observations in order to
adjust the degrees of freedom. For example, statistical tests that are
based on relatively simple statistics or ANOVA which just needs a
correct counting of the number of observations by groups. (This might
be partially covered by any NA ufunc implementation, that does mean,
var and co

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-06 Thread josef . pktd
On Wed, Jul 6, 2011 at 4:38 PM,   wrote:
> On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire
>  wrote:
>>
>>
>> On Wed, Jul 6, 2011 at 1:08 PM,  wrote:
>>>
>>> On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire
>>>  wrote:
>>> >
>>> >
>>> > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker
>>> > 
>>> > wrote:
>>> >>
>>> >> Christopher Jordan-Squire wrote:
>>> >> > If we follow those rules for IGNORE for all computations, we
>>> >> > sometimes
>>> >> > get some weird output. For example:
>>> >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix
>>> >> > multiply and not * with broadcasting.) Or should that sort of
>>> >> > operation
>>> >> > through an error?
>>> >>
>>> >> That should throw an error -- matrix computation is heavily influenced
>>> >> by the shape and size of matrices, so I think IGNORES really don't make
>>> >> sense there.
>>> >>
>>> >>
>>> >
>>> > If the IGNORES don't make sense in basic numpy computations then I'm
>>> > kinda
>>> > confused why they'd be included at the numpy core level.
>>> >
>>> >>
>>> >> Nathaniel Smith wrote:
>>> >> > It's exactly this transparency that worries Matthew and me -- we feel
>>> >> > that the alterNEP preserves it, and the NEP attempts to erase it. In
>>> >> > the NEP, there are two totally different underlying data structures,
>>> >> > but this difference is blurred at the Python level. The idea is that
>>> >> > you shouldn't have to think about which you have, but if you work
>>> >> > with
>>> >> > C/Fortran, then of course you do have to be constantly aware of the
>>> >> > underlying implementation anyway.
>>> >>
>>> >> I don't think this bothers me -- I think it's analogous to things in
>>> >> numpy like Fortran order and non-contiguous arrays -- you can ignore
>>> >> all
>>> >> that when working in pure python when performance isn't critical, but
>>> >> you need a deeper understanding if you want to work with the data in C
>>> >> or Fortran or to tune performance in python.
>>> >>
>>> >> So as long as there is an API to query and control how things work, I
>>> >> like that it's hidden from simple python code.
>>> >>
>>> >> -Chris
>>> >>
>>> >>
>>> >
>>> > I'm similarly not too concerned about it. Performance seems finicky when
>>> > you're dealing with missing data, since a lot of arrays will likely have
>>> > to
>>> > be copied over to other arrays containing only complete data before
>>> > being
>>> > handed over to BLAS.
>>>
>>> Unless you know the neutral value for the computation or you just want
>>> to do a forward_fill in time series, and you have to ask the user not
>>> to give you an unmutable array with NAs if they don't want extra
>>> copies.
>>>
>>> Josef
>>>
>>
>> Mean value replacement, or more generally single scalar value replacement,
>> is generally not a good idea. It biases downward your standard error
>> estimates if you use mean replacement, and it will bias both if you use
>> anything other than mean replacement. The bias is gets worse with more
>> missing data. So it's worst in the precisely the cases where you'd want to
>> fill in the data the most. (Though I admit I'm not too familiar with time
>> series, so maybe this doesn't apply. But it's true as a general principle in
>> statistics.) I'm not sure why we'd want to make this use case easier.

Another qualification on this (I cannot help it).
I think this only applies if you use a prefabricated no-missing-values
algorithm. If I write it myself, I can do the proper correction for
the reduced number of observations. (similar to the case when we
ignore correlated information and use statistics based on uncorrelated
observations which also overestimate the amount of information we have
available.)

Josef

>
> We just discussed a use case for pandas on the statsmodels mailing
> list, minute data of stock quotes (prices), if the quote is NA then
> fill it with the last price quote. If it would be necessary for memory
> usage and performance, this can be handled efficiently and with
> minimal copying.
>
> If you want to fill in a missing value without messing up any result
> statistics, then there is a large literature in statistics on
> imputations, repeatedly assigning values to a NA from an underlying
> distribution. scipy/statsmodels doesn't have anything like this (yet)
> but R and the others have it available, and it looks more popular in
> bio-statistics.
>
> (But similar to what Dag said, for statistical analysis it will be
> necessary to keep case specific masks and data arrays around. I
> haven't actually written any missing values algorithm yet, so I'm
> quite again.)
>
> Josef
>
>> -Chris Jordan-Squire
>>
>>>
>>> > My primary concern is that the np.NA stuff 'just
>>> > works'. Especially since I've never run into use cases in statistics
>>> > where
>>> > the difference between IGNORE and NA mattered.
>>> >
>>> >
>>> >>
>>> >>
>>> >> --
>>> >> Christopher Barker, Ph.D.
>>> >> Oceanographer
>>> >>
>>> >> Emergency Response Division
>>> >

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-06 Thread josef . pktd
On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire
 wrote:
>
>
> On Wed, Jul 6, 2011 at 1:08 PM,  wrote:
>>
>> On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire
>>  wrote:
>> >
>> >
>> > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker
>> > 
>> > wrote:
>> >>
>> >> Christopher Jordan-Squire wrote:
>> >> > If we follow those rules for IGNORE for all computations, we
>> >> > sometimes
>> >> > get some weird output. For example:
>> >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix
>> >> > multiply and not * with broadcasting.) Or should that sort of
>> >> > operation
>> >> > through an error?
>> >>
>> >> That should throw an error -- matrix computation is heavily influenced
>> >> by the shape and size of matrices, so I think IGNORES really don't make
>> >> sense there.
>> >>
>> >>
>> >
>> > If the IGNORES don't make sense in basic numpy computations then I'm
>> > kinda
>> > confused why they'd be included at the numpy core level.
>> >
>> >>
>> >> Nathaniel Smith wrote:
>> >> > It's exactly this transparency that worries Matthew and me -- we feel
>> >> > that the alterNEP preserves it, and the NEP attempts to erase it. In
>> >> > the NEP, there are two totally different underlying data structures,
>> >> > but this difference is blurred at the Python level. The idea is that
>> >> > you shouldn't have to think about which you have, but if you work
>> >> > with
>> >> > C/Fortran, then of course you do have to be constantly aware of the
>> >> > underlying implementation anyway.
>> >>
>> >> I don't think this bothers me -- I think it's analogous to things in
>> >> numpy like Fortran order and non-contiguous arrays -- you can ignore
>> >> all
>> >> that when working in pure python when performance isn't critical, but
>> >> you need a deeper understanding if you want to work with the data in C
>> >> or Fortran or to tune performance in python.
>> >>
>> >> So as long as there is an API to query and control how things work, I
>> >> like that it's hidden from simple python code.
>> >>
>> >> -Chris
>> >>
>> >>
>> >
>> > I'm similarly not too concerned about it. Performance seems finicky when
>> > you're dealing with missing data, since a lot of arrays will likely have
>> > to
>> > be copied over to other arrays containing only complete data before
>> > being
>> > handed over to BLAS.
>>
>> Unless you know the neutral value for the computation or you just want
>> to do a forward_fill in time series, and you have to ask the user not
>> to give you an unmutable array with NAs if they don't want extra
>> copies.
>>
>> Josef
>>
>
> Mean value replacement, or more generally single scalar value replacement,
> is generally not a good idea. It biases downward your standard error
> estimates if you use mean replacement, and it will bias both if you use
> anything other than mean replacement. The bias is gets worse with more
> missing data. So it's worst in the precisely the cases where you'd want to
> fill in the data the most. (Though I admit I'm not too familiar with time
> series, so maybe this doesn't apply. But it's true as a general principle in
> statistics.) I'm not sure why we'd want to make this use case easier.

We just discussed a use case for pandas on the statsmodels mailing
list, minute data of stock quotes (prices), if the quote is NA then
fill it with the last price quote. If it would be necessary for memory
usage and performance, this can be handled efficiently and with
minimal copying.

If you want to fill in a missing value without messing up any result
statistics, then there is a large literature in statistics on
imputations, repeatedly assigning values to a NA from an underlying
distribution. scipy/statsmodels doesn't have anything like this (yet)
but R and the others have it available, and it looks more popular in
bio-statistics.

(But similar to what Dag said, for statistical analysis it will be
necessary to keep case specific masks and data arrays around. I
haven't actually written any missing values algorithm yet, so I'm
quite again.)

Josef

> -Chris Jordan-Squire
>
>>
>> > My primary concern is that the np.NA stuff 'just
>> > works'. Especially since I've never run into use cases in statistics
>> > where
>> > the difference between IGNORE and NA mattered.
>> >
>> >
>> >>
>> >>
>> >> --
>> >> Christopher Barker, Ph.D.
>> >> Oceanographer
>> >>
>> >> Emergency Response Division
>> >> NOAA/NOS/OR&R            (206) 526-6959   voice
>> >> 7600 Sand Point Way NE   (206) 526-6329   fax
>> >> Seattle, WA  98115       (206) 526-6317   main reception
>> >>
>> >> chris.bar...@noaa.gov
>> >> ___
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> ___

Re: [Numpy-discussion] NA/Missing Data Conference Call Summary

2011-07-06 Thread josef . pktd
On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire
 wrote:
>
>
> On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker 
> wrote:
>>
>> Christopher Jordan-Squire wrote:
>> > If we follow those rules for IGNORE for all computations, we sometimes
>> > get some weird output. For example:
>> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix
>> > multiply and not * with broadcasting.) Or should that sort of operation
>> > through an error?
>>
>> That should throw an error -- matrix computation is heavily influenced
>> by the shape and size of matrices, so I think IGNORES really don't make
>> sense there.
>>
>>
>
> If the IGNORES don't make sense in basic numpy computations then I'm kinda
> confused why they'd be included at the numpy core level.
>
>>
>> Nathaniel Smith wrote:
>> > It's exactly this transparency that worries Matthew and me -- we feel
>> > that the alterNEP preserves it, and the NEP attempts to erase it. In
>> > the NEP, there are two totally different underlying data structures,
>> > but this difference is blurred at the Python level. The idea is that
>> > you shouldn't have to think about which you have, but if you work with
>> > C/Fortran, then of course you do have to be constantly aware of the
>> > underlying implementation anyway.
>>
>> I don't think this bothers me -- I think it's analogous to things in
>> numpy like Fortran order and non-contiguous arrays -- you can ignore all
>> that when working in pure python when performance isn't critical, but
>> you need a deeper understanding if you want to work with the data in C
>> or Fortran or to tune performance in python.
>>
>> So as long as there is an API to query and control how things work, I
>> like that it's hidden from simple python code.
>>
>> -Chris
>>
>>
>
> I'm similarly not too concerned about it. Performance seems finicky when
> you're dealing with missing data, since a lot of arrays will likely have to
> be copied over to other arrays containing only complete data before being
> handed over to BLAS.

Unless you know the neutral value for the computation or you just want
to do a forward_fill in time series, and you have to ask the user not
to give you an unmutable array with NAs if they don't want extra
copies.

Josef

> My primary concern is that the np.NA stuff 'just
> works'. Especially since I've never run into use cases in statistics where
> the difference between IGNORE and NA mattered.
>
>
>>
>>
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional random variables

2011-07-05 Thread josef . pktd
On Tue, Jul 5, 2011 at 12:26 PM, Ted To  wrote:
> On 07/05/2011 11:07 AM, josef.p...@gmail.com wrote:
>> For example sample x>=U and then sample y>=u-x. That's two univariate
>> normal samples.
>
> Ah, that's what I was looking for!  Many thanks!

just in case I wasn't clear, if x and y are correlated, then y: y>u-x
needs to be sampled from the conditional distribution y|x
http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions

Josef

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional random variables

2011-07-05 Thread josef . pktd
On Tue, Jul 5, 2011 at 10:33 AM, Ted To  wrote:
> On 07/05/2011 10:17 AM, josef.p...@gmail.com wrote:
>> On Mon, Jul 4, 2011 at 10:13 PM, Ted To  wrote:
>>> Hi,
>>>
>>> Is there an easy way to make random draws from a conditional random
>>> variable?  E.g., draw a random variable, x conditional on x>=\bar x.
>>
>> If you mean here truncated distribution, then I asked a similar
>> question on the scipy user list a month ago for the normal
>> distribution.
>>
>> The answer was use rejection sampling, Gibbs or MCMC.
>>
>> I just sample from the original distribution and throw away those
>> values that are not in the desired range. This works fine if there is
>> only a small truncation, but not so well for distribution with support
>> only in the tails. It's reasonably fast for distributions that
>> numpy.random produces relatively fast.
>>
>> (Having a bi- or multi-variate distribution and sampling y conditional
>> on given x sounds more "fun".)
>
> Yes, that is what I had been doing but in some cases my truncations
> moves into the upper tail and it takes an extraordinary amount of time.
>  I found that I could use scipy.stats.truncnorm but I haven't yet
> figured out how to use it for a joint distribution.  E.g., I have 2
> normal rv's X and Y from which I would like to draw X and Y where X+Y>= U.
>
> Any suggestions?

If you only need to sample the sum Z=X+Y, then it would be just a
univariate normal again (in Z).

For the general case, I'm at least a month away from being able to
sample from a generic multivariate distribution. There is an integral
transform that does recursive conditioning y|x.
(like F^{-1} transform for multivariate distributions, used for
example for copulas)

For example sample x>=U and then sample y>=u-x. That's two univariate
normal samples.

Another trick I used for the tail is to take the absolute value around
the mean, because of symmetry you get twice as many valid samples.

I also never tried importance sampling and the other biased sampling procedures.

If you find something, then I'm also interested in a solution.

Cheers,

Josef


>
> Cheers,
> Ted To
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional random variables

2011-07-05 Thread josef . pktd
On Mon, Jul 4, 2011 at 10:13 PM, Ted To  wrote:
> Hi,
>
> Is there an easy way to make random draws from a conditional random
> variable?  E.g., draw a random variable, x conditional on x>=\bar x.

If you mean here truncated distribution, then I asked a similar
question on the scipy user list a month ago for the normal
distribution.

The answer was use rejection sampling, Gibbs or MCMC.

I just sample from the original distribution and throw away those
values that are not in the desired range. This works fine if there is
only a small truncation, but not so well for distribution with support
only in the tails. It's reasonably fast for distributions that
numpy.random produces relatively fast.

(Having a bi- or multi-variate distribution and sampling y conditional
on given x sounds more "fun".)

Josef


>
> Thank you,
> Ted To
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] alterNEP - was: missing data discussion round 2

2011-07-02 Thread josef . pktd
On Sat, Jul 2, 2011 at 4:10 PM, Benjamin Root  wrote:
> On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith  wrote:
>>
>> I'm not sure what you mean here. If we have masked array support at
>> all (and some people seem to want it), then we have to say more than
>> "it's an array with a mask". Indexing such a beast has to do
>> *something*, so we need some kind of theory to say what, ufuncs have
>> to do *something*, ditto. I mean, I guess we could just say that a
>> masked array is literally an np.ndarray where you have attached a
>> field named "mask" that doesn't do anything, but I don't think that
>> would really satisfy most users :-).
>>
>
> Indexing a masked array just returns an array with np.NA in the appropriate
> elements.  This is no different than with regular ndarray objects or in
> numpy.ma.  As for ufuncs, the NEP already addresses this in multiple ways.
> For element-wise ufuncs, a "where" parameter is available for indicating
> which elements to skip.  For reduction ufuncs, a "skipna" parameter will
> indicate whether or not to skip the values.  On top of that, subclassed
> ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function
> that can set a default value for those parameters to make things easier for
> masked array users.
>
>> I don't know about others, but my main objection is this: He's
>> proposing two different implementations for NA. I only need one, so
>> having two is redundant and confusing. Of these two, the bit-pattern
>> one has lower memory overhead (which many people have spoken up to say
>> matters to them), and really obvious semantics (assignment is
>> implemented as assignment, etc.). So why force people to make this
>> confusing choice? What does the mask implementation add? AFAICT, its
>> only purpose is to satisfy a rather different set of use cases. (See
>> Gary Strangman's email here for a good description of these use cases:
>> http://www.mail-archive.com/numpy-discussion@scipy.org/msg32385.html)
>> But AFAICT again, it's been crippled for those use cases in order to
>> give it the NA semantics. So I just don't see who the masking part is
>> supposed to help.
>>
>
> As a user of numpy.ma, masked arrays have always been a second-class citizen
> to me. Developing new code with it always brought about new surprises and
> discoveries of strange behavior from various functions. In this sense,
> numpy.ma has always been crippled.  By sacrificing *some* of the existing
> semantics (which would likely be taken care of by a re-implemented numpy.ma
> to preserve backwards-compatibility), the masked array community gains a
> first-class citizen in numpy, and numpy developers will have the
> masked/missing data issue in the forefront whenever developing new functions
> and libraries.  I am more than happy with that trade-off.  I am willing to
> learn to semantics so long as I have a guarantee that the functions I use
> behaves the way I expect them to.
>
>>
>> BTW, you can't access the memory of a masked value by taking a view,
>> at least if I'm reading this version of the NEP correctly, and it
>> seems to be the latest:
>>
>>  https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst
>> The only way to access the memory of a masked value is take a view
>> *before* you mask it. And if the array has a mask at all when you take
>> the view, you also have to set a.flags.ownmask = True, before you mask
>> the value.
>
> This isn't actually as bad as it sounds.  From a function's perspective, it
> should only know the values that it has been given access to.  If I -- as a
> user of said function -- decide that certain values should be unknown to the
> function, I wouldn't want the function to be able to override that
> decision.  Remember, it is possible that the masked element never was
> initialized.  Therefore, we wouldn't want the function to use that element.
> (Note, this is one of those "fun" surprises that a numpy.ma user sometimes
> encounters when a function uses np.asarray instead of np.asanyarray).

But as far as I understand this takes away the ability to temporarily
fill in the masked values with values that are neutral for a
calculation, e.g. zero when taking a sum or dot product.
Instead it looks like a copy of the array has to be made in the new version.
(I'm thinking more correlate, convolution, linalg, scipy.signal, not
simple ufuncs. In many cases new arrays might be created anyway so the
loss from getting a copy of the non-NA data might not be so severe.)

I guess the "fun" surprises will remain fun since most function in
scipy or other libraries won't suddenly learn how to handle masked
arrays or NAs. What happens if you feed the new animals to linalg.svd,
or linalg.inv or fft ... that are all designed for asarray and not for
asanyarray?

Josef

>
> Ben Root
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.

Re: [Numpy-discussion] Moving lib.recfunctions?

2011-07-01 Thread josef . pktd
On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold  wrote:
> lib.recfunctions has never been fully advertised. The two bugs I just
> discovered lead me to believe that it's not that well vetted, but it
> is useful. I can't be the only one using these?
>
> What do people think of either deprecating lib.recfunctions or at
> least importing them into the numpy.rec namespace?
>
> I'm sure this has come up before, but gmane search isn't working for me.

about once a year

http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655

my guess is not much has changed since then

Josef

>
> Skipper
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fitting Log normal or truncated log normal distibution to three points

2011-06-30 Thread josef . pktd
On Thu, Jun 30, 2011 at 5:25 AM, Christoph Deil
 wrote:
>
> On Jun 30, 2011, at 10:03 AM, Sachin Kumar Sharma wrote:
>
> Hi,
>
> I have three points 10800, 81100, 582000.
>
> What is the easiest way of fitting a log normal and truncated log normal
> distribution to these three points using numpy.
>
> The lognormal and maximum likelihood fit is available in scipy.stats. It's
> easier to use this than to implement your own fit in numpy, although if you
> can't use scipy for some reason you can have a look there on how to
> implement it in numpy.
> The rest of this reply uses scipy.stats, so if you are only interested in
> numpy, please stop reading.
 data = [10800, 81100, 582000]
 import scipy.stats
 print scipy.stats.lognorm.extradoc
>
> Lognormal distribution
> lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2)
> for x > 0, s > 0.
> If log x is normally distributed with mean mu and variance sigma**2,
> then x is log-normally distributed with shape paramter sigma and scale
> parameter exp(mu).
>
 scipy.stats.lognorm.fit(data)
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280:
> RuntimeWarning: invalid value encountered in subtract
>   and max(abs(fsim[0]-fsim[1:])) <= ftol):
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280:
> RuntimeWarning: invalid value encountered in absolute
>   and max(abs(fsim[0]-fsim[1:])) <= ftol):
> (1.0, 30618.493505379971, 117675.94879488947)
 scipy.stats.lognorm.fit(data, floc=0, fscale=1)
> [11.40507812522, 0, 1]
> See http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html for
> further info on how the location and scale parameter are handled, basically
> the x in the lognormal.pdf formula above is x = (x_actual - loc) / scale.
> Now how to fit a truncated lognorm?
> First note that because log(x) can only be computed for x>0,
> scipy.stats.lognorm is already truncated to x > 0.
 scipy.stats.lognorm.cdf(0, 1) # arguments: (x, s)
> 0.0
 scipy.stats.lognorm.cdf(1, 1) # arguments: (x, s)
> 0.5
 scipy.stats.lognorm.cdf(2, 1) # arguments: (x, s)
> 0.75589140421441725
> As documented here, you can use parameters a and b to set the support of the
> distribution (i.e. the lower and upper truncation points):
 help(scipy.stats.rv_continuous)
>  |  a : float, optional
>  |      Lower bound of the support of the distribution, default is minus
>  |      infinity.
>  |  b : float, optional
>  |      Upper bound of the support of the distribution, default is plus
>  |      infinity.
> However when I try to use the a, b parameters to call pdf() (as a simpler
> method than fit() to check if it works)  I run into the following problem:
 scipy.stats.lognorm.pdf(1, 1)
> 0.3989422804014327
 scipy.stats.lognorm(a=1).pdf(1, 1)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: pdf() takes exactly 2 arguments (3 given)
 scipy.stats.lognorm(a=1).pdf(1)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py",
> line 335, in pdf
>     return self.dist.pdf(x, *self.args, **self.kwds)
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py",
> line 1113, in pdf
>     place(output,cond,self._pdf(*goodargs) / scale)
> TypeError: _pdf() takes exactly 3 arguments (2 given)
> For a distribution without parameters besides (loc, scale), setting a works:
 scipy.stats.norm(a=-2).pdf(3)
> 0.0044318484119380075
> Is this a bug or am I simply using it wrong?
> It would be nice
> if http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html contained
> an example or two of how to use the a, b, xa, xb, xtol parameters of
> scipy.stats.rv_continuous .

in your last example a=-2 is ignored

>>> stats.norm.pdf(3)
0.0044318484119380075

a, b, xa, xb are attributes of the distribution that are set when the
distribution instance is created.

changing a and b doesn't make much sense since you would truncate the
support of the distribution without changing the distribution

>>> stats.norm.pdf(-3)
0.0044318484119380075
>>> stats.norm.a = -2
>>> stats.norm.pdf(-3)
0.0

stats.norm and the other distributions is an instances, and if you
change a in it it will stay this way, and might mess up other
calculations you might want to do.

xa, xb are only used internally as limits for the inversion of the cdf
(limits to fsolve) and I don't think they have any other effect.

what's a truncated lognormal ? left or right side truncated.
I think it might need to be created by subclassing.

there might be a direct way of estimating lognormal parameters from
log(x).mean() and log(x).std() if x is a lognormal sample.

(that's a scipy user question)

Josef


> Christoph
> 

Re: [Numpy-discussion] Multiply along axis

2011-06-29 Thread josef . pktd
On Wed, Jun 29, 2011 at 11:05 AM, Robert Elsner  wrote:
>
> Yeah great that was spot-on. And I thought I knew most of the slicing
> tricks. I combined it with a slice object so that
>
> idx_obj = [ None for i in xrange(a.ndim) ]

or
idx_obj = [None] * a.ndim

otherwise this is also what I do quite often

Josef

> idx_obj[axis] = slice(None)
>
> a * x[idx_object]
>
> works the way I want it. Suggestions are welcome but I am happy with the
> quick solution you pointed out. Thanks
>
>
> On 29.06.2011 16:38, Skipper Seabold wrote:
>> On Wed, Jun 29, 2011 at 10:32 AM, Robert Elsner  wrote:
>>> Hello everyone,
>>>
>>> I would like to solve the following problem (preferably without
>>> reshaping / flipping the array a).
>>>
>>> Assume I have a vector v of length x and an n-dimensional array a where
>>> one dimension has length x as well. Now I would like to multiply the
>>> vector v along a given axis of a.
>>>
>>> Some example code
>>>
>>> a = np.random.random((2,3))
>>> x = np.zeros(2)
>>>
>>> a * x   # Fails because not broadcastable
>>>
>>>
>>> So how do I multiply x with the columns of a so that for each column j
>>> a[:,j] = a[:,j] * x
>>>
>>> without using a loop. Is there some (fast) fast way to accomplish that
>>> with numpy/scipy?
>>>
>> Is this what you want?
>>
>> a * x[:,None]
>>
>> or
>>
>> a *  x[:,np.newaxis]
>>
>> or more generally
>>
>> a * np.expand_dims(x,1)
>>
>> Skipper
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] missing data discussion round 2

2011-06-27 Thread josef . pktd
On Mon, Jun 27, 2011 at 5:01 PM, Mark Wiebe  wrote:
> On Mon, Jun 27, 2011 at 2:59 PM,  wrote:
>>
>> On Mon, Jun 27, 2011 at 2:24 PM, eat  wrote:
>> >
>> >
>> > On Mon, Jun 27, 2011 at 8:53 PM, Mark Wiebe  wrote:
>> >>
>> >> On Mon, Jun 27, 2011 at 12:44 PM, eat  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> On Mon, Jun 27, 2011 at 6:55 PM, Mark Wiebe  wrote:
>> 
>>  First I'd like to thank everyone for all the feedback you're
>>  providing,
>>  clearly this is an important topic to many people, and the discussion
>>  has
>>  helped clarify the ideas for me. I've renamed and updated the NEP,
>>  then
>>  placed it into the master NumPy repository so it has a more permanent
>>  home
>>  here:
>>  https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
>>  In the NEP, I've tried to address everything that was raised in the
>>  original thread and in Nathaniel's followup 'Concepts' thread. To
>>  deal with
>>  the issue of whether a mask is True or False for a missing value,
>>  I've
>>  removed the 'mask' attribute entirely, except for ufunc-like
>>  functions
>>  np.ismissing and np.isavail which return the two styles of masks.
>>  Here's a
>>  high level summary of how I'm thinking of the topic, and what I will
>>  implement:
>>  Missing Data Abstraction
>>  There appear to be two useful ways to think about missing data that
>>  are
>>  worth supporting.
>>  1) Unknown yet existing data
>>  2) Data that doesn't exist
>>  In 1), an NA value causes outputs to become NA except in a small
>>  number
>>  of exceptions such as boolean logic, and in 2), operations treat the
>>  data as
>>  if there were a smaller array without the NA values.
>>  Temporarily Ignoring Data
>>  In some cases, it is useful to flag data as NA temporarily, possibly
>>  in
>>  several different ways, for particular calculations or testing out
>>  different
>>  ways of throwing away outliers. This is independent of the missing
>>  data
>>  abstraction, still requiring a choice of 1) or 2) above.
>>  Implementation Techniques
>>  There are two mechanisms generally used to implement missing data
>>  abstractions,
>>  1) An NA bit pattern
>>  2) A mask
>>  I've described a design in the NEP which can include both techniques
>>  using the same interface. The mask approach is strictly more general
>>  than
>>  the NA bit pattern approach, except for a few things like the idea of
>>  supporting the dtype 'NA[f8,InfNan]' which you can read about in the
>>  NEP.
>>  My intention is to implement the mask-based design, and possibly also
>>  implement the NA bit pattern design, but if anything gets cut it will
>>  be the
>>  NA bit patterns.
>>  Thanks again for all your input so far, and thanks in advance for
>>  your
>>  suggestions for improving this new revision of the NEP.
>> >>>
>> >>> A very impressive PEP indeed.
>> >
>> > Hi,
>> >>>
>> >>> However, how would corner cases, like
>> >>>
>> >>> >>> a = np.array([np.NA, np.NA], dtype='f8', masked=True)
>> >>> >>> np.mean(a, skipna=True)
>> >>
>> >> This should be equivalent to removing all the NA values, then calling
>> >> mean, like this:
>> >> >>> b = np.array([], dtype='f8')
>> >> >>> np.mean(b)
>> >>
>> >>
>> >> /home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2374:
>> >> RuntimeWarning: invalid value encountered in double_scalars
>> >>   return mean(axis, dtype, out)
>> >> nan
>> >>>
>> >>> >>> np.mean(a)
>> >>
>> >> This would return NA, since NA values are sitting in positions that
>> >> would
>> >> affect the output result.
>> >
>> > OK.
>> >>
>> >>
>> >>>
>> >>> be handled?
>> >>> My concern here is that there always seems to be such corner cases
>> >>> which
>> >>> can only be handled with specific context knowledge. Thus producing
>> >>> 100%
>> >>> generic code to handle 'missing data' is not doable.
>> >>
>> >> Working out the corner cases for the functions that are already in
>> >> numpy
>> >> seems tractable to me, how to or whether to support missing data is
>> >> something the author of each new function will have to consider when
>> >> missing
>> >> data support is in NumPy, but I don't think we can do more than provide
>> >> the
>> >> mechanisms for people to use.
>> >
>> > Sure. I'll ride up with this and wait when I'll have some tangible to
>> > outperform the 'traditional' NaN handling.
>> > - eat
>>
>> Just a question how things would work with the new model.
>> How can you implement the "use" keyword from R's cov (or cor), with
>> minimal data copying
>>
>> I think the basic masked array version would (or does) just assign 0
>> to the missing values calculate the covariance or correlation and then
>> correct with the correct count.
>>
>> 
>> cov(x, y = NULL, use = "everything",
>>    meth

<    1   2   3   4   5   6   7   8   9   10   >