[Numpy-discussion] Error in Covariance and Variance calculation

2020-03-20 Thread Gunter Meissner
Dear Programmers, 

 

This is Gunter Meissner. I am currently writing a book on Forecasting and 
derived the regression coefficient with Numpy:

 

import numpy as np
X=[1,2,3,4]
Y=[1,8000,5000,1000]
print(np.cov(X,Y))
print(np.var(X))
Beta1 = np.cov(X,Y)/np.var(X)
print(Beta1)



However, Numpy is using the SAMPLE covariance , (which divides by n-1) and the 
POPULATION variance

VarX =   (which divides by n). Therefore the regression coefficient BETA1 is 
not correct.

The solution is easy: Please use the population approach (dividing by n) for 
BOTH covariance and variance or use the sample approach (dividing by n-1) 

for BOTH covariance and variance. You may also allow the user to use both as in 
EXCEL, where the user can choose between Var.S and Var.P

and Cov.P and Var.P.

 

Thanks!!!

Gunter 

 

 

Gunter Meissner, PhD

University of Hawaii

Adjunct Professor of MathFinance at Columbia University and NYU

President of Derivatives Software www.dersoft.com    

CEO Cassandra Capital Management   
www.cassandracm.com 

CV:   www.dersoft.com/cv.pdf 

Email:   meiss...@hawaii.edu

Tel: USA (808) 779 3660

 

 

 

 

From: NumPy-Discussion 
 On Behalf Of Ralf 
Gommers
Sent: Wednesday, March 18, 2020 5:16 AM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards a new 
Datatype System

 

 

 

On Tue, Mar 17, 2020 at 9:03 PM Sebastian Berg mailto:sebast...@sipsolutions.net> > wrote:

Hi all,

in the spirit of trying to keep this moving, can I assume that the main
reason for little discussion is that the actual changes proposed are
not very far reaching as of now?  Or is the reason that this is a
fairly complex topic that you need more time to think about it?

 

Probably (a) it's a long NEP on a complex topic, (b) the past week has been a 
very weird week for everyone (in the extra-news-reading-time I could easily 
have re-reviewed the NEP), and (c) the amount of feedback one expects to get on 
a NEP is roughly inversely proportional to the scope and complexity of the NEP 
contents.

 

Today I re-read the parts I commented on before. This version is a big 
improvement over the previous ones. Thanks in particular for adding clear 
examples and the diagram, it helps a lot.

 

If it is the latter, is there some way I can help with it?  I tried to
minimize how much is part of this initial NEP.

If there is not much need for discussion, I would like to officially
accept the NEP very soon, sending out an official one week notice in
the next days.

 

I agree. I think I would like to keep the option open though to come back to 
the NEP later to improve the clarity of the text about 
motivation/plan/examples/scope, given that this will be the reference for a 
major amount of work for a long time to come.

 

To summarize one more time, the main point is that:

 

This point seems fine, and I'm +1 for going ahead with the described parts of 
the technical design.

 

Cheers,

Ralf

 


type(np.dtype(np.float64))

will be `np.dtype[float64]`, a subclass of dtype, so that:

issubclass(np.dtype[float64], np.dtype)

is true. This means that we will have one class for every current type
number: `dtype.num`. The implementation of these subclasses will be a
C-written (extension) MetaClass, all details of this class are supposed
to remain experimental in flux at this time.

Cheers

Sebastian


On Wed, 2020-03-11 at 17:02 -0700, Sebastian Berg wrote:
> Hi all,
> 
> I am pleased to propose NEP 41: First step towards a new Datatype
> System https://numpy.org/neps/nep-0041-improved-dtype-support.html
> 
> This NEP motivates the larger restructure of the datatype machinery
> in
> NumPy and defines a few fundamental design aspects. The long term
> user
> impact will be allowing easier and more rich featured user defined
> datatypes.
> 
> As this is a large restructure, the NEP represents only the first
> steps
> with some additional information in further NEPs being drafted [1]
> (this may be helpful to look at depending on the level of detail you
> are interested in).
> The NEP itself does not propose to add significant new public API.
> Instead it proposes to move forward with an incremental internal
> refactor and lays the foundation for this process.
> 
> The main user facing change at this time is that datatypes will
> become
> classes (e.g. ``type(np.dtype("float64"))`` will be a float64
> specific
> class.
> For most users, the main impact should be many new datatypes in the
> long run (see the user impact section). However, for those interested
> in API design within NumPy or with respect to implementing new
> datatypes, this and the following NEPs are important decisions in the
> future roadmap for NumPy.
> 
> The current full text is reproduced below, although the above link is
> probably a better way to read it.
> 
> Cheers
> 
> Sebastian

Re: [Numpy-discussion] Error in Covariance and Variance calculation

2020-03-20 Thread Warren Weckesser
On 3/20/20, Gunter Meissner  wrote:
> Dear Programmers,
>
>
>
> This is Gunter Meissner. I am currently writing a book on Forecasting and
> derived the regression coefficient with Numpy:
>
>
>
> import numpy as np
> X=[1,2,3,4]
> Y=[1,8000,5000,1000]
> print(np.cov(X,Y))
> print(np.var(X))
> Beta1 = np.cov(X,Y)/np.var(X)
> print(Beta1)
>
>
>
> However, Numpy is using the SAMPLE covariance , (which divides by n-1) and
> the POPULATION variance
>
> VarX =   (which divides by n). Therefore the regression coefficient BETA1 is
> not correct.
>
> The solution is easy: Please use the population approach (dividing by n) for
> BOTH covariance and variance or use the sample approach (dividing by n-1)
>
> for BOTH covariance and variance. You may also allow the user to use both as
> in EXCEL, where the user can choose between Var.S and Var.P
>
> and Cov.P and Var.P.
>
>
>
> Thanks!!!
>
> Gunter
>


Gunter,

This is an unfortunate discrepancy in the API:  `var` uses the default
`ddof=0`, while `cov` uses, in effect, `ddof=1` by default.

You can get the consistent behavior you want by using `ddof=1` in both
functions.  E.g.

Beta1 = np.cov(X,Y, ddof=1) / np.var(X, ddof=1)

Using `ddof=1` in `np.cov` is redundant, but in this context, it is
probably useful to make explicit to the reader of the code that both
functions are using the same convention.

Changing the default in either function breaks backwards
compatibility.   That would require a long and potentially painful
deprecation process.

Warren



>
>
>
>
> Gunter Meissner, PhD
>
> University of Hawaii
>
> Adjunct Professor of MathFinance at Columbia University and NYU
>
> President of Derivatives Software www.dersoft.com 
>
>
> CEO Cassandra Capital Management  
> www.cassandracm.com
>
> CV:   www.dersoft.com/cv.pdf
>
> Email:   meiss...@hawaii.edu
>
> Tel: USA (808) 779 3660
>
>
>
>
>
>
>
>
>
> From: NumPy-Discussion
>  On Behalf Of Ralf
> Gommers
> Sent: Wednesday, March 18, 2020 5:16 AM
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards a new
> Datatype System
>
>
>
>
>
>
>
> On Tue, Mar 17, 2020 at 9:03 PM Sebastian Berg   > wrote:
>
> Hi all,
>
> in the spirit of trying to keep this moving, can I assume that the main
> reason for little discussion is that the actual changes proposed are
> not very far reaching as of now?  Or is the reason that this is a
> fairly complex topic that you need more time to think about it?
>
>
>
> Probably (a) it's a long NEP on a complex topic, (b) the past week has been
> a very weird week for everyone (in the extra-news-reading-time I could
> easily have re-reviewed the NEP), and (c) the amount of feedback one expects
> to get on a NEP is roughly inversely proportional to the scope and
> complexity of the NEP contents.
>
>
>
> Today I re-read the parts I commented on before. This version is a big
> improvement over the previous ones. Thanks in particular for adding clear
> examples and the diagram, it helps a lot.
>
>
>
> If it is the latter, is there some way I can help with it?  I tried to
> minimize how much is part of this initial NEP.
>
> If there is not much need for discussion, I would like to officially
> accept the NEP very soon, sending out an official one week notice in
> the next days.
>
>
>
> I agree. I think I would like to keep the option open though to come back to
> the NEP later to improve the clarity of the text about
> motivation/plan/examples/scope, given that this will be the reference for a
> major amount of work for a long time to come.
>
>
>
> To summarize one more time, the main point is that:
>
>
>
> This point seems fine, and I'm +1 for going ahead with the described parts
> of the technical design.
>
>
>
> Cheers,
>
> Ralf
>
>
>
>
> type(np.dtype(np.float64))
>
> will be `np.dtype[float64]`, a subclass of dtype, so that:
>
> issubclass(np.dtype[float64], np.dtype)
>
> is true. This means that we will have one class for every current type
> number: `dtype.num`. The implementation of these subclasses will be a
> C-written (extension) MetaClass, all details of this class are supposed
> to remain experimental in flux at this time.
>
> Cheers
>
> Sebastian
>
>
> On Wed, 2020-03-11 at 17:02 -0700, Sebastian Berg wrote:
>> Hi all,
>>
>> I am pleased to propose NEP 41: First step towards a new Datatype
>> System https://numpy.org/neps/nep-0041-improved-dtype-support.html
>>
>> This NEP motivates the larger restructure of the datatype machinery
>> in
>> NumPy and defines a few fundamental design aspects. The long term
>> user
>> impact will be allowing easier and more rich featured user defined
>> datatypes.
>>
>> As this is a large restructure, the NEP represents only the first
>> steps
>> with some additional information in further NEPs being drafted [1]
>> (this may be helpful to look 

[Numpy-discussion] New Wednesday Community Meeting Time slot (Current 11:00am California time)

2020-03-20 Thread Sebastian Berg
Hi all,

since we are spread out all over the world, we are considering moving
the Community meeting time, if you are interested in joining in
occasionally, please fill out the doodle:

https://doodle.com/poll/p3gik4xxdra93cwt

I have currently limited the times to hourly times in the California
work day.
The most likely change will be a slight shift to make it later than the
current 11:00am California time to still be reasonable for those from
Europe. But if you are generally interested and never joined due to the
time, now is a good time to bring it up :).

Note that this meeting is bi-weekly on Wednesday, I did not anticipate
moving the day itself for now.

This is primarily for the Community and not the Triage meeting.

All the best,

Sebastian


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Error in Covariance and Variance calculation

2020-03-20 Thread Gunter Meissner
Thanks Warren! Worked like a charm 😊 Will mention you in the book...


Gunter Meissner, PhD
University of Hawaii
Adjunct Professor of MathFinance at Columbia University and NYU
President of Derivatives Software www.dersoft.com  
CEO Cassandra Capital Management www.cassandracm.com 
CV: www.dersoft.com/cv.pdf 
Email: meiss...@hawaii.edu
Tel: USA (808) 779 3660



-Original Message-
From: NumPy-Discussion 
 On Behalf Of Warren 
Weckesser
Sent: Friday, March 20, 2020 8:45 AM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Error in Covariance and Variance calculation

On 3/20/20, Gunter Meissner  wrote:
> Dear Programmers,
>
>
>
> This is Gunter Meissner. I am currently writing a book on Forecasting 
> and derived the regression coefficient with Numpy:
>
>
>
> import numpy as np
> X=[1,2,3,4]
> Y=[1,8000,5000,1000]
> print(np.cov(X,Y))
> print(np.var(X))
> Beta1 = np.cov(X,Y)/np.var(X)
> print(Beta1)
>
>
>
> However, Numpy is using the SAMPLE covariance , (which divides by n-1) 
> and the POPULATION variance
>
> VarX =   (which divides by n). Therefore the regression coefficient BETA1 is
> not correct.
>
> The solution is easy: Please use the population approach (dividing by 
> n) for BOTH covariance and variance or use the sample approach 
> (dividing by n-1)
>
> for BOTH covariance and variance. You may also allow the user to use 
> both as in EXCEL, where the user can choose between Var.S and Var.P
>
> and Cov.P and Var.P.
>
>
>
> Thanks!!!
>
> Gunter
>


Gunter,

This is an unfortunate discrepancy in the API:  `var` uses the default 
`ddof=0`, while `cov` uses, in effect, `ddof=1` by default.

You can get the consistent behavior you want by using `ddof=1` in both 
functions.  E.g.

Beta1 = np.cov(X,Y, ddof=1) / np.var(X, ddof=1)

Using `ddof=1` in `np.cov` is redundant, but in this context, it is probably 
useful to make explicit to the reader of the code that both functions are using 
the same convention.

Changing the default in either function breaks backwards
compatibility.   That would require a long and potentially painful
deprecation process.

Warren



>
>
>
>
> Gunter Meissner, PhD
>
> University of Hawaii
>
> Adjunct Professor of MathFinance at Columbia University and NYU
>
> President of Derivatives Software www.dersoft.com 
> 
>
>
> CEO Cassandra Capital Management   
> www.cassandracm.com
>
> CV:   www.dersoft.com/cv.pdf
>
> Email:   meiss...@hawaii.edu
>
> Tel: USA (808) 779 3660
>
>
>
>
>
>
>
>
>
> From: NumPy-Discussion
>  On Behalf Of 
> Ralf Gommers
> Sent: Wednesday, March 18, 2020 5:16 AM
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards 
> a new Datatype System
>
>
>
>
>
>
>
> On Tue, Mar 17, 2020 at 9:03 PM Sebastian Berg 
> mailto:sebast...@sipsolutions.net> > wrote:
>
> Hi all,
>
> in the spirit of trying to keep this moving, can I assume that the 
> main reason for little discussion is that the actual changes proposed 
> are not very far reaching as of now?  Or is the reason that this is a 
> fairly complex topic that you need more time to think about it?
>
>
>
> Probably (a) it's a long NEP on a complex topic, (b) the past week has 
> been a very weird week for everyone (in the extra-news-reading-time I 
> could easily have re-reviewed the NEP), and (c) the amount of feedback 
> one expects to get on a NEP is roughly inversely proportional to the 
> scope and complexity of the NEP contents.
>
>
>
> Today I re-read the parts I commented on before. This version is a big 
> improvement over the previous ones. Thanks in particular for adding 
> clear examples and the diagram, it helps a lot.
>
>
>
> If it is the latter, is there some way I can help with it?  I tried to 
> minimize how much is part of this initial NEP.
>
> If there is not much need for discussion, I would like to officially 
> accept the NEP very soon, sending out an official one week notice in 
> the next days.
>
>
>
> I agree. I think I would like to keep the option open though to come 
> back to the NEP later to improve the clarity of the text about 
> motivation/plan/examples/scope, given that this will be the reference 
> for a major amount of work for a long time to come.
>
>
>
> To summarize one more time, the main point is that:
>
>
>
> This point seems fine, and I'm +1 for going ahead with the described 
> parts of the technical design.
>
>
>
> Cheers,
>
> Ralf
>
>
>
>
> type(np.dtype(np.float64))
>
> will be `np.dtype[float64]`, a subclass of dtype, so that:
>
> issubclass(np.dtype[float64], np.dtype)
>
> is true. This means that we will have one class for every current type
> number: `dtype.num`. The implementation of these subclasses will be a 
> C-written (extension) MetaClass, all details of this class are 
> supposed to remain experimental in flux at this time.
>
> Cheers
>
> Sebastian
>
>
>